The goal of this exercise was to assess how many shuffles it takes to "sufficiently randomize" a deck of cards. Unfortunately, "sufficiently randomized" is not an easily quantifiable metric. One way this was assessed here, was by exploiting the Shannon Entropy. The Shannon Entropy can be described as "how little information do I have about a situation." Mathematically, the Shannon Entropy, $S$, can be described via, \[ S \propto - \sum_{i=1}^{N_{\text{outcomes}}}p_i \log p_i \] where $p_i$ is the probability of outcome $i$. In a classic example of a coin flip, one observes that the Shannon Entropy is maximized when a coin is fair, that is the probability of heads is equal to that of tails as follows. In a coin flip, there are two possible outcomes, so $S$ becomes, \[ S = -k p_i \log p_i + -k(1-p_i)\log (1-p_i). \] where k is a positive constant. To find the value of p_i that maximizes $S$, take the derivative with respect to $p_i$ and set it equal to zero, \[ \frac{d S}{d p_i} = -p_i \frac{1}{p_i} - \log p_i (1) - (1-p_i)\frac{-1}{1-p_i} + \log (1-p_i) = 0 \] Simplifying, \[ \log \frac{p_i}{1-pi} = 0 \] \[ p_i = 1 - p_i \] \[ p_i = \frac{1}{2} \] Then we can find a value for $k$ such that the maximal value of $S$ is $1$. \[ S(p_i=0.5) = 1 \implies k\log{2} = 1 \implies k = \frac{1}{\log 2} \] so for the case of two outcomes, with the indicated maximal value for $S$ and applying the change of base formula, \[ \boxed{ S = -\sum_{i=1}^{N_{\text{outcomes}}}p_i \log_2 p_i. } \]
In the general case of N possible outcomes, we can solve for the relationship between the probabilities by optimizing $S$ subject to the constraint that the sum of the probabilities is 1. \[ \vec{\nabla} \left[ S(\{p_i\}) + \lambda \left(1-\sum_i^N p_i \right) \right]= \vec{0} \] where $\vec{\nabla}$ indicates \[ \vec{\nabla} = \begin{pmatrix} \frac{\partial}{\partial p_1} \\ \frac{\partial}{\partial p_2} \\ \vdots \\ \frac{\partial}{\partial p_N} \\ \frac{\partial}{\partial \lambda} \end{pmatrix}. \] Resulting in $N$ expressions of the form, \[ -\log p_i - 1 = \lambda. \] Setting $\lambda = \lambda$, \[ -\log p_i -1 = -\log p_j -1 \implies p_i=p_j=p \quad \forall \quad i, j \] and the original constraint, \[ \sum_i^N p_i = 1 = p \sum_i^N 1 = 1 \implies p = 1/N \] The Shannon Entropy is visualized in Figure 1 wherein contours of constant $S$ in a three outcome system are shaded. In this figure, $p_3$ is implicit upon specification of $p_1$ and $p_2$.
In the context of shuffling a deck of cards, $S$ was used to assess the probability mass function (PMF) of the location of a card within the deck after $x$ shuffles. Say, for instance, we know a certain card is on the bottom of the deck. When we shuffle the deck once, we can usually make a decent guess as to where the card is in the new arrangment of cards. That is, the probability that the card is in one location, say in the bottom 10-20 cards is greater than it suddenly having jumped to the top of the deck. This non-uniform probability mass function (PMF) is precisely what is captured by $S$.
As we continue to shuffle, we intuitively "lose track" of where a particular card went, and would be equally likely to find it at any location within the deck. Figure 2 illustrates this loss of information for a card initially known to located in the middle of the deck, position 50, after shuffling with a side shuffle the indicate number of times. After the first shuffle, the PMF contains two peaks corresponding to the top and bottom of the deck in accordance with whether the deck was cut just before or just after the 50th card before beginning the side shuffle, respectively. Note: in frequency trail plots, each curve is offset by a set amount to give an impression of the progression (or in this case regression) of a distribution with respect to some dependent variable. This means that the absolute "y"-values for two curves with different shading ought be compared with caution. Figures 3 through 6 show much the same trend, with a marked decay in information on the whereabouts of the initial card.
Additionally, the PMF associated with how many cards were in the top pile after cutting the deck in two halves was compared to the established model of Gilber-Shannon-Reeds (Figure 7.). This model predicts the probability of having $k$ cards in the top pile after cutting an $n$-card deck to be, \[ p(k) = \begin{pmatrix} n\\ k \end{pmatrix} 2^{-n}. \] Curiously, my data are suggestive of a significantly sharper peaked distribution. To assess the confidence in the distribution of my data, my next step will be to perform bootstrap sampling and obtain error bars for each $p(k)$.