Chi-square test for randomness python

I'm running a simulation for a class project that relies heavily on random number generators, and as a result we're asked to test the random number generator to see just how "random" it is using the Chi-Square static. After looking through the some posts here, I used the follow code to find the answer:

from random import randint
import numpy as np
from scipy.stats import chisquare
numIterations = 1000  #I've run it with other numbers as well

observed = []
for i in range(0, numIterations):
    observed.append(randint(0, 100))
data = np.array(observed)
print "(chi squared statistic, p-value) with", numOfIter, "samples generated: ", chisquare(data)

However, I'm getting a p-value of zero when numIterations is greater than 10, which doesn't really make sense considering the null hypothesis is that the data is uniform. Am I misinterpreting the results? Or is my code simply wrong?

I am testing pseudo-random number generators and need to perform a chi-squared test. However, I've encountered some difficulties.

Let's take the following example: I have generated 100 numbers, ranging from 1 to 10. The distribution is as follows:

1: 8

2: 12

3: 9

4: 11

5: 16

6: 6

7: 8

8: 10

9: 13

10: 7

From what I was able to understand, next I should calculate D.

$$D = d1 + d2 + d3 + ... + d10.$$

$di =$ square of the difference between the expected value and the observer value, everything over the expected value

$$d1 = ((8 - 10)^2)/10 = 4/10$$

$$d2 = ((12 - 10)^2)/10 = 4/10$$

. . .

$$d10 = ((7 - 10)^2)/10 = 9/10$$

Adding them up results in 84/10 or 8.4.

The next step is comparing this to $X^2$.

That is $X^2[1-\alpha,k-1]$. It is clear that $k=10$. But what value should I use for $\alpha$? And how to I know the value of $X^2$ after I decide what $\alpha$ I am going to use?

It feels that I am close but I just can't figure it out. Many thanks.

Sometime in the early part of this decade, I caught onto the board gaming craze. Every month or so, I’d scour through BoardGameGeek’s (“BGG”) highest rated games for new board games to buy. There are literally tens of thousands of board games listed BGG, many with reviews and critiques. My favorite strategy game is not an unusual pick. Rather, it’s currently the #4 rated game of all-time on BGG and a former #1. That game is Twilight Struggle.

Chi-square test for randomness python

Twilight Struggle is a card-driven 2-player strategy game with a “Cold War” theme. It some ways, it feels like a combination of chess and poker. One side plays the United States and the other side plays the Soviet Union. The game has the same paranoid feel as the real Cold War, as you’re constantly having to guess which cards your opponent might have and how they can harm you with those cards.

Chi-square test for randomness python

While skill is very important in TS, luck also plays a heavy role in outcomes. The game includes coups, realignments, war cards, and a space race, all of which are determined by die rolls.

Chi-square test for randomness python

The Soviets never foresaw The Giant Cat Invasion from the Middle East

A few years ago, after a successful crowd-funding campaign, an online version of Twilight Struggle was released for PCs and Macs (available on Steam). After playing a few hundred online games, I decided I wanted to try to create a luck-measurement system to evaluate my own results. In the process, I started collecting the results for die rolls on “coups” and “war cards”. And this is where things get interesting: my die roll samples had surprising distributions.

Chi-square test for randomness python

After 279 rolls, my average roll was 3.254. After 303 rolls, my opponent’s average roll was 3.855. I wanted to know how unusual this distribution was, so I conducted some chi-square tests in Python.

Understanding Chi-Square Tests

Before we look at those tests, however, I’ll explain chi-square in more detail.

The chi-square statistical test is used to determine whether there’s a significant difference between an expected distribution and an actual distribution. It’s typically used with categorical data such as educational attainment, colors, or gender.

Dice rolls are a great example of data suited for chi-square testing. If we roll a standard 6-sided die a thousand times, we know that each number should come up approximately 1/6 of the time (i.e. 16.66667%). A chi-square test can help determine whether a die is ‘fair’ or if die-roll generators (such as those used in software) are generating ‘random’ results.

However, die rolls are an example of a variable where the ‘expected distribution’ is known. This isn’t always the case. Sometimes, our ‘expected distribution’ is estimated through data.

Let’s pretend for a second we don’t know the expected frequency of die rolls. We’d have to estimate the expected frequency through data samples. Let’s conduct a few samples to try to ascertain the frequency of each roll. I decided to do 4 samples of die rolls manually (i.e. with actual dice), the first 3 samples were 35 rolls each, and the last sample as 45 rolls. These are smaller samples than we prefer, but I wanted to give us some real data to work with. Here is my distribution of rolls, with the four samples denoted by letters ‘a’, ‘b’, ‘c’, and ‘d’.

Chi-square test for randomness python

Given what we know about probability, with 150 rolls, we should expect each number to come up approximately 25 times (i.e. 1/6 of 150). We can see that this happened for 1, 5, and 6, but 4 came up quite a bit more than expected, and 2 and 3 were a bit underrepresented. This is likely due to our relatively small sample size (see “law of large numbers”), but we’ll work with it.

Let’s run a chi-square test for independence for variables in a contingency table on this data set. First I’ll input the data.

import numpy as npa1 = [6, 4, 5, 10]
a2 = [8, 5, 3, 3]
a3 = [5, 4, 8, 4]
a4 = [4, 11, 7, 13]
a5 = [5, 8, 7, 6]
a6 = [7, 3, 5, 9]
dice = np.array([a1, a2, a3, a4, a5, a6])

Then, I ran the test using SciPy Stats library

from scipy import stats

stats.chi2_contingency(dice)

Unfortunately, while it’s a very useful tool, SciPy does not provide the results in the prettiest fashion.

Chi-square test for randomness python

I’ll explain what all of this means. The first value (16.49) is the chi-square statistic. Skip down to the third number in the output; that’s ‘degrees of freedom.’ This can be calculated by taking the number of rows minus one and multiplying this result by the number of columns minus one.

In this instance:

Rows = 6 [die rolls 1–6]

Columns = 4 [samples]

So we take (6–1) and multiply by (4–1) to get 15 degrees of freedom.

With the chi-square stat and the degrees of freedoms, we can find the p-value. The p-value is what we use to determine significance (or independence in this case). Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold. In this particular example, the p-value (the second number in our output: 0.3502) is far 0.01, and thus we have not met the threshold for statistical significance.

Now that I’ve explained what everything means, we can create easier-to-read output code in SciPy.

chi2_stat, p_val, dof, ex = stats.chi2_contingency(dice2)print("===Chi2 Stat===")
print(chi2_stat)
print("\n")
print("===Degrees of Freedom===")
print(dof)
print("\n")
print("===P-Value===")
print(p_val)
print("\n")
print("===Contingency Table===")
print(ex)

This will produce a much more coherent output:

Chi-square test for randomness python

Finally, the array at the end of the output is the contingency table with expected values based on all of our samples. Note in this case, our contingency table produced values that are, in some cases, quite a bit off of what we know we should expect with die rolls. This is because we are using too small of a sample to accurate measure the population.

Running a Large Sample to Get Expected Population Distribution

We can run a much larger sample to see how this methodology can work better. Since I’m not willing to hand-roll a die thousands of times, we’ll use Python to do this. We need np.random.randint and np.unique. I did 5 samples of 1,000 die rolls each.

r1 = np.random.randint(1,7,1000)
r2 = np.random.randint(1,7,1000)
r3 = np.random.randint(1,7,1000)
r4 = np.random.randint(1,7,1000)
r5 = np.random.randint(1,7,1000)

Then saved the results via np.unique.

unique, counts1 = np.unique(r1, return_counts=True)
unique, counts2 = np.unique(r2, return_counts=True)
unique, counts3 = np.unique(r3, return_counts=True)
unique, counts4 = np.unique(r4, return_counts=True)
unique, counts5 = np.unique(r5, return_counts=True)

Now, we combine our arrays to run stats.chi2_contingency:

dice = np.array([counts1, counts2, counts3, counts4, counts5])

And let’s the run the test.

chi2_stat, p_val, dof, ex = stats.chi2_contingency(dice)

Here were the results.

Chi-square test for randomness python

Notice our contingency table now produces a more uniform expected distribution. It’s still slightly off (we should expect each number to come up about 166.7 times in a sample of 1,000 die rolls), but we’re getting very close to that distribution.

I decided to run the test one more time, this time with 5 samples of 10,000 die rolls.

Chi-square test for randomness python

We can see our distribution closes in even more on the known population distribution (16.667% chance for each number or 1,666.7 out of 10,000 rolls in sample). The interesting thing about this is that since we know the real expected distribution, we can see how samples allow us to estimate the population distribution.

Twilight Struggle Dice Chi-Square Test

Now, let’s jump into our online Twilight Struggle dice data.

Chi-square test for randomness python

For our actual test, we don’t need the contingency table. We know the expected distribution. For a 6 sided die, each number is expected to come up approximately 1/6 of the time. Since we know the expected distribution, we can use scipy.stats.chisquare rather than chi2_contingency.

For my Twilight Struggle dice data, I have two samples: die rolls for myself and die rolls for my opponents. Our null hypothesis is that the die rolls are randomly distributed (and hence, evenly distributed).

For my data, I rolled 279 times. We divide by 6 to find the expected distribution (46.5 for each number). When running scipy.stats.chisquare, be careful to get the order of the arguments correct; otherwise, you’ll get inaccurate results. The first argument (f_obs) is for the ‘actual results’ while the second argument (f_exp) is for ‘expected results’.

my_rolls_expected = [46.5, 46.5, 46.5, 46.5, 46.5, 46.5]
my_rolls_actual = [59, 63, 37, 38, 32, 50]
stats.chisquare(my_rolls_actual, my_rolls_expected)

Running this test, we come up with a p-value of 0.0037.

Chi-square test for randomness python

This is below 0.01 and statistically significant. This means there’s only about a 0.4% chance that we’d see this result if the dice were truly random.

Next let’s look at my opponent die rolls. My opponents rolled 303 times. Once again, we divide by 6 to find the expected distribution of 50.5. We compare to the actual distribution.

opp_rolls_expected = [50.5,50.5,50.5,50.5,50.5,50.5]
opp_rolls_actual = [39,39,46,54,53,72]
stats.chisquare(opp_rolls_actual, opp_rolls_expected)

We find a similar result.

Chi-square test for randomness python

Our p-value is 0.0097, which is also below 0.01, indicating that there’s slightly less than a 1% chance that we’d observe this distribution if the dice rolls were truly randomized.

While I’ve anecdotally noticed odd patterns in the die rolls (which I had previously waived off as ‘observation bias’), I’m actually a little bit surprised by these results. For both my die rolls and opponent die rolls in our 2 random samples, we can reject the null hypothesis that the dice are truly random.

This is interesting and I’ve decided I’m going to continue collecting data to see if I can replicate the results in the future. Unfortunately, it will likely take me a couple more months to build up a few more meaningful data samples.

Conclusions

Chi-square is a great tool to compare results involving categorical data. We can see how a sample deviates from the expected distribution. Python’s SciPy library provides great tools for running chi-square tests.

Further Resources

To understand chi-square better, I recommend Khan Academy’s excellent series of videos.