An article on boosting the power of chi-squared test for checking randomness.

Alphabet $A = {a_1, \ldots, a_k}$ with $k$ in the order of $2^{10}$ or greater. In chi-squared test, $k$ is the number of categories (i.e., $k-1$ is the degree of freedom) and if we apply the test to a random number generator over alphabet $A$, we need a very large sample size.

Chi-square test:

where $N$ is the sample size, $n_i$ is the count of samples in category $i$, and $p_i$ is the probability of category $i$ according to the hypothesis. Then $x^2$ follows chi-square distribution of $k-1$ degree of freedom asymptotically, namely, the less $x^2$ the more probable the sample fits the distribution $p_i$. This is a good approximation if $Np_i \ge 5$ for all $i$. Therefore, the necessary (but not sufficient) condition for a good approximation is $N\ge 5k$, depends on the number of categories.

The paper proposes “adaptive chi-squared test” that partition alphabet $A$ into subsets $A_1,\ldots,A_s$ which $2 \le s\ll k$. Therefore, the number of categories are vastly reduced. And we apply chi-squared test on the partition with $s$ categories instead of $k$.

The partition of $A$ into $s$ subsets $A_i$ are as follows: We divide the samples $x_1,\ldots,x_N$ into training samples $x_1,\ldots,x_m$ and testing samples $x_{m+1},\ldots,x_N$. Then based on training samples, we make alphabets $a_i$ and $a_j$ into the same subset $A_h$ iff the frequency of $a_i$ and $a_j$ are close. In the example of the paper, $A$ is partitioned into $A_0,A_1,A_2$ for alphabets that have zero, one, and more than one occurrences in the training samples, respectively. After the partition, we assign probabilities to subsets $A_h$ according to its constitutient alphabets. In case of the hypothesis of uniform distribution in $A$, then $\Pr[A_h] = |A_h|/k = \sum_{i:a_i\in A_h} \frac{1}{k}$. Such probabilities of the subsets is the null hypothesis $H_0$ in the chi-squared test and we test it using the frequency count in training sample.

Because we combined the categories to make the number of categories reduce from $k$ to $s$, we increases the power.

The paper tested the proposal with entropy of encrypted text: Try to distinguished encrypted plain text against random bytes. It assume alphabets to be 24-bit words and partition the set of alphabet into three subsets as mentioned above, and require a level of significant of 5%. For example encrypting English text with AES in block length of 128 bits. The paper found that, in case of 100K samples, the “usual” chi-squared test can detect it is not pure random at 4 out of 40 times but the adaptive chi-squared test can detect 17 out of 40 times. In case of 2000K samples, the usual and adaptive chi-squared test can detect 30 and 33 out of 40 times, respectively. Thus the adaptive test can provide a significant better yield in case of fewer samples.

Note: Such test for randomness is incomplete. For example, it doesn’t tell about the cycling property of random sequences. Wikipedia has an article on randomness tests.

Bibliographic data

@article{
   title = "A new test for randomness and its application to some cryptographic problems",
   author = "B. Ya. Ryabko and V. S. Stognienko and Yu. I. Shokin",
   journal = "Journal of Statistical Planning and Inference",
   number = "123",
   year = "2004",
   pages = "365--376",
   doi = "10.1016/S0376-3758(03)-00149-6",
}