An article on boosting the power of chi-squared test for checking randomness.

Alphabet \(A = \{a_1, \ldots, a_k\}\) with \(k\) in the order of \(2^{10}\) or greater. In chi-squared test, \(k\) is the number of categories (i.e., \(k-1\) is the degree of freedom) and if we apply the test to a random number generator over alphabet \(A\), we need a very large sample size.

Chi-square test:

\[x^2 = \sum_{i=1}^k \frac{(n_i - Np_i)^2}{Np_i}\]

where \(N\) is the sample size, \(n_i\) is the count of samples in category \(i\), and \(p_i\) is the probability of category \(i\) according to the hypothesis. Then \(x^2\) follows chi-square distribution of \(k-1\) degree of freedom asymptotically, namely, the less \(x^2\) the more probable the sample fits the distribution \(p_i\). This is a good approximation if \(Np_i \ge 5\) for all \(i\). Therefore, the necessary (but not sufficient) condition for a good approximation is \(N\ge 5k\), depends on the number of categories.

The paper proposes “adaptive chi-squared test” that partition alphabet \(A\) into subsets \(A_1,\ldots,A_s\) which \(2 \le s\ll k\). Therefore, the number of categories are vastly reduced. And we apply chi-squared test on the partition with \(s\) categories instead of \(k\).

The partition of \(A\) into \(s\) subsets \(A_i\) are as follows: We divide the samples \(x_1,\ldots,x_N\) into training samples \(x_1,\ldots,x_m\) and testing samples \(x_{m+1},\ldots,x_N\). Then based on training samples, we make alphabets \(a_i\) and \(a_j\) into the same subset \(A_h\) iff the frequency of \(a_i\) and \(a_j\) are close. In the example of the paper, \(A\) is partitioned into \(A_0,A_1,A_2\) for alphabets that have zero, one, and more than one occurrences in the training samples, respectively. After the partition, we assign probabilities to subsets \(A_h\) according to its constitutient alphabets. In case of the hypothesis of uniform distribution in \(A\), then \(\Pr[A_h] = |A_h|/k = \sum_{i:a_i\in A_h} \frac{1}{k}\). Such probabilities of the subsets is the null hypothesis \(H_0\) in the chi-squared test and we test it using the frequency count in training sample.

Because we combined the categories to make the number of categories reduce from \(k\) to \(s\), we increases the power.

The paper tested the proposal with entropy of encrypted text: Try to distinguished encrypted plain text against random bytes. It assume alphabets to be 24-bit words and partition the set of alphabet into three subsets as mentioned above, and require a level of significant of 5%. For example encrypting English text with AES in block length of 128 bits. The paper found that, in case of 100K samples, the “usual” chi-squared test can detect it is not pure random at 4 out of 40 times but the adaptive chi-squared test can detect 17 out of 40 times. In case of 2000K samples, the usual and adaptive chi-squared test can detect 30 and 33 out of 40 times, respectively. Thus the adaptive test can provide a significant better yield in case of fewer samples.

Note: Such test for randomness is incomplete. For example, it doesn’t tell about the cycling property of random sequences. Wikipedia has an article on randomness tests.

Bibliographic data

@article{
   title = "A new test for randomness and its application to some cryptographic problems",
   author = "B. Ya. Ryabko and V. S. Stognienko and Yu. I. Shokin",
   journal = "Journal of Statistical Planning and Inference",
   number = "123",
   year = "2004",
   pages = "365--376",
   doi = "10.1016/S0376-3758(03)-00149-6",
}