Consider a Bernoulli trial with success probability \(p = 1-q\), this paper is interested in the problem of the number of trials needed to see the \(r\)-th occurrence of seeing 2 successes within a window of no more than \(k\) trials. Windows are non-overlapping.

A window having 2 successes are defined such that only the first and last trials are by definition a success and its length is less than or equal to \(k\). Alternative definition of the problem is based on a “2-out-of-\(k\) sliding window detector”.

These are the symbols defined in the paper:

  • \(X_1, X_2, \cdots\) are Bernoulli trial sequence, takes values of 0 or 1
  • \(p = 1-q\) is the Bernoulli parameter, the probability of success
  • \(k\ge 2\) is the maximum window size
  • \(r\ge 1\) is the number of occurrences to look for
  • \(T_{k,r}\) is the waiting time, i.e., number of trials to see the \(r\)-th occurrence of 2 successes within windows of size \(k\)
  • \(N_{n,k}\) is the number of occurrences of a strand of at most \(k\) consecutive trials containing 2 successes. \(P(N_{n,k}\ge r) = P(T_{k,r} \le n)\)
  • \(f_{k,r}(n) = P(T_{k,r}=n)\) is the distribution function for \(T_{k,r}\)
  • \(\phi_{k,r} = \sum_{n=0}^{\infty} f_{k,r}(n)z^n\) is the probability generating function

We can see that,

\[T = T_{k,1} = \inf\left\{n\ge 1: \sum_{i=\max(1,n-k+1)}^n X_i \ge 2\right\}\] \[P(T>n) = P(N_{n,k} = 0)\]

In sec 3, we consider only \(T=T_{k,1}\), with \(f(n)=f_{k,1}(n)\), and the first theorem is derived as follows: Obviously \(f(0)=f(1)=0\) for we cannot have two successes before the second trial. For \(n\le k\), i.e., within the size of one window, then \(f(n) = (n-1)p^2q^{n-2}\) which is interpreted as having one success somewhere within first \(n-1\) trials and the other success at trial \(n\). Consider the probability \(f(n)\) for larger \(n\). We sure we see a success at trial \(n\). If the result of the first trial is failure (with probability \(q\)), then we are looking for the occurrence of such pattern from second trial onward, i.e. with probability \(qf(n-1)\). But if the first trial is success (with probability \(p\)), we cannot have another success in trials 2 to \(k\) to meet the criteria that \(T_{k,1} = n\). So the probability will be \(pq^{k-1}f(n-k)\). Therefore, we established:

\[f(n) = f_{k,1}(n) = \begin{cases} 0 & n\in\{0,1\} \\ (n-1)p^2q^{n-2} & 1<n\le k \\ qf(n-1)+pq^{k-1}f(n-k) & n>k \end{cases}\]

Note that \(\sum_n f(n) = q\sum_n f(n-1) + pq^{k-1} \sum_n f(n-k)\), so the tail probabilities \(F(n) = P(T>n) = \sum_{x=n+1}^{\infty} f(x)\) satisfy the same recurrence relation.

Here, we have a special case of \(k=2\): (this is an interview question I was asked, but at the time I wasn’t realized that there is no close form solution and we have to resolve into numeric answer)

\[\begin{align} f(n) = f_{2,1}(n) &= qf(n-1)+pqf(n-2) \\ &= f(n-1)+p^2qf(n-3) \end{align}\]

Here we have a few properties of \(f(n)\): On \(1<n\le k-1\),

\[\frac{f(n+1)}{f(n)} = \frac{n}{n-1}q\]

and

\[\begin{align} \frac{n}{n-1}q \ge 1 \Rightarrow\quad nq \ge n-1 \Rightarrow\quad np \ge 1 \end{align}\]

therefore

\[\begin{cases} f(n+1) \ge f(n) & \textrm{if } n\le 1/p \\ f(n+1) \le f(n) & \textrm{if } n\ge 1/p. \end{cases}\]

For \(f(n+1),\; n\ge k\), it should be lower-bounded by the case that first \(k-1\) trials are all failed (which is stricter case than only the first trial failed):

\[\begin{align} f(n+1) &\ge q^{k-1}f(n-k+1) \\ qf(n)+pq^{k-1}f(n-k+1) &\ge q^{k-1}f(n-k+1) \\ qf(n) &\ge (1-p)q^{k-1}f(n-k+1) \\ f(n) &\ge q^{k-1}f(n-k+1) \end{align}\]

Hence is the equation on p.792. From this we can see that

\[\begin{align} f(n+1) - f(n) &= qf(n)+pq^{k-1}f(n-k+1) - f(n) \\ &= -pf(n)+pq^{k-1}f(n-k+1) \\ &\ge p[-q^{k-1}f(n-k+1)+q^{k-1}f(n-k+1)] = 0 \\ f(n+1) - f(n) &\ge 0 \end{align}\]

So here we concluded that \(f(n)\) is unimodal with maximum attained at \(n=\left[\min(k-1, 1/p)\right]+1\) but the strong unimodality characterization \(f^2(n)\ge f(n-1)f(n+1)\) reversed the sign on \(n=k+1\). Therefore its convolution with other unimodal distributions is not necessarily unimodal — so it is not easy to tell the properties of \(f_{k,r}\) from \(f_{k,1}\).

The generating function: First consider that \(f(n) = qf(n-1)+p^2q^{n-2}\) for \(1<n\le k\). Then

\[\begin{align} \phi(z) &= \phi_{k,1}(z) \\ &= \sum_{n=0}^{\infty} f(n)z^n \\ &= \sum_{n=0}^{k} f(n)z^n + \sum_{n=k+1}^{\infty} f(n)z^n \\ &= \sum_{n=2}^{k} (qf(n-1)+p^2q^{n-2})z^n + \sum_{n=k+1}^{\infty} (qf(n-1)+pq^{k-1}f(n-k))z^n \\ &= qz\sum_{n=2}^{k} f(n-1)z^{n-1} + p^2 z^2\sum_{n=2}^{k} q^{n-2}z^{n-2} + qz\sum_{n=k+1}^{\infty} f(n-1)z^{n-1} + pq^{k-1}z^k\sum_{n=k+1}^{\infty} f(n-k)z^{n-k} \\ &= qz\sum_{n=1}^{k-1} f(n)z^n + p^2 z^2\sum_{n=0}^{k-2} q^n z^n + qz\sum_{n=k}^{\infty} f(n)z^n + pq^{k-1}z^k\sum_{n=1}^{\infty} f(n)z^n \\ &= qz\sum_{n=1}^{\infty} f(n)z^n + p^2 z^2 \frac{1-(qz)^{k-1}}{1-qz} + pq^{k-1}z^k\sum_{n=1}^{\infty} f(n)z^n \\ &= qz\phi(z) + (pz)^2 \frac{1-(qz)^{k-1}}{1-qz} + pq^{k-1}z^k\phi(z) \\ \phi(z) &= \frac{(pz)^2(1-(qz)^{k-1})/(1-qz)}{1-qz-pq^{k-1}z^k} \end{align}\]

From which we can find the mean and variance of \(T=T_{k,1}\) (corollary 3.1):

\[\begin{align} \mu = E[T] &= \phi'(1) \\ &= \frac{2-q^{k-1}}{p(1-q^{k-1})} \\ \sigma^2 = \mathrm{Var}[T] &= \phi''(1)+\phi'(1)-(\phi'(1))^2\\ &= \frac{q}{p^2} + (2k-1)\frac{q^{k-1}}{p(1-q^{k-1})^2}+\frac{q}{p^2(1-q^{k-1})^2} \end{align}\]

and for higher order moments, \(m_s = E[T^s]\), \(s\ge 0\), we have (theorem 3.3)

\[\begin{align} m_s &= \sum_{n=2}^k n^s f(n) + \sum_{n=k+1}^{\infty} n^s f(n) \\ &= p^2 \sum_{n=2}^k n^s q^{n-2} + q\sum_{n=2}^{\infty} n^s f(n-1) + pq^{k-1} \sum_{n=k+1}^{\infty} n^s f(n-k) \\ &= p^2 \sum_{n=2}^k n^s q^{n-2} + q\sum_{n=1}^{\infty} (n+1)^s f(n) + pq^{k-1} \sum_{n=1}^{\infty} (n+k)^s f(n) \\ &= p^2 \sum_{n=2}^k n^s q^{n-2} + q\sum_{i=1}^{s} \binom{s}{i}m_i + pq^{k-1} \sum_{i=0}^{s} \binom{s}{i}k^{s-i}m_i \\ &= \frac{1}{p(1-q^{k-1})}\left[p^2\sum_{n=2}^k n^sq^{n-2}+q\sum_{i=0}^{s-1}\binom{s}{i}(1+pq^{k-2}k^{s-i})m_i\right] \end{align}\]

Using the above result, the paper proposed a way to estimate the Bernoulli parameter \(p\):

\[E[T] = \frac{1}{p}\left(1+\frac{1}{1-(1-p)^k}\right)\]

which \(E[T] = \frac{1}{N}\sum_{i=0}^N T^{(i)}\) is determined by Monte Carlo simulation. Indeed, the summation \(\sum_{i=0}^N T^{(i)}\) is one instance of \(T_{k,N}\).

I tried out this estimation:

import random

p = 0.14159
k = 5
rangen = random.SystemRandom()

def trial():
    # simulate a Bernoulli trial, with success prob p
    return rangen.random() <= p

def Tn():
    """Simulate T

    Returns:
        tuple (T, n), which T is number of trials done until we see 2 head in a
        window of k, and n is the total number of success encountered in T
    """
    T = 0
    last_k = []
    while True:
        result = trial()
        T += 1
        last_k.append(result)
        if not result:
            continue
        if sum(last_k[-k:]) == 2:
            return (T, sum(last_k))

def main():
    # simulate 1000 count of T
    N = 1000
    sum_T = 0
    sum_n = 0
    for _ in range(N):
        T, n = Tn()
        sum_T += T
        sum_n += n
    # compute probability
    simple_p = float(sum_n)/float(sum_T)
    mean_T = float(sum_T) / N
    h = (1 + 1/(1 - (1 - simple_p)**k)) / simple_p
    th_h = (1 + 1/(1 - (1 - p)**k)) / p
    print("T, n: %d, %d" % (sum_T, sum_n))
    print("real p: %.6f" % p)
    print("simple p: %.6f" % simple_p)
    print("computed h from simple p: %.6f" % h)
    print("experimental T: %.6f" % mean_T)
    print("theoretical T: %.6f" % th_h)

if __name__ == "__main__":
    main()

and found that this approach is less accurate than simply using the result of all Bernoulli trials ever performed – for the reason that we use vastly less number of samples.

Sec 4 answers the ultimate question of \(T_r = T_{k,r}\). It can be seen as a sum \(r\) of iid random variables \(T_{k,1}\) and therefore, the paper starts with the probability generating function

\[\begin{align} \phi_r(z) &= \sum_{n=0}^{\infty} f_r(n)z^n = E[z^{\sum T}] = (E[z^T])^r\\ &= \left[\frac{(pz)^2(1-(qz)^{k-1})/(1-qz)}{1-qz-pq^{k-1}z^k}\right]^r \end{align}\]

This can help finding the probability distribution:

\[\begin{align} \phi_r(z) &= [\phi(z)]^r \\ \therefore \phi_{r+1}(z) &= \phi(z)\phi_r(z) \\ \therefore f_r(n) &= \phi_r(0) \end{align}\]

and by the properties of sum of iid random variables

\[\begin{align} E[T_r] &= rE[T] \\ \mathrm{Var}[T_r] &= r\mathrm{Var}[T] \end{align}\]

Bibliographic data

@article{
   title = "On a waiting time distribution in a sequence of Bernoulli trials",
   author = "M. V. Koutras",
   journal = "Ann. Inst. Statist. Math.",
   volume = "48",
   number = "4",
   pages = "789-806",
   year = "1996",
   url = "https://pdfs.semanticscholar.org/5b6a/5299008293657032d170c062bc54f04ce3eb.pdf",
}