Consider a Bernoulli trial with success probability $p = 1-q$, this paper is interested in the problem of the number of trials needed to see the $r$-th occurrence of seeing 2 successes within a window of no more than $k$ trials. Windows are non-overlapping.

A window having 2 successes are defined such that only the first and last trials are by definition a success and its length is less than or equal to $k$. Alternative definition of the problem is based on a “2-out-of-$k$ sliding window detector”.

These are the symbols defined in the paper:

• $X_1, X_2, \cdots$ are Bernoulli trial sequence, takes values of 0 or 1
• $p = 1-q$ is the Bernoulli parameter, the probability of success
• $k\ge 2$ is the maximum window size
• $r\ge 1$ is the number of occurrences to look for
• $T_{k,r}$ is the waiting time, i.e., number of trials to see the $r$-th occurrence of 2 successes within windows of size $k$
• $N_{n,k}$ is the number of occurrences of a strand of at most $k$ consecutive trials containing 2 successes. $P(N_{n,k}\ge r) = P(T_{k,r} \le n)$
• $f_{k,r}(n) = P(T_{k,r}=n)$ is the distribution function for $T_{k,r}$
• $\phi_{k,r} = \sum_{n=0}^{\infty} f_{k,r}(n)z^n$ is the probability generating function

We can see that,

In sec 3, we consider only $T=T_{k,1}$, with $f(n)=f_{k,1}(n)$, and the first theorem is derived as follows: Obviously $f(0)=f(1)=0$ for we cannot have two successes before the second trial. For $n\le k$, i.e., within the size of one window, then $f(n) = (n-1)p^2q^{n-2}$ which is interpreted as having one success somewhere within first $n-1$ trials and the other success at trial $n$. Consider the probability $f(n)$ for larger $n$. We sure we see a success at trial $n$. If the result of the first trial is failure (with probability $q$), then we are looking for the occurrence of such pattern from second trial onward, i.e. with probability $qf(n-1)$. But if the first trial is success (with probability $p$), we cannot have another success in trials 2 to $k$ to meet the criteria that $T_{k,1} = n$. So the probability will be $pq^{k-1}f(n-k)$. Therefore, we established:

Note that $\sum_n f(n) = q\sum_n f(n-1) + pq^{k-1} \sum_n f(n-k)$, so the tail probabilities $F(n) = P(T>n) = \sum_{x=n+1}^{\infty} f(x)$ satisfy the same recurrence relation.

Here, we have a special case of $k=2$: (this is an interview question I was asked, but at the time I wasn’t realized that there is no close form solution and we have to resolve into numeric answer)

Here we have a few properties of $f(n)$: On $% $,

and

therefore

For $f(n+1),\; n\ge k$, it should be lower-bounded by the case that first $k-1$ trials are all failed (which is stricter case than only the first trial failed):

Hence is the equation on p.792. From this we can see that

So here we concluded that $f(n)$ is unimodal with maximum attained at $n=\left[\min(k-1, 1/p)\right]+1$ but the strong unimodality characterization $f^2(n)\ge f(n-1)f(n+1)$ reversed the sign on $n=k+1$. Therefore its convolution with other unimodal distributions is not necessarily unimodal — so it is not easy to tell the properties of $f_{k,r}$ from $f_{k,1}$.

The generating function: First consider that $f(n) = qf(n-1)+p^2q^{n-2}$ for $% $. Then

From which we can find the mean and variance of $T=T_{k,1}$ (corollary 3.1):

and for higher order moments, $m_s = E[T^s]$, $s\ge 0$, we have (theorem 3.3)

Using the above result, the paper proposed a way to estimate the Bernoulli parameter $p$:

which $E[T] = \frac{1}{N}\sum_{i=0}^N T^{(i)}$ is determined by Monte Carlo simulation. Indeed, the summation $\sum_{i=0}^N T^{(i)}$ is one instance of $T_{k,N}$.

I tried out this estimation:

import random

p = 0.14159
k = 5
rangen = random.SystemRandom()

def trial():
# simulate a Bernoulli trial, with success prob p
return rangen.random() <= p

def Tn():
"""Simulate T

Returns:
tuple (T, n), which T is number of trials done until we see 2 head in a
window of k, and n is the total number of success encountered in T
"""
T = 0
last_k = []
while True:
result = trial()
T += 1
last_k.append(result)
if not result:
continue
if sum(last_k[-k:]) == 2:
return (T, sum(last_k))

def main():
# simulate 1000 count of T
N = 1000
sum_T = 0
sum_n = 0
for _ in range(N):
T, n = Tn()
sum_T += T
sum_n += n
# compute probability
simple_p = float(sum_n)/float(sum_T)
mean_T = float(sum_T) / N
h = (1 + 1/(1 - (1 - simple_p)**k)) / simple_p
th_h = (1 + 1/(1 - (1 - p)**k)) / p
print("T, n: %d, %d" % (sum_T, sum_n))
print("real p: %.6f" % p)
print("simple p: %.6f" % simple_p)
print("computed h from simple p: %.6f" % h)
print("experimental T: %.6f" % mean_T)
print("theoretical T: %.6f" % th_h)

if __name__ == "__main__":
main()


and found that this approach is less accurate than simply using the result of all Bernoulli trials ever performed – for the reason that we use vastly less number of samples.

Sec 4 answers the ultimate question of $T_r = T_{k,r}$. It can be seen as a sum $r$ of iid random variables $T_{k,1}$ and therefore, the paper starts with the probability generating function

This can help finding the probability distribution:

and by the properties of sum of iid random variables

## Bibliographic data

@article{
title = "On a waiting time distribution in a sequence of Bernoulli trials",
author = "M. V. Koutras",
journal = "Ann. Inst. Statist. Math.",
volume = "48",
number = "4",
pages = "789-806",
year = "1996",
url = "https://pdfs.semanticscholar.org/5b6a/5299008293657032d170c062bc54f04ce3eb.pdf",
}