Consider a Bernoulli trial with success probability , this paper is interested in the problem of the number of trials needed to see the -th occurrence of seeing 2 successes within a window of no more than trials. Windows are non-overlapping.

A window having 2 successes are defined such that only the first and last trials are by definition a success and its length is less than or equal to . Alternative definition of the problem is based on a “2-out-of- sliding window detector”.

These are the symbols defined in the paper:

- are Bernoulli trial sequence, takes values of 0 or 1
- is the Bernoulli parameter, the probability of success
- is the maximum window size
- is the number of occurrences to look for
- is the waiting time, i.e., number of trials to see the -th occurrence of 2 successes within windows of size
- is the number of occurrences of a strand of at most consecutive trials containing 2 successes.
- is the distribution function for
- is the probability generating function

We can see that,

In sec 3, we consider only , with , and the first theorem is derived as follows: Obviously for we cannot have two successes before the second trial. For , i.e., within the size of one window, then which is interpreted as having one success somewhere within first trials and the other success at trial . Consider the probability for larger . We sure we see a success at trial . If the result of the first trial is failure (with probability ), then we are looking for the occurrence of such pattern from second trial onward, i.e. with probability . But if the first trial is success (with probability ), we cannot have another success in trials 2 to to meet the criteria that . So the probability will be . Therefore, we established:

Note that , so the tail probabilities satisfy the same recurrence relation.

Here, we have a special case of : (this is an interview question I was asked, but at the time I wasn’t realized that there is no close form solution and we have to resolve into numeric answer)

Here we have a few properties of : On ,

and

therefore

For , it should be lower-bounded by the case that first trials are all failed (which is stricter case than only the first trial failed):

Hence is the equation on p.792. From this we can see that

So here we concluded that is unimodal with maximum attained at but the strong unimodality characterization reversed the sign on . Therefore its convolution with other unimodal distributions is not necessarily unimodal — so it is not easy to tell the properties of from .

The generating function: First consider that for . Then

From which we can find the mean and variance of (corollary 3.1):

and for higher order moments, , , we have (theorem 3.3)

Using the above result, the paper proposed a way to estimate the Bernoulli parameter :

which is determined by Monte Carlo simulation. Indeed, the summation is one instance of .

I tried out this estimation:

```
import random
p = 0.14159
k = 5
rangen = random.SystemRandom()
def trial():
# simulate a Bernoulli trial, with success prob p
return rangen.random() <= p
def Tn():
"""Simulate T
Returns:
tuple (T, n), which T is number of trials done until we see 2 head in a
window of k, and n is the total number of success encountered in T
"""
T = 0
last_k = []
while True:
result = trial()
T += 1
last_k.append(result)
if not result:
continue
if sum(last_k[-k:]) == 2:
return (T, sum(last_k))
def main():
# simulate 1000 count of T
N = 1000
sum_T = 0
sum_n = 0
for _ in range(N):
T, n = Tn()
sum_T += T
sum_n += n
# compute probability
simple_p = float(sum_n)/float(sum_T)
mean_T = float(sum_T) / N
h = (1 + 1/(1 - (1 - simple_p)**k)) / simple_p
th_h = (1 + 1/(1 - (1 - p)**k)) / p
print("T, n: %d, %d" % (sum_T, sum_n))
print("real p: %.6f" % p)
print("simple p: %.6f" % simple_p)
print("computed h from simple p: %.6f" % h)
print("experimental T: %.6f" % mean_T)
print("theoretical T: %.6f" % th_h)
if __name__ == "__main__":
main()
```

and found that this approach is less accurate than simply using the result of all Bernoulli trials ever performed – for the reason that we use vastly less number of samples.

Sec 4 answers the ultimate question of . It can be seen as a sum of iid random variables and therefore, the paper starts with the probability generating function

This can help finding the probability distribution:

and by the properties of sum of iid random variables

## Bibliographic data

```
@article{
title = "On a waiting time distribution in a sequence of Bernoulli trials",
author = "M. V. Koutras",
journal = "Ann. Inst. Statist. Math.",
volume = "48",
number = "4",
pages = "789-806",
year = "1996",
url = "https://pdfs.semanticscholar.org/5b6a/5299008293657032d170c062bc54f04ce3eb.pdf",
}
```