This is a summary of common probability distributions in engineering and statistics. This chart has the plots of the pdf or pmf (LaTeX source):

discrete distributions

binomial distribution

A big urn with balls in either white or black color. Drawing a white ball from urn has probability \(x\) (i.e., black ball has probability \(1-x\)). If we draw \(n\) balls from urn with replacement, the probability of getting \(k\) white balls:

\[f(k; n, x) = \binom{n}{k} x^k (1-x)^{n-k}\]

Poisson distribution

Balls are added to the urn at rate of \(\lambda\) per unit time, under exponential distribution. The probability of having \(k\) balls added to the urn within time \(t\):

\[f(k; \lambda t) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}\]

geometric distribution

The probability of have to draw \(k\) balls to see the first white ball being drawn:

\[f(k; x) = (1-x)^{k-1} x\]

negative binomial distribution

same as the distribution of the sum of \(r\) iid geometric random variable
negative binomial approximates Poisson with \(\lambda = r(1-x)\) with large \(r\) and \(x\approx 1\)
Drawing balls from the urn. If we have to draw \(k\) balls to see the \(r\)-th white ball (we have drawn \(r\) white balls and \(k-r\) black balls). The probability of \(k\):

\[f(k; r, x) = \binom{k-1}{k-r} x^r (1-x)^{k-r}\]

hypergeometric distribution

A urn with \(N\) balls (finite) and \(K\) balls amongst are white. Draw, without replacement, \(n\) balls from the urn to get \(k\) white balls:

\[f(k; N, K, n) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}\]

continuous distributions

uniform distribution

extreme of flattened distribution
with upper and lower bounds

triangular distribution

with upper and lower bounds

normal distribution

strong tendency for data at central value; symmetric, equally likely for positive and negative deviations from its central value
frequency of deviations falls off rapidly as we move further away from central value

\[f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{(x-\mu)^2}{2\sigma^2})\]

\[X_1 \sim N(\mu_1, \sigma^2_1); X_2 \sim N(\mu_2, \sigma^2_2) \to X_1+X_2 \sim N(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2)\]
approximation to Poisson distribution: if \(\lambda\) is large, Poisson distribution approximates normal with \(\mu=\sigma^2=\lambda\)
approximation to binomial distribution: if \(n\) is large and \(x\approx \frac{1}{2}\), binomial distribution approximates normal with \(\mu=nx\) and \(\sigma^2=nx(1-x)\)
approximation to beta distribution: if \(\alpha\) and \(\beta\) are large, beta distribution approximates normal with \(\mu=\frac{\alpha}{\alpha+\beta}\) and \(\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\)

Laplace distribution

absolute difference from mean compared to squared difference in normal distribution
longer (fatter) tails, higher kurtosis (flattened peak)
pdf:

\[f(x; \mu, s) = \frac{1}{2s}\exp(-\frac{|x-\mu|}{s})\]

logistic distribution

symmetric, with longer tails and higher kurtosis than normal distribution
logistic distribution has finite mean \(\mu\) and variance defined
\[X\sim U(0,1) \to \mu+s[\log(X)-\log(1-X)] \sim \textrm{Logistic}(\mu,s)\]
\[X\sim \textrm{Exp}(1) \to \mu+s\log(e^X-1) \sim \textrm{Logistic}(\mu,s)\]
logistic pdf:

\[f(x; \mu, s) = \frac{e^{-(x-\mu)/s}}{s\left(1+e^{-(x-\mu)/s}\right)^2}\]

Cauchy distribution

symmetric, with longer tails and higher kurtosis than normal distribution
Cauchy distribution has mean and variance undefined, but mean & mode at \(\mu\)
\[X,Y\sim N(\mu,\sigma^2) \to X/Y \sim \textrm{Cauchy}(\mu,\sigma^2)\]
Cauchy pdf:

\[f(x; x_0, \gamma) = \frac{1}{\pi\gamma\left[1+\left(\frac{x-x_0}{\gamma}\right)^2\right]}\]

lognormal distribution

\(\log(X)\sim N(\mu,\sigma^2)\), positively skewed
parameterised by shape (\(\sigma\)), scale (\(\mu\), or median), shift (\(\theta\))
\(\mu=0, \theta=1\) is standard lognormal distribution
as \(\sigma\) rises, the peak shifts to left and skewness increases

\[f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi x^2\sigma^2}}\exp(-\frac{(\log x-\mu)^2}{2\sigma^2})\]

sum of two lognormal random variable is a lognormal random variable with \(\mu=\mu_1+\mu_2\) and \(\sigma^2=\sigma_1^2+\sigma_2^2\)

Pareto distribution

power law probability distribution
continuous counterpart of Zipf’s law
positively skewed, no negative tail, peak at \(x=0\)

\[f(x; x_m, \alpha) = \frac{\alpha x_m^{\alpha}}{x^{\alpha+1}}\]

gamma distribution

support for \(x\in(0,\infty)\), positive skewness (lean left)
decreasing \(\alpha\) will push distribution towards the left; at low \(\alpha\), left tail will disappear and distribution will resemble exponential
models the time to the \(\alpha\)-th Poisson arrival with arrival rate \(\beta\)
gamma pdf (\(\alpha=1\) becomes exponential pdf with rate \(\beta\)):

\[f(x; \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha-1}e^{-\beta x}\]

Weibull distribution

support for \(x\in(0,\infty)\), positive skewness (lean left)
decreasing \(k\) will push distribution towards the left; at low \(k\), left tail will disappear and distribution will resemble exponential
If \(W\sim\textrm{Weibull}(k,\lambda)\), then \(X=W^k \sim \textrm{Exp}(1/\lambda^k)\)
Weibull pdf (\(k=1\) becomes exponential pdf with rate \(1/\lambda\)):

\[f(x; k, \lambda) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/\lambda)^k}\]

Erlang distribution

\[X_i\sim\textrm{Exp}(\lambda) \to \sum_{i=1}^k X_i \sim \textrm{Erlang}(k, \lambda)\]
arise from teletraffic engineering: time to \(k\)-th call

\[f(x; k,\lambda) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{(k-1)!}\]

beta distribution

support for \(x\in(0,1)\)
allows negative skewness
two shape parameters \(p\) and \(q\), and lower- and upper-bounds on data (\(a\) and \(b\))

\[f(x; \alpha, \beta) = \left(\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}\right)^{-1} x^{\alpha-1}(1-x)^{\beta-1}\]

extreme value distribution (i.e. Gumbel minimum distribution)

negatively skewed
Gumbel maximum distribution, \(f(-x;-\mu,\beta)\), is positively skewed
Limiting distribution of the max/min value of \(n\to\infty\) iid samples from \(\textrm{Exp}(\lambda)\) with \(\lambda = 1/\beta\)
standard cdf: \(F(x)=1-\exp(-e^x)\)

\[f(x; \mu, \beta) = \frac{1}{\beta}e^{(\mu-x)/\beta}e^{-e^{(\mu-x)/\beta}}\]

Rayleigh distribution

positively skewed
modelling the \(L^2\)-norm of two iid normal distribution with zero mean (e.g., orthogonal components of a 2D vector)

\[f(x; \sigma) = \frac{x}{\sigma^2} \exp(-\frac{x^2}{2\sigma^2})\]

Maxwell-Boltzmann distribution

positively skewed
3D counterpart of Rayleigh distribution
arise from thermodynamic: probability of a particle in speed \(v\) if temperature is \(T\)

\[\begin{align} f(v; \sqrt{kT/m}) &= \left(\frac{m}{2\pi kT} \right)^{3/2} 4\pi v^2 \exp(-\frac{mv^2}{2kT}) \\ f(x; a) &= \sqrt{\frac{2}{\pi}}\frac{x^2 e^{-x^2/(2a^2)}}{a^3} \end{align}\]

Chi-squared distribution

distribution of the sum of the square of \(k\ge 1\) i.i.d. standard normal random variables
mean \(k\), variance \(2k\)
PDF with \(k\) degrees of freedom:

\[f(x; k) = \frac{x^{k/2-1} e^{-x/2}}{2^{k/2}\Gamma(k/2)}\]

F-distribution

Distribution of a random variable defined as the ratio of two independent \(\chi^2\)-distributed random variables, with degrees of freedom \(d_1\) and \(d_2\) respectively
Commonly used in ANOVA
PDF, with degrees of freedom \(d_1\) and \(d_2\), involves beta function \(B(\alpha,\beta)\):

\[\begin{align} f(x; d_1, d_2) &= \left[x B(\frac{d_1}{2},\frac{d_2}{2})\right]^{-1} \sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x+d_2)^{d_1+d_2}}} \\ \textrm{and if}\qquad z_1 &\sim \chi^2(d_1) \\ z_2 &\sim \chi^2(d_2) \\ \textrm{then}\qquad x = \frac{z_1/d_1}{z_2/d_2} &\sim f(x, d_1, d_2) \end{align}\]

Student’s t distribution

Distribution of normalized sample mean of \(n=k+1\) observations from a normal distribution, \(\frac{\bar{X}-\mu}{S/\sqrt{n}}\)
Equivalently, this is the distribution of \(\frac{x}{\sqrt{y/r}}\) for \(x\) is standard normal and \(y\) is chi-square with \(r\) degrees of freedom
t distribution with \(n=1\) is Cauchy distribution
PDF with degree of freedom \(k\):

\[f(x, k) = \frac{\Gamma(\frac{k+1}{2})}{\sqrt{k\pi}\Gamma(k/2)} \left(1+\frac{x^2}{k}\right)^{-(k+1)/2}\]

test of fit for distributions

Kolmogorov-Smirnov test (K-S test, on cumulative distribution function \(F(x)\))

\[D_n = \sup_x | F_n(x) - F(x) |\]

if sample comes from distribution, \(D_n\) converges to 0 a.s. as number of samples \(n\) goes to infinity

Shapiro-Wilk test

\[W = \frac{\sum_{i=1}^n a_i x_i}{\sum_{i=1}^n (x_i - \bar{x})^2}\]

test of normality in frequentist statistics (i.e. for \(x_i\) in normal distribution)
\(\bar{x} = \frac{1}{n}(x_1 + \cdots + x_n)\) is the sample mean
\((a_1,\cdots,a_n) = m^T V^{-1} (m^T V^{-1}V^{-1} m)^{-1/2}\) where \(m\) is vector of expected values of the order statistics from normal distribution and \(V\) the covariance matrix of those order statistics

Anderson-Darling test

\[A^2 = n \int_{-\infty}^{\infty} \frac{(F_n(x)-F(x))^2}{F(x)(1-F(x))} dF(x)\]

test whether a sample comes from a specified distribution
\(A^2\) is weighted distance between \(F_n(x)\) and \(F(x)\), with more weight on tails of the distribution

Pearson’s \(\chi^2\) test

\[\chi^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}\]

test for categories fit a distribution: checking observed frequency \(O_i\) against expected frequency \(E_i\) according to distribution for each of \(n\) categories
degree of freedom: \(n\) minus number of parameters of the fitted distribution

Reference

Lawrence M. Leemis and Jacquelyn T. McQuestion. Univariate Distribution Relationships, Am Stat, 62(1) pp.45–53, 2008, DOI: 10.1198/000313008X270448

Aswath Damodaran. Probabilistic approaches: Scenario analysis, decision trees and simulations (PDF, the appendix is also available separately) and includes the following chart for choosing a distribution: