This is a summary of common probability distributions in engineering and statistics. This chart has the plots of the pdf or pmf (LaTeX source):

# discrete distributions

binomial distribution

• A big urn with balls in either white or black color. Drawing a white ball from urn has probability $$x$$ (i.e., black ball has probability $$1-x$$). If we draw $$n$$ balls from urn with replacement, the probability of getting $$k$$ white balls:
$f(k; n, x) = \binom{n}{k} x^k (1-x)^{n-k}$

Poisson distribution

• Balls are added to the urn at rate of $$\lambda$$ per unit time, under exponential distribution. The probability of having $$k$$ balls added to the urn within time $$t$$:
$f(k; \lambda t) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}$

geometric distribution

• The probability of have to draw $$k$$ balls to see the first white ball being drawn:
$f(k; x) = (1-x)^{k-1} x$

negative binomial distribution

• same as the distribution of the sum of $$r$$ iid geometric random variable
• negative binomial approximates Poisson with $$\lambda = r(1-x)$$ with large $$r$$ and $$x\approx 1$$
• Drawing balls from the urn. If we have to draw $$k$$ balls to see the $$r$$-th white ball (we have drawn $$r$$ white balls and $$k-r$$ black balls). The probability of $$k$$:
$f(k; r, x) = \binom{k-1}{k-r} x^r (1-x)^{k-r}$

hypergeometric distribution

• A urn with $$N$$ balls (finite) and $$K$$ balls amongst are white. Draw, without replacement, $$n$$ balls from the urn to get $$k$$ white balls:
$f(k; N, K, n) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

# continuous distributions

uniform distribution

• extreme of flattened distribution
• with upper and lower bounds

triangular distribution

• with upper and lower bounds

normal distribution

• strong tendency for data at central value; symmetric, equally likely for positive and negative deviations from its central value
• frequency of deviations falls off rapidly as we move further away from central value
$f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{(x-\mu)^2}{2\sigma^2})$
• $X_1 \sim N(\mu_1, \sigma^2_1); X_2 \sim N(\mu_2, \sigma^2_2) \to X_1+X_2 \sim N(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2)$
• approximation to Poisson distribution: if $$\lambda$$ is large, Poisson distribution approximates normal with $$\mu=\sigma^2=\lambda$$
• approximation to binomial distribution: if $$n$$ is large and $$x\approx \frac{1}{2}$$, binomial distribution approximates normal with $$\mu=nx$$ and $$\sigma^2=nx(1-x)$$
• approximation to beta distribution: if $$\alpha$$ and $$\beta$$ are large, beta distribution approximates normal with $$\mu=\frac{\alpha}{\alpha+\beta}$$ and $$\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$$

Laplace distribution

• absolute difference from mean compared to squared difference in normal distribution
• longer (fatter) tails, higher kurtosis (flattened peak)
• pdf:
$f(x; \mu, s) = \frac{1}{2s}\exp(-\frac{|x-\mu|}{s})$

logistic distribution

• symmetric, with longer tails and higher kurtosis than normal distribution
• logistic distribution has finite mean $$\mu$$ and variance defined
• $X\sim U(0,1) \to \mu+s[\log(X)-\log(1-X)] \sim \textrm{Logistic}(\mu,s)$
• $X\sim \textrm{Exp}(1) \to \mu+s\log(e^X-1) \sim \textrm{Logistic}(\mu,s)$
• logistic pdf:
$f(x; \mu, s) = \frac{e^{-(x-\mu)/s}}{s\left(1+e^{-(x-\mu)/s}\right)^2}$

Cauchy distribution

• symmetric, with longer tails and higher kurtosis than normal distribution
• Cauchy distribution has mean and variance undefined, but mean & mode at $$\mu$$
• $X,Y\sim N(\mu,\sigma^2) \to X/Y \sim \textrm{Cauchy}(\mu,\sigma^2)$
• Cauchy pdf:
$f(x; x_0, \gamma) = \frac{1}{\pi\gamma\left[1+\left(\frac{x-x_0}{\gamma}\right)^2\right]}$

lognormal distribution

• $$\log(X)\sim N(\mu,\sigma^2)$$, positively skewed
• parameterised by shape ($$\sigma$$), scale ($$\mu$$, or median), shift ($$\theta$$)
• $$\mu=0, \theta=1$$ is standard lognormal distribution
• as $$\sigma$$ rises, the peak shifts to left and skewness increases
$f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi x^2\sigma^2}}\exp(-\frac{(\log x-\mu)^2}{2\sigma^2})$
• sum of two lognormal random variable is a lognormal random variable with $$\mu=\mu_1+\mu_2$$ and $$\sigma^2=\sigma_1^2+\sigma_2^2$$

Pareto distribution

• power law probability distribution
• continuous counterpart of Zipf’s law
• positively skewed, no negative tail, peak at $$x=0$$
$f(x; x_m, \alpha) = \frac{\alpha x_m^{\alpha}}{x^{\alpha+1}}$

gamma distribution

• support for $$x\in(0,\infty)$$, positive skewness (lean left)
• decreasing $$\alpha$$ will push distribution towards the left; at low $$\alpha$$, left tail will disappear and distribution will resemble exponential
• models the time to the $$\alpha$$-th Poisson arrival with arrival rate $$\beta$$
• gamma pdf ($$\alpha=1$$ becomes exponential pdf with rate $$\beta$$):
$f(x; \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha-1}e^{-\beta x}$

Weibull distribution

• support for $$x\in(0,\infty)$$, positive skewness (lean left)
• decreasing $$k$$ will push distribution towards the left; at low $$k$$, left tail will disappear and distribution will resemble exponential
• If $$W\sim\textrm{Weibull}(k,\lambda)$$, then $$X=W^k \sim \textrm{Exp}(1/\lambda^k)$$
• Weibull pdf ($$k=1$$ becomes exponential pdf with rate $$1/\lambda$$):
$f(x; k, \lambda) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/\lambda)^k}$

Erlang distribution

• $X_i\sim\textrm{Exp}(\lambda) \to \sum_{i=1}^k X_i \sim \textrm{Erlang}(k, \lambda)$
• arise from teletraffic engineering: time to $$k$$-th call
$f(x; k,\lambda) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{(k-1)!}$

beta distribution

• support for $$x\in(0,1)$$
• allows negative skewness
• two shape parameters $$p$$ and $$q$$, and lower- and upper-bounds on data ($$a$$ and $$b$$)
$f(x; \alpha, \beta) = \left(\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}\right)^{-1} x^{\alpha-1}(1-x)^{\beta-1}$

extreme value distribution (i.e. Gumbel minimum distribution)

• negatively skewed
• Gumbel maximum distribution, $$f(-x;-\mu,\beta)$$, is positively skewed
• Limiting distribution of the max/min value of $$n\to\infty$$ iid samples from $$\textrm{Exp}(\lambda)$$ with $$\lambda = 1/\beta$$
• standard cdf: $$F(x)=1-\exp(-e^x)$$
$f(x; \mu, \beta) = \frac{1}{\beta}e^{(\mu-x)/\beta}e^{-e^{(\mu-x)/\beta}}$

Rayleigh distribution

• positively skewed
• modelling the $$L^2$$-norm of two iid normal distribution with zero mean (e.g., orthogonal components of a 2D vector)
$f(x; \sigma) = \frac{x}{\sigma^2} \exp(-\frac{x^2}{2\sigma^2})$

Maxwell-Boltzmann distribution

• positively skewed
• 3D counterpart of Rayleigh distribution
• arise from thermodynamic: probability of a particle in speed $$v$$ if temperature is $$T$$
\begin{align} f(v; \sqrt{kT/m}) &= \left(\frac{m}{2\pi kT} \right)^{3/2} 4\pi v^2 \exp(-\frac{mv^2}{2kT}) \\ f(x; a) &= \sqrt{\frac{2}{\pi}}\frac{x^2 e^{-x^2/(2a^2)}}{a^3} \end{align}

Chi-squared distribution

• distribution of the sum of the square of $$k\ge 1$$ i.i.d. standard normal random variables
• mean $$k$$, variance $$2k$$
• PDF with $$k$$ degrees of freedom:
$f(x; k) = \frac{x^{k/2-1} e^{-x/2}}{2^{k/2}\Gamma(k/2)}$

F-distribution

• Distribution of a random variable defined as the ratio of two independent $$\chi^2$$-distributed random variables, with degrees of freedom $$d_1$$ and $$d_2$$ respectively
• Commonly used in ANOVA
• PDF, with degrees of freedom $$d_1$$ and $$d_2$$, involves beta function $$B(\alpha,\beta)$$:
\begin{align} f(x; d_1, d_2) &= \left[x B(\frac{d_1}{2},\frac{d_2}{2})\right]^{-1} \sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x+d_2)^{d_1+d_2}}} \\ \textrm{and if}\qquad z_1 &\sim \chi^2(d_1) \\ z_2 &\sim \chi^2(d_2) \\ \textrm{then}\qquad x = \frac{z_1/d_1}{z_2/d_2} &\sim f(x, d_1, d_2) \end{align}

Student’s t distribution

• Distribution of normalized sample mean of $$n=k+1$$ observations from a normal distribution, $$\frac{\bar{X}-\mu}{S/\sqrt{n}}$$
• Equivalently, this is the distribution of $$\frac{x}{\sqrt{y/r}}$$ for $$x$$ is standard normal and $$y$$ is chi-square with $$r$$ degrees of freedom
• t distribution with $$n=1$$ is Cauchy distribution
• PDF with degree of freedom $$k$$:
$f(x, k) = \frac{\Gamma(\frac{k+1}{2})}{\sqrt{k\pi}\Gamma(k/2)} \left(1+\frac{x^2}{k}\right)^{-(k+1)/2}$

# test of fit for distributions

Kolmogorov-Smirnov test (K-S test, on cumulative distribution function $$F(x)$$)

$D_n = \sup_x | F_n(x) - F(x) |$
• if sample comes from distribution, $$D_n$$ converges to 0 a.s. as number of samples $$n$$ goes to infinity

Shapiro-Wilk test

$W = \frac{\sum_{i=1}^n a_i x_i}{\sum_{i=1}^n (x_i - \bar{x})^2}$
• test of normality in frequentist statistics (i.e. for $$x_i$$ in normal distribution)
• $$\bar{x} = \frac{1}{n}(x_1 + \cdots + x_n)$$ is the sample mean
• $$(a_1,\cdots,a_n) = m^T V^{-1} (m^T V^{-1}V^{-1} m)^{-1/2}$$ where $$m$$ is vector of expected values of the order statistics from normal distribution and $$V$$ the covariance matrix of those order statistics

Anderson-Darling test

$A^2 = n \int_{-\infty}^{\infty} \frac{(F_n(x)-F(x))^2}{F(x)(1-F(x))} dF(x)$
• test whether a sample comes from a specified distribution
• $$A^2$$ is weighted distance between $$F_n(x)$$ and $$F(x)$$, with more weight on tails of the distribution

Pearson’s $$\chi^2$$ test

$\chi^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}$
• test for categories fit a distribution: checking observed frequency $$O_i$$ against expected frequency $$E_i$$ according to distribution for each of $$n$$ categories
• degree of freedom: $$n$$ minus number of parameters of the fitted distribution

# Reference

Lawrence M. Leemis and Jacquelyn T. McQuestion. Univariate Distribution Relationships, Am Stat, 62(1) pp.45–53, 2008, DOI: 10.1198/000313008X270448

Aswath Damodaran. Probabilistic approaches: Scenario analysis, decision trees and simulations (PDF, the appendix is also available separately) and includes the following chart for choosing a distribution: