This is a summary of common probability distributions in engineering and statistics. This chart has the plots of the pdf or pmf (LaTeX source): # discrete distributions

binomial distribution

• A big urn with balls in either white or black color. Drawing a white ball from urn has probability $x$ (i.e., black ball has probability $1-x$). If we draw $n$ balls from urn with replacement, the probability of getting $k$ white balls:

Poisson distribution

• Balls are added to the urn at rate of $\lambda$ per unit time, under exponential distribution. The probability of having $k$ balls added to the urn within time $t$:

geometric distribution

• The probability of have to draw $k$ balls to see the first white ball being drawn:

negative binomial distribution

• same as the distribution of the sum of $r$ iid geometric random variable
• negative binomial approximates Poisson with $\lambda = r(1-x)$ with large $r$ and $x\approx 1$
• Drawing balls from the urn. If we have to draw $k$ balls to see the $r$-th white ball (we have drawn $r$ white balls and $k-r$ black balls). The probability of $k$:

hypergeometric distribution

• A urn with $N$ balls (finite) and $K$ balls amongst are white. Draw, without replacement, $n$ balls from the urn to get $k$ white balls:

# continuous distributions

uniform distribution

• extreme of flattened distribution
• with upper and lower bounds

triangular distribution

• with upper and lower bounds

normal distribution

• strong tendency for data at central value; symmetric, equally likely for positive and negative deviations from its central value
• frequency of deviations falls off rapidly as we move further away from central value
• approximation to Poisson distribution: if $\lambda$ is large, Poisson distribution approximates normal with $\mu=\sigma^2=\lambda$
• approximation to binomial distribution: if $n$ is large and $x\approx \frac{1}{2}$, binomial distribution approximates normal with $\mu=nx$ and $\sigma^2=nx(1-x)$
• approximation to beta distribution: if $\alpha$ and $\beta$ are large, beta distribution approximates normal with $\mu=\frac{\alpha}{\alpha+\beta}$ and $\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$

Laplace distribution

• absolute difference from mean compared to squared difference in normal distribution
• longer (fatter) tails, higher kurtosis (flattened peak)
• pdf:

logistic distribution

• symmetric, with longer tails and higher kurtosis than normal distribution
• logistic distribution has finite mean $\mu$ and variance defined
• logistic pdf:

Cauchy distribution

• symmetric, with longer tails and higher kurtosis than normal distribution
• Cauchy distribution has mean and variance undefined, but mean & mode at $\mu$
• Cauchy pdf:

lognormal distribution

• $\log(X)\sim N(\mu,\sigma^2)$, positively skewed
• parameterised by shape ($\sigma$), scale ($\mu$, or median), shift ($\theta$)
• $\mu=0, \theta=1$ is standard lognormal distribution
• as $\sigma$ rises, the peak shifts to left and skewness increases
• sum of two lognormal random variable is a lognormal random variable with $\mu=\mu_1+\mu_2$ and $\sigma^2=\sigma_1^2+\sigma_2^2$

Pareto distribution

• power law probability distribution
• continuous counterpart of Zipf’s law
• positively skewed, no negative tail, peak at $x=0$

gamma distribution

• support for $x\in(0,\infty)$, positive skewness (lean left)
• decreasing $\alpha$ will push distribution towards the left; at low $\alpha$, left tail will disappear and distribution will resemble exponential
• models the time to the $\alpha$-th Poisson arrival with arrival rate $\beta$
• gamma pdf ($\alpha=1$ becomes exponential pdf with rate $\beta$):

Weibull distribution

• support for $x\in(0,\infty)$, positive skewness (lean left)
• decreasing $k$ will push distribution towards the left; at low $k$, left tail will disappear and distribution will resemble exponential
• If $W\sim\textrm{Weibull}(k,\lambda)$, then $X=W^k \sim \textrm{Exp}(1/\lambda^k)$
• Weibull pdf ($k=1$ becomes exponential pdf with rate $1/\lambda$):

Erlang distribution

• arise from teletraffic engineering: time to $k$-th call

beta distribution

• support for $x\in(0,1)$
• allows negative skewness
• two shape parameters $p$ and $q$, and lower- and upper-bounds on data ($a$ and $b$)

extreme value distribution (i.e. Gumbel minimum distribution)

• negatively skewed
• Gumbel maximum distribution, $f(-x;-\mu,\beta)$, is positively skewed
• Limiting distribution of the max/min value of $n\to\infty$ iid samples from $\textrm{Exp}(\lambda)$ with $\lambda = 1/\beta$
• standard cdf: $F(x)=1-\exp(-e^x)$

Rayleigh distribution

• positively skewed
• modelling the $L^2$-norm of two iid normal distribution with zero mean (e.g., orthogonal components of a 2D vector)

Maxwell-Boltzmann distribution

• positively skewed
• 3D counterpart of Rayleigh distribution
• arise from thermodynamic: probability of a particle in speed $v$ if temperature is $T$

Chi-squared distribution

• distribution of the sum of the square of $k\ge 1$ i.i.d. standard normal random variables
• mean $k$, variance $2k$
• PDF with $k$ degrees of freedom:

F-distribution

• Distribution of a random variable defined as the ratio of two independent $\chi^2$-distributed random variables, with degrees of freedom $d_1$ and $d_2$ respectively
• Commonly used in ANOVA
• PDF, with degrees of freedom $d_1$ and $d_2$, involves beta function $B(\alpha,\beta)$:

Student’s t distribution

• Distribution of normalized sample mean of $n=k+1$ observations from a normal distribution, $\frac{\bar{X}-\mu}{S/\sqrt{n}}$
• Equivalently, this is the distribution of $\frac{x}{\sqrt{y/r}}$ for $x$ is standard normal and $y$ is chi-square with $r$ degrees of freedom
• t distribution with $n=1$ is Cauchy distribution
• PDF with degree of freedom $k$:

# test of fit for distributions

Kolmogorov-Smirnov test (K-S test, on cumulative distribution function $F(x)$)

• if sample comes from distribution, $D_n$ converges to 0 a.s. as number of samples $n$ goes to infinity

Shapiro-Wilk test

• test of normality in frequentist statistics (i.e. for $x_i$ in normal distribution)
• $\bar{x} = \frac{1}{n}(x_1 + \cdots + x_n)$ is the sample mean
• $(a_1,\cdots,a_n) = m^T V^{-1} (m^T V^{-1}V^{-1} m)^{-1/2}$ where $m$ is vector of expected values of the order statistics from normal distribution and $V$ the covariance matrix of those order statistics

Anderson-Darling test

• test whether a sample comes from a specified distribution
• $A^2$ is weighted distance between $F_n(x)$ and $F(x)$, with more weight on tails of the distribution

Pearson’s $\chi^2$ test

• test for categories fit a distribution: checking observed frequency $O_i$ against expected frequency $E_i$ according to distribution for each of $n$ categories
• degree of freedom: $n$ minus number of parameters of the fitted distribution

# Reference

Lawrence M. Leemis and Jacquelyn T. McQuestion. Univariate Distribution Relationships, Am Stat, 62(1) pp.45–53, 2008, DOI: 10.1198/000313008X270448

Aswath Damodaran. Probabilistic approaches: Scenario analysis, decision trees and simulations (PDF, the appendix is also available separately) and includes the following chart for choosing a distribution: 