This is a summary of common probability distributions in engineering and statistics. This chart has the plots of the pdf or pmf (LaTeX source):

discrete distributions

binomial distribution

  • A big urn with balls in either white or black color. Drawing a white ball from urn has probability $x$ (i.e., black ball has probability $1-x$). If we draw $n$ balls from urn with replacement, the probability of getting $k$ white balls:

Poisson distribution

  • Balls are added to the urn at rate of $\lambda$ per unit time, under exponential distribution. The probability of having $k$ balls added to the urn within time $t$:

geometric distribution

  • The probability of have to draw $k$ balls to see the first white ball being drawn:

negative binomial distribution

  • same as the distribution of the sum of $r$ iid geometric random variable
  • negative binomial approximates Poisson with $\lambda = r(1-x)$ with large $r$ and $x\approx 1$
  • Drawing balls from the urn. If we have to draw $k$ balls to see the $r$-th white ball (we have drawn $r$ white balls and $k-r$ black balls). The probability of $k$:

hypergeometric distribution

  • A urn with $N$ balls (finite) and $K$ balls amongst are white. Draw, without replacement, $n$ balls from the urn to get $k$ white balls:

continuous distributions

uniform distribution

  • extreme of flattened distribution
  • with upper and lower bounds

triangular distribution

  • with upper and lower bounds

normal distribution

  • strong tendency for data at central value; symmetric, equally likely for positive and negative deviations from its central value
  • frequency of deviations falls off rapidly as we move further away from central value
  • $X_1 \sim N(\mu_1, \sigma^2_1); X_2 \sim N(\mu_2, \sigma^2_2) \to X_1+X_2 \sim N(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2)$
  • approximation to Poisson distribution: if $\lambda$ is large, Poisson distribution approximates normal with $\mu=\sigma^2=\lambda$
  • approximation to binomial distribution: if $n$ is large and $x\approx \frac{1}{2}$, binomial distribution approximates normal with $\mu=nx$ and $\sigma^2=nx(1-x)$
  • approximation to beta distribution: if $\alpha$ and $\beta$ are large, beta distribution approximates normal with $\mu=\frac{\alpha}{\alpha+\beta}$ and $\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$

Laplace distribution

  • absolute difference from mean compared to squared difference in normal distribution
  • longer (fatter) tails, higher kurtosis (flattened peak)
  • pdf:

logistic distribution

  • symmetric, with longer tails and higher kurtosis than normal distribution
  • logistic distribution has finite mean $\mu$ and variance defined
  • $X\sim U(0,1) \to \mu+s[\log(X)-\log(1-X)] \sim \textrm{Logistic}(\mu,s)$
  • $X\sim \textrm{Exp}(1) \to \mu+s\log(e^X-1) \sim \textrm{Logistic}(\mu,s)$
  • logistic pdf:

Cauchy distribution

  • symmetric, with longer tails and higher kurtosis than normal distribution
  • Cauchy distribution has mean and variance undefined, but mean & mode at $\mu$
  • $X,Y\sim N(\mu,\sigma^2) \to X/Y \sim \textrm{Cauchy}(\mu,\sigma^2)$
  • Cauchy pdf:

lognormal distribution

  • $\log(X)\sim N(\mu,\sigma^2)$, positively skewed
  • parameterised by shape ($\sigma$), scale ($\mu$, or median), shift ($\theta$)
  • $\mu=0, \theta=1$ is standard lognormal distribution
  • as $\sigma$ rises, the peak shifts to left and skewness increases
  • sum of two lognormal random variable is a lognormal random variable with $\mu=\mu_1+\mu_2$ and $\sigma^2=\sigma_1^2+\sigma_2^2$

Pareto distribution

  • power law probability distribution
  • continuous counterpart of Zipf’s law
  • positively skewed, no negative tail, peak at $x=0$

gamma distribution

  • support for $x\in(0,\infty)$, positive skewness (lean left)
  • decreasing $\alpha$ will push distribution towards the left; at low $\alpha$, left tail will disappear and distribution will resemble exponential
  • models the time to the $\alpha$-th Poisson arrival with arrival rate $\beta$
  • gamma pdf ($\alpha=1$ becomes exponential pdf with rate $\beta$):

Weibull distribution

  • support for $x\in(0,\infty)$, positive skewness (lean left)
  • decreasing $k$ will push distribution towards the left; at low $k$, left tail will disappear and distribution will resemble exponential
  • If $W\sim\textrm{Weibull}(k,\lambda)$, then $X=W^k \sim \textrm{Exp}(1/\lambda^k)$
  • Weibull pdf ($k=1$ becomes exponential pdf with rate $1/\lambda$):

Erlang distribution

  • $X_i\sim\textrm{Exp}(\lambda) \to \sum_{i=1}^k X_i \sim \textrm{Erlang}(k, \lambda)$
  • arise from teletraffic engineering: time to $k$-th call

beta distribution

  • support for $x\in(0,1)$
  • allows negative skewness
  • two shape parameters $p$ and $q$, and lower- and upper-bounds on data ($a$ and $b$)

extreme value distribution (i.e. Gumbel minimum distribution)

  • negatively skewed
  • Gumbel maximum distribution, $f(-x;-\mu,\beta)$, is positively skewed
  • Limiting distribution of the max/min value of $n\to\infty$ iid samples from $\textrm{Exp}(\lambda)$ with $\lambda = 1/\beta$
  • standard cdf: $F(x)=1-\exp(-e^x)$

Rayleigh distribution

  • positively skewed
  • modelling the $L^2$-norm of two iid normal distribution with zero mean (e.g., orthogonal components of a 2D vector)

Maxwell-Boltzmann distribution

  • positively skewed
  • 3D counterpart of Rayleigh distribution
  • arise from thermodynamic: probability of a particle in speed $v$ if temperature is $T$

Chi-squared distribution

  • distribution of the sum of the square of $k\ge 1$ i.i.d. standard normal random variables
  • mean $k$, variance $2k$
  • PDF with $k$ degrees of freedom:

F-distribution

  • Distribution of a random variable defined as the ratio of two independent $\chi^2$-distributed random variables
  • Commonly used in ANOVA
  • PDF, with degrees of freedom $d_1$ and $d_2$, involves beta function $B(\alpha,\beta)$:

Student’s t distribution

  • Distribution of normalized sample mean of $n=k+1$ observations from a normal distribution, $\frac{\bar{X}-\mu}{S/\sqrt{n}}$
  • PDF with degree of freedom $k$:

test of fit for distributions

Kolmogorov-Smirnov test (K-S test, on cumulative distribution function $F(x)$)

  • if sample comes from distribution, $D_n$ converges to 0 a.s. as number of samples $n$ goes to infinity

Shapiro-Wilk test

  • test of normality in frequentist statistics (i.e. for $x_i$ in normal distribution)
  • $\bar{x} = \frac{1}{n}(x_1 + \cdots + x_n)$ is the sample mean
  • $(a_1,\cdots,a_n) = m^T V^{-1} (m^T V^{-1}V^{-1} m)^{-1/2}$ where $m$ is vector of expected values of the order statistics from normal distribution and $V$ the covariance matrix of those order statistics

Anderson-Darling test

  • test whether a sample comes from a specified distribution
  • $A^2$ is weighted distance between $F_n(x)$ and $F(x)$, with more weight on tails of the distribution

Pearson’s $\chi^2$ test

  • test for categories fit a distribution: checking observed frequency $O_i$ against expected frequency $E_i$ according to distribution for each of $n$ categories
  • degree of freedom: $n$ minus number of parameters of the fitted distribution

Reference

Lawrence M. Leemis and Jacquelyn T. McQuestion. Univariate Distribution Relationships, Am Stat, 62(1) pp.45–53, 2008, DOI: 10.1198/000313008X270448

Aswath Damodaran. Probabilistic approaches: Scenario analysis, decision trees and simulations (PDF, the appendix is also available separately) and includes the following chart for choosing a distribution: