This is a summary of common probability distributions in engineering and statistics. This chart has the plots of the pdf or pmf (LaTeX source):

# discrete distributions

binomial distribution

- A big urn with balls in either white or black color. Drawing a white ball from urn has probability $x$ (i.e., black ball has probability $1-x$). If we draw $n$ balls from urn with replacement, the probability of getting $k$ white balls:

Poisson distribution

- Balls are added to the urn at rate of $\lambda$ per unit time, under exponential distribution. The probability of having $k$ balls added to the urn within time $t$:

geometric distribution

- The probability of have to draw $k$ balls to see the first white ball being drawn:

negative binomial distribution

- same as the distribution of the sum of $r$ iid geometric random variable
- negative binomial approximates Poisson with $\lambda = r(1-x)$ with large $r$ and $x\approx 1$
- Drawing balls from the urn. If we have to draw $k$ balls to see the $r$-th white ball (we have drawn $r$ white balls and $k-r$ black balls). The probability of $k$:

hypergeometric distribution

- A urn with $N$ balls (finite) and $K$ balls amongst are white. Draw, without replacement, $n$ balls from the urn to get $k$ white balls:

# continuous distributions

uniform distribution

- extreme of flattened distribution
- with upper and lower bounds

triangular distribution

- with upper and lower bounds

normal distribution

- strong tendency for data at central value; symmetric, equally likely for positive and negative deviations from its central value
- frequency of deviations falls off rapidly as we move further away from central value

- $X_1 \sim N(\mu_1, \sigma^2_1); X_2 \sim N(\mu_2, \sigma^2_2) \to X_1+X_2 \sim N(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2)$
- approximation to Poisson distribution: if $\lambda$ is large, Poisson distribution approximates normal with $\mu=\sigma^2=\lambda$
- approximation to binomial distribution: if $n$ is large and $x\approx \frac{1}{2}$, binomial distribution approximates normal with $\mu=nx$ and $\sigma^2=nx(1-x)$
- approximation to beta distribution: if $\alpha$ and $\beta$ are large, beta distribution approximates normal with $\mu=\frac{\alpha}{\alpha+\beta}$ and $\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$

Laplace distribution

- absolute difference from mean compared to squared difference in normal distribution
- longer (fatter) tails, higher kurtosis (flattened peak)
- pdf:

logistic distribution

- symmetric, with longer tails and higher kurtosis than normal distribution
- logistic distribution has finite mean $\mu$ and variance defined
- $X\sim U(0,1) \to \mu+s[\log(X)-\log(1-X)] \sim \textrm{Logistic}(\mu,s)$
- $X\sim \textrm{Exp}(1) \to \mu+s\log(e^X-1) \sim \textrm{Logistic}(\mu,s)$
- logistic pdf:

Cauchy distribution

- symmetric, with longer tails and higher kurtosis than normal distribution
- Cauchy distribution has mean and variance undefined, but mean & mode at $\mu$
- $X,Y\sim N(\mu,\sigma^2) \to X/Y \sim \textrm{Cauchy}(\mu,\sigma^2)$
- Cauchy pdf:

lognormal distribution

- $\log(X)\sim N(\mu,\sigma^2)$, positively skewed
- parameterised by shape ($\sigma$), scale ($\mu$, or median), shift ($\theta$)
- $\mu=0, \theta=1$ is standard lognormal distribution
- as $\sigma$ rises, the peak shifts to left and skewness increases

- sum of two lognormal random variable is a lognormal random variable with $\mu=\mu_1+\mu_2$ and $\sigma^2=\sigma_1^2+\sigma_2^2$

Pareto distribution

- power law probability distribution
- continuous counterpart of Zipf’s law
- positively skewed, no negative tail, peak at $x=0$

gamma distribution

- support for $x\in(0,\infty)$, positive skewness (lean left)
- decreasing $\alpha$ will push distribution towards the left; at low $\alpha$, left tail will disappear and distribution will resemble exponential
- models the time to the $\alpha$-th Poisson arrival with arrival rate $\beta$
- gamma pdf ($\alpha=1$ becomes exponential pdf with rate $\beta$):

Weibull distribution

- support for $x\in(0,\infty)$, positive skewness (lean left)
- decreasing $k$ will push distribution towards the left; at low $k$, left tail will disappear and distribution will resemble exponential
- If $W\sim\textrm{Weibull}(k,\lambda)$, then $X=W^k \sim \textrm{Exp}(1/\lambda^k)$
- Weibull pdf ($k=1$ becomes exponential pdf with rate $1/\lambda$):

Erlang distribution

- $X_i\sim\textrm{Exp}(\lambda) \to \sum_{i=1}^k X_i \sim \textrm{Erlang}(k, \lambda)$
- arise from teletraffic engineering: time to $k$-th call

beta distribution

- support for $x\in(0,1)$
- allows negative skewness
- two shape parameters $p$ and $q$, and lower- and upper-bounds on data ($a$ and $b$)

extreme value distribution (i.e. Gumbel minimum distribution)

- negatively skewed
- Gumbel maximum distribution, $f(-x;-\mu,\beta)$, is positively skewed
- Limiting distribution of the max/min value of $n\to\infty$ iid samples from $\textrm{Exp}(\lambda)$ with $\lambda = 1/\beta$
- standard cdf: $F(x)=1-\exp(-e^x)$

Rayleigh distribution

- positively skewed
- modelling the $L^2$-norm of two iid normal distribution with zero mean (e.g., orthogonal components of a 2D vector)

Maxwell-Boltzmann distribution

- positively skewed
- 3D counterpart of Rayleigh distribution
- arise from thermodynamic: probability of a particle in speed $v$ if temperature is $T$

Chi-squared distribution

- distribution of the sum of the square of $k\ge 1$ i.i.d. standard normal random variables
- mean $k$, variance $2k$
- PDF with $k$ degrees of freedom:

F-distribution

- Distribution of a random variable defined as the ratio of two independent $\chi^2$-distributed random variables
- Commonly used in ANOVA
- PDF, with degrees of freedom $d_1$ and $d_2$, involves beta function $B(\alpha,\beta)$:

Student’s t distribution

- Distribution of normalized sample mean of $n=k+1$ observations from a normal distribution, $\frac{\bar{X}-\mu}{S/\sqrt{n}}$
- PDF with degree of freedom $k$:

# test of fit for distributions

Kolmogorov-Smirnov test (K-S test, on cumulative distribution function $F(x)$)

- if sample comes from distribution, $D_n$ converges to 0 a.s. as number of samples $n$ goes to infinity

Shapiro-Wilk test

- test of normality in frequentist statistics (i.e. for $x_i$ in normal distribution)
- $\bar{x} = \frac{1}{n}(x_1 + \cdots + x_n)$ is the sample mean
- $(a_1,\cdots,a_n) = m^T V^{-1} (m^T V^{-1}V^{-1} m)^{-1/2}$ where $m$ is vector of expected values of the order statistics from normal distribution and $V$ the covariance matrix of those order statistics

Anderson-Darling test

- test whether a sample comes from a specified distribution
- $A^2$ is weighted distance between $F_n(x)$ and $F(x)$, with more weight on tails of the distribution

Pearson’s $\chi^2$ test

- test for categories fit a distribution: checking observed frequency $O_i$ against expected frequency $E_i$ according to distribution for each of $n$ categories
- degree of freedom: $n$ minus number of parameters of the fitted distribution

# Reference

Lawrence M. Leemis and Jacquelyn T. McQuestion. Univariate Distribution Relationships, Am Stat, 62(1) pp.45–53, 2008, DOI: 10.1198/000313008X270448

Aswath Damodaran. Probabilistic approaches: Scenario analysis, decision trees and simulations (PDF, the appendix is also available separately) and includes the following chart for choosing a distribution: