This is a failed attempt to simplify the process to generate multivariate Gaussian distribution by utilizing the copula function. However, it has the merit of figuring out what information a copula function failed to provide.
What is a copula function?
Copula is to describe the correlation of random variables. If we have a vector of RVs , each has a continuous CDF , then after integral transform on the RV vector, we have
which is a vector over with each . A copula is the joint CDF:
Wikipedia mention that it has the properties:
- , i.e., if any argument is 0
- , i.e., if all arguments are 1 except
- is non-decreasing, i.e.,
where is any hyper-rectangle within , of the summation goes over all corners of (example of a corner: , i.e. each element of takes either the lower or upper bound in the dimension of the hyper-rectangle), and counts the number of element in vector that takes the lower bound in its dimension.
Use a bivariate as an example, and , then
After all, a copula function is just a probability distribution function. If we define the copula’s density
then the joint density function is related to the copula density by
i.e., we can now calculate the joint density as if the RVs are independent of each other than multiply with the copula density.
Different kinds of copula functions
There are several kinds of copula functions, depends on how the is based on. For example, a Gaussian copula has the form of
which is the correlation matrix of the RVs, and is the inverse CDF of standard normal distribution, and is the joint CDF of a multivariate normal distribution with mean vector of zero and covariance matrix , which therefore the copula’s density function is:
This compares to the general -variate normal density function:
In other words, a Gaussian copula is a multivariate standard normal distribution function with correlation matrix , which each margin has the probability integral transform applied.
Note that a Gaussian copula does not limit to multivariate Gaussian random variables , as long as we can find the corresponding distribution (e.g. all are continuous) and the correlation is well-defined.
Another copula function is the Achimedean copula, which is in the form of
where is a parameter, is a continuous strictly decreasing convex function, known as the generator function, with , and is its pseudo-inverse which for and otherwise.
Expectation for copula models and Monte Carlo integration
Assume we have a function and we are interested in for some random vector . If the CDF of this random vector is , then
which the second equality arises if we have a copula model
and the third equality arises if the copula is continuous and has a density function . Substitute back , and assuming the density is known, we have
Which we can find the expectation using Monte Carlo algorithm:
- Draw samples from copula function
- Produce samples from each
Conversely, if we collected samples of , , the corresponding copula observations are . We can approximate by empirical distribution and corresponding empirical copula is therefore
Generating random vectors from a Gaussian copula
If we have a Gaussian copula , we can generate random vectors from it using the following procedure:
- Perform Cholesky decomposition of the correlation matrix where is the lower triangular matrix
- Generate a -dimensional iid standard normal vector
- Multivariate normal random vector to generate is then , with the corresponding copula vector
This is exactly the same procedure to generate multivariate Gaussian random variables from the covariance matrix , only this time we use correlation matrix instead. That makes sense because and are related:
that is, diagonal of is always 1 and off-diagonal terms are
Up to here, we can see that the copula lost the information of mean and variance of individual components of , as it focus only on their correlation. The random vector generated from the copula is not the same as the original distribution, but scaled and shifted. This can be seen from the following plot:
# 3D gaussian random variable, generate from covariance matrix from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import numpy as np Sigma = np.array([[4, 9, 3], [9, 25, 12], [3, 12, 25]]) S = np.linalg.cholesky(Sigma) # lower triangular part, i.e. S * S' == Sigma iidnorm = np.random.randn(3,1000) mvnorm = np.matmul(S, iidnorm) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(mvnorm, mvnorm, c=mvnorm, alpha=0.1) plt.show()
or equiv 3D scatter plot:
fig = plt.figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(mvnorm, mvnorm, mvnorm, alpha=0.1) plt.show()
and the following is the same plot using Gaussian copula function (i.e. using the correlation matrix):
# From Sigma, generate the correlation matrix R # Convert the covariance matrix into correlation matrix SD = np.mat(np.sqrt(np.diag(Sigma))) # sqrt of diag vector, i.e. std dev R = np.array(Sigma/np.multiply(D, D.T)) A = np.linalg.cholesky(R) mvnorm = np.matmul(A, iidnorm) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(mvnorm, mvnorm, c=mvnorm, alpha=0.1) plt.show()
and the 3D version:
So we see, using the correlation matrix to generate the random vectors is like transforming the version of using covariance matrix to standard normal, as we lost the information of relative scale of variance in each dimension. Because of this, we do not make it any easier to generate multivariate random variables using copula function.
- https://ieeexplore.ieee.org/document/4419639 (DOI 10.1109/WSC.2007.4419639)