This is a failed attempt to simplify the process to generate multivariate Gaussian distribution by utilizing the copula function. However, it has the merit of figuring out what information a copula function failed to provide.

What is a copula function?

Copula is to describe the correlation of random variables. If we have a vector of RVs , each has a continuous CDF , then after integral transform on the RV vector, we have

which is a vector over with each . A copula is the joint CDF:

Wikipedia mention that it has the properties:

  • , i.e., if any argument is 0
  • , i.e., if all arguments are 1 except
  • is non-decreasing, i.e.,

where is any hyper-rectangle within , of the summation goes over all corners of (example of a corner: , i.e. each element of takes either the lower or upper bound in the dimension of the hyper-rectangle), and counts the number of element in vector that takes the lower bound in its dimension.

Use a bivariate as an example, and , then

After all, a copula function is just a probability distribution function. If we define the copula’s density

then the joint density function is related to the copula density by

i.e., we can now calculate the joint density as if the RVs are independent of each other than multiply with the copula density.

Different kinds of copula functions

There are several kinds of copula functions, depends on how the is based on. For example, a Gaussian copula has the form of

which is the correlation matrix of the RVs, and is the inverse CDF of standard normal distribution, and is the joint CDF of a multivariate normal distribution with mean vector of zero and covariance matrix , which therefore the copula’s density function is:

This compares to the general -variate normal density function:

In other words, a Gaussian copula is a multivariate standard normal distribution function with correlation matrix , which each margin has the probability integral transform applied.

Note that a Gaussian copula does not limit to multivariate Gaussian random variables , as long as we can find the corresponding distribution (e.g. all are continuous) and the correlation is well-defined.

Another copula function is the Achimedean copula, which is in the form of

where is a parameter, is a continuous strictly decreasing convex function, known as the generator function, with , and is its pseudo-inverse which for and otherwise.

Expectation for copula models and Monte Carlo integration

Assume we have a function and we are interested in for some random vector . If the CDF of this random vector is , then

which the second equality arises if we have a copula model

and the third equality arises if the copula is continuous and has a density function . Substitute back , and assuming the density is known, we have

Which we can find the expectation using Monte Carlo algorithm:

  1. Draw samples from copula function
  2. Produce samples from each
  3. Evaluate

Conversely, if we collected samples of , , the corresponding copula observations are . We can approximate by empirical distribution and corresponding empirical copula is therefore

Generating random vectors from a Gaussian copula

If we have a Gaussian copula , we can generate random vectors from it using the following procedure:

  1. Perform Cholesky decomposition of the correlation matrix where is the lower triangular matrix
  2. Generate a -dimensional iid standard normal vector
  3. Multivariate normal random vector to generate is then , with the corresponding copula vector

This is exactly the same procedure to generate multivariate Gaussian random variables from the covariance matrix , only this time we use correlation matrix instead. That makes sense because and are related:

that is, diagonal of is always 1 and off-diagonal terms are

Up to here, we can see that the copula lost the information of mean and variance of individual components of , as it focus only on their correlation. The random vector generated from the copula is not the same as the original distribution, but scaled and shifted. This can be seen from the following plot:

# 3D gaussian random variable, generate from covariance matrix
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

Sigma = np.array([[4, 9, 3], [9, 25, 12], [3, 12, 25]])
S = np.linalg.cholesky(Sigma) # lower triangular part, i.e. S * S' == Sigma
iidnorm = np.random.randn(3,1000)
mvnorm = np.matmul(S, iidnorm)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(mvnorm[0], mvnorm[1], c=mvnorm[2], alpha=0.1)
plt.show()

or equiv 3D scatter plot:

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(mvnorm[0], mvnorm[1], mvnorm[2], alpha=0.1)
plt.show()

and the following is the same plot using Gaussian copula function (i.e. using the correlation matrix):

# From Sigma, generate the correlation matrix R
# Convert the covariance matrix into correlation matrix
SD = np.mat(np.sqrt(np.diag(Sigma))) # sqrt of diag vector, i.e. std dev
R = np.array(Sigma/np.multiply(D, D.T))
A = np.linalg.cholesky(R)
mvnorm = np.matmul(A, iidnorm)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(mvnorm[0], mvnorm[1], c=mvnorm[2], alpha=0.1)
plt.show()

and the 3D version:

So we see, using the correlation matrix to generate the random vectors is like transforming the version of using covariance matrix to standard normal, as we lost the information of relative scale of variance in each dimension. Because of this, we do not make it any easier to generate multivariate random variables using copula function.

References