Monday, 5 December 2011

MCMC Methods: The Gibbs Sampler

Suppose we want to draw samples from a non-standard distribution which is normally called the target distribution. The target is not a distribution that can be directly sampled by calling a function such as rnorm() in R.  An ingenious way of generating realisations from the target is by constructing a markov chain whose stationary distribution is the target. If this can be done, then by running the chain long enough the realisations from this chain will eventually be realisations from the target. This is what Markov Chain Monte Carlo (MCMC) methods are all about.

Sadly, before we can go any further we need to give a few definitions:

Definitions

A markov chain is a sequence of random variables $y_{t},y_{t+1},y_{t+2},...$

such that the transition probabilities satisfy

$P(y_{t} = Y| y_{t-1},y_{t-2},y_{t-3},...) = P(y_{t} = Y|y_{t-1})$

for all t.

Assume that the transition probabilities are independent of t then the rule which dictates the transition of the chain from t to t+1 can be described a transition kernel $K(x,y)$

where
$K(x,y) = P(y_{t+1} = y | y_{t} = x)$

Finally, let $p_{t+1}(y)$ denote the distribution for the state of the chain at date t+1

where
$p_{t+1}(y) = \int K(x,y)p_{t}(x) dx$

A stationary distribution of the transition kernel $K(x,y)$  denoted $p(y)$ satisfies the following equation

$p(y) = \int K(x,y)p(x) dx$

Now for any of this to be of practical use we need to find a transition kernel such that $p_{t}(y)$ converges to the target distribution.

Enter the Gibbs Sampler

Suppose we partition the vector $y$ into two blocks $(y_1,y_2)$ such that we can find two component conditional densities of the target, $p_{Y_1|Y_2}(y_1|x_2)$ and $p_{Y_2|Y_1}(y_2|y_1)$

Then the following kernel can be shown to have the target $p(y)$ as its stationary distribution.

$K(x,y) =  p_{Y_1|Y_2}(y_1|x_2)p_{Y_2|Y_1}(y_2|y_1)$

This is the Gibbs Sampler kernel.

We can iterate the following algorithm to generate realisations according to the transition kernel described above.

1. Draw $y_1$ from $p_{Y_1|Y_2}(y_1|x_2)$
2. Using $y_1$ now draw $y_2$ from $p_{Y_2|Y_1}(y_2|y_1)$
3. Use $y_2$ in step 1.

To be sure of convergence a number of regularity conditions would have to be checked but in practice people choose to carry out ex-post convergence checks.

Lets take a look at an example.

Example

Consider the Normal-Gamma distribution that arises from the bayesian linear regression model with a gamma prior for the precision (the reciprocal of the variance) and a vague improper prior for the parameters. The kernel of the Normal-Gamma distribution is given below.

$h^{n/2}exp\left ({-\frac{h}{2}(\beta - \hat{\beta})^{'} (X'X)^{-1}(\beta - \hat{\beta})}\right )\times exp\left ( \frac{-hvs^2}{2} \right )$

Where $\hat{\beta}$ is the OLS estimate of $\beta$ and $v = n - k$ is the degrees of freedom and finally $s^2$ is the estimated variance of the OLS residuals.

Now in R there is no function we can call to draw samples from this distribution. But by factorizing the density as the product of a normal density, conditional on h, and a gamma density for h - you can already see how the algorithm will work. 

Iterating the following steps will generate a a sample from the normal-gamma distribution.

1. Draw $h_{t} \sim  \Gamma(v/2,vs^2/2)$
2. Draw $\beta_{t} \sim N(\hat{\beta}, \frac{1}{h_{t}} (X'X)^{-1})$

Here is a simple example of this algorithm implemented using R.


# # # # load required package # # # #

library(mvtnorm) 

# # # # generate data # # # #

x1 <- runif(100, min = 0, max = 10)
y <- 1 + 2 * x1 + rnorm(100)

# # # # # # # # Model # # # # # # # 

X <- cbind(1,x1)
XX <- crossprod(X,X)
XXI <- solve(XX)
beta.ols <- XXI %*% t(X) %*% y
resid <- y - X %*% beta.ols
n <- length(y)
k <- ncol(X)
v <- n - k
s2 <- sum(resid^2)/v

# # # # define gibbs sampler function # # # #

gibbs <- function(nrDraws,m,v,s2,X) {

h_sample <- rgamma(nrDraws, v/2, v*s2/2) 
beta_sample <- matrix(0, nrow = nrDraws, ncol = ncol(X))
XX <- crossprod(X,X)
XXI <- solve(XX)

for(i in 1:nrDraws) {
beta_sample[i,] <- rmvnorm(1,m,(1/h_sample[i]) * XXI)
}
sample <- cbind(beta_sample,h_sample)
return(sample)
}

# # # get sample for model using gibbs sampler # # #

nrDraws <- 10000
sample <- gibbs(nrDraws,beta.ols,v,s2,X)
colnames(sample) <- c("intercept", "slope", "precision")

No comments:

Post a Comment