Monday, 19 December 2011

Markov-switching AR(1) model

Often the behaviour of economic and financial time series changes dramatically due to structural factors in the economy. An obvious example is the business cycle. During a boom the annual growth rate in GDP will be positive and the time series of growth rates will have a strictly positive mean. But during a recession the growth rate will turn negative and the mean growth rate will be negative. Another example could be CDS spreads for European banks. Fig.1 is the CDS spread for a french bank. We can see from Fig.1 that there is a significant change in both the volatility and level of the series roughly from the end of July coinciding with the downgrade of spanish government debt.

Fig.1
One way of modelling these structural changes is with markov switching models. The basic idea is that the structural parameters which explain a time series can differ depending on the unobserved state. The number of possible unobserved states is chosen to be a small number. Often just two states are modelled e.g recession and boom, risk averse and risk loving, etc. The unobserved state evolves according to a markov process.

Example

Consider the following simple two state markov switching AR(1) model


where 


 and the state 



So that the transition probabilities are as follows





and the vector of unknown parameters which need to be estimated are



In the R code given below I simulated 200 observations from this model and estimated the parameters by maximum likelihood. I found that the likelihood function can be very flat with lots of local maxima, thus the parameter estimates can differ significantly depending on the choice of starting values used. To try and limit the possibility of selecting parameter estimates that only yield a local maximum I repeated the estimation procedure 50 times using random starting values to generate a list of potential MLEs. Then I simply chose the parameter estimates that corresponded to the largest likelihood from the list. In a future post I'll see if I can get faster and more reliable estimates using the Gibbs sampler. 

Fig.2 shows that both the smoothed and filtered probabilities offer an accurate reflection of the true underlying state at each date.

Fig.2


R code


# markov switching model
# generate data for a AR(1) markov switching model with the following pars
# state 0: y_t = 2 + 0.5 * y_{t-1} + e1_t
# state 1: y_t = -1 + 0.8 * y_{t-1} + e2_t
# where e1_t ~ N(0,1), e2_t ~ N(0,0.5^2)
# transition probabilities p_s0_s1 = p_s1_s0 = 0.20

# generate realisations of the state

n <- 200
gamma_s0 <- qnorm(0.8)
gamma_s1 <- qnorm(0.2)
gamma <- rep(0,n)
state <- rep(0,n)

# choose initial state at t=0 to be state 0 

gamma[1] <- gamma_s0
state[1] <- 0

for(i in 2:n) {
if(rnorm(1) < gamma[i-1]) {
gamma[i] <- gamma_s0
state[i] <- 0
}
else {
gamma[i] <- gamma_s1
state[i] <- 1
}
}
# generate observations
# choose y_0 = 0
# recall state at t=1 was set to 0 

y1 <- 2 + 0.5 * 0 + rnorm(1, sd = 1)
y <- rep(0,n)
y[1] <- y1

for(i in 2:n) {
if(state[i]==0) {
y[i] <- 2 + 0.5 * y[i-1] + rnorm(1, sd = 1)
}
else {
y[i] <- -1 + 0.8 * y[i-1] + rnorm(1, sd = 0.5)
}
}

# convert into time series object

y <- ts(y, start = 1, freq = 1)


hamilton.filter <- function(theta, y) {

# construct parameters
    beta_s0 <- theta[1:2]
beta_s1 <- theta[3:4]
sigma2_s0 <- exp(theta[5])
sigma2_s1 <- exp(theta[6])
gamma_s0 <- theta[7]
gamma_s1 <- theta[8]
# construct probabilities

#probit specification
  
p_s0_s0 <- pnorm(gamma_s0)
p_s0_s1 <- pnorm(gamma_s1)
p_s1_s0 <- 1-pnorm(gamma_s0)
p_s1_s1 <- 1-pnorm(gamma_s1)

# # # construct transition probability matrix P # # #

P.mat <- matrix(c(p_s0_s0, p_s1_s0, p_s0_s1, p_s1_s1), nrow = 2, ncol = 2, byrow = TRUE)

# create data matrix

X <- cbind(1,y)

# assume erogodicity of the markov chain
# use unconditional probabilities

p0_s0 <- (1 - p_s1_s1) / (2 -p_s0_s0 -p_s1_s1)
p0_s1 <- 1-p0_s0

# create variables

p_s0_t_1 <- rep(0, nrow(X))
p_s1_t_1 <- rep(0, nrow(X))
p_s0_t <- rep(0, nrow(X))
p_s1_t <- rep(0, nrow(X))
f_s0 <- rep(0,nrow(X)-1)
f_s1 <- rep(0,nrow(X)-1)
f <- rep(0,nrow(X)-1)
logf <- rep(0, nrow(X)-1)

p_s0_t[1] <- p0_s0
p_s1_t[1] <- p0_s1 

# initiate hamilton filter

for(i in 2:nrow(X)) {
# calculate prior probabilities using the TPT
# TPT for this example gives us 
# p_si_t_1 = p_si_t_1_si * p_si_t + p_si_t_1_sj * p_si_t
# where p_si_t_1 is the prob state_t = i given information @ time t-1
# p_si_t_1_sj is the prob state_t = i given state_t_1 = j, and all info @ time t-1
# p_si_t is the prob state_t = i given information @ time t

# in this simple example p_si_t_1_sj = p_si_sj

p_s0_t_1[i] <- (p_s0_s0 * p_s0_t[i-1]) + (p_s0_s1 * p_s1_t[i-1])
p_s1_t_1[i] <- (p_s1_s0 * p_s0_t[i-1]) + (p_s1_s1 * p_s1_t[i-1])

# calculate density function for observation i
# f_si is density conditional on state = i
# f is the density 
  
f_s0[i] <- dnorm(y[i]-X[i-1,]%*%beta_s0, sd = sqrt(sigma2_s0)) 
f_s1[i] <- dnorm(y[i]-X[i-1,]%*%beta_s1, sd = sqrt(sigma2_s1))
f[i] <- (f_s0[i] * p_s0_t_1[i]) + (f_s1[i] * p_s1_t_1[i])

# calculate filtered/posterior probabilities using bayes rule
# p_si_t is the prob that state = i given information @ time t

p_s0_t[i] <- (f_s0[i] * p_s0_t_1[i]) / f[i]
p_s1_t[i] <- (f_s1[i] * p_s1_t_1[i]) / f[i]

logf[i] <- log(f[i])

}

logl <-sum(logf)

output <- list(p_s0_t_1 = p_s0_t_1, p_s1_t_1 = p_s1_t_1,f.prob_s0 = p_s0_t,f.prob_s1 = p_s1_t,T.mat = P.mat,logLik = logl)

return(output)

}


#construct negative loglikelihood function

neg.logl <- function(theta,y) {
out <- hamilton.filter(theta,y)
return(-out$logLik)
}

# pick start values for the 7 unknown parameters
# k is the number of times the estim procedure is run
# different starting values for each run

k <- 50 
start_val <- matrix(rnorm(8*k), nrow = 8, ncol = k)

#matrix to store parameter estimates

par.cand <- matrix(NA,nrow = 8, ncol = k)

#row vector to store likelihood values

like.cand <- matrix(NA, nrow = 1, ncol = k)

#list to store hessian matrices

Hess.cand <- sapply(1:k, function(x) { matrix(0,nrow = 8, ncol = 8) }, simplify = FALSE) 

for(i in 1:k) {

x <-optim(start_val[,i],neg.logl,y=y, hessian = TRUE)

par.cand[,i] <- x$par
like.cand[,i] <-x$value
Hess.cand[[i]] <- x$hessian

cat("Iteration", i, "of", k, "\n")

}

results <- rbind(like.cand,par.cand)

# # # select mle from candidates # # # 

like.cand <- results[1,]
index <- which.min(like.cand)
mle.pars <- results[-1,index]
hessian <- Hess.cand[[index]]

# # # # # # # # # # # # # # # # # # # # # # # # #  
# # # # # # # # # # # # # # # # # # # # # # # # # 

# # # get filtered probabilities # # #

output <- hamilton.filter(mle.pars,y)

fprob_s0 <- output$f.prob_s0
fprob_s1 <- output$f.prob_s1

# # #  get transition matrix # # # 

Tmat <- output$T.mat

# # # define kim's smoothing algorithm # # #

kim.smooth <- function(T.mat,fp_s0,fp_s1,p_s0_t_1,p_s1_t_1) {

n <- length(fp_s0)

# create smoothed probability variables 
p_s0_T <- rep(0,n) 
p_s1_T <- rep(0,n)
# assign initial values to implement kim's algo

p_s0_T[n] <- fp_s0[n]
p_s1_T[n] <- fp_s1[n]

# initiate recursion

for(i in (n-1):1) {
p_s0_T[i] <- fp_s0[i] * ( T.mat[1,1] * (p_s0_T[i+1]/p_s0_t_1[i+1]) + T.mat[2,1] *(p_s1_T[i+1]/p_s1_t_1[i+1])) 
p_s1_T[i] <- fp_s1[i] * ( T.mat[1,2] * (p_s0_T[i+1]/p_s0_t_1[i+1]) + T.mat[2,2] *(p_s1_T[i+1]/p_s1_t_1[i+1]))
}

output <- list(p_s0_T = p_s0_T, p_s1_T = p_s1_T)
return(output)

}

# # # get smoothed probabilities using kim's algo # # #

smoothed.output <- kim.smooth(T.mat = Tmat, fp_s0 = fprob_s0, fp_s1 = fprob_s1, p_s0_t_1 = output$p_s0_t_1, p_s1_t_1 = output$p_s1_t_1)


p_s0_T <- smoothed.output$p_s0_T
p_s1_T <- smoothed.output$p_s1_T

# # # plot results # # #

state <- ts(state, start = 1, freq = 1)

par(mfrow = c(3,1))
plot.ts(y, ylab = "y", main = "Observed series")
plot.ts(cbind(state,p_s1_T), plot.type = c("single"), col = c("black","red"), ylab = "probability/state value", main = "True state and Smoothed probabilities", lty = c(2,1))
legend("topleft",legend = c("true","smoothed prob"), col = c("black", "red"), lty = c(1,1))
plot.ts(cbind(fprob_s1,p_s1_T), plot.type = c("single"), col = c("blue","red"), ylab = "probability", main = "Filtered and Smoothed probabilities", lty = c(1,1))
legend("topleft",legend = c("filtered prob","smoothed prob"), col = c("blue", "red"), lty = c(1,1))

Monday, 12 December 2011

Model selection in a bayesian framework: Which model is more likely?

Suppose we have two competing models named model 0 and model 1. A natural question to ask is "which model is more likely to be have generated the data?". An appealing feature of the bayesian framework is that we can answer this question.

To answer this question we need to calculate the probability that model 0 is the true model and the probability that model 1 is the true model.

I.e. we want to calculate


for k = 0,1.

where m = k means model k is the true model.

How do we calculate these probabilities?

Using Bayes' theorem,



where the probability of the data is given by



And since,


to find each probability we simply evaluate an integral (in practice this normally requires us to carry out monte carlo integration).

Once we have these probabilities we can also calculate the Bayes Factors/Posterior Odds given by



What I really like about this approach to model selection is that it doesn't matter whether the models are nested or not, the approach is always the same. In a frequentist/classical setting testing non-nested models is a lot tougher to handle. Another great property is that as the sample size tends to infinity, with probability 1, this approach will choose the best model. In a frequentist setting even as the sample size tends to infinity we'll still be choosing the wrong models with a probability at least as large as the chosen significance level.

However, a problem with this approach is that it requires us to evaluate integrals which can be computationally expensive. But this problem is becoming less of an issue as computers become cheaper and more powerful.

Example

The following R code considers two bayesian linear regression models with a vague prior. To carry out the monte carlo integration of the posteriors we make use of the gibbs sampler (discussed in my previous post here) to obtain samples from the non-standard distributions.

See R code below


# Model selection example in a bayesian framework
# M0: y_t = theta1 + theta2 * x1^(1/3) + e_t
# M1: y_t = phi1 + phi2 * x1^(2/3) + e_t
# where e_t ~ iidN(0,1)
# want the posterior probabilities of the two models
# postprob_mj = prob(y|model j true) * priorprob_mj / p(y)
# need to calculate the integral of p(y|theta,h)p(theta,h)
# and the integral of p(y|phi,h)p(phi,h)
# recall we chose the vague improper prior p(theta,h) = p(phi,h) = 1/h
# need to sample from the posterior to get an approximation of the integral


# # # # load required package # # # #

library(mvtnorm) 

# # # # generate data # # # #

x1 <- runif(50, min = 0, max = 10)
y <- 1 + 2 * x1^(2/5) + rnorm(50, sd = 0.5)

# # # # # # # # Model 0 # # # # # # # 

Z <- cbind(1,x1^(1/3))
ZZ <- crossprod(Z,Z)
ZZI <- solve(ZZ)
theta.ols <- ZZI %*% t(Z) %*% y
resid_m0 <- y - Z %*% theta.ols
n <- length(y)
k_m0 <- ncol(Z)
v_m0 <- n - k_m0
s2_m0 <- sum(resid_m0^2)/v_m0

# # # # # # # # Model 1 # # # # # # # 

W <- cbind(1,x1^(2/3))
WW <- crossprod(W,W)
WWI <- solve(WW)
phi.ols <- WWI %*% t(W) %*% y
resid_m1 <- y - W %*% phi.ols
n <- length(y)
k_m1 <- ncol(W)
v_m1 <- n - k_m1
s2_m1 <- sum(resid_m1^2)/v_m1

# # # # define gibbs sampler function # # # #

gibbs <- function(nrDraws,m,v,s2,X) {

h_sample <- rgamma(nrDraws, v/2, v*s2/2) 
beta_sample <- matrix(0, nrow = nrDraws, ncol = ncol(X))
XX <- crossprod(X,X)
XXI <- solve(XX)

for(i in 1:nrDraws) {
beta_sample[i,] <- rmvnorm(1,m,(1/h_sample[i]) * XXI)
}
sample <- cbind(beta_sample,h_sample)
return(sample)
}

# # # get sample for model 0 using gibbs sampler # # #
nrDraws <- 10000
m0_sample <- gibbs(nrDraws,theta.ols,v_m0,s2_m0,Z)

# # # get sample for model 1 using gibbs sampler # # #
nrDraws <- 10000
m1_sample <- gibbs(nrDraws,phi.ols,v_m1,s2_m1,W)

# # # # define posterior density function # # # #

f <- function(beta,h,m,v,s2,X) {
XXI <- solve(crossprod(X,X))
x <- dmvnorm(beta,m,1/sqrt(h) * XXI) * dgamma(h, v/2, v*s2/2)
out <- prod(x)
return(out)
}

# # # # monte carlo integration of p(y|m_0) # # # #

post_m0 <- rep(NA,nrow(m0_sample))

for(i in 1:nrow(m0_sample)) {
post_m0[i] <-f(m0_sample[i,1:2],m0_sample[i,3],theta.ols,v_m0,s2_m0,Z)

}

int_m0 <- sum(post_m0) / nrow(m0_sample)

# # # # monte carlo integration of p(y|m_1) # # # #

post_m1 <- rep(NA,nrow(m1_sample))

for(i in 1:nrow(m1_sample)) {
post_m1[i] <-f(m1_sample[i,1:2],m1_sample[i,3],phi.ols,v_m1,s2_m0,W)

}

int_m1 <- sum(post_m1) / nrow(m1_sample)


# # # calculate posterior model probabilities # # #

priorprob_m0 <- 1/2
priorprob_m1 <- 1/2
py <- priorprob_m0 * int_m0 + priorprob_m1 * int_m1

postprob_m0 <- int_m0 * priorprob_m0 / py
postprob_m1 <- int_m1 * priorprob_m1 / py


# # #  calculate Bayes Factor/Posterior Odds  # # #

BF <- postprob_m1/postprob_m0