Simulating data from a VAR(1) model

Simulating time series data.

In the previous document you simulated data according to an AR(1) model. Now, you will simulate data for a vector autoregressive model of order 1, a VAR(1) model. This is an autoregressive model for multiple dependent variables that are regressed on themselves (autoregression) and each other (crossregression).

1 Read up on vector autoregressive models. For example, start with section 1.2 and section 3.1 (up to “The model as defined….” on page 69) in my dissertation.

Study equation 3.2 and 3.3. Because there are multiple dependent variables, the equations are now specified in terms of vectors and matrices.

Now go to the section ‘example’ on https://en.wikipedia.org/wiki/Vector_autoregression. You see two sets of equations here. One in terms of matrices, and one with two equations that are not in terms of matrices (each regression equation has a single dependent variable).

Use matrix algebra (do this by hand, with pen and paper) to get from the matrix specification at the top, to the two equations at the bottom. For this you will need matrix addition “Entrywise sum”: https://en.wikipedia.org/wiki/Matrix_addition, and matrix multiplication “matrix product”: https://www.mathsisfun.com/algebra/matrix-multiplying.html.

2 Choose two suitable (in theory normally distributed) variables that will serve as your two dependent variables, and think about how frequently you will measure both of them. Choose the number of repeated measurements you want to generate for both variables and store this number in object `T’.

T <- 100

3 Choose values for the parameters in the vector autoregressive model you will use to generate the data. Think of the variable you picked to come up with reasonable parameter values.

Make four objects:

one vector called int, containing the two intercepts of your variables
one 2 by 2 matrix called Phi, containing the two autoregression coefficients on the diagonal, and the two cross-lagged coefficients on the off-diagonal.
one 2 by 2 matrix called Sigma, containing the variances for the residuals of your data on the diagonal, and the covariance between the residuals on the off-diagonals. tip: use function cov2cor() on this matrix to see what the correlation matrix is of the residuals.
one vector called mures, containing two elements that are both equal to zero (the means for your residuals).

int <- c(0,0)
Phi <- matrix(c(.6,.3,0,.5),2,2,byrow=TRUE)
Sigma <- matrix(c(.3,.12,.12,.3),2,2,byrow=TRUE)
cov2cor(Sigma)

##      [,1] [,2]
## [1,]  1.0  0.4
## [2,]  0.4  1.0

mures <- c(0,0)

Call all the objects you have made to see what they look like.

4 Generate your T (as many as T) residuals from a bivariate normal distribution. Use object mures for the means and object Sigma for the coviarance matrix. Store the residuals in object `resids’.

For this, install package ‘MASS’, load the package, and then use the package’s function mvrnorm() to generate data from the bivariate normal distribution.

#install.packages("MASS") # install the package
library("MASS") ##load the package
?mvrnorm ## search for help on function mvrnorm

resids <- mvrnorm(T, mures, Sigma)

Call object resids to see what it looks like.

5 Make a matrix called y, with T rows and 2 columnns, with each element equal to `NA’.

y=matrix(NA,T,2)

Call y to see what it looks like.

6 Generate the first observations for the two dependent variables.

The best way would be to generate the first observations from the means and covariance matrix of the VAR(1) process, as we did in the AR(1) simulations. However, this is a bit complicated now, so I will help you with this later. For now, set the first observation for both variables to zero.

y[1,1:2]<-c(0,0)

Call y to see what it looks like.

7 Specify the regression equation for the VAR(1) model, using the matrix specification:

with y, with repeated measurement t selected, as the dependent variable (as in exercise 2)
with y with the repeated measurement right before t selected (for the rows) as the predictor variable (as in exercise 3)
add the residuals by adding object resids, with repeated measurement t selected (for the rows).
don’t forget to match the names of your parameters and the residuals in the equations with those that you used in exercise 4.
for matrix summation in R, use +. For matrix multiplication in R, use %*%.

# y[t,]<-int+Phi%*%y[t-1,]+resids[t,]

8 Specify that t is now equal to 2 (t=2). Run the regression equation code you wrote in the previous exercise. Call your dependent variable y to see what it looks like.

Now specify that t=3. Run the regression equation code you wrote in the previous exercise. Call your dependent variable y to see what it looks like. Now do this for t=4.

t=2
y[t,]<-int+Phi%*%y[t-1,]+resids[t,]

t=3
y[t,]<-int+Phi%*%y[t-1,]+resids[t,]

t=4
y[t,]<-int+Phi%*%y[t-1,]+resids[t,]

If you keep doing this until you have reached t=T, then you have simulated your own AR(1) data. However, this is of course quite tedious. That is why we are going to make R do this for you, by specifying a ‘for loop’.

9 Specify a for loop to simulate your data

Specify a for loop around your regression equation that goes from t=2 until t=T. Simulate your data by running this for loop with the regression equation inside.

for(t in 2:T){
y[t,]<-int+Phi%*%y[t-1,]+resids[t,]
}

10 Plot your data y, for example using function ts.plot(), and hist(). Calculate some descriptive statistics for your data.

ts.plot(y)

hist(y[,1])

hist(y[,2])

11 Make a lagged version of y[,1] called y1tmin1, and of y[,2] called y2tmin1

That is, make a new matrix for each column of y, with 1 column and T rows, and set all its elements to NA. Now, make it such that ytmin1 has NA as the first element of y[,1], the first element of y[,1] as the second element, and so on. Do the same thing again but then for variable y[,2] and y2tmin1.

Put y and the two lagged variables ytmin1 in a dataframe called `dataset’ with y in the first two columns and y1tmin1 and y2min1 on the third and fourth column.

y1tmin1=matrix(NA,T,1)
y1tmin1[2:T]=y[1:T-1,1]
y2tmin1=matrix(NA,T,1)
y2tmin1[2:T]=y[1:T-1,2]


dataset = data.frame(y,y1tmin1, y2tmin1)

Call object dataset to see what it looks like.

12 Perform two linear multiple regression analyses, one with y[,1] as the dependent variable and y1tmin1 and y2tmin1 as the predictor variables, and one with y[,2] as the dependent variable, using function lm(). Put the regression analyses in object ‘myregression1’ and ‘myregression2’ .

Inspect function lm() - lm stands for linear model - to see how the function works. Look at the examples provided in the help file. Perform the regression analysis.

myregression1<-lm(X1 ~ 1 + y1tmin1 + y2tmin1, data=dataset) ## the 1 you add to the regression equation is used to add an intercept to the model.

myregression2<-lm(X2 ~ 1 + y1tmin1 + y2tmin1, data=dataset) ## the 1 you add to the regression equation is used to add an intercept to the model.

Call myregression1 and 2 to see the results. Use summary() to get more detailed results. Interpret the results.

myregression1

## 
## Call:
## lm(formula = X1 ~ 1 + y1tmin1 + y2tmin1, data = dataset)
## 
## Coefficients:
## (Intercept)      y1tmin1      y2tmin1  
##     0.03243      0.61506      0.32461

summary(myregression1)

## 
## Call:
## lm(formula = X1 ~ 1 + y1tmin1 + y2tmin1, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.70565 -0.37913  0.05457  0.43911  1.77142 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03243    0.06041   0.537  0.59262    
## y1tmin1      0.61506    0.08323   7.390 5.42e-11 ***
## y2tmin1      0.32461    0.11091   2.927  0.00427 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5997 on 96 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.6153, Adjusted R-squared:  0.6073 
## F-statistic: 76.77 on 2 and 96 DF,  p-value: < 2.2e-16

myregression2

## 
## Call:
## lm(formula = X2 ~ 1 + y1tmin1 + y2tmin1, data = dataset)
## 
## Coefficients:
## (Intercept)      y1tmin1      y2tmin1  
##    -0.01188      0.06098      0.50155

summary(myregression2)

## 
## Call:
## lm(formula = X2 ~ 1 + y1tmin1 + y2tmin1, data = dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2770 -0.3656 -0.0022  0.4467  1.1682 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.01188    0.06007  -0.198    0.844    
## y1tmin1      0.06098    0.08276   0.737    0.463    
## y2tmin1      0.50155    0.11028   4.548 1.58e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5963 on 96 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:   0.31,  Adjusted R-squared:  0.2956 
## F-statistic: 21.56 on 2 and 96 DF,  p-value: 1.843e-08

13 Now, fit the data again but with package vars.

install package vars
load package vars
use function VAR() on your matrix y (by doing this you fit the same models as in exercise 12)

#install.packages('vars')
library('vars')

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Loading required package: urca

## Loading required package: lmtest

myvar <- VAR(y)

## Warning in VAR(y): No column names supplied in y, using: y1, y2 , instead.

myvar

## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation y1: 
## ======================================= 
## Call:
## y1 = y1.l1 + y2.l1 + const 
## 
##      y1.l1      y2.l1      const 
## 0.61506447 0.32461476 0.03243003 
## 
## 
## Estimated coefficients for equation y2: 
## ======================================= 
## Call:
## y2 = y1.l1 + y2.l1 + const 
## 
##       y1.l1       y2.l1       const 
##  0.06098184  0.50155014 -0.01188301