Simulating time series data.

1 Generate 100 random samples from a normal distribution, and store them in object testdata.

We will use this data to test some things out before we generate actual time series data.

testdata <- rnorm(100,0,1)

2 Make an object called `t’ (short for time point) and assign it the value 50. Then use the object to to select the 50th (the t-th) element from object testdata.

t<-50
testdata[t]
## [1] -0.09289775

3 Now use object t to select the element immediately before t (so the 49th element) from object testdata.

testdata[t-1]
## [1] 0.7031973

5 Choose a suitable (in theory normally distributed) variable that will serve as your dependent variable, and think about how frequently you will measure this variable. Choose the number of repeated measurements you want to generate and store this number in object `T’.

T <- 100

4 Choose values for the parameters in the autoregressive model you will use to generate the data. Think of the variable you picked to come up with reasonable parameter values.

Make four objects:

  • one called mu, containing the mean of your data

  • one called phi, containing the autoregression coefficient

  • one called sigma2 containing the variance for the residuals of your data.

  • one called int, the intercept of your data - do this by placing the followin formula in object int: mu*(1-phi). This is the formula with which you can calculate the intercept based on your mean and autoregression coefficient.

mu<-3
phi <- .2
sigma2 <- 1
int <- mu*(1-phi)

5 Generate your T (as many as T) residuals from a normal distribution with mean 0 and variance sigma2. Store the residuals in object `resids’.

Use rnorm() to make an object that contains normally distributed residuals with a mean equal to zero, and a variance equal to the residual variance you chose in the previous exercise.

resids <- rnorm(T,0, sqrt(sigma2))

6 Make a matrix called y, with T rows and 1 columnn, with each element equal to `NA’.

y=matrix(NA,T,1)

7 Generate the first observation for y.

Do this by sampling from a normal distribution, with a mean that is equal to mu, and variance equal to the equation sigma2/(1-phi^2). Use function rnorm for this - and don’t forget that rnorm uses the standard deviation instead of the variance. Save this value to the first element of matrix y.

mu is equal to the mean of an autoregressive process, and the equation for the variance here, is actually the equation for the variance of an autoregressive process (as you can see, this variance is a function of the variance of the residuals, and of the autoregression coefficient)

y[1]<-rnorm(1,mu,sqrt(sigma2/(1-phi^2)))

Call your dependent variable to see what it looks like.

8 Specify the regression equation for the AR(1) model, just like a regular regression model (see practical 1, exercise 5), but then:

  • with y, with element t selected, as the dependent variable (as in exercise 2)

  • with y with the element right before t selected as the predictor variable (as in exercise 3)

  • add the residuals by adding object resids, with element t selected.

  • don’t forget to match the names of your parameters and the residuals in the equations with those that you used in exercise 4.

  • Running this code at this point will (if all went well) do nothing interesting at this point

y[t]<-int+phi*y[t-1]+resids[t]

9 Specify that t is now equal to 2 (t=2). Run the regression equation code you wrote in the previous exercise. Call your dependent variable y to see what it looks like.

Now specify that t=3. Run the regression equation code you wrote in the previous exercise. Call your dependent variable y to see what it looks like. Now do this for t=4.

t=2
y[t]<-int+phi*y[t-1]+resids[t]

t=3
y[t]<-int+phi*y[t-1]+resids[t]

t=4
y[t]<-int+phi*y[t-1]+resids[t]

If you keep doing this until you have reached t=T, then you have simulated your own AR(1) data. However, this is of course quite tedious. That is why we are going to make R do this for you, by specifying a ‘for loop’.

10 Study for loops in R online.

Here are some links:

Specify a for loop around your regression equation that goes from t=2 until t=T. Simulate your data by running this for loop with the regression equation inside.

for(t in 2:T){
 y[t]<-int+phi*y[t-1]+resids[t]
}

mean(y)
## [1] 3.058476

11 Plot your data y using function ts.plot() and hist(). Calculate some descriptive statistics for your data y.

Inspect the functions if you want to see the right arguments, and what options you can change.

ts.plot(y)

hist(y)

11 Make a lagged version of y called ytmin1.

That is, make a new matrix ytmin1 with 1 column and T rows, and set all its elements to NA. Now, make it such that ytmin1 has NA as the first element, the first element of y as the second element, and so on.

Put y and ytmin1 in a dataframe called `dataset’ with y in the first column and ytmin1 in the second column.

ytmin1=matrix(NA,T,1)
ytmin1[2:T]=y[1:T-1]

dataset = data.frame(y,ytmin1)

12 Perform a linear multiple regression with y as the dependent variable and ytmin1 as the predictor variable using function lm(). Put the regression analysis in object ‘myregression’.

Inspect function lm() - lm stands for linear model - to see how the function works. Look at the examples provided in the help file. Perform the regression analysis.

myregression<-lm(y ~ 1 + ytmin1, data=dataset) ## the 1 you add to the regression equation is used to add an intercept to the model.

Call myregression to see the results. Use summary() to get more detailed results. Interpret the results. Note that the lm() function automatically made a dummy variable for gender, with female as a reference category (it states ‘genderfemale’ in the results).

myregression
## 
## Call:
## lm(formula = y ~ 1 + ytmin1, data = dataset)
## 
## Coefficients:
## (Intercept)       ytmin1  
##      2.3217       0.2418
summary(myregression)
## 
## Call:
## lm(formula = y ~ 1 + ytmin1, data = dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1661 -0.5962  0.1491  0.6064  2.1128 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.32166    0.31979   7.260 9.67e-11 ***
## ytmin1       0.24179    0.09859   2.453    0.016 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.044 on 97 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.05839,    Adjusted R-squared:  0.04868 
## F-statistic: 6.015 on 1 and 97 DF,  p-value: 0.01597

13 Now, for kicks use function ar() to do fit the model again.

ar(y)
## 
## Call:
## ar(x = y)
## 
## Coefficients:
##      1  
## 0.2412  
## 
## Order selected 1  sigma^2 estimated as  1.081

14 Play around with different parameter values and plot your data and fit the model to see what happens.