testdata.We will use this data to test some things out before we generate actual time series data.
testdata <- rnorm(100,0,1)
t<-50
testdata[t]
## [1] -0.09289775
testdata[t-1]
## [1] 0.7031973
T <- 100
Make four objects:
one called mu, containing the mean of your data
one called phi, containing the autoregression coefficient
one called sigma2 containing the variance for the residuals of your data.
one called int, the intercept of your data - do this by placing the followin formula in object int: mu*(1-phi). This is the formula with which you can calculate the intercept based on your mean and autoregression coefficient.
mu<-3
phi <- .2
sigma2 <- 1
int <- mu*(1-phi)
Use rnorm() to make an object that contains normally distributed residuals with a mean equal to zero, and a variance equal to the residual variance you chose in the previous exercise.
resids <- rnorm(T,0, sqrt(sigma2))
y=matrix(NA,T,1)
Do this by sampling from a normal distribution, with a mean that is equal to mu, and variance equal to the equation sigma2/(1-phi^2). Use function rnorm for this - and don’t forget that rnorm uses the standard deviation instead of the variance. Save this value to the first element of matrix y.
mu is equal to the mean of an autoregressive process, and the equation for the variance here, is actually the equation for the variance of an autoregressive process (as you can see, this variance is a function of the variance of the residuals, and of the autoregression coefficient)
y[1]<-rnorm(1,mu,sqrt(sigma2/(1-phi^2)))
Call your dependent variable to see what it looks like.
with y, with element t selected, as the dependent variable (as in exercise 2)
with y with the element right before t selected as the predictor variable (as in exercise 3)
add the residuals by adding object resids, with element t selected.
don’t forget to match the names of your parameters and the residuals in the equations with those that you used in exercise 4.
Running this code at this point will (if all went well) do nothing interesting at this point
y[t]<-int+phi*y[t-1]+resids[t]
Now specify that t=3. Run the regression equation code you wrote in the previous exercise. Call your dependent variable y to see what it looks like. Now do this for t=4.
t=2
y[t]<-int+phi*y[t-1]+resids[t]
t=3
y[t]<-int+phi*y[t-1]+resids[t]
t=4
y[t]<-int+phi*y[t-1]+resids[t]
If you keep doing this until you have reached t=T, then you have simulated your own AR(1) data. However, this is of course quite tedious. That is why we are going to make R do this for you, by specifying a ‘for loop’.
Here are some links:
https://www.r-bloggers.com/how-to-write-the-first-for-loop-in-r/
https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r#gs.eboxw6E
Specify a for loop around your regression equation that goes from t=2 until t=T. Simulate your data by running this for loop with the regression equation inside.
for(t in 2:T){
y[t]<-int+phi*y[t-1]+resids[t]
}
mean(y)
## [1] 3.058476
Inspect the functions if you want to see the right arguments, and what options you can change.
ts.plot(y)
hist(y)
That is, make a new matrix ytmin1 with 1 column and T rows, and set all its elements to NA. Now, make it such that ytmin1 has NA as the first element, the first element of y as the second element, and so on.
Put y and ytmin1 in a dataframe called `dataset’ with y in the first column and ytmin1 in the second column.
ytmin1=matrix(NA,T,1)
ytmin1[2:T]=y[1:T-1]
dataset = data.frame(y,ytmin1)
Inspect function lm() - lm stands for linear model - to see how the function works. Look at the examples provided in the help file. Perform the regression analysis.
myregression<-lm(y ~ 1 + ytmin1, data=dataset) ## the 1 you add to the regression equation is used to add an intercept to the model.
Call myregression to see the results. Use summary() to get more detailed results. Interpret the results. Note that the lm() function automatically made a dummy variable for gender, with female as a reference category (it states ‘genderfemale’ in the results).
myregression
##
## Call:
## lm(formula = y ~ 1 + ytmin1, data = dataset)
##
## Coefficients:
## (Intercept) ytmin1
## 2.3217 0.2418
summary(myregression)
##
## Call:
## lm(formula = y ~ 1 + ytmin1, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1661 -0.5962 0.1491 0.6064 2.1128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.32166 0.31979 7.260 9.67e-11 ***
## ytmin1 0.24179 0.09859 2.453 0.016 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.044 on 97 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.05839, Adjusted R-squared: 0.04868
## F-statistic: 6.015 on 1 and 97 DF, p-value: 0.01597
ar(y)
##
## Call:
## ar(x = y)
##
## Coefficients:
## 1
## 0.2412
##
## Order selected 1 sigma^2 estimated as 1.081