It is of key importance that your study is reproducible. Therefore, be sure to save all relevant data/results and all of your code.
http://stackoverflow.com/questions/13605271/reasons-for-using-the-set-seed-function#13605506```{r} You usually set a seed at the top of your simulation study R file. A good way to choose a seed number is to use the current date and time (e.g., 300420171505) - this way you are sure you won’t use the same seed for multiple studies.
If you chose to set a seed, be sure to check that each dataset you generate is a different one (e.g., that you don’t use set.seed in such a way that you end up with the same 1000 data sets).
Regardless of whether you use a seed or not, I still want you to also save your data sets for now.
You can use function save() to save it as an object in an .Rdata file, or use write.table() to save the data in a .txt file.
You can use load() for this or read.table() depending on how you saved your file in the previous exercise.
save()For each of the matrices, specify the following columnnames with colnames(): “est”, “sd”, “tvalue”, “pvalue”, “95cilow”, “95cihigh”.
Use str() on summan to inspect the structure of this object. In the output from str(summan) you see a number of dollarsigns with a word behind it, and then : with some explanation. You can use summan$word to call various contents from object summan. Try, for example, summan$call, or summan$residuals.
Use this technique (find the right word in the output from str(summan)) to get the estimate, standard error, tvalue and pvalue for each coefficient (except for the residual variance). Put the estimates, standard errors, tvalues, and pvalues in the appropriate columns of the matrices you made in 5.
Use this technique (find the right word in the output from str(summan)) to get the estimate for the residual standard deviation. Put this estimate in the appropriate column of the matrix you made for the residual standard deviation.
Put the lower bound and upper bounds you found in the appropriate place (column 95cilow and 95cihigh) of the matrices you made in exercise 5. Now, you should have, for each parameters, a full matrix with various kinds of results (except for the residual standard deviation, for which you miss a number of things - that is ok.)
If you get stuck, inspect the helpfiles of all the relevant functions using ?functionname, and use google to find solutions to your problems. There are many resources that will help you get unstuck on the internet!
You will make 1000 data sets for each condition in your simulation study. Make object ndatasets and assign it the value 1000.
Make several matrices, as in exercise 5, filled with NA’s, except now with 1000 rows instead of 1 row.
After this, specify a for loop that iterates for repnum, from 1 to ndatasets. (so repnum is 1 in loop 1, 2 in loop 2, etc)
Inside the for loop, generate one data set and save this dataset. Make sure that for each loopnumber, the file in which you save the dataset has a different name. You can use function paste() for this, to generate a filename for your saved data in each loop, that includes repnum (and repnum changes in each loop, such that your filename is different in each loop).
Inside the for loop, analyse the data set. Save the analysis. Make sure that for each loopnumber, the file in which you save the analysis has a different name.
Inside the for loop, take the results for the coefficients and put them in the matrices you made. Put the results for loop 1 on row 1, for loop 2 on row 2, and so on.
After the for loop, save all of the matrices you made with the results for the coefficients in an .Rdata file.
Be sure to check very well whether each step is working and saved correctly.
Calculate the bias for the estimates of each parameter. The bias is defined as the average difference between the estimated parameter and the true value of the parameter. If all is well, the bias should be very close to zero.
Calculate the average standard error for each parameter across the 1000 datasets.
Calculate the standard deviation of the estimates of each parameter across the 1000 datasets.
Calculate the ratio of the average standard error for the parameter across the conditions, and the standard deviation of the estimates of the parameter across the conditions for each parameter. This value should be close to 1.
Calculate the coverage rate for each parameter. The coverage rate is the proportion of the 1000 datasets, for which the true value of the parameter actually was in between the lower and upper bound of the confidence interval. For a 95% confidence interval, this value should be .95.
Calculate the empirical rejection rate for each parameter. This is the proportion of the 1000 datasets for which the null-hypothesis that the parameter is equal to zero is rejected. You will use this to estimate power (or the type 1 error if the true value of the parameter in question was exactly zero).
I want you to at least check what happens to the power if you use one (AR(1)), two (VAR(1)), or more (3? 5? 10? 100? - you decide) variables (VAR(1)) in your analysis. It would also make a lot of sense to see what happens to the power when you change the sample size, the size of the regression coefficients, or of the residual variance. You could also consider looking what happens for larger orders (AR(2), AR(3)). However, do try to keep the design relatively simple, so that you end up with a doable number of conditions to report on (let’s say not more than 30 conditions total!). Choose reasonable parameter values for the conditions (you can base this on what people have found in the literature, or what I used in my papers, etc).
You can also already run a number of conditions to see what happens. When I’m back we will discuss your design and make adjustments where necessary.