We present examples of setting up a simulation function for various models and R packages. These are written in a basic way to serve as blueprints for further customization.
The general layout of a simulation function is:
<- function(N) {
simfun # Generate a data set Test the hypothesis
}
For all examples in this vignette, we use an alpha level of .01 and a desired power of .95.
For further guidance on how to use the package and the
find.design
function specifically, see the Readme.md file
library(mlpwr)
We test for one sample whether the mean differs from 0. The true effect size is Cohen’s d = .3. In this scenario, this means that the true mean difference is .3.
<- function(N) {
simfun_ttest # Generate a data set
<- rnorm(n = N, mean = 0.3)
dat # Test the hypothesis
<- t.test(dat)
res $p.value < 0.01
res }
Example Use
<- find.design(simfun = simfun_ttest, boundaries = c(100,
res 300), power = 0.95)
library(mlpwr)
We test for a mean difference among n.groups
groups.
Each group consists of n
participants.
<- function(n, n.groups) {
simfun_anova
# Generate a data set
<- rnorm(n.groups, sd = 0.2) # generate groupmeans using cohen's f=.2
groupmeans <- sapply(groupmeans, function(x) rnorm(n,
dat mean = x, sd = 1)) # generate data
<- dat |>
dat as.data.frame() |>
gather() # format
# Test the hypothesis
<- aov(value ~ key, data = dat) # perform ANOVA
res summary(res)[[1]][1, 5] < 0.01 # extract significance
}
Example Use
<- find.design(simfun = simfun_anova, costfun = function(n,
res 5 * n + 20 * n.groups, boundaries = list(n = c(10,
n.groups) 150), n.groups = c(5, 30)), power = 0.95)
library(mlpwr)
We consider generalized linear models using the stats
package and the glm
function.
We use an “original” data set and fit a generalized linear model
assuming a poisson distributed criterion counts
.
<- data.frame(counts = c(18, 17, 15, 20,
dat.original 10, 20, 25, 13, 12), treatment = gl(3, 1, 9), outcome = gl(3,
3))
<- glm(counts ~ outcome + treatment, data = dat.original,
mod.original family = poisson)
summary(mod.original)
We use the parameters from the original model in the
simfun
.
<- function(N) {
simfun_glm1
# generate data
<- data.frame(outcome = gl(3, 1, ceiling(N/3)),
dat treatment = gl(3, ceiling(N/3)))[1:N, ]
<- predict(mod.original, newdata = dat, type = "response")
a $counts <- rpois(N, a)
dat
# test hypothesis
<- glm(counts ~ outcome + treatment, data = dat,
mod family = poisson)
summary(mod)$coefficients["treatment2", "Pr(>|z|)"] <
0.01
}
Example Use
<- find.design(simfun = simfun_glm1, boundaries = c(20,
res 100), power = 0.95)
We use a logistic regression and generate the data using hand-specified parameters and the logistic function.
<- function(x) 1/(1 + exp(-x)) logistic
We test if the second predictor is significant.
<- function(N) {
simfun_glm2
# generate data
<- data.frame(pred1 = rnorm(N), pred2 = rnorm(N))
dat <- c(1.2, 0.8) # parameter weights
beta <- logistic(as.matrix(dat) %*% beta) # get probability
prob $criterion <- runif(N) < prob # draw according to probability
dat
# test hypothesis
<- glm(criterion ~ pred1 + pred2, data = dat,
mod family = binomial)
summary(mod)$coefficients["pred2", "Pr(>|z|)"] <
0.01
}
Example Use
<- find.design(simfun = simfun_glm2, boundaries = c(90,
res 200), power = 0.95)
library(mlpwr)
library(mirt)
We use the mirt
package to show an example that applies
an item response theory model.
See ?simdata
for additional options and examples to
generate data with the mirt
package.
We first generate data from a 2PL model. Then we want to check whether the 2PL model actually shows a better fit to the data better than the simpler Rasch model. We use a likelihood ratio test for this purpose.
<- function(N) {
simfun_irt1
# generate data
<- simdata(a = c(1.04, 1.2, 1.19, 0.61, 1.31,
dat 0.83, 1.46, 1.27, 0.51, 0.81), d = c(0.06,
-1.79, -1.15, 0.88, -0.2, -1.87, 1.23, -0.08,
-0.71, 0.6), N = N, itemtype = "2PL")
# test hypothesis
<- mirt(dat) # Fit 2PL Model
mod <- "F = 1-4
constrained CONSTRAIN = (1-4, a1)"
<- mirt(dat, constrained) # Fit 2PL with equal slopes
mod_constrained
<- anova(mod_constrained, mod) # perform model comparison
res $p[2] < 0.01 # extract significance
res }
Example Use
<- find.design(simfun = simfun_irt1, boundaries = c(100,
res 500), power = 0.95, evaluations = 500)
We check for differential item functioning in one item. Again, we generate data using the 2PL model, but using different parameters for two different groups in this case. The parameters for the first item are different in one group compared to other. We again apply an likelihood ratio test to test the null hypothesis that all item parameters are actually equal in both groups.
We optimize for the sizes of the two groups as the two design parameters. We assume that the second group involves higher costs than the first.
<- function(N1, N2) 5 * N1 + 7 * N2 costfun_irt2
<- function(N1, N2) {
simfun_irt2
# generate data
<- a2 <- c(1.04, 1.2, 1.19, 0.61, 1.31, 0.83,
a1 1.46, 1.27, 0.51, 0.81)
<- d2 <- c(0.06, -1.79, -1.15, 0.88, -0.2, -1.87,
d1 1.23, -0.08, -0.71, 0.6)
1] <- a2[1] + 0.3
a2[1] <- d2[1] + 0.5
d2[<- simdata(a = a1, d = d1, N = N1, itemtype = "2PL")
dat1 <- simdata(a = a2, d = d2, N = N2, itemtype = "2PL")
dat2 <- as.data.frame(rbind(dat1, dat2))
dat <- c(rep("1", N1), rep("2", N2))
group
# fit models
<- multipleGroup(dat, 1, group = group)
mod1 <- multipleGroup(dat, 1, group = group, invariance = c("slopes",
mod2 "intercepts"))
# test hypothesis
<- anova(mod2, mod1)
res
# extract significance
$p[2] < 0.01
res }
Example Use
<- find.design(simfun = simfun_irt2, boundaries = list(N1 = c(100,
res 700), N2 = c(100, 700)), costfun = costfun_irt2,
power = 0.95)
library(mlpwr)
library(lme4)
library(lmerTest)
We consider multilevel models using the lme4
and
lmerTest
packages. We use the glmer
function
for model fit and the simulate
function to generate data.
For further options for data generation in multilevel models see
?simulate.merMod
.
We generate data using manually specified standard deviation of the random effects and parameter weights. We generate and fit the data according to a generalized linear mixed effects model with a poisson distributed criterion variable.
<- function(N) {
simfun_multi1
# generate data
<- list(theta = 0.5, beta = c(2, -0.2, -0.4,
params -0.6))
<- expand.grid(herd = 1:ceiling(N/4), period = factor(1:4))[1:N,
dat
]$x <- simulate(~period + (1 | herd), newdata = dat,
datfamily = poisson, newparams = params)[[1]]
# test hypothesis
<- glmer(x ~ period + (1 | herd), data = dat,
mod family = poisson) # fit model
<- summary(mod)[["coefficients"]][2:4,
pvalues "Pr(>|z|)"]
any(pvalues < 0.01)
}
Example Use
<- find.design(simfun = simfun_multi1, boundaries = c(100,
res 500), power = 0.95)
We generate data from a fitted generalized linear mixed-effects model. We apply a mixed effects logistic regression in this case. It has two predictors and a random intercepts for each country.
<- function(x) 1/(1 + exp(-x))
logistic
<- 300
N.original <- 20
n.countries.original
# generate original data
<- data.frame(country = rep(1:n.countries.original,
dat.original length.out = N.original), pred1 = rnorm(N.original),
pred2 = rnorm(N.original))
<- rnorm(n.countries.original, sd = 0.5)
country.intercepts $intercepts <- country.intercepts[dat.original$country]
dat.original<- c(1, 0.4, -0.3) # parameter weights
beta <- logistic(as.matrix(dat.original[c("intercepts",
prob "pred1", "pred2")]) %*% as.matrix(beta)) # get probability
$criterion <- runif(N.original) < prob # draw according to probability
dat.original
# fit original model to obtain parameters
<- glmer(criterion ~ pred1 + pred2 + 0 +
mod.original 1 | country), data = dat.original, family = binomial) (
In the simulation function, we generate criterion data using the
original model. Design parameters are the number of participant per
country n
and the number of countries
n.countries
. We test the hypothesis that the second
predictor is significant.
<- function(n, n.countries) {
simfun_multi2
# generate data
<- data.frame(country = rep(1:n.countries,
dat length.out = n * n.countries), pred1 = rnorm(n *
pred2 = rnorm(n * n.countries))
n.countries), $criterion <- simulate(mod.original, nsim = 1,
datnewdata = dat, allow.new.levels = TRUE, use.u = FALSE) |>
unlist() # criterion data from the fitted model
# test hypothesis
<- glmer(criterion ~ pred1 + pred2 + 0 + (1 |
mod data = dat, family = binomial)
country), summary(mod)[["coefficients"]]["pred2", "Pr(>|z|)"] <
0.01
}
As a cost function, we can use
<- function(n, n.countries) 5 * m +
costfun_multi2 100 * n.countries
Example Use
<- find.design(simfun = simfun_multi2, boundaries = list(n = c(10,
res 40), n.countries = c(5, 20)), costfun = costfun_multi2,
power = 0.95)