In this vignette, we discuss how to use multilevelcoda
to specify multilevel models where compositional data are used as
predictors.
The following table outlines the packages used and a brief description of their purpose.
Package | Purpose |
---|---|
multilevelcoda |
calculate between and within composition variables, calculate substitutions and plots |
brms |
fit Bayesian multilevel models using Stan as a backend |
bayestestR |
compute Bayes factors used to compare models |
doFuture |
parallel processing to speed up run times |
library(multilevelcoda)
library(brms)
library(bayestestR)
library(doFuture)
options(digits = 3) # reduce number of digits shown
For the examples, we make use of three built in datasets:
Dataset | Purpose |
---|---|
mcompd |
compositional sleep and wake variables and additional predictors/outcomes (simulated) |
sbp |
a pre-specified sequential binary partition, used in calculating compositional predictors |
psub |
all possible pairwise substitutions between compositional variables, used for substitution analyses |
data("mcompd")
data("sbp")
data("psub")
The following table shows a few rows of data from
mcompd
.
TST | WAKE | MVPA | LPA | SB | ID | Age | Female | STRESS |
---|---|---|---|---|---|---|---|---|
542 | 99.0 | 297.4 | 460 | 41.4 | 185 | 29.7 | 0 | 3.67 |
458 | 49.4 | 117.3 | 653 | 162.3 | 185 | 29.7 | 0 | 7.21 |
271 | 41.1 | 488.7 | 625 | 14.5 | 185 | 29.7 | 0 | 2.84 |
286 | 52.7 | 106.9 | 906 | 89.2 | 184 | 22.3 | 1 | 2.36 |
281 | 18.8 | 403.0 | 611 | 126.3 | 184 | 22.3 | 1 | 1.18 |
397 | 26.5 | 39.9 | 587 | 389.8 | 184 | 22.3 | 1 | 0.00 |
The following table shows the sequential binary partition being used
in sbp
. Columns correspond to the composition variables
(TST, WAKE, MVPA, LPA, SB). Rows correspond to distinct ILR
coordinates.
1 | 1 | -1 | -1 | -1 |
1 | -1 | 0 | 0 | 0 |
0 | 0 | 1 | -1 | -1 |
0 | 0 | 0 | 1 | -1 |
The following table shows how all the possible binary substitions contrasts are setup. Time substitions work by taking time from the -1 variable and adding time to the +1 variable.
TST | WAKE | MVPA | LPA | SB |
---|---|---|---|---|
1 | -1 | 0 | 0 | 0 |
1 | 0 | -1 | 0 | 0 |
1 | 0 | 0 | -1 | 0 |
1 | 0 | 0 | 0 | -1 |
-1 | 1 | 0 | 0 | 0 |
0 | 1 | -1 | 0 | 0 |
0 | 1 | 0 | -1 | 0 |
0 | 1 | 0 | 0 | -1 |
-1 | 0 | 1 | 0 | 0 |
0 | -1 | 1 | 0 | 0 |
0 | 0 | 1 | -1 | 0 |
0 | 0 | 1 | 0 | -1 |
-1 | 0 | 0 | 1 | 0 |
0 | -1 | 0 | 1 | 0 |
0 | 0 | -1 | 1 | 0 |
0 | 0 | 0 | 1 | -1 |
-1 | 0 | 0 | 0 | 1 |
0 | -1 | 0 | 0 | 1 |
0 | 0 | -1 | 0 | 1 |
0 | 0 | 0 | -1 | 1 |
Compositional data are often expressed as a set of isometric log
ratio (ILR) coordinates in regression models. We can use the
compilr()
function to calculate both between- and
within-level ILR coordinates for use in subsequent models as predictors.
Note: compilr()
also calculates total ILR coordinates
that could be used as outcomes (or predictors) in models, if the
decomposition into a between- and within-level ILR coordinates was not
desired.
The compilr()
function for multilevel data requires four
arguments:
Argument | Input |
---|---|
data |
a dataset containing all variables: at a minimum composition variables and an ID variable |
sbp |
a matrix with contrast codes used to calculate the sequential binary partitions for the ILR coordinates |
parts |
a character vector giving the names of the variables
in data that are compositional |
idvar |
a character string giving the name of the variable that contains the IDs |
cilr <- compilr(data = mcompd, sbp = sbp,
parts = c("TST", "WAKE", "MVPA", "LPA", "SB"), idvar = "ID")
We now will use output from the compilr()
to fit our
brms
model, using the brmcoda()
. Here is a
model predicting STRESS
from between- and within-person
sleep-wake behaviours (expressed as ILR coordinates).
Note: make sure you pass the correct names of the ILR coordinates to
brms
model.
m <- brmcoda(compilr = cilr,
formula = STRESS ~ bilr1 + bilr2 + bilr3 + bilr4 +
wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID),
cores = 8, seed = 123, backend = "cmdstanr")
#> Compiling Stan program...
#> Start sampling
Here is a summary()
of the model results.
summary(m$Model)
#> Family: gaussian
#> Links: mu = identity; sigma = identity
#> Formula: STRESS ~ bilr1 + bilr2 + bilr3 + bilr4 + wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID)
#> Data: tmp (Number of observations: 3540)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Group-Level Effects:
#> ~ID (Number of levels: 266)
#> Estimate Est.Error l-95% CI u-95% CI
#> sd(Intercept) 1.00 0.06 0.88 1.13
#> Rhat Bulk_ESS Tail_ESS
#> sd(Intercept) 1.00 1370 2369
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> Intercept 2.56 0.49 1.64 3.50 1.00
#> bilr1 0.17 0.33 -0.49 0.82 1.00
#> bilr2 0.41 0.35 -0.28 1.08 1.00
#> bilr3 0.12 0.21 -0.29 0.54 1.00
#> bilr4 -0.04 0.29 -0.59 0.52 1.00
#> wilr1 -0.34 0.12 -0.58 -0.10 1.00
#> wilr2 0.04 0.13 -0.21 0.29 1.00
#> wilr3 -0.11 0.08 -0.25 0.04 1.00
#> wilr4 0.24 0.10 0.05 0.43 1.00
#> Bulk_ESS Tail_ESS
#> Intercept 1011 1819
#> bilr1 987 1601
#> bilr2 964 2099
#> bilr3 1091 1861
#> bilr4 1001 1775
#> wilr1 3280 3276
#> wilr2 3576 3033
#> wilr3 3196 2789
#> wilr4 3769 2893
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> sigma 2.37 0.03 2.31 2.42 1.00
#> Bulk_ESS Tail_ESS
#> sigma 5037 3013
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
Results show that the first and forth within-person ILR coordinate
was associated with stress. The interpretation of these outputs depends
on how you construct your sequential binary partition. For the built-in
sequential binary partition sbp
(shown previously), the
resulting interpretation would be as follows:
ILR coordinates | What it means |
---|---|
bilr1 |
Sleep (TST & WAKE) vs wake behaviours at between-person level |
bilr2 |
Sleep vs awake in bed at between-person level |
bilr3 |
MVPA vs (LPA and SB) at between-person level |
bilr4 |
LPA vs SB at between-person level |
wilr1 |
Sleep (TST & WAKE) vs wake behaviours at between-person level |
wilr2 |
Sleep vs awake in bed at within-person level |
wilr3 |
MVPA vs (LPA and SB) at within-person level |
wilr4 |
LPA vs SB at within-person level |
Due to the nature of within-person ILR coordinates, it is often
challenging to interpret these results in great details. For example,
the significant coefficient for wilr1
shows that the
within-person change in sleep behaviours (sleep duration and time awake
in bed combined), relative to wake behaviours (moderate to vigorous
physical activity, light physical activity, and sedentary behaviour) on
a given day, was associated with stress. However, as there are several
behaviours involved in this coordinate, we don’t know the within-person
change in which of them drives the association. It could be the change
in sleep, such that people sleep more than their own average on a given
day, but it could also be the change in time awake. Further, we don’t
know about the specific changes in time spent across behaviours. That
is, if people slept more, what behaviour did they spend less time
in?
One approach to gain further insights into these relationships, and the changes in outcomes associated with changes in specific time across compositionl components is the substitution model. We will discuss the substitution model later in this vignette.
In the frequentist approach, we usually compare the fits of models
using anova()
. In Bayesian, this can be done by comparing
the marginal likelihoods of two models. Bayes Factors (BFs) are indices
of relative evidence of one model over another. In the context of
compositional multilevel modelling, Bayes Factors provide two main
useful functions:
We can utilize Bayes factors to answer the following question: “Which model is more likely to have produced the observed data?”
Let’s fit a series of model with brmcoda()
to predict
STRESS
from sleep-wake composition. For precise Bayes
factors, we will use 40,000 posterior draws for each model.
Notes : To use Bayes factors, brmsfit
models
must be fitted with an additional non-default argument
save_pars = save_pars(all = TRUE)
.
# intercept only model
m0 <- brmcoda(compilr = cilr,
formula = STRESS ~ 1 + (1 | ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
#> Compiling Stan program...
#> Start sampling
# between-person composition only model
m1 <- brmcoda(compilr = cilr,
formula = STRESS ~ bilr1 + bilr2 + bilr3 + bilr4 + (1 | ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
#> Compiling Stan program...
#> Start sampling
# within-person composition only model
m2 <- brmcoda(compilr = cilr,
formula = STRESS ~ wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
#> Compiling Stan program...
#> Start sampling
# full model
m <- brmcoda(compilr = cilr,
formula = STRESS ~ bilr1 + bilr2 + bilr3 + bilr4 +
wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
#> Compiling Stan program...
#> Start sampling
We can now compare these models with the
bayesfactor_models()
function, using the intercept-only
model as reference.
comparison <- bayesfactor_models(m$Model, m1$Model, m2$Model, denominator = m0$Model)
comparison
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [1] bilr1 + bilr2 + bilr3 + bilr4 + wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID) 4.01
#> [2] bilr1 + bilr2 + bilr3 + bilr4 + (1 | ID) 0.341
#> [3] wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID) 11.62
#>
#> * Against Denominator: [4] 1 + (1 | ID)
#> * Bayes Factor Type: marginal likelihoods (bridgesampling)
We can see that model with only within-person composition is the best model - with \(BF\) = 11.86 compared to the null (intercept only).
Let’s compare these models against the full model.
update(comparison, reference = 1)
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [2] bilr1 + bilr2 + bilr3 + bilr4 + (1 | ID) 0.085
#> [3] wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID) 2.89
#> [4] 1 + (1 | ID) 0.249
#>
#> * Against Denominator: [1] bilr1 + bilr2 + bilr3 + bilr4 + wilr1 + wilr2 + wilr3 + wilr4 + (1 | ID)
#> * Bayes Factor Type: marginal likelihoods (bridgesampling)
Again, our data favours the within-person composition only model over the full model, giving 2.93 times more support.
When examining the relationships between compositional data and an
outcome, we often are also interested in the changes in an outcomes when
a fixed duration of time is reallocated from one compositional component
to another, while the other components remain constant. These changes
can be examined using the compositional isotemporal substitution model.
In multilevelcoda
, we extend this model to multilevel
approach to test both between-person and within-person changes. All
substitution models can be computed using the
substitution()
function.
The below example examines the changes in stress for different pairwise substitution of sleep-wake behaviours for 5 minutes, at between-person level.
bsubm <- substitution(object = m, delta = 5,
level = "between", type = "conditional")
The output contains multiple data sets of results for all compositional components. Here are the results for changes in stress when sleep (TST) is substituted for 5 minutes, averaged across levels of covariates.
knitr::kable(bsubm$TST)
|| || || ||
None of the results are significant, given that the credible intervals did not cross 0, showing that increasing sleep (TST) at the expense of any other behaviours was not associated in changes in stress. Notice there is no column indicating the levels of convariates, indicating that these results have been averaged.
Let’s now take a look at how stress changes when different pairwise of sleep-wake behaviours are substituted for 5 minutes, at within-person level.
# Within-person substitution
wsubm <- substitution(object = m, delta = 5,
level = "within", type = "conditional")
Results for 5 minute substitution.
knitr::kable(wsubm$TST)
|| || || ||
At within-person level, there were significant results for substitution of sleep (TST) and time awake in bed (WAKE) for 5 minutes, but not other behaviours. Increasing sleep at the expense of time spent awake in bed predicted 0.02 higher stress [95% CI 0.00, 0.03], on a given day. Conversely, less sleep and more time awake in bed predicted less stress (b = -0.016 [95% CI -0.03, -0.00]).
You can learn more about different types of substitution models
at
Compositional
Multilevel Substitution Models.