The main purpose of the adjustedCurves
R-Package is to
calculate and plot confounder-adjusted survival curves and
cause-specific confounder-adjusted cumulative incidence functions (CIF)
using a variety of methods. Group- or treatment-specific survival curves
and CIFs are often used to graphically display the treatment (or group)
effect on the survival probability. When the data at hand comes from a
randomized controlled trial with balanced subgroups, simple stratified
estimators of the survival curves or CIFs ignoring other covariates are
unbiased. When randomization fails or was not done at all however,
confounding can lead to biased depictions (see Rubin 1974; Pearl
2009).
Luckily, a lot of different methods to adjust survival curves and
CIFs for confounding have been proposed. With the
adjustedCurves
package, these methods can be used with
little effort.
The R-Package is currently not available on CRAN, but can be
installed easily using the devtools
package:
library(devtools)
::install_github("RobinDenz1/adjustedCurves") devtools
or the remotes
package:
library(remotes)
::install_github("RobinDenz1/adjustedCurves") remotes
Further packages might have to be installed, depending on the specified method.
Let’s start with the standard survival setting. Using the
sim_confounded_surv
function, we simulate some survival
data (Chatton et al. 2020):
library(survival)
library(ggplot2)
library(riskRegression)
library(pammtools)
library(adjustedCurves)
# set random number generator seed to make this reproducible
set.seed(44)
# simulate standard survival data with 300 rows
<- sim_confounded_surv(n=300, max_t=1.1, group_beta=-0.6)
data_1 # code the grouping variable as a factor
$group <- as.factor(data_1$group)
data_1
# take a look at the first few rows
head(data_1)
## x1 x2 x3 x4 x5 x6 group time event
## 1 1 0 0 -0.52520986 0.22776621 -0.05319665 1 0.37181297 1
## 2 0 1 1 0.02441988 -2.16818274 0.58097459 0 0.65649863 1
## 3 1 1 1 0.05159071 0.07329185 1.18737651 1 0.08156474 0
## 4 1 0 0 0.01956631 -1.27205270 1.16594437 0 0.34161603 1
## 5 0 1 0 0.33793112 1.32447922 -1.28991227 0 0.52472847 1
## 6 0 1 0 -2.96505672 2.19434272 0.00528396 1 0.87274894 1
Using the default arguments, this function outputs a data.frame with
6 independently drawn covariates (x1
- x6
), a
binary group variable (group
), the observed event time
(time
) and the event indicator (event
). In
this setting, there is only one type of event (1
). When the
observations are right-censored this indicator is set to 0
.
This is the standard data format used in standard time-to-event
analysis.
To calculate confounder-adjusted survival curves fo each group of
this dataset using Direct Standardization (also known as
G-Computation or Corrected Group Prognosis method, see
Makuch (1982) or Chang et al. (1982)), we first have to fit a
coxph
model:
# it is important to use X=TRUE in the coxph function call
<- survival::coxph(Surv(time, event) ~ x1 + x2 + x3 + x4 + x5 +
outcome_model + group, data=data_1, x=TRUE) x6
This model can then be used in a call to the
adjustedsurv
function, as shown below:
<- adjustedsurv(data=data_1,
adjsurv variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
conf_int=TRUE)
The argument data
simply refers to our data.frame, the
argument variable
specifies our grouping variable of
interest and the ev_time
and event
variable
specify the time-to-event variables in our data.frame. Setting the
method
to "direct"
will result in
G-Computation estimates, based on the previously fit cox-regression
model supplied using the outcome_model
argument.
Doing this returns a list with some needed output objects. Most
important however is the adjsurv
data.frame in there,
containing the adjusted survival curves and corresponding confidence
intervals (because we used conf_int=TRUE
in the original
function call). We can take a look at this object using the following
code:
head(adjsurv$adjsurv)
## time surv group se ci_lower ci_upper
## 1 0.00000000 1.0000000 0 0.000000000 1.0000000 1.0000000
## 2 0.01457659 0.9959356 0 0.004039858 0.9880177 1.0000000
## 3 0.04062032 0.9917639 0 0.005767859 0.9804591 1.0000000
## 4 0.04232678 0.9876128 0 0.007034862 0.9738247 1.0000000
## 5 0.06129069 0.9834495 0 0.008123567 0.9675276 0.9993714
## 6 0.06271999 0.9792709 0 0.009066121 0.9615017 0.9970402
More importantly however, we can plot the survival curves directly
using the plot
method:
plot(adjsurv)
This plot function comes with many options which are listed in the
documentation. To plot the point-wise confidence intervals, we can set
the argument conf_int
to TRUE:
plot(adjsurv, conf_int=TRUE)
Instead of using colors to differentiate the curves we can also use
different linetypes, by setting the linetype
argument to
TRUE and the color
argument to FALSE
:
plot(adjsurv, linetype=TRUE, color=FALSE)
We can also add small indicator lines for censored observations using
censoring_ind="lines"
:
plot(adjsurv, color=TRUE, censoring_ind="lines")
Indicator lines for the median survival time can be added using
median_surv_lines=TRUE
:
plot(adjsurv, color=TRUE, median_surv_lines=TRUE)
Many more custom settings are available. For more details and
examples see ?plot.adjustedsurv
.
The adjustedsurv
function essentially works the same
with every available method. Since the methods are however vastly
different in nature, some additional arguments have to be supplied by
the user. For example, using the Inverse Probability of Treatment
Weighting method (IPTW), we need to model the treatment assignment
mechanism instead of the outcome mechanism (see Xie and Liu
(2005)). This can be done using a logistic regression model as
follows:
<- glm(group ~ x1 + x2 + x3 + x4 + x5 + x6,
treatment_model data=data_1, family="binomial"(link="logit"))
<- adjustedsurv(data=data_1,
adjsurv variable="group",
ev_time="time",
event="event",
method="iptw_km",
treatment_model=treatment_model,
conf_int=TRUE)
The resulting curves can be plotted as before:
plot(adjsurv)
Since both methods were used correctly here, there are only slight differences in the results. Big differences between the two methods usually indicate that either the cox-regression model or the logistic regression model are incorrectly specified.
Doubly-Robust methods can be helpful in such cases (see Robins and Rotnitzky (1992) or Ozenne et al. (2020)). The standard Augmented Inverse Probability of Treatment Weighting estimator utilized both kinds of models at the same time. If either of the models is correctly specified, the resulting estimates will be unbiased. It can be used in the same way as the other methods, only that this time both models have to be supplied:
<- adjustedsurv(data=data_1,
adjsurv variable="group",
ev_time="time",
event="event",
method="aiptw",
treatment_model=treatment_model,
outcome_model=outcome_model,
conf_int=TRUE)
plot(adjsurv, conf_int=TRUE)
In many situations there are multiple mutually exclusive types of events instead of just one event. This is formally known as a competing risks situation. In these situations, survival curves can not be estimated anymore. However, the cumulative incidence function can be used instead. Without randomization, these CIFs face the same problems due to confounding as the survival curves do. Many of the methods to adjust survival curves for confounders can be used to adjust CIFs in very similar fashion.
While the computational details and the underlying theory is slightly
different (see Ozenne et al. (2020)), the syntax for the R-Package stays
pretty much exactly the same. The only major difference is that instead
of using the adjustedsurv
function, the
adjustedcif
function should be utilized. Additionally, the
user now also has to specify which event-type is of interest (argument
cause
).
First we need new example data, mirroring the competing risks
situation. We are going to use the sim_confounded_crisk
function, which does the same thing as the
sim_confounded_surv
function, but with competing risks
data:
# simulate the data
<- sim_confounded_crisk(n=300)
data_2 $group <- as.factor(data_2$group)
data_2
head(data_2)
## x1 x2 x3 x4 x5 x6 group time event
## 1 1 1 0 0.96216812 -0.66375234 0.65527293 1 0.02085406 0
## 2 1 1 0 -0.58477811 1.73696566 -0.18714061 1 0.18924205 0
## 3 1 0 0 -0.33549773 0.52008044 0.08290031 0 0.26288996 2
## 4 0 1 1 -0.04008865 0.19841348 0.38858944 1 0.08210559 2
## 5 0 0 1 0.78936972 -0.27545174 0.49322272 1 1.67993934 2
## 6 0 0 0 0.26001387 -0.03590098 0.86231143 1 1.70000000 0
Let’s again start with the Direct Standardization method.
Instead of using a simple coxph
method, we need to use a
model for the time-to-event process which takes the multiple event-types
into account. One such method is the Cause-Specific
Cox-Regression model. A simple implementation of this model is
contained in the riskRegression
R-Package.
<- riskRegression::CSC(Hist(time, event) ~ x1 + x2 + x3 + x4 +
outcome_model + x6 + group, data=data_2) x5
This model can then be used in a call to the adjustedcif
function, as shown below:
<- adjustedcif(data=data_2,
adjcif variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
cause=1,
conf_int=TRUE)
plot(adjcif, conf_int=TRUE)
This shows the confounder-adjusted CIFs for cause = 1
.
By setting cause
to 2
we get the
confounder-adjusted CIFs for the other cause:
<- adjustedcif(data=data_2,
adjcif variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
cause=2,
conf_int=TRUE)
plot(adjcif, conf_int=TRUE)
The IPTW estimator can be used exactly the same way as we did with
the adjustedsurv
function:
<- glm(group ~ x1 + x2 + x3 + x4 + x5 + x6,
treatment_model data=data_2, family="binomial"(link="logit"))
<- adjustedcif(data=data_2,
adjcif variable="group",
ev_time="time",
event="event",
method="iptw",
treatment_model=treatment_model,
cause=1,
conf_int=TRUE)
plot(adjcif, conf_int=TRUE)
The other estimators can be used using essentially the same syntax.
We used bootstrap=TRUE
in the last example, because we will
use functionality relying on it when comparing groups later.
In many applications there are more than two treatments. Some of the methods included in this R-Package allow calculations for an arbitrary number of treatments. In these cases the code does not change at all and can be used exactly in the same way as before. The only difference occurs when using IPTW methods. Here the user has to use a multinomial logistic regression model instead of a regular logistic regression to model the outcome. This is illustrated below.
First we again create a simulated data set (for single event survival data). The function is only able to create binary treatment variables, but we can simply resample all occurrences of 1 into 1 and 2, creating 3 treatments, where 1 and 2 have an identical treatment effect and selection process:
# add another group
# NOTE: this is done only to showcase the method and does not
# reflect what should be done in real situations
$group <- factor(data_1$group, levels=c("0", "1", "2"))
data_1$group[data_1$group=="1"] <- sample(c("1", "2"), replace=TRUE,
data_1size=nrow(data_1[data_1$group=="1",]))
The Direct Adjusted survival curves can be calculated exactly as before:
<- survival::coxph(Surv(time, event) ~ x1 + x2 + x3 + x4 +
outcome_model + x6 + group, data=data_1, x=TRUE)
x5
<- adjustedsurv(data=data_1,
adjsurv variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
conf_int=TRUE)
plot(adjsurv)
For the IPTW based estimates we first fit a multinomial logistic
regression model using the multinom
function from the
nnet
R-Package and use it with the same code as before:
<- nnet::multinom(group ~ x1 + x2 + x3 + x4 + x5 + x6,
treatment_model data=data_1)
## # weights: 24 (14 variable)
## initial value 329.583687
## iter 10 value 286.543657
## iter 20 value 283.765460
## final value 283.765313
## converged
<- adjustedsurv(data=data_1,
adjsurv variable="group",
ev_time="time",
event="event",
method="iptw_km",
treatment_model=treatment_model,
conf_int=TRUE,
bootstrap=TRUE,
n_boot=50)
plot(adjsurv, conf_int=TRUE)
We use bootstrap=TRUE
here because we will need it in
the next section. In practice, however, 50 bootstrap samples would
definitely not be enough to guarantee convergence. We choose this low
number here only due to speed considerations enforced by CRAN.
The estimated adjusted curves can subsequently be used to compare the treatment groups using additional plots and statistics.
One simple way to compare the groups is to pick a point in time and
simply calculate the differences between the survival or failure
probability at this point. The adjusted_curve_diff
function
automates this process. In the following example we calculate the
difference between the adjusted survival probability of group 0 and 1 at
time = 0.4:
adjusted_curve_diff(adjsurv, times=0.4, group_1="0", group_2="1", conf_int=TRUE)
## time diff se ci_lower ci_upper p_value
## 1 0.4 -0.008257592 0.09049062 -0.1856159 0.1691008 0.9272911
The confidence interval does include 0, suggesting that this difference is not statistically significant. This can also be seen from the associated p-value.
Instead of focusing on a single point in time, we can also plot the
entire difference curve using the plot_curve_diff
function:
plot_curve_diff(adjsurv, group_1="0", group_2="1", conf_int=TRUE, color="blue")
These kind of curves are a great way to visualize the group effect
over time. When more than two groups are present, this function can be
called multiple times with changing group_1
and
group_2
values.
Similarly, we can calculate the adjusted survival time quantiles
using the adjusted_surv_quantile
function. For example, the
adjusted median survival time is given by:
adjusted_surv_quantile(adjsurv, p=0.5, conf_int=TRUE)
## p group q_surv ci_lower ci_upper
## 1 0.5 0 0.5067310 0.4487885 0.5657139
## 2 0.5 1 0.8014008 0.4068963 NA
## 3 0.5 2 0.4835157 0.4311141 0.6283029
In our case, they are all very similar.
An alternative measure is the restricted mean survival time, which
denotes the integral of the survival curve from time 0 up to a specified
point in time (see Royston et al. 2013). A confounder adjusted version
of this statistic can be calculated by simply integrating the confounder
adjusted survival curves instead of a normal Kaplan-Meier curve (Conner
et al. 2019). The adjustedCurves
package allows the user to
do this by simply calling the adjusted_rmst
function on a
previously created adjustedsurv
object:
adjusted_rmst(adjsurv, to=0.4)
## group rmst
## 1 0 0.3485056
## 2 1 0.3415740
## 3 2 0.3547187
The function returns adjusted RMST for every level of the
"group"
variable. When bootstrapping was performed when
calling the adjustedsurv
function, standard errors and
confidence intervals can be also calculated using this function as well
by setting the conf_int
argument to TRUE
:
adjusted_rmst(adjsurv, to=0.4, conf_int=TRUE)
## group rmst se ci_lower ci_upper n_boot
## 1 0 0.3485056 0.009156084 0.3305600 0.3664512 50
## 2 1 0.3415740 0.019812278 0.3027427 0.3804054 50
## 3 2 0.3547187 0.009117535 0.3368487 0.3725888 50
Similar to the adjusted_curve_diff
function, we can
perform a test of the difference between two RMST values:
adjusted_rmst(adjsurv, to=0.4, conf_int=TRUE, difference=TRUE,
group_1="0", group_2="1")
## diff se ci_lower ci_upper p_value
## 1 0.006931522 0.02182568 -0.03584602 0.04970907 0.7507993
There seems to be no statistically significant difference between the two RMSTs. However, due to the very low number of bootstrap replications used here these results should not be trusted. In practice one should use a much higher number, such as 500 or 1000 replications.
Instead of focusing on a single point in time, it is also possible to
plot the RMST as it evolves over time using the
plot_rmst_curve
function:
plot_rmst_curve(adjsurv)
Bootstrap confidence intervals could also be added to this plot using
conf_int=TRUE
.
A similar statistic is the adjusted Restricted Mean Time
Lost (RMTL), which is defined as the area under the cause-specific
confounder adjusted CIF. It can be calculated in the same way as the
RMST, but using the adjusted_rmtl
function:
adjusted_rmtl(adjcif, to=0.4, conf_int=FALSE)
## group rmtl
## 1 0 0.05621035
## 2 1 0.04968113
In contrast to the adjusted_rmst
function, this works
for both adjustedsurv
and adjustedcif
objects
because the survival curve can easily be transformed to a CIF. It also
allows difference tests, just like the adjusted_rmtl
function.
We can also plot the RMTL over time, this time using the
plot_rmtl_curve
function:
plot_rmtl_curve(adjcif)
Donald B. Rubin (1974). “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies”. In: Journal of Educational Psychology 66.5, pp. 688-701
Judea Pearl (2009). Causality: Models, Reasoning and Inference. 2nd ed. Cambridge: Cambridge University Press
Arthur Chatton, Florent Le Borgne, Clemence Leyrat, and Yohann Foucher (2020). G-Computation and Inverse Probability Weighting for Time-To-Event Outcomes: A Comparative Study. arXiv:2006.16859v1
Robert W. Makuch (1982). “Adjusted Survival Curve Estimation Using Covariates”. In: Journal of Chronic Diseases 35.6, pp. 437-443
I-Ming Chang, Rebecca Gelman, and Marcello Pagano (1982). “Corrected Group Prognostic Curves and Summary Statistics”. In: Journal of Chronic Diseases 35, pp. 669-674
Jun Xie and Chaofeng Liu (2005). “Adjusted Kaplan-Meier Estimator and Log-Rank Test with Inverse Probability of Treatment Weighting for Survival Data”. In: Statistics in Medicine 24, pp. 3089-3110
James M. Robins and Andrea Rotnitzky (1992). “Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers”. In: AIDS Epidemiology: Methodological Issues. Ed. by Nicholas P. Jewell, Klaus Dietz, and Vernon T. Farewell. New York: Springer Science + Business Media, pp. 297-331
Brice Maxime Hugues Ozenne, Thomas Harder Scheike, and Laila Staerk (2020). “On the Estimation of Average Treatment Effects with Right-Censored Time to Event Outcome and Competing Risks”. In: Biometrical Journal 62, pp. 751-763
Weixin Cai and Mark J. van der Laan (2020). “One-Step Targeted Maximum Likelihood Estimation for Time-To-Event Outcomes”. In: Biometrics 76, pp. 722-733
Patrick Royston and Mahesh K. B. Parmar (2013). “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-To-Event Outcome”. In: BMC Medical Research Methodology 13.152
Sarah C. Conner, Lisa M. Sullivan, Emelia J. Benjamin, Michael P. LaValley, Sandro Galea, and Ludovic Trinquart (2019). “Adjusted Restricted Mean Survival Times in Observational Studies”. In: Statistics in Medicine 38, pp. 3832-3860