Overview

The adaptr package simulates adaptive clinical trials using adaptive stopping, adaptive arm dropping and/or response-adaptive randomisation.

The package has been developed as part of the INCEPT (Intensive Care Platform Trial) project, funded primarily by a grant from Sygeforsikringen “danmark”.

Additional guidance on the key methodological considerations when planning and comparing adaptive clinical trials can be found in the open access article An overview of methodological considerations regarding adaptive stopping, arm dropping and randomisation in clinical trials available in Journal of Clinical Epidemiology.

Basic example

First, load the package:

library(adaptr)

Set up trial

Then, setup a trial with the desired specifications. adaptr offers the general purpose function setup_trial(), but here we use the built-in setup_trial_binom() for a trial with a binary, binomially distributed, undesirable outcome such as mortality (adaptr also includes setup_trial_norm() for continuous, normally distributed outcomes). The example trial specification has three characteristics:

binom_trial <- setup_trial_binom(
  arms = c("Arm A", "Arm B", "Arm C"),
  true_ys = c(0.25, 0.20, 0.30),
  min_probs = rep(0.15, 3), 
  data_looks = seq(from = 300, to = 2000, by = 100),
  equivalence_prob = 0.9,
  equivalence_diff = 0.05,
  soften_power = 0.5 
) 

See ?setup_trial() for more details on these arguments or vignette("Basic-examples", "adaptr") for basic example trial specifications and a thorough review of the general settings, and vignette("Advanced-example", "adaptr") for an advanced example including details on how to specify user-written functions for generating outcomes and posterior draws.

We can print an overview of the trial specification by simply running:

binom_trial
#> Trial specification: generic binomially distributed outcome trial
#> * Undesirable outcome
#> * No common control arm
#> * Best arm: Arm B
#> 
#> Arms, true outcomes, starting allocation probabilities 
#> and allocation probability limits:
#>   arms true_ys start_probs fixed_probs min_probs max_probs
#>  Arm A    0.25       0.333          NA      0.15        NA
#>  Arm B    0.20       0.333          NA      0.15        NA
#>  Arm C    0.30       0.333          NA      0.15        NA
#> 
#> Maximum sample size: 2000 
#> Maximum number of data looks: 18
#> Planned data looks after:  300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000 patients have reached follow-up
#> Number of patients randomised at each look:  300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000
#> 
#> Superiority threshold: 0.99 (all analyses)
#> Inferiority threshold: 0.01 (all analyses)
#> Equivalence threshold: 0.9 (all analyses) (no common control)
#> Absolute equivalence difference: 0.05
#> No futility threshold (not relevant - no common control)
#> Soften power for all analyses: 0.5

By default, probabilities are shown with 3 decimals. This can be changed by explicitly print()ing the specification with the prob_digits arguments, for example:

print(binom_trial, prob_digits = 2)
#> Trial specification: generic binomially distributed outcome trial
#> * Undesirable outcome
#> * No common control arm
#> * Best arm: Arm B
#> 
#> Arms, true outcomes, starting allocation probabilities 
#> and allocation probability limits:
#>   arms true_ys start_probs fixed_probs min_probs max_probs
#>  Arm A    0.25        0.33          NA      0.15        NA
#>  Arm B    0.20        0.33          NA      0.15        NA
#>  Arm C    0.30        0.33          NA      0.15        NA
#> 
#> Maximum sample size: 2000 
#> Maximum number of data looks: 18
#> Planned data looks after:  300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000 patients have reached follow-up
#> Number of patients randomised at each look:  300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000
#> 
#> Superiority threshold: 0.99 (all analyses)
#> Inferiority threshold: 0.01 (all analyses)
#> Equivalence threshold: 0.9 (all analyses) (no common control)
#> Absolute equivalence difference: 0.05
#> No futility threshold (not relevant - no common control)
#> Soften power for all analyses: 0.5

Simulate a single trial

Remember to define the seed to ensure reproducible results:

trial_res <- run_trial(binom_trial, seed = 12345)

trial_res
#> Single simulation result: generic binomially distributed outcome trial
#> * Undesirable outcome
#> * No common control arm
#> 
#> Final status: inconclusive, stopped at final allowed adaptive analysis
#> Final/maximum allowed sample sizes: 2000/2000 (100.0%)
#> Available outcome data at last adaptive analysis: 2000/2000 (100.0%)
#> 
#> Trial results overview:
#>   arms true_ys final_status status_look status_probs final_alloc
#>  Arm A    0.25       active          NA           NA       0.194
#>  Arm B    0.20       active          NA           NA       0.656
#>  Arm C    0.30     inferior        2000        0.007       0.150
#> 
#> Esimates from final analysis (all patients):
#>   arms sum_ys_all ns_all raw_ests_all post_ests_all post_errs_all lo_cri_all
#>  Arm A        180    742        0.243         0.243        0.0161      0.213
#>  Arm B        178    841        0.212         0.212        0.0141      0.185
#>  Arm C        113    417        0.271         0.272        0.0221      0.230
#>  hi_cri_all
#>       0.274
#>       0.240
#>       0.316
#> 
#> Estimates from last adaptive analysis including each arm:
#>   arms sum_ys  ns raw_ests post_ests post_errs lo_cri hi_cri
#>  Arm A    180 742    0.243     0.243    0.0159  0.213  0.275
#>  Arm B    178 841    0.212     0.212    0.0141  0.185  0.241
#>  Arm C    113 417    0.271     0.271    0.0215  0.230  0.316
#> 
#> Simulation details:
#> * Random seed: 12345
#> * Credible interval width: 95%
#> * Number of posterior draws: 5000
#> * Posterior estimation method: medians with MAD-SDs

Again, we can choose the number of decimals with print():

print(trial_res, prob_digits = 2)
#> Single simulation result: generic binomially distributed outcome trial
#> * Undesirable outcome
#> * No common control arm
#> 
#> Final status: inconclusive, stopped at final allowed adaptive analysis
#> Final/maximum allowed sample sizes: 2000/2000 (100.0%)
#> Available outcome data at last adaptive analysis: 2000/2000 (100.0%)
#> 
#> Trial results overview:
#>   arms true_ys final_status status_look status_probs final_alloc
#>  Arm A    0.25       active          NA           NA        0.19
#>  Arm B    0.20       active          NA           NA        0.66
#>  Arm C    0.30     inferior        2000        0.007        0.15
#> 
#> Esimates from final analysis (all patients):
#>   arms sum_ys_all ns_all raw_ests_all post_ests_all post_errs_all lo_cri_all
#>  Arm A        180    742         0.24          0.24         0.016       0.21
#>  Arm B        178    841         0.21          0.21         0.014       0.19
#>  Arm C        113    417         0.27          0.27         0.022       0.23
#>  hi_cri_all
#>        0.27
#>        0.24
#>        0.32
#> 
#> Estimates from last adaptive analysis including each arm:
#>   arms sum_ys  ns raw_ests post_ests post_errs lo_cri hi_cri
#>  Arm A    180 742     0.24      0.24     0.016   0.21   0.27
#>  Arm B    178 841     0.21      0.21     0.014   0.19   0.24
#>  Arm C    113 417     0.27      0.27     0.022   0.23   0.32
#> 
#> Simulation details:
#> * Random seed: 12345
#> * Credible interval width: 95%
#> * Number of posterior draws: 5000
#> * Posterior estimation method: medians with MAD-SDs

Simulate multiple trials

Generally, we want to run many simulations using the same trial specification to assess and compare performance metrics of different trial designs. This is the job of run_trials() (note the final s); again, note the use of a base_seed for reproducible results. Here we run 25 simulations, but in practice you will generally want to run more simulations. The low number of simulations in this example have been chosen to make run-time tolerable when producing the example, but leads to uncertainty and instability in the results, as seen below.

trial_res_mult <- run_trials(binom_trial, n_rep = 25, base_seed = 67890)

run_trials() can run simulations on several CPU cores concurrently: set the cores argument to some number greater than 1 (which is the default value, resulting in serial/non-parallel processing). As an aside, you can see the number of available CPU cores by calling parallel::detectCores().

Calculate performance metrics and summmarise results

The results of multiple simulations may be summarised by printing the resulting object:

trial_res_mult
#> Multiple simulation results: generic binomially distributed outcome trial
#> * Undesirable outcome
#> * Number of simulations: 25
#> * Number of simulations summarised: 25 (all trials)
#> * No common control arm
#> * Selection strategy: no selection if no superior arm
#> * Treatment effect compared to: no comparison
#> 
#> Performance metrics (using posterior estimates from last adaptive analysis):
#> * Sample sizes: mean 1324.0 (SD: 547.2) | median 1200.0 (IQR: 1000.0 to 1900.0)
#> * Total summarised outcomes: mean 296.1 (SD: 111.5) | median 282.0 (IQR: 228.0 to 415.0)
#> * Total summarised outcome rates: mean 0.228 (SD: 0.013) | median 0.230 (IQR: 0.220 to 0.235)
#> * Conclusive: 80.0%
#> * Superiority: 64.0%
#> * Equivalence: 16.0%
#> * Futility: 0.0% [not assessed]
#> * Inconclusive at max sample size: 20.0%
#> * Selection probabilities: Arm A: 0.0% | Arm B: 64.0% | Arm C: 0.0% | None: 36.0%
#> * RMSE: 0.01642
#> * RMSE treatment effect: not estimated
#> * Ideal design percentage: 100.0%
#> 
#> Simulation details:
#> * Simulation time: 1.1 secs
#> * Base random seed: 67890
#> * Credible interval width: 95%
#> * Number of posterior draws: 5000
#> * Estimation method: posterior medians with MAD-SDs

This calls the summary() method (as known from, e.g., regression models in R), which summarises the results, and prints the output from that function in a human-friendly manner using the print() method for summarised simulations.

The summary() function can also be called directly, which allows more control of how results are summarised (including which arms are selected in inconclusive trials), and allows subsequent extraction of individual key results. In addition, the number of digits can be controlled when printed:

res_sum <- summary(trial_res_mult)

print(res_sum, digits = 1)
#> Multiple simulation results: generic binomially distributed outcome trial
#> * Undesirable outcome
#> * Number of simulations: 25
#> * Number of simulations summarised: 25 (all trials)
#> * No common control arm
#> * Selection strategy: no selection if no superior arm
#> * Treatment effect compared to: no comparison
#> 
#> Performance metrics (using posterior estimates from last adaptive analysis):
#> * Sample sizes: mean 1324.0 (SD: 547.2) | median 1200.0 (IQR: 1000.0 to 1900.0)
#> * Total summarised outcomes: mean 296.1 (SD: 111.5) | median 282.0 (IQR: 228.0 to 415.0)
#> * Total summarised outcome rates: mean 0.228 (SD: 0.013) | median 0.230 (IQR: 0.220 to 0.235)
#> * Conclusive: 80.0%
#> * Superiority: 64.0%
#> * Equivalence: 16.0%
#> * Futility: 0.0% [not assessed]
#> * Inconclusive at max sample size: 20.0%
#> * Selection probabilities: Arm A: 0.0% | Arm B: 64.0% | Arm C: 0.0% | None: 36.0%
#> * RMSE: 0.01642
#> * RMSE treatment effect: not estimated
#> * Ideal design percentage: 100.0%
#> 
#> Simulation details:
#> * Simulation time: 1.1 secs
#> * Base random seed: 67890
#> * Credible interval width: 95%
#> * Number of posterior draws: 5000
#> * Estimation method: posterior medians with MAD-SDs

The summary() method, however, may not necessarily be what you want. adaptr has additional functions that may be used on multiple simulations using the same trial specification.

The extract_results() function extract key trial results and yields a tidy data.frame with one simulation per row:

extr_res <- extract_results(trial_res_mult)

nrow(extr_res)
#> [1] 25

head(extr_res)
#>   sim final_n sum_ys  ratio_ys final_status superior_arm selected_arm
#> 1   1    2000    415 0.2075000          max         <NA>         <NA>
#> 2   2     600    139 0.2316667  superiority        Arm B        Arm B
#> 3   3    1000    237 0.2370000  superiority        Arm B        Arm B
#> 4   4     900    209 0.2322222  equivalence         <NA>         <NA>
#> 5   5    2000    441 0.2205000  superiority        Arm B        Arm B
#> 6   6    1900    431 0.2268421  superiority        Arm B        Arm B
#>         sq_err sq_err_te
#> 1           NA        NA
#> 2 7.853843e-04        NA
#> 3 4.190319e-05        NA
#> 4           NA        NA
#> 5 3.422824e-06        NA
#> 6 4.852161e-05        NA

Further, the check_performance() function calculates key performance metrics and returns them in a tidy data.frame with one metric per row, and may also be used to assess the uncertainty in these estimates using bootstrapping:

perf_res <- check_performance(trial_res_mult, uncertainty = TRUE, n_boot = 1000,
                              boot_seed = "base")

print(perf_res, digits = 3)
#>                   metric      est  err_sd err_mad    lo_ci    hi_ci
#> 1           n_summarised   25.000   0.000   0.000   25.000   25.000
#> 2              size_mean 1324.000 104.423 106.747 1131.800 1524.000
#> 3                size_sd  547.175  47.966  48.405  441.268  627.113
#> 4            size_median 1200.000 229.311 296.520 1000.000 1600.000
#> 5               size_p25 1000.000 132.490   0.000  600.000 1100.000
#> 6               size_p75 1900.000 184.737 148.260 1500.000 2000.000
#> 7            sum_ys_mean  296.120  21.297  22.120  255.065  335.856
#> 8              sum_ys_sd  111.481  10.793  10.592   87.716  129.987
#> 9          sum_ys_median  282.000  41.603  63.752  234.000  359.000
#> 10            sum_ys_p25  228.000  31.031  13.343  138.000  266.000
#> 11            sum_ys_p75  415.000  32.732   7.413  325.000  431.000
#> 12         ratio_ys_mean    0.228   0.003   0.003    0.223    0.233
#> 13           ratio_ys_sd    0.013   0.002   0.002    0.009    0.016
#> 14       ratio_ys_median    0.230   0.004   0.003    0.220    0.234
#> 15          ratio_ys_p25    0.220   0.004   0.005    0.210    0.228
#> 16          ratio_ys_p75    0.235   0.003   0.003    0.232    0.242
#> 17       prob_conclusive    0.800   0.079   0.059    0.640    0.960
#> 18         prob_superior    0.640   0.095   0.119    0.440    0.800
#> 19      prob_equivalence    0.160   0.076   0.059    0.040    0.320
#> 20         prob_futility    0.000   0.000   0.000    0.000    0.000
#> 21              prob_max    0.200   0.079   0.059    0.040    0.360
#> 22 prob_select_arm_Arm A    0.000   0.000   0.000    0.000    0.000
#> 23 prob_select_arm_Arm B    0.640   0.095   0.119    0.440    0.800
#> 24 prob_select_arm_Arm C    0.000   0.000   0.000    0.000    0.000
#> 25      prob_select_none    0.360   0.095   0.119    0.200    0.560
#> 26                  rmse    0.016   0.004   0.004    0.008    0.024
#> 27               rmse_te       NA      NA      NA       NA       NA
#> 28                   idp  100.000   0.000   0.000  100.000  100.000

Finally, the stability of performance metrics according to the number of simulations may be assessed visually using the plot_convergence() function. Note that all adaptr plotting functions require the ggplot2 package.

# Convergence plots for four performance metrics
plot_convergence(trial_res_mult, metrics = c("size mean", "prob superior",
                                             "rmse", "idp"))

It is seen that the low number of simulations used in this example leads to substantial uncertainty and instability in performance metrics (while the ideal design percentage is always at 100% here, it would likely drop somewhat if more simulations were conducted).

Visualise trial results

In addition to the convergence plots, results may be visualised using the plot_status() and plot_history() functions. We need non-sparse results for plot_history() (but not for plot_status() or plot_convergence() presented above), so first we re-run run_trials() with the sparse argument set to FALSE:

trial_res_mult <- run_trials(binom_trial, n_rep = 25, base_seed = 67890,
                             sparse = FALSE)

First, we plot the overall trial statuses according to the total number of patients randomised (this function does not require sparse = FALSE):

plot_status(trial_res_mult, x_value = "total n")

We can also plot the statuses for specific arms, or for all arms if supplying NA to the arm-argument:

plot_status(trial_res_mult, x_value = "total n", arm = NA, ncol = 1)

Next, we plot the history of allocation probabilities at each adaptive analysis look. Intervals cover the inter-quartile range by default (interval_width = 0.5):

plot_history(trial_res_mult)

Plotting other summary metrics is possible; see plot_history().

Citation

If using the package, please consider citing:

citation(package = "adaptr")
#> 
#> To cite adaptr in publications use:
#> 
#>   Granholm A, Jensen AKG, Lange T, Kaas-Hansen BS (2022). adaptr: an R
#>   package for simulating and comparing adaptive clinical trials.
#>   Journal of Open Source Software, 7(72), 4284. URL
#>   https://doi.org/10.21105/joss.04284.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {{adaptr}: an R package for simulating and comparing adaptive clinical trials},
#>     author = {Anders Granholm and Aksel Karl Georg Jensen and Theis Lange and Benjamin Skov Kaas-Hansen},
#>     journal = {Journal of Open Source Software},
#>     year = {2022},
#>     volume = {7},
#>     number = {72},
#>     pages = {4284},
#>     url = {https://doi.org/10.21105/joss.04284},
#>     doi = {10.21105/joss.04284},
#>   }