Installing from CRAN.
install.packages("CRE")
Installing the latest developing version.
library(devtools)
install_github("NSAPH-Software/CRE", ref = "develop")
Import.
library("CRE")
Data (required)
y
The observed response/outcome vector
(binary or continuos).
z
The treatment/exposure/policy vector
(binary).
X
The covariate matrix (binary or
continuos).
Parameters (not required)
method_parameters
The list of
parameters to define the models used, including:
- ratio_dis
The ratio of data delegated to
the discovery sub-sample (default: 0.5).
- ite_method_dis
The method to estimate
the individual treatment effect (ITE) on the discovery sub-sample
(default: ‘aipw’) [1].
- ps_method_dis
The estimation model for
the propensity score on the discovery sub-sample (default:
‘SL.xgboost’).
- or_method_dis
The estimation model for
the outcome regressions estimate_ite_aipw on the discovery sub-sample
(default: ‘SL.xgboost’).
- ite_method_inf
The method to estimate
the individual treatment effect (ITE) on the infernce sub-sample
(default: ‘aipw’) [1].
- ps_method_inf
The estimation model for
the propensity score on the inference subsample (default:
‘SL.xgboost’).
- or_method_inf
The estimation model for
the outcome regressions in estimate_ite_aipw on the inference subsample
(default: ‘SL.xgboost’).
hyper_params
The list of hyper
parameters to fine tune the method, including:
- intervention_vars
Intervention-able
variables used for Rules Generation (default: NULL
).
- offset
Name of the covariate to use as
offset (i.e. ‘x1’) for T-Poisson ITE Estimation. NULL
if
not used (default: NULL
).
- ntrees_rf
A number of decision trees for
random forest (default: 20).
- ntrees_gbm
A number of decision trees
for the generalized boosted regression modeling algorithm. (default:
20).
- node_size
Minimum size of the trees’
terminal nodes (default: 20).
- max_nodes
Maximum number of terminal
nodes per tree (default: 5).
- max_depth
Maximum rules length (default:
3).
- replace
Boolean variable for replacement
in bootstrapping for rules generation by random forest (default:
TRUE
).
- t_decay
The decay threshold for rules
pruning (default: 0.025).
- t_ext
The threshold to define too
generic or too specific (extreme) rules (default: 0.01).
- t_corr
The threshold to define
correlated rules (default: 1).
- t_pvalue
The threshold to define
statistically significant rules (default: 0.05).
- stability_selection
Whether or not using
stability selection for selecting the rules (default:
TRUE
).
- cutoff
Threshold defining the minimum
cutoff value for the stability scores (default: 0.9).
- pfer
Upper bound for the per-family
error rate (tolerated amount of falsely selected rules) (default:
1).
- penalty_rl
Order of penalty for rules
length during LASSO for Regularization (i.e. 0: no penalty, 1:
rules_length, 2: rules_length^2) (default: 1).
Additional Estimates (not required)
ite
The estimated ITE vector. If given,
both the ITE estimation steps in Discovery and Inference are skipped
(default: NULL
).
[1] Options for the ITE estimation are as follows:
slearner
)tlearner
)tpoisson
)xlearner
)aipw
)cf
)bcf
)bart
)if other estimates of the ITE are provided in ite
additional argument, both the ITE estimations in discovery and inference
are skipped and those values estimates are used instead.
One can create a customized wrapper for SuperLearner internal packages. The following is an example of providing the number of cores (e.g., 12) for the xgboost package in a shared memory system.
<- function(nthread = 12, ...) {
m_xgboost ::SL.xgboost(nthread = nthread, ...)
SuperLearner }
Then use “m_xgboost”, instead of “SL.xgboost”.
Example 1 (default parameters)
set.seed(9687)
<- generate_cre_dataset(n = 1000,
dataset rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
<- dataset[["y"]]
y <- dataset[["z"]]
z <- dataset[["X"]]
X
<- cre(y, z, X)
cre_results summary(cre_results)
plot(cre_results)
Example 2 (personalized ite estimation)
set.seed(9687)
<- generate_cre_dataset(n = 1000,
dataset rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
<- dataset[["y"]]
y <- dataset[["z"]]
z <- dataset[["X"]]
X
<- ... # personalized ite estimation
ite_pred <- cre(y, z, X, ite = ite_pred)
cre_results summary(cre_results)
plot(cre_results)
Example 3 (setting parameters)
set.seed(9687)
<- generate_cre_dataset(n = 1000,
dataset rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
<- dataset[["y"]]
y <- dataset[["z"]]
z <- dataset[["X"]]
X
<- list(ratio_dis = 0.25,
method_params ite_method_dis="aipw",
ps_method_dis = "SL.xgboost",
oreg_method_dis = "SL.xgboost",
ite_method_inf = "aipw",
ps_method_inf = "SL.xgboost",
oreg_method_inf = "SL.xgboost")
<- list(intervention_vars = c("x1","x2","x3","x4"),
hyper_params offset = NULL,
ntrees_rf = 20,
ntrees_gbm = 20,
node_size = 20,
max_nodes = 5,
max_depth = 3,
t_decay = 0.025,
t_ext = 0.025,
t_corr = 1,
t_pvalue = 0.05,
replace = FALSE,
stability_selection = TRUE,
cutoff = 0.8,
pfer = 0.1,
penalty_rl = 1)
<- cre(y, z, X, method_params, hyper_params)
cre_results summary(cre_results)
plot(cre_results)
More synthetic data sets can be generated using
generate_cre_dataset()
.