While {reservr}
is capable of fitting distributions to
censored and truncated observations, it does not directly allow
modelling the influence of exogenous variables observed alongside the
primary outcome. This is where the integration with TensorFlow comes
in.
The TensorFlow integration allows to fit a neural network simultaneously to all parameters of a distribution while taking exogenous variables into account.
{reservr}
accepts all partial tensorflow networks which
yield a single arbitrary-dimension rank 2 tensor (e.g. any dense layer)
as output and can connect suitable layers to this intermediate
output such that the complete network predicts the parameters of
any pre-specified distribution family.
It also dynamically compiles a suitable conditional likelihood based
loss, depending on the type of problem (censoring, truncation), which
can be optimized using the keras::fit
implementation
out-of-the box. This means there is full flexibility with respect to
callbacks, optimizers, mini-batching, etc.
library(reservr)
library(tensorflow)
library(keras)
library(tibble)
library(ggplot2)
The following example will show the code necessary to fit a simple model with the same assumptions as OLS to data. As a true relationship we use \(y = 2 x + \epsilon\) with \(\epsilon \sim \mathcal{N}(0, 1)\). We will not use censoring or truncation.
if (keras::is_keras_available()) {
set.seed(1431L)
::set_random_seed(1432L)
tensorflow
<- tibble(
dataset x = runif(100, min = 10, max = 20),
y = 2 * x + rnorm(100)
)
::qplot(x, y, data = dataset)
ggplot2
# Specify distributional assumption of OLS:
<- dist_normal(sd = 1.0) # OLS assumption: homoskedasticity
dist
# Optional: Compute a global fit
<- fit(dist, dataset$y)
global_fit
# Define a neural network
<- layer_input(shape = 1L, name = "x_input")
nnet_input # in practice, this would be deeper
<- nnet_input
nnet_output
<- if (packageVersion("keras") >= "2.6.0") {
optimizer optimizer_adam(learning_rate = 0.1)
else {
} optimizer_adam(lr = 0.1)
}
<- tf_compile_model(
nnet inputs = list(nnet_input),
intermediate_output = nnet_output,
dist = dist,
optimizer = optimizer,
censoring = FALSE, # Turn off unnecessary features for this problem
truncation = FALSE
)
<- fit(nnet, x = dataset$x, y = dataset$y, epochs = 100L, batch_size = 100L, shuffle = FALSE)
nnet_fit
plot(nnet_fit)
<- predict(nnet, data = list(k_constant(dataset$x)))
pred_params
<- lm(y ~ x, data = dataset)
lm_fit
$y_pred <- pred_params$mean
dataset$y_lm <- predict(lm_fit, newdata = dataset, type = "response")
dataset
ggplot(dataset, aes(x = x, y = y)) %+%
geom_point() %+%
geom_line(aes(y = y_pred)) %+%
geom_line(aes(y = y_lm), linetype = 2L)
<- rev(as.numeric(nnet$model$get_weights()))
coef_nnet <- coef(lm_fit)
coef_lm
print(coef_nnet)
print(coef_lm)
}#> Loaded Tensorflow version 2.9.2
#> Warning: `qplot()` was deprecated in ggplot2 3.4.0.
#> [1] 0.8800764 1.9325488
#> (Intercept) x
#> 0.5645856 1.9574191