EcoDiet explored on a realistic example

Heloise Thero, Pierre-Yves Hernvann

2023-01-06

The introduction employed a simplistic expemple of food web to familiarize the user with the basic commands and options of the EcoDiet package. Here we will use a more realistic example (although still artificial!) to run the different EcoDiet configurations, compare their results and hence higlight the complementarity in the different data used.

The data corresponds to 10 trophic groups with stomach content data, and very distinct isotopic measures.

realistic_stomach_data_path <- system.file("extdata", "realistic_stomach_data.csv",
                                           package = "EcoDiet")
realistic_stomach_data <- read.csv(realistic_stomach_data_path)
knitr::kable(realistic_stomach_data)
X Cod Pout Sardine Shrimps Crabs Bivalves Worms Zooplankton Phytoplankton Detritus
Cod 0 0 0 0 0 0 0 0 0 0
Pout 1 0 0 0 0 0 0 0 0 0
Sardine 9 0 0 0 0 0 0 0 0 0
Shrimps 4 4 29 0 24 0 0 0 0 0
Crabs 1 24 0 0 0 0 0 0 0 0
Bivalves 0 3 0 0 11 0 0 0 0 0
Worms 16 30 0 1 24 0 0 0 0 0
Zooplankton 0 27 6 3 0 0 0 0 0 0
Phytoplankton 0 0 14 10 0 16 0 20 0 0
Detritus 0 0 0 12 19 18 18 0 0 0
full 21 30 29 19 29 27 18 20 0 0
realistic_biotracer_data_path <- system.file("extdata", "realistic_biotracer_data.csv",
                                           package = "EcoDiet")
realistic_biotracer_data <- read.csv(realistic_biotracer_data_path)
knitr::kable(realistic_biotracer_data[c(1:3, 31:33, 61:63), ])
group d13C d15N
1 Cod -12.94144 19.18913
2 Cod -14.96070 20.23939
3 Cod -13.77822 19.48809
31 Pout -13.47127 18.57353
32 Pout -13.16888 17.58714
33 Pout -14.23085 17.38938
61 Sardine -14.56111 16.95231
62 Sardine -15.04729 17.15197
63 Sardine -14.63688 16.90906
library(EcoDiet)

plot_data(biotracer_data = realistic_biotracer_data,
          stomach_data = realistic_stomach_data)

#> Warning: Use of `biotracer_data$group` is discouraged. Use `group` instead.

Yes, we are aware that isotopic data is usually messier, but isn’t it a beautiful plot?

The configuration without literature data

We define the configuration we are in, and preprocess the data:

literature_configuration <- FALSE

data <- preprocess_data(biotracer_data = realistic_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = realistic_stomach_data)
#> The model will investigate the following trophic links:
#>               Bivalves Cod Crabs Detritus Phytoplankton Pout Sardine Shrimps
#> Bivalves             0   0     1        0             0    1       0       0
#> Cod                  0   0     0        0             0    0       0       0
#> Crabs                0   1     0        0             0    1       0       0
#> Detritus             1   0     1        0             0    0       0       1
#> Phytoplankton        1   0     0        0             0    0       1       1
#> Pout                 0   1     0        0             0    0       0       0
#> Sardine              0   1     0        0             0    0       0       0
#> Shrimps              0   1     1        0             0    1       1       0
#> Worms                0   1     1        0             0    1       0       1
#> Zooplankton          0   0     0        0             0    1       1       1
#>               Worms Zooplankton
#> Bivalves          0           0
#> Cod               0           0
#> Crabs             0           0
#> Detritus          1           0
#> Phytoplankton     0           1
#> Pout              0           0
#> Sardine           0           0
#> Shrimps           0           0
#> Worms             0           0
#> Zooplankton       0           0

In this configuration, priors are set for each trophic link identified as plausible by the user but the priors are not informed by literature data, and are thus uninformative:

plot_prior(data, literature_configuration)

The marginal prior distributions have different shape depending on the variables:

plot_prior(data, literature_configuration, pred = "Pout")

We define the model, and test if it compiles well with a few iterations and adaptation steps:

filename <- "mymodel.txt"
write_model(file.name = filename, literature_configuration = literature_configuration, print.model = F)
mcmc_output <- run_model(filename, data, run_param="test")
#> 
#> Processing function input....... 
#> 
#> Done. 
#>  
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 316
#>    Unobserved stochastic nodes: 125
#>    Total graph size: 1104
#> 
#> Initializing model
#> 
#> Adaptive phase, 500 iterations x 3 chains 
#> If no progress bar appears JAGS has decided not to adapt 
#>  
#> 
#>  Burn-in phase, 500 iterations x 3 chains 
#>  
#> 
#> Sampling from joint posterior, 500 iterations x 3 chains 
#>  
#> 
#> Calculating statistics.......
#> Warning in doTryCatch(return(expr), name, parentenv, handler): At least one Rhat
#> value could not be calculated.
#> 
#> Done.
#> 
#>   /!\ Convergence warning:
#> Out of the 51 variables, 12 variables have a Gelman-Rubin statistic > 1.1.
#> You may consider modifying the model run settings.
#> The variables with the poorest convergence are: PI[6,2], PI[7,2], PI[3,2], PI[1,6], PI[9,8], PI[8,6], PI[5,8], PI[4,1], PI[5,1], PI[5,7].
#> JAGS output for model 'mymodel.txt', generated by jagsUI.
#> Estimates based on 3 chains of 1000 iterations,
#> adaptation = 500 iterations (sufficient),
#> burn-in = 500 iterations and thin rate = 1,
#> yielding 1500 total samples from the joint posterior. 
#> MCMC ran for 0.515 minutes at time 2023-01-06 12:43:28.
#> 
#>              mean     sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
#> eta[4,1]    0.657  0.085   0.486   0.663   0.813    FALSE 1 1.002   691
#> eta[5,1]    0.594  0.089   0.413   0.597   0.758    FALSE 1 1.005   386
#> eta[3,2]    0.087  0.057   0.010   0.074   0.231    FALSE 1 1.007   524
#> eta[6,2]    0.089  0.059   0.013   0.077   0.230    FALSE 1 1.003   626
#> eta[7,2]    0.441  0.104   0.245   0.441   0.645    FALSE 1 1.018   119
#> eta[8,2]    0.221  0.085   0.084   0.209   0.412    FALSE 1 1.001  1500
#> eta[9,2]    0.733  0.089   0.534   0.739   0.884    FALSE 1 1.005   347
#> eta[1,3]    0.387  0.086   0.232   0.384   0.565    FALSE 1 0.999  1500
#> eta[4,3]    0.647  0.082   0.484   0.651   0.794    FALSE 1 0.999  1500
#> eta[8,3]    0.808  0.070   0.648   0.814   0.926    FALSE 1 1.003   850
#> eta[9,3]    0.806  0.069   0.661   0.813   0.918    FALSE 1 1.001  1075
#> eta[1,6]    0.122  0.056   0.036   0.116   0.251    FALSE 1 1.002  1430
#> eta[3,6]    0.784  0.072   0.632   0.792   0.906    FALSE 1 1.004   488
#> eta[8,6]    0.153  0.062   0.051   0.147   0.290    FALSE 1 1.001  1500
#> eta[9,6]    0.969  0.030   0.894   0.979   0.999    FALSE 1 1.001  1440
#> eta[10,6]   0.875  0.059   0.739   0.886   0.962    FALSE 1 1.005   604
#> eta[5,7]    0.486  0.089   0.316   0.486   0.660    FALSE 1 1.006   338
#> eta[8,7]    0.968  0.031   0.891   0.977   0.999    FALSE 1 1.006   648
#> eta[10,7]   0.230  0.073   0.108   0.226   0.388    FALSE 1 1.000  1500
#> eta[4,8]    0.618  0.104   0.415   0.622   0.809    FALSE 1 1.009   199
#> eta[5,8]    0.525  0.108   0.314   0.523   0.729    FALSE 1 1.000  1500
#> eta[9,8]    0.093  0.061   0.013   0.081   0.246    FALSE 1 1.002   622
#> eta[10,8]   0.200  0.086   0.064   0.191   0.395    FALSE 1 1.001  1500
#> eta[4,9]    0.949  0.048   0.814   0.962   0.999    FALSE 1 1.004   765
#> eta[5,10]   0.952  0.045   0.836   0.965   0.999    FALSE 1 1.002  1222
#> PI[4,1]     0.397  0.321   0.000   0.340   1.000    FALSE 1 1.242    13
#> PI[5,1]     0.603  0.321   0.000   0.660   1.000    FALSE 1 1.242    13
#> PI[3,2]     0.143  0.236   0.000   0.012   0.847    FALSE 1 1.566     8
#> PI[6,2]     0.118  0.198   0.000   0.026   0.698    FALSE 1 1.631     8
#> PI[7,2]     0.401  0.325   0.000   0.390   0.963    FALSE 1 1.589     7
#> PI[8,2]     0.161  0.209   0.000   0.050   0.695    FALSE 1 1.033    66
#> PI[9,2]     0.177  0.159   0.000   0.146   0.526    FALSE 1 1.113    22
#> PI[1,3]     0.182  0.213   0.000   0.100   0.713    FALSE 1 1.059    41
#> PI[4,3]     0.192  0.171   0.000   0.161   0.573    FALSE 1 1.028    74
#> PI[8,3]     0.264  0.198   0.000   0.236   0.705    FALSE 1 1.127    21
#> PI[9,3]     0.363  0.256   0.000   0.329   0.879    FALSE 1 1.065    39
#> PI[1,6]     0.015  0.068   0.000   0.000   0.258    FALSE 1 1.349    31
#> PI[3,6]     0.334  0.211   0.000   0.324   0.759    FALSE 1 1.043    52
#> PI[8,6]     0.011  0.030   0.000   0.000   0.110    FALSE 1 1.320    20
#> PI[9,6]     0.362  0.222   0.017   0.331   0.814    FALSE 1 1.003   707
#> PI[10,6]    0.279  0.213   0.000   0.258   0.747    FALSE 1 1.045    51
#> PI[5,7]     0.204  0.178   0.000   0.185   0.563    FALSE 1 1.213    14
#> PI[8,7]     0.544  0.242   0.051   0.568   0.995    FALSE 1 1.040    71
#> PI[10,7]    0.252  0.294   0.000   0.119   0.920    FALSE 1 1.086    30
#> PI[4,8]     0.243  0.198   0.000   0.216   0.726    FALSE 1 1.020   104
#> PI[5,8]     0.287  0.272   0.000   0.226   1.000    FALSE 1 1.255    15
#> PI[9,8]     0.078  0.127   0.000   0.009   0.458    FALSE 1 1.324    11
#> PI[10,8]    0.391  0.269   0.000   0.408   0.902    FALSE 1 1.020   129
#> PI[4,9]     1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> PI[5,10]    1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> deviance  867.210 11.529 846.874 866.513 892.169    FALSE 1 1.017   146
#> 
#> **WARNING** Rhat values indicate convergence failure. 
#> Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
#> For each parameter, n.eff is a crude measure of effective sample size. 
#> 
#> overlap0 checks if 0 falls in the parameter's 95% credible interval.
#> f is the proportion of the posterior with the same sign as the mean;
#> i.e., our confidence that the parameter is positive or negative.
#> 
#> DIC info: (pD = var(deviance)/2) 
#> pD = 65.6 and DIC = 932.836 
#> DIC is an estimate of expected predictive error (lower is better).

You should now try to run the model until it converges (it should take around half an hour to run, so we won’t do it in this vignette):

mcmc_output <- run_model(filename, data, run_param="normal", parallelize = T)

Here are the figures corresponding to the results that have converged:

plot_results(mcmc_output, data)

plot_results(mcmc_output, data, pred = "Pout")

You can also plot the results for specific prey if you want a clearer figure:

plot_results(mcmc_output, data, pred = "Pout", 
             variable = "PI", prey = c("Bivalves", "Worms"))

The configuration with literature data

We now change the configuration to add literature data to the model:

literature_configuration <- TRUE
realistic_literature_diets_path <- system.file("extdata", "realistic_literature_diets.csv",
                                               package = "EcoDiet")
realistic_literature_diets <- read.csv(realistic_literature_diets_path)
knitr::kable(realistic_literature_diets)
X Cod Pout Sardine Shrimps Crabs Bivalves Worms Zooplankton Phytoplankton Detritus
Cod 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0 0.0 0 0
Pout 0.4275065 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0 0.0 0 0
Sardine 0.3603675 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0 0.0 0 0
Shrimps 0.0300737 0.5295859 0.0002143 0.0000000 0.0082107 0.0000000 0.0 0.0 0 0
Crabs 0.1410430 0.3332779 0.0000000 0.0000000 0.0000000 0.0000000 0.0 0.0 0 0
Bivalves 0.0000000 0.0006130 0.0000000 0.0000000 0.3441081 0.0000000 0.0 0.0 0 0
Worms 0.0410093 0.1023676 0.0000000 0.0171336 0.4435377 0.0000000 0.0 0.0 0 0
Zooplankton 0.0000000 0.0341557 0.7381375 0.9121505 0.0000000 0.0000000 0.0 0.0 0 0
Phytoplankton 0.0000000 0.0000000 0.2616482 0.0000610 0.0000000 0.9966847 0.0 1.0 0 0
Detritus 0.0000000 0.0000000 0.0000000 0.0706550 0.2041434 0.0033153 1.0 0.0 0 0
pedigree 0.8000000 0.1000000 0.5000000 0.3000000 0.7000000 0.1000000 0.2 0.2 1 1
data <- preprocess_data(biotracer_data = realistic_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = realistic_stomach_data,
                        literature_diets = realistic_literature_diets,
                        nb_literature = 12,
                        literature_slope = 0.5)
#> The model will investigate the following trophic links:
#>               Bivalves Cod Crabs Detritus Phytoplankton Pout Sardine Shrimps
#> Bivalves             0   0     1        0             0    1       0       0
#> Cod                  0   0     0        0             0    0       0       0
#> Crabs                0   1     0        0             0    1       0       0
#> Detritus             1   0     1        0             0    0       0       1
#> Phytoplankton        1   0     0        0             0    0       1       1
#> Pout                 0   1     0        0             0    0       0       0
#> Sardine              0   1     0        0             0    0       0       0
#> Shrimps              0   1     1        0             0    1       1       0
#> Worms                0   1     1        0             0    1       0       1
#> Zooplankton          0   0     0        0             0    1       1       1
#>               Worms Zooplankton
#> Bivalves          0           0
#> Cod               0           0
#> Crabs             0           0
#> Detritus          1           0
#> Phytoplankton     0           1
#> Pout              0           0
#> Sardine           0           0
#> Shrimps           0           0
#> Worms             0           0
#> Zooplankton       0           0

Now we see that the prior distributions are informed by the literature data:

plot_prior(data, literature_configuration)

plot_prior(data, literature_configuration, pred = "Pout")

Again, we verify that the model compiles well:

filename <- "mymodel_literature.txt"
write_model(file.name = filename, literature_configuration = literature_configuration, print.model = F)
mcmc_output <- run_model(filename, data, run_param="test")
#> 
#> Processing function input....... 
#> 
#> Done. 
#>  
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 316
#>    Unobserved stochastic nodes: 125
#>    Total graph size: 1594
#> 
#> Initializing model
#> 
#> Adaptive phase, 500 iterations x 3 chains 
#> If no progress bar appears JAGS has decided not to adapt 
#>  
#> 
#>  Burn-in phase, 500 iterations x 3 chains 
#>  
#> 
#> Sampling from joint posterior, 500 iterations x 3 chains 
#>  
#> 
#> Calculating statistics.......
#> Warning in doTryCatch(return(expr), name, parentenv, handler): At least one Rhat
#> value could not be calculated.
#> 
#> Done.
#> 
#>   /!\ Convergence warning:
#> Out of the 51 variables, 15 variables have a Gelman-Rubin statistic > 1.1.
#> You may consider modifying the model run settings.
#> The variables with the poorest convergence are: PI[10,8], PI[3,2], PI[5,8], PI[8,6], PI[4,1], PI[5,1], PI[9,8], PI[1,6], PI[8,7], PI[10,6].
#> JAGS output for model 'mymodel_literature.txt', generated by jagsUI.
#> Estimates based on 3 chains of 1000 iterations,
#> adaptation = 500 iterations (sufficient),
#> burn-in = 500 iterations and thin rate = 1,
#> yielding 1500 total samples from the joint posterior. 
#> MCMC ran for 0.568 minutes at time 2023-01-06 12:44:06.
#> 
#>              mean     sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
#> eta[4,1]    0.668  0.084   0.492   0.672   0.818    FALSE 1 1.007   272
#> eta[5,1]    0.604  0.087   0.431   0.608   0.775    FALSE 1 1.001  1399
#> eta[3,2]    0.355  0.084   0.201   0.350   0.523    FALSE 1 1.001  1455
#> eta[6,2]    0.354  0.082   0.206   0.349   0.521    FALSE 1 1.007   318
#> eta[7,2]    0.601  0.082   0.436   0.601   0.757    FALSE 1 1.002   659
#> eta[8,2]    0.446  0.085   0.281   0.444   0.615    FALSE 1 1.005   417
#> eta[9,2]    0.818  0.067   0.670   0.825   0.927    FALSE 1 1.001  1500
#> eta[1,3]    0.529  0.078   0.364   0.530   0.674    FALSE 1 1.000  1500
#> eta[4,3]    0.719  0.073   0.566   0.727   0.848    FALSE 1 1.001  1018
#> eta[8,3]    0.848  0.057   0.717   0.855   0.937    FALSE 1 1.005   460
#> eta[9,3]    0.854  0.054   0.735   0.857   0.942    FALSE 1 1.001  1301
#> eta[1,6]    0.159  0.063   0.057   0.153   0.296    FALSE 1 1.000  1500
#> eta[3,6]    0.790  0.072   0.632   0.797   0.912    FALSE 1 1.002   669
#> eta[8,6]    0.188  0.063   0.082   0.182   0.325    FALSE 1 1.003   495
#> eta[9,6]    0.970  0.030   0.893   0.979   0.999    FALSE 1 1.002  1500
#> eta[10,6]   0.880  0.057   0.745   0.889   0.965    FALSE 1 1.001  1500
#> eta[5,7]    0.566  0.081   0.410   0.564   0.723    FALSE 1 1.005   391
#> eta[8,7]    0.974  0.026   0.908   0.982   0.999    FALSE 1 1.006  1043
#> eta[10,7]   0.368  0.079   0.228   0.367   0.524    FALSE 1 1.001  1094
#> eta[4,8]    0.672  0.090   0.489   0.675   0.832    FALSE 1 1.000  1500
#> eta[5,8]    0.593  0.100   0.389   0.596   0.777    FALSE 1 1.002   761
#> eta[9,8]    0.229  0.080   0.094   0.220   0.401    FALSE 1 1.001  1500
#> eta[10,8]   0.304  0.092   0.140   0.301   0.497    FALSE 1 1.008   254
#> eta[4,9]    0.953  0.045   0.830   0.966   0.999    FALSE 1 1.005   946
#> eta[5,10]   0.958  0.041   0.849   0.970   0.999    FALSE 1 1.002  1214
#> PI[4,1]     0.198  0.324   0.000   0.003   1.000    FALSE 1 1.439     9
#> PI[5,1]     0.802  0.324   0.000   0.997   1.000    FALSE 1 1.439     9
#> PI[3,2]     0.207  0.293   0.000   0.040   0.949    FALSE 1 1.687     7
#> PI[6,2]     0.271  0.296   0.000   0.123   0.890    FALSE 1 1.042    55
#> PI[7,2]     0.366  0.334   0.000   0.317   0.965    FALSE 1 1.199    14
#> PI[8,2]     0.071  0.126   0.000   0.013   0.435    FALSE 1 1.282    17
#> PI[9,2]     0.085  0.103   0.000   0.042   0.350    FALSE 1 1.021   122
#> PI[1,3]     0.364  0.216   0.001   0.353   0.798    FALSE 1 1.030    74
#> PI[4,3]     0.100  0.106   0.000   0.068   0.362    FALSE 1 1.025    95
#> PI[8,3]     0.049  0.078   0.000   0.010   0.277    FALSE 1 1.134    38
#> PI[9,3]     0.487  0.233   0.060   0.477   0.971    FALSE 1 1.028   106
#> PI[1,6]     0.021  0.079   0.000   0.000   0.199    FALSE 1 1.425    16
#> PI[3,6]     0.337  0.264   0.000   0.334   0.876    FALSE 1 1.228    13
#> PI[8,6]     0.236  0.315   0.000   0.007   0.915    FALSE 1 1.547     7
#> PI[9,6]     0.206  0.199   0.000   0.150   0.673    FALSE 1 1.008   264
#> PI[10,6]    0.200  0.205   0.000   0.138   0.663    FALSE 1 1.348     9
#> PI[5,7]     0.079  0.120   0.000   0.012   0.420    FALSE 1 1.083    44
#> PI[8,7]     0.067  0.179   0.000   0.000   0.673    FALSE 1 1.350    17
#> PI[10,7]    0.853  0.243   0.001   0.949   1.000    FALSE 1 1.318    18
#> PI[4,8]     0.215  0.239   0.000   0.136   0.879    FALSE 1 1.028   104
#> PI[5,8]     0.143  0.231   0.000   0.005   0.785    FALSE 1 1.569     8
#> PI[9,8]     0.299  0.302   0.000   0.217   0.911    FALSE 1 1.432     8
#> PI[10,8]    0.343  0.372   0.000   0.121   0.994    FALSE 1 2.010     5
#> PI[4,9]     1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> PI[5,10]    1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> deviance  865.826 11.103 845.743 865.696 888.519    FALSE 1 1.010   208
#> 
#> **WARNING** Rhat values indicate convergence failure. 
#> Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
#> For each parameter, n.eff is a crude measure of effective sample size. 
#> 
#> overlap0 checks if 0 falls in the parameter's 95% credible interval.
#> f is the proportion of the posterior with the same sign as the mean;
#> i.e., our confidence that the parameter is positive or negative.
#> 
#> DIC info: (pD = var(deviance)/2) 
#> pD = 61.1 and DIC = 926.945 
#> DIC is an estimate of expected predictive error (lower is better).

You should now try to run the model until it converges (it should take around half an hour to run, so we won’t do it in this vignette):

mcmc_output <- run_model(filename, data, run_param=list(nb_iter=100000, nb_burnin=50000, nb_thin=50, nb_adapt=50000), parallelize = T)

Here are the figures corresponding to the results that have converged:

plot_results(mcmc_output, data)

plot_results(mcmc_output, data, pred = "Pout")

You can save the figures as PNG using:

plot_results(mcmc_output, data, pred = "Pout", save = TRUE, save_path = ".")