RFPM
provides an implementation of the Floating
Percentile Model (FPM) in the R statistical environment. The FPM was
originally developed by others on behalf of the Washington State
Department of Ecology (Avocet 2003) and has since been recommended for
use in Washington, Oregon, and Idaho (Ecology 2011). The FPM currently
exists as a Microsoft Excel macro (written with Visual Basic for
Applications). RFPM
represents an independent effort to
port the FPM to R with the purpose of:
- correcting a data handling error
- streamlining the chemical selection and sediment quality benchmark
calculation procedures
- providing new tools to rapidly evaluate and refine benchmarks
- improving several aspects of the FPM algorithm
- improving the availability, accessibility, and transparency of the FPM
tool
The purpose of this vignette is to lead the reader through an example
analysis of real-world data using the core functions of
RFPM
rather than go into great detail on any particular
function. For more function details, see the help files by using
?
or help()
(or by navigating in RStudio to
the Packages tab, selecting RFPM from the list of available packages,
and then selecting the function of interest).
The main user interface within RFPM
is via the
FPM
function. To use this function, the user must first
prepare a sediment chemistry and toxicity data.frame that contains one
or more chemical columns and a single toxicity column called
Hit
. Several example datasets are provided with
RFPM
with this structure, for example
h.northport
, which we will evaluate repeatedly in this
vignette. Other datasets include the empirical c.northport
and h.tristate
as well as simulated FPM data
perfect
, lowNoise
, and
highNoise
.
Other key functions that the user is directed to include
optimFPM
, cvFPM
, and chemVI
which
can help the user to explore and refine FPM inputs and optimize FPM
outputs.
In this vignette, we focus on evaluating a case study dataset called
h.northport
, which includes a data.frame of bulk sediment
chemistry metals concentrations (mg/kg) and Hit
data. Hits
are logical results indicating whether a particular sample was deemed
toxic (TRUE
) or not (FALSE
) based on the
Hyalella azteca biomass endpoint in 28-day laboratory
exposures. The definition of a toxic Hit is up to the practitioner and
not a part of the model. Data must be input to FPM
in a
cross-tab (wide) format, meaning each sediment chemical has a unique
column and each sample has just one row.
head(h.northport)
#> Al As Cu Cd Cr Fe Pb Hg Ni Zn Organism
#> 442 8640 8.38 396.0 3.740 34.6 56900 226.0 0.159 13.20 4300 HA
#> 443 5890 7.69 167.0 4.860 18.7 28500 219.0 0.247 14.00 1780 HA
#> 444 10200 11.10 913.0 0.727 50.8 83300 151.0 0.051 9.91 6490 HA
#> 445 15800 14.60 1630.0 0.688 100.0 161000 211.0 0.016 13.10 13600 HA
#> 446 5950 2.55 30.4 0.721 11.5 14600 63.8 0.020 11.20 484 HA
#> 447 8370 8.74 644.0 0.774 39.3 62400 99.2 0.027 9.68 4740 HA
#> Meas_Day Endpoint Hit
#> 442 28 Biomass FALSE
#> 443 28 Biomass FALSE
#> 444 28 Biomass TRUE
#> 445 28 Biomass TRUE
#> 446 28 Biomass FALSE
#> 447 28 Biomass FALSE
To run FPM
, you will need to specify, at a minimum, the
data.frame object and the column names associated with sediment
chemistry data. As noted above, FPM
also requires that the
data.frame include a column called Hit
. Columns other than
Hit
and those specified using the paramList
argument are ignored.
<- names(h.northport)[1:10] ## all chemical column names
p.northport FPM(data = h.northport, paramList = p.northport) ## minimum input - dataset and chemical column names
#> $FPM
#> Cu Cr Fe Zn FN_crit TP FN TN FP pFN pFP HR NR PHR PNR
#> 1 2740 26.7 27400 738 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.49 0.745
#> FHR FNR OR
#> 1 0.51 0.255 0.571
#>
#> $chemDensity
#> Cu Cr Fe Zn
#> 1 1.36e-05 0.988 1 0.991
The output of FPM
displays three types of data:
- The sediment quality benchmark for each of the chemicals selected by
FPM
as significant (meaning that concentrations when
Hit == TRUE
are significantly greater than when
Hit == FALSE
).
- Selected FN_crit
value, the user-defined limit on
benchmark conservatism (default = 0.2
)
- Classification metrics: e.g., false negatives rate (pFN
),
false positive rate (pFP
), and overall reliability
(OR
)
This is technically all that was required to run the FPM on this
example dataset. There is of course a lot going on behind the scenes.
Functions that were used within FPM include:
- chemSig
: chemical selection algorithm, which evaluates
distribution assumptions of normality and equal variance, then applies
appropriate hypothesis test methods to identify significant differences
between concentrations when Hit == TRUE
and
Hit == FALSE
. The output of chemSig
is a
logical vector indicating which chemicals are significant (therefore
selected by FPM
for generating benchmarks).
- chemSigSelect
: a function that uses chemSig
but instead returns the chemistry data for significant chemicals and has
options for plotting chemistry data distributions to compare
Hit == TRUE
and Hit == FALSE
subsets. Setting
plot = TRUE
in FPM
turns on plot outputs (by
passing that argument to chemSigSelect
).
As an example, see the following figures generated by
plot.chemSigSelect
(identical to those generated by
FPM
when plot = TRUE
) for a significant
chemical (Cr
), shown with blue points, and a
non-significant chemical (Al
), shown with green points:
plot(chemSigSelect(h.northport[, c("Al", "Cr", "Hit")], paramList = c("Al", "Cr")), type = "boxplot")
There are also many arguments that can be adjusted in
FPM
that affect how chemSig
and
chemSigSelect
function or how the FPM algorithm itself
functions. Users can adjust the test methods/assumptions for selecting
chemicals, force the FPM to accept the assumptions of the original
Excel-based function (i.e., ExcelMode == TRUE
), etc.
See ?FPM
and ?chemSig
for more
information.
In RFPM
, there are currently two optimization functions
available:
- optimFPM
is intended to find an optimal input (to
FPM
) of the FN_crit
and/or alpha
parameter. These optimal levels are based on the full dataset (input as
data
argument) and, therefore, may be over-specific to that
data. The FN_crit
and alpha
arguments should
be input either as ranges or single values depending on how you want the
optimization to run. Inputting a single value for FN_crit
and a range for alpha
would optimize the alpha
value only and vice-versa. Inputting ranges for both values results in
slightly more complex output owing to the 2-dimensional
optimization.
- cvFPM
is currently parameterized to optimize
FN_crit
inputs using a cross-validation (CV) approach. The
result is a “best” value that tries to account for uncertainty in future
data. The user can adjust the k
parameter to change the
number of folds in the CV algorithm. This affects the smoothness of the
mean/median optimization curves by generating more hypothetical
datapoints; it also is likely to expand the min/max range of
optimization values by providing more chances for extreme outcomes. By
using larger training datasets (i.e., with larger k
inputs), the CV-based optimum will converge toward the empirical
optimization curve generated by optimFPM
(and shown in the
output of cvFPM
). The value of k
is
constrained (e.g., the leave-one-out CV method is impossible here) by
the requirement to have at least 3 Hit and 3 No-hit data in the test and
training datasets, else FPM
and chemSig
cannot
run within the cvFPM
algorithm. Thus, we recommend not
increasing k
by a large amount unless data
is
also very large.
This is an example of the empirical optimization approach with
optimFPM
:
## one-way optimization of the FN Limit - vertical lines show best values based on two metrics
optimFPM(h.northport, p.northport, FN_crit = seq(0.1, 0.9, 0.05), alpha = 0.05)
#> balance_FN.FP max_OR
#> 0.4 0.6
## two-way optimization of both FN Limit and alpha - black squares show best values based on two metrics
optimFPM(h.northport, p.northport,
FN_crit = seq(0.1, 0.9, 0.05),
alpha = seq(0.01, 0.2, 0.01))
#> balance_FN.FP max_OR
#> FN_crit 0.40 0.25
#> alpha 0.01 0.11
The resulting plots show a mix of potential outcomes. When only one
parameter is being optimized (i.e.,
length(alpha) == 1 | length(FN_crit) == 1
), then the first
plot is produced, showing the rates of FNs, FPs, and OR. Vertical lines
identify optimal values: either the highest OR value (lowest
alpha
or FN_crit
value when there are ties) or
the point at which the FN and FP rates are most similar (i.e., balanced
under- and overprediction of toxicity).
In the case that two parameters are being optimized together (i.e.,
length(alpha) > 1 & length(FN_crit) > 1
), then a
matrix of results is provided representing with heat colors to indicate
the optimization outcome (yellow = suboptimal, red = optimal). Squares
indicate the selected values, which are also output into the console. A
black square indicates which value applies to the stated optimization
statistic; the gray square is the value selected according to the
alternative statistic (i.e., the second figure).
This is an example of the CV-based optimization approach with
cvFPM
and 10-folds:
cvFPM(h.northport, p.northport, k = 10, FN_crit = seq(0.1, 0.9, 0.05), alpha = 0.05)
#> $optim_FN
#> balance_FN.FP max_OR
#> 0.50 0.65
Two figures are generated showing the optimization statistics from
multiple CV runs (as well as the empirical result from
optimFPM
- the “actual” value) and how stable the
classification metrics may be at various model inputs. The first figure
shows the optimization of the overall reliability; the second reduces
the difference between FP and FN rates, similar to
optimFPM
. Optimized outputs attempt to account for the
mean, median, maximum, and minimum results (among CV runs)
altogether.
Based on the output from either optimFPM
or
cvFPM
, the practitioner may decide to adjust and rerun the
FPM
using optimized inputs. The decision of whether this is
appropriate will depend on the specific management scenario. For
example, the optimal FN Limit could end up being 80%, but that lack of
conservatism may be unacceptable to a regulator. Similarly, the optimal
alpha could be 0.5, meaning that the regulator would need to accept a
50% probability of error when selecting significant chemicals; this is a
very high uncertainty, 10-fold higher than typical. Regardless,
optimization provides useful context to the RFPM
user.
Using both optimFPM
and cvFPM
is recommended,
because they can provide somewhat different or complementary results
that facilitate decision making about subsequent FPM
runs.
The chemVI
function outputs several potentially useful
metrics for evaluating individual metals in the sediment quality
benchmark set. These metrics are:
- chemDensity
: how little the benchmark “floated” within
the FPM algorithm. High density values correspond to benchmarks that
remained at a relatively low concentrations (i.e., did not float up),
suggesting relative importance
- MADP
: average change in other chemical’s benchmarks
resulting from removal of a chemical from data
- dOR
: change in the overall reliability resulting from
removal of the chemical from data
The MADP
and dOR
metrics are the more
useful for identifying important chemicals; for example:
chemVI(h.northport, p.northport)
#> chemDensity MADP dOR
#> Cu 0.001 0.00 0.0
#> Cr 98.800 996.00 2.0
#> Fe 100.000 8.52 0.0
#> Zn 99.100 32.50 0.6
Here we see that copper (Cu
) has an MADP
and dOR
of 0
, meaning that there was no change
in the benchmarks or their ability to predict toxicity after removing
copper. Clearly copper is, therefore, not important in this case and
could be excluded from further consideration. However, that
isn’t to say that copper isn’t related to toxicity; it’s very important
to keep in mind that the FPM is a classification tool that attempts to
accurately predict Hit
values.
The significant chemicals in this list are mostly well correlated, so
toxicity could be related to one, all, or perhaps none of the chemicals
(i.e., an unmeasured but related chemical or non-chemical stressor). The
inclusion of new chemicals in (or exclusion of important chemicals from)
data
can result in markedly different benchmarks and
chemVI
metrics (particularly for the
chemDensity
metric). Therefore, we recommend pre-screening
data
to remove chemicals that are unlikely to cause
toxicity (e.g., due to poor bioavailability, low toxicity, or uniformly
low concentrations). These chemicals, while potentially “significant”
with respect to having higher concentrations when
Hit == TRUE
, may lead to problems down the road when
predicting toxicity more generally. In other words, try to the extent
possible to establish a weight of evidence pointing toward causation
rather than letting pure correlation lead your analysis. In the current
example, we perhaps should have pre-screened chemicals like Fe and Al,
which we expected to have low toxicity at natural levels. Al was
ultimately screened out by FPM
as non-significant, but Fe
was significant and retained.
chemVI
can be helpful with respect to developing a
parsimonious benchark set by identifying benchmarks that do not affect
prediction at the site. In our h.northport
example,
Cu
and Fe
will be removed because they don’t
affect toxicity predictions (either positively or negatively). Whenever
any chemicals would be removed from consideration through an iterative
FPM benchmark development process, FPM
must be rerun to
generate new benchmarks. Compare the two outputs:
FPM(data = h.northport, paramList = p.northport)
#> $FPM
#> Cu Cr Fe Zn FN_crit TP FN TN FP pFN pFP HR NR PHR PNR
#> 1 2740 26.7 27400 738 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.49 0.745
#> FHR FNR OR
#> 1 0.51 0.255 0.571
#>
#> $chemDensity
#> Cu Cr Fe Zn
#> 1 1.36e-05 0.988 1 0.991
FPM(data = h.northport, paramList = c("Cr", "Zn"))
#> $FPM
#> Cr Zn FN_crit TP FN TN FP pFN pFP HR NR PHR PNR FHR FNR
#> 1 26.1 910 0.2 49 12 35 51 0.197 0.593 0.803 0.407 0.49 0.745 0.51 0.255
#> OR
#> 1 0.571
#>
#> $chemDensity
#> Cr Zn
#> 1 1 0.983
For the FPM benchmarks generated for h.northport
, the
MADP
for Fe was >0, and the benchmark for Zn shifted
after removing Fe from consideration. Although the benchmark changed,
the classification errors did not, as reflected by the unchanged
OR
. The final FPM benchmark set including only
Cr
and Zn
is, therefore, the most
parsimonious, by which we mean that it classifies Hit
values just as accurately as the larger set of benchmarks but does so
with fewer benchmarks.
You might have noticed that we didn’t use the optimized inputs in the
example above for chemVI
. We omitted that to provide an
example where relatively unimportant chemicals could be screened out
using the metrics generated by chemVI
. If we instead change
the inputs of chemVI
to the optimized FN_crit
and alpha
(using the max_OR
optimization
approach), the chemVI
results look like this instead:
FPM(h.northport, p.northport, FN_crit = 0.25, alpha = 0.11)$FPM
#> Al Cu Cr Fe Pb Zn FN_crit TP FN TN FP pFN pFP HR NR
#> 1 24400 131 131 247250 904 1320 0.25 46 15 53 33 0.246 0.384 0.754 0.616
#> PHR PNR FHR FNR OR
#> 1 0.582 0.779 0.418 0.221 0.673
chemVI(h.northport, p.northport, FN_crit = 0.25, alpha = 0.11)
#> chemDensity MADP dOR
#> Al 60.200 435.0 12.2
#> Cu 99.500 13.4 3.4
#> Cr 0.512 0.0 0.0
#> Fe 1.060 0.0 0.0
#> Pb 0.090 0.0 0.0
#> Zn 100.000 22.1 13.6
In this case, adjusting alpha has allowed for 2 new chemicals to be
added, Al and Pb, which were not selected when
alpha = 0.05
. After including these chemicals (and slightly
adjusting the FN_crit
), the final OR
increased
to 67%, a 10% improvement in benchmark accuracy over the default values
(not shown). Based on the results of chemVI
shown above,
there is a marked shift in the relative importance of metals for
toxicity predictions. Removing any of the chemicals would result in a
roughly 10% reduction in accuracy (based on the dOR
metric), and removing Cu
now would have an effect on the
other benchmarks values (as shown by the MADP
metric
>0). The chemDensity
value for Cr
has
dropped to <1%, meaning that the benchmark is now close to the
maximum possible value, whereas it had one of the highest
chemDensity
values before.
As you can see, variable importance is highly dependent on the set of benchmarks, not just individual benchmarks in isolation. The addition of new chemicals had a big impact not only on the relative importance of each chemical but the overall reliability of the set of chemicals and on the FPM benchmark values themselves.
We want to emphasize several findings from our analyses of the
characteristics, behavior, and sensitivity of the FPM
(the
subject of two anticipated manuscripts).
FPM benchmarks should be treated as a set of values, not
values to be used singularly or mixed and matched between runs of the
FPM
using different species, toxicity test endpoints,
sites, etc. There may be a desire or an inclination to select the most
sensitive species and test endpoints among chemicals. Doing so, however,
would invalidate the various assumptions made when generating FPM
benchmarks, including the projected FN Limit (i.e., pFN
)
and the accuracy of benchmarks (i.e., OR
). If there are
multiple species that need to be accounted for by the model, consider
different approaches to defining Hit
values that capture
multiple species and/or endpoints rather prior to running
FPM
. The final set of FPM benchmarks should then be used as
a group to predict Hits. If highly protective benchmarks are desired,
then we recommend running the FPM
for all species and
endpoints of interest, selecting the endpoint with the greatest degree
of effect, and then either running the FPM
using that
endpoint to define Hits
, or, alternatively, running
FPM
on all endpoints and selecting theh single lowest set
of benchmarks among endpoints. Lastly, the FN_crit
input
(i.e., FN Limit) can be decreased to increase benchmark conservatism (at
the possible expense of accuracy).
FPM benchmarks should be considered as predictive of toxicity
classifications (i.e., Hit
values) rather than
continuous toxicity test results (e.g., 10% mortality, 27% biomass,
etc.). For this reason, we would generally argue against the application
of FPM benchmarks within a Natural Resource Damage Assessment (NRDA)
context. We recognize that future approaches might be proposed for
applying FPM benchmarks in a NRDA context; while theoretically possible,
such attempts should be considered carefully, particularly with regard
to how Hits are defined (and whether the definition reasonably
corresponds with definitions of injury).
FPM benchmarks are most powerful when they are site-specific. By
using site toxicity and sediment chemistry data, and by implicitly
considering factors like bioavailability, FPM benchmarks should improve
upon existing default benchmarks (Ecology 2011). Therefore, we recommend
using RFPM
and site-specific data to the extent practicable
to derive new benchmarks.
While it is most common to generate benchmarks based on bulk sediment concentrations, it is not necessary that this be the case. For example, benchmarks could reflect organic carbon-normalized sediment concentrations, pore water concentrations, etc. These measurements might improve the relevance of toxicity predictions by taking bioavailability more explicitly into account. Attempts to generate benchmarks based on these types of data simply complicate the application of the benchmarks by requiring all future samples be analyzed and concentrations reported in the same way.
While there may be a concern about mixing chemicals of different
types and mechanisms/modes of action (e.g., metals and polycyclic
aromatic hydrocarbons) when generating FPM benchmarks, it does not seem
to us that such a distinction should matter. From a conceptual
standpoint, the FPM could be used to predict any variable that is
distilled to a TRUE
/FALSE
classification from
a set of one or more independent variables (chemical or otherwise). The
key limitations that arise when mixing chemicals (and/or non-chemical
stressors) are:
FPM
via data
need to have a common directionality with respect to Hits
.
In other words, toxicity is expected to increase as, for example, copper
increases. Therefore, you could not also include pore water acidity,
which increases toxicity at both high and low pH. This limitation is
related to the chemical selection process, which (by default) assumes
that concentrations will be higher among toxic samples. It also relates
to the iteratively increasing nature of the FPM
algorithm;
concentrations start at a low level and then shift one direction
(up).h.northport
dataset presented above (using default
FPM
inputs), Fe was correlated with other metals (not
shown) and, as a result, was selected for inclusion among the benchmarks
despite being relatively non-toxic and contributing not at all to the
predictive accuracy of the benchmark set. As described above,
chemVI
can help to identify potential simplifications to
the benchmark set when correlated variables lead to unnecessary
complexity. The user is also free to pre-screen their data to remove
redundant variables.FPM
has been found to work poorly or not at all with
missing data. Consider removing analytes with incomplete datasets, omit
samples with partial analysis, and/or impute values to fill data
gaps.
FPM
does not currently address non-detection in a
meaningful way. As the model is largely based on percentiles and ranges
rather than averages, this shouldn’t generally lead to major problems as
long as there isn’t a high degree of non-detection (e.g., >20%). We
leave it to the user to address how non-detects will be handled and to
use FPM benchmarks based on non-detected data with caution. High degrees
of non-detection of a chemical is a good reason for excluding it from
data
(or paramList
), as there is not a
reasonable way to use such a chemical to predict Hits. The use of more
advanced censored data imputation methods (e.g., Kaplan-Meier or
Regression on Order Statistics [ROS]) may provide useful, but should
also be used with care; using these methods (in addition to treating
non-detects as half of detection limits or as zero) have the potential
to result in benchmarks that fall below detection limits. We would argue
that it is unreasonable to apply benchmarks that fall at or below
detection limits.
As with other toxicity-based benchmarks, it may be important to
consider background chemical concentrations and reference area toxicity
levels when developing and applying FPM
benchmarks.
Reference area toxicity can be incorporated explicitly into
Hits
definitions by normalizing toxicity data to the
reference condition prior to classifying Hits
, and
background can be used to qualify chemical exceedances of FPM benchmarks
that fall below background levels.
Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA.
Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.