This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.
library(PHEindicatormethods)
library(dplyr)
This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.
Function | Type | Description |
---|---|---|
phe_proportion | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_rate | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_mean | Aggregate | Performs a calculation on each grouping set |
phe_dsr | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
calculate_ISRatio | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
calculate_ISRate | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:
<- data.frame(
df area = rep(c("Area1","Area2","Area3","Area4"), 2),
year = rep(2015:2016, each = 4),
obs = sample(100, 2 * 4, replace = TRUE),
pop = sample(100:200, 2 * 4, replace = TRUE))
df#> area year obs pop
#> 1 Area1 2015 9 102
#> 2 Area2 2015 54 177
#> 3 Area3 2015 88 139
#> 4 Area4 2015 84 158
#> 5 Area1 2016 99 135
#> 6 Area2 2016 93 138
#> 7 Area3 2016 28 162
#> 8 Area4 2016 55 198
INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.
OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified
# default proportion
phe_proportion(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 9 102 0.08823529 0.04711556 0.1592446 95% proportion of 1
#> 2 Area2 2015 54 177 0.30508475 0.24198941 0.3764609 95% proportion of 1
#> 3 Area3 2015 88 139 0.63309353 0.55039583 0.7086326 95% proportion of 1
#> 4 Area4 2015 84 158 0.53164557 0.45401284 0.6077760 95% proportion of 1
#> 5 Area1 2016 99 135 0.73333333 0.65303779 0.8007172 95% proportion of 1
#> 6 Area2 2016 93 138 0.67391304 0.59191304 0.7464930 95% proportion of 1
#> 7 Area3 2016 28 162 0.17283951 0.12237447 0.2384609 95% proportion of 1
#> 8 Area4 2016 55 198 0.27777778 0.22007129 0.3439430 95% proportion of 1
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 9 102 0.08823529 0.03332059 0.2136507 99.8% proportion of 1
#> 2 Area2 2015 54 177 0.30508475 0.21040918 0.4197159 99.8% proportion of 1
#> 3 Area3 2015 88 139 0.63309353 0.50203957 0.7470356 99.8% proportion of 1
#> 4 Area4 2015 84 158 0.53164557 0.41069915 0.6489847 99.8% proportion of 1
#> 5 Area1 2016 99 135 0.73333333 0.60321518 0.8326216 99.8% proportion of 1
#> 6 Area2 2016 93 138 0.67391304 0.54286833 0.7824461 99.8% proportion of 1
#> 7 Area3 2016 28 162 0.17283951 0.10000154 0.2821011 99.8% proportion of 1
#> 8 Area4 2016 55 198 0.27777778 0.19138399 0.3846208 99.8% proportion of 1
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic method
#> 1 Area1 2015 9 102 8.823529 4.711556 15.92446 95% percentage Wilson
#> 2 Area2 2015 54 177 30.508475 24.198941 37.64609 95% percentage Wilson
#> 3 Area3 2015 88 139 63.309353 55.039583 70.86326 95% percentage Wilson
#> 4 Area4 2015 84 158 53.164557 45.401284 60.77760 95% percentage Wilson
#> 5 Area1 2016 99 135 73.333333 65.303779 80.07172 95% percentage Wilson
#> 6 Area2 2016 93 138 67.391304 59.191304 74.64930 95% percentage Wilson
#> 7 Area3 2016 28 162 17.283951 12.237447 23.84609 95% percentage Wilson
#> 8 Area4 2016 55 198 27.777778 22.007129 34.39430 95% percentage Wilson
# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic method
#> 1 Area1 2015 9 102 8.823529 3.332059 21.36507 99.8% percentage Wilson
#> 2 Area2 2015 54 177 30.508475 21.040918 41.97159 99.8% percentage Wilson
#> 3 Area3 2015 88 139 63.309353 50.203957 74.70356 99.8% percentage Wilson
#> 4 Area4 2015 84 158 53.164557 41.069915 64.89847 99.8% percentage Wilson
#> 5 Area1 2016 99 135 73.333333 60.321518 83.26216 99.8% percentage Wilson
#> 6 Area2 2016 93 138 67.391304 54.286833 78.24461 99.8% percentage Wilson
#> 7 Area3 2016 28 162 17.283951 10.000154 28.21011 99.8% percentage Wilson
#> 8 Area4 2016 55 198 27.777778 19.138399 38.46208 99.8% percentage Wilson
# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 9 102 8.823529 3.332059 21.36507
#> 2 Area2 2015 54 177 30.508475 21.040918 41.97159
#> 3 Area3 2015 88 139 63.309353 50.203957 74.70356
#> 4 Area4 2015 84 158 53.164557 41.069915 64.89847
#> 5 Area1 2016 99 135 73.333333 60.321518 83.26216
#> 6 Area2 2016 93 138 67.391304 54.286833 78.24461
#> 7 Area3 2016 28 162 17.283951 10.000154 28.21011
#> 8 Area4 2016 55 198 27.777778 19.138399 38.46208
# default rate
phe_rate(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 9 102 8823.529 4034.68 16749.81 95% rate per 100000
#> 2 Area2 2015 54 177 30508.475 22917.36 39807.71 95% rate per 100000
#> 3 Area3 2015 88 139 63309.353 50774.49 77999.75 95% rate per 100000
#> 4 Area4 2015 84 158 53164.557 42404.81 65822.10 95% rate per 100000
#> 5 Area1 2016 99 135 73333.333 59600.31 89281.55 95% rate per 100000
#> 6 Area2 2016 93 138 67391.304 54392.05 82559.75 95% rate per 100000
#> 7 Area3 2016 28 162 17283.951 11482.52 24981.06 95% rate per 100000
#> 8 Area4 2016 55 198 27777.778 20924.66 36157.28 95% rate per 100000
#> method
#> 1 Exact
#> 2 Byars
#> 3 Byars
#> 4 Byars
#> 5 Byars
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 9 102 8.823529 2.404338 22.21311 99.8% rate per 100
#> 2 Area2 2015 54 177 30.508475 19.254307 45.65917 99.8% rate per 100
#> 3 Area3 2015 88 139 63.309353 44.470464 87.08100 99.8% rate per 100
#> 4 Area4 2015 84 158 53.164557 37.012185 73.65871 99.8% rate per 100
#> 5 Area1 2016 99 135 73.333333 52.635818 99.10346 99.8% rate per 100
#> 6 Area2 2016 93 138 67.391304 47.828369 91.91981 99.8% rate per 100
#> 7 Area3 2016 28 162 17.283951 8.894771 29.97285 99.8% rate per 100
#> 8 Area4 2016 55 198 27.777778 17.611838 41.42614 99.8% rate per 100
#> method
#> 1 Exact
#> 2 Byars
#> 3 Byars
#> 4 Byars
#> 5 Byars
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 9 102 8.823529 2.404338 22.21311
#> 2 Area2 2015 54 177 30.508475 19.254307 45.65917
#> 3 Area3 2015 88 139 63.309353 44.470464 87.08100
#> 4 Area4 2015 84 158 53.164557 37.012185 73.65871
#> 5 Area1 2016 99 135 73.333333 52.635818 99.10346
#> 6 Area2 2016 93 138 67.391304 47.828369 91.91981
#> 7 Area3 2016 28 162 17.283951 8.894771 29.97285
#> 8 Area4 2016 55 198 27.777778 17.611838 41.42614
These functions can also return aggregate data if the input dataframes are grouped:
# default proportion - grouped
%>%
df group_by(year) %>%
phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups: year [2]
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 235 576 0.408 0.369 0.449 95% proportion of 1 Wilson
#> 2 2016 275 633 0.434 0.396 0.473 95% proportion of 1 Wilson
# default rate - grouped
%>%
df group_by(year) %>%
phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups: year [2]
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 235 576 40799. 35748. 46362. 95% rate per 100000 Byars
#> 2 2016 275 633 43444. 38460. 48894. 95% rate per 100000 Byars
The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.
The df test data generated earlier can be used to demonstrate phe_mean:
INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified
# default mean
phe_mean(df,obs)
#> value_sum value_count stdev value lowercl uppercl confidence statistic
#> 1 510 8 32.82747 63.75 36.30555 91.19445 95% mean
#> method
#> 1 Student's t-distribution
# multiple means in a single execution with 99.8% confidence
%>%
df group_by(year) %>%
phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups: year [2]
#> year value_sum value_count stdev value lowercl uppercl confi…¹ stati…² method
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 235 4 36.5 58.8 -128. 245. 99.8% mean Stude…
#> 2 2016 275 4 33.4 68.8 -102. 239. 99.8% mean Stude…
#> # … with abbreviated variable names ¹confidence, ²statistic
# multiple means in a single execution with 99.8% confidence and data-only output
%>%
df group_by(year) %>%
phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups: year [2]
#> year value_sum value_count stdev value lowercl uppercl
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2015 235 4 36.5 58.8 -128. 245.
#> 2 2016 275 4 33.4 68.8 -102. 239.
The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:
<- data.frame(
df_std area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
year = rep(2006:2010, each = 19 * 2),
sex = rep(rep(c("Male", "Female"), each = 19), 5),
ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
50,55,60,65,70,75,80,85,90), times = 10),
obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#> area year sex ageband obs pop
#> 1 Area1 2006 Male 0 24 10192
#> 2 Area1 2006 Male 5 169 18681
#> 3 Area1 2006 Male 10 158 14702
#> 4 Area1 2006 Male 15 99 11868
#> 5 Area1 2006 Male 20 2 13844
#> 6 Area1 2006 Male 25 109 19556
INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:
The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.
The function can also accept standard populations provided as a column within the input data frame.
standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order
standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified
# calculate separate dsrs for each area, year and sex
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2007 269260 756. 721. 793. 95% dsr pe…
#> 2 Area1 2006 Male 1698 288766 559. 531. 588. 95% dsr pe…
#> 3 Area1 2007 Female 2232 276230 875. 836. 915. 95% dsr pe…
#> 4 Area1 2007 Male 2180 296054 724. 693. 757. 95% dsr pe…
#> 5 Area1 2008 Female 1720 292559 650. 617. 684. 95% dsr pe…
#> 6 Area1 2008 Male 2178 290502 733. 701. 767. 95% dsr pe…
#> 7 Area1 2009 Female 1673 314807 556. 529. 585. 95% dsr pe…
#> 8 Area1 2009 Male 1730 273165 625. 593. 658. 95% dsr pe…
#> 9 Area1 2010 Female 1991 281760 698. 665. 731. 95% dsr pe…
#> 10 Area1 2010 Male 2151 279641 804. 767. 843. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 2007 269260 756. 721. 793.
#> 2 Area1 2006 Male 1698 288766 559. 531. 588.
#> 3 Area1 2007 Female 2232 276230 875. 836. 915.
#> 4 Area1 2007 Male 2180 296054 724. 693. 757.
#> 5 Area1 2008 Female 1720 292559 650. 617. 684.
#> 6 Area1 2008 Male 2178 290502 733. 701. 767.
#> 7 Area1 2009 Female 1673 314807 556. 529. 585.
#> 8 Area1 2009 Male 1730 273165 625. 593. 658.
#> 9 Area1 2010 Female 1991 281760 698. 665. 731.
#> 10 Area1 2010 Male 2151 279641 804. 767. 843.
#> # … with 30 more rows
# calculate same specifying standard population in vector form
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2007 269260 756. 721. 793. 95% dsr pe…
#> 2 Area1 2006 Male 1698 288766 559. 531. 588. 95% dsr pe…
#> 3 Area1 2007 Female 2232 276230 875. 836. 915. 95% dsr pe…
#> 4 Area1 2007 Male 2180 296054 724. 693. 757. 95% dsr pe…
#> 5 Area1 2008 Female 1720 292559 650. 617. 684. 95% dsr pe…
#> 6 Area1 2008 Male 2178 290502 733. 701. 767. 95% dsr pe…
#> 7 Area1 2009 Female 1673 314807 556. 529. 585. 95% dsr pe…
#> 8 Area1 2009 Male 1730 273165 625. 593. 658. 95% dsr pe…
#> 9 Area1 2010 Female 1991 281760 698. 665. 731. 95% dsr pe…
#> 10 Area1 2010 Male 2151 279641 804. 767. 843. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate the same dsrs by appending the standard populations to the data frame
%>%
df_std mutate(refpop = rep(esp2013,40)) %>%
group_by(area, year, sex) %>%
phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2007 269260 756. 721. 793. 95% dsr pe…
#> 2 Area1 2006 Male 1698 288766 559. 531. 588. 95% dsr pe…
#> 3 Area1 2007 Female 2232 276230 875. 836. 915. 95% dsr pe…
#> 4 Area1 2007 Male 2180 296054 724. 693. 757. 95% dsr pe…
#> 5 Area1 2008 Female 1720 292559 650. 617. 684. 95% dsr pe…
#> 6 Area1 2008 Male 2178 290502 733. 701. 767. 95% dsr pe…
#> 7 Area1 2009 Female 1673 314807 556. 529. 585. 95% dsr pe…
#> 8 Area1 2009 Male 1730 273165 625. 593. 658. 95% dsr pe…
#> 9 Area1 2010 Female 1991 281760 698. 665. 731. 95% dsr pe…
#> 10 Area1 2010 Male 2151 279641 804. 767. 843. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
%>%
df_std filter(ageband <= 70) %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 1545 215014 742. 704. 781. 95% dsr pe…
#> 2 Area1 2006 Male 1314 226680 551. 521. 583. 95% dsr pe…
#> 3 Area1 2007 Female 1750 214719 886. 844. 929. 95% dsr pe…
#> 4 Area1 2007 Male 1745 244313 714. 680. 749. 95% dsr pe…
#> 5 Area1 2008 Female 1383 234043 648. 613. 685. 95% dsr pe…
#> 6 Area1 2008 Male 1661 235481 716. 681. 752. 95% dsr pe…
#> 7 Area1 2009 Female 1385 250593 556. 527. 587. 95% dsr pe…
#> 8 Area1 2009 Male 1235 214445 616. 582. 652. 95% dsr pe…
#> 9 Area1 2010 Female 1393 224130 642. 608. 677. 95% dsr pe…
#> 10 Area1 2010 Male 1623 210733 805. 765. 846. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate separate dsrs for persons for each area and year)
%>%
df_std group_by(area, year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups: area, year [20]
#> area year total_count total_…¹ value lowercl uppercl confi…² stati…³ method
#> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 Area1 2006 3705 558026 651. 628. 674. 95% dsr pe… Dobson
#> 2 Area1 2007 4412 572284 755. 732. 779. 95% dsr pe… Dobson
#> 3 Area1 2008 3898 583061 671. 649. 694. 95% dsr pe… Dobson
#> 4 Area1 2009 3403 587972 573. 552. 594. 95% dsr pe… Dobson
#> 5 Area1 2010 4142 561401 734. 710. 758. 95% dsr pe… Dobson
#> 6 Area2 2006 3504 587960 590. 570. 612. 95% dsr pe… Dobson
#> 7 Area2 2007 3894 570559 705. 681. 728. 95% dsr pe… Dobson
#> 8 Area2 2008 3672 599158 606. 585. 627. 95% dsr pe… Dobson
#> 9 Area2 2009 3812 586282 648. 626. 670. 95% dsr pe… Dobson
#> 10 Area2 2010 3500 595776 616. 595. 638. 95% dsr pe… Dobson
#> 11 Area3 2006 4077 568050 747. 723. 772. 95% dsr pe… Dobson
#> 12 Area3 2007 4709 569532 800. 776. 824. 95% dsr pe… Dobson
#> 13 Area3 2008 3382 612962 543. 524. 563. 95% dsr pe… Dobson
#> 14 Area3 2009 3568 576734 659. 636. 682. 95% dsr pe… Dobson
#> 15 Area3 2010 3862 565964 696. 673. 719. 95% dsr pe… Dobson
#> 16 Area4 2006 3694 556898 697. 673. 722. 95% dsr pe… Dobson
#> 17 Area4 2007 4137 560678 759. 735. 784. 95% dsr pe… Dobson
#> 18 Area4 2008 3641 539819 657. 634. 680. 95% dsr pe… Dobson
#> 19 Area4 2009 4329 579105 749. 726. 774. 95% dsr pe… Dobson
#> 20 Area4 2010 3604 546257 637. 615. 660. 95% dsr pe… Dobson
#> # … with abbreviated variable names ¹total_pop, ²confidence, ³statistic
INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.
The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:
reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.
reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple smrs/isrs then the reference data will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:
<- df_std %>%
df_ref filter(year == 2006) %>%
group_by(ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last")
head(df_ref)
#> # A tibble: 6 × 3
#> ageband obs pop
#> <dbl> <int> <int>
#> 1 0 790 103962
#> 2 5 810 144855
#> 3 10 707 128045
#> 4 15 507 126804
#> 5 20 423 119120
#> 6 25 962 121688
Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified
# calculate separate smrs for each area, year and sex
%>%
df_std group_by(area, year, sex) %>%
calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex observed expected value lowercl uppercl confidence stati…¹
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2007 1746. 1.15 1.10 1.20 95% indire…
#> 2 Area1 2006 Male 1698 1946. 0.872 0.831 0.915 95% indire…
#> 3 Area1 2007 Female 2232 1810. 1.23 1.18 1.29 95% indire…
#> 4 Area1 2007 Male 2180 1972. 1.11 1.06 1.15 95% indire…
#> 5 Area1 2008 Female 1720 1875. 0.917 0.874 0.961 95% indire…
#> 6 Area1 2008 Male 2178 1937. 1.12 1.08 1.17 95% indire…
#> 7 Area1 2009 Female 1673 2072. 0.808 0.769 0.847 95% indire…
#> 8 Area1 2009 Male 1730 1843. 0.938 0.895 0.984 95% indire…
#> 9 Area1 2010 Female 1991 1898. 1.05 1.00 1.10 95% indire…
#> 10 Area1 2010 Male 2151 1848. 1.16 1.12 1.21 95% indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # name ¹statistic
# calculate the same smrs by appending the reference data to the data frame
%>%
df_std mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex observed expected value lowercl uppercl confidence stati…¹
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2007 1746. 1.15 1.10 1.20 95% indire…
#> 2 Area1 2006 Male 1698 1946. 0.872 0.831 0.915 95% indire…
#> 3 Area1 2007 Female 2232 1810. 1.23 1.18 1.29 95% indire…
#> 4 Area1 2007 Male 2180 1972. 1.11 1.06 1.15 95% indire…
#> 5 Area1 2008 Female 1720 1875. 0.917 0.874 0.961 95% indire…
#> 6 Area1 2008 Male 2178 1937. 1.12 1.08 1.17 95% indire…
#> 7 Area1 2009 Female 1673 2072. 0.808 0.769 0.847 95% indire…
#> 8 Area1 2009 Male 1730 1843. 0.938 0.895 0.984 95% indire…
#> 9 Area1 2010 Female 1991 1898. 1.05 1.00 1.10 95% indire…
#> 10 Area1 2010 Male 2151 1848. 1.16 1.12 1.21 95% indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # name ¹statistic
# calculate separate smrs for each year and drop metadata columns from output
%>%
df_std group_by(year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 × 6
#> # Groups: year [5]
#> year observed expected value lowercl uppercl
#> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2006 14980 14980 1 0.984 1.02
#> 2 2007 17152 15070. 1.14 1.12 1.16
#> 3 2008 14593 15459. 0.944 0.929 0.959
#> 4 2009 15112 15414. 0.980 0.965 0.996
#> 5 2010 15108 15127. 0.999 0.983 1.01
The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:
# calculate separate isrs for each area, year and sex
%>%
df_std group_by(area, year, sex) %>%
calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups: area, year, sex [40]
#> area year sex observed expected ref_rate value lowercl uppercl confide…¹
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 2007 1746. 660. 758. 726. 792. 95%
#> 2 Area1 2006 Male 1698 1946. 660. 576. 548. 604. 95%
#> 3 Area1 2007 Female 2232 1810. 660. 814. 780. 848. 95%
#> 4 Area1 2007 Male 2180 1972. 660. 729. 699. 761. 95%
#> 5 Area1 2008 Female 1720 1875. 660. 605. 577. 634. 95%
#> 6 Area1 2008 Male 2178 1937. 660. 742. 711. 774. 95%
#> 7 Area1 2009 Female 1673 2072. 660. 533. 507. 559. 95%
#> 8 Area1 2009 Male 1730 1843. 660. 619. 590. 649. 95%
#> 9 Area1 2010 Female 1991 1898. 660. 692. 662. 723. 95%
#> 10 Area1 2010 Male 2151 1848. 660. 768. 736. 801. 95%
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> # abbreviated variable name ¹confidence
# calculate the same isrs by appending the reference data to the data frame
%>%
df_std mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 × 12
#> # Groups: area, year, sex [40]
#> area year sex observed expected ref_rate value lowercl uppercl confide…¹
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 2007 1746. 660. 758. 726. 792. 95%
#> 2 Area1 2006 Male 1698 1946. 660. 576. 548. 604. 95%
#> 3 Area1 2007 Female 2232 1810. 660. 814. 780. 848. 95%
#> 4 Area1 2007 Male 2180 1972. 660. 729. 699. 761. 95%
#> 5 Area1 2008 Female 1720 1875. 660. 605. 577. 634. 95%
#> 6 Area1 2008 Male 2178 1937. 660. 742. 711. 774. 95%
#> 7 Area1 2009 Female 1673 2072. 660. 533. 507. 559. 95%
#> 8 Area1 2009 Male 1730 1843. 660. 619. 590. 649. 95%
#> 9 Area1 2010 Female 1991 1898. 660. 692. 662. 723. 95%
#> 10 Area1 2010 Male 2151 1848. 660. 768. 736. 801. 95%
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> # abbreviated variable name ¹confidence
# calculate separate isrs for each year and drop metadata columns from output
%>%
df_std group_by(year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 × 7
#> # Groups: year [5]
#> year observed expected ref_rate value lowercl uppercl
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2006 14980 14980 660. 660. 649. 670.
#> 2 2007 17152 15070. 660. 751. 740. 762.
#> 3 2008 14593 15459. 660. 623. 613. 633.
#> 4 2009 15112 15414. 660. 647. 636. 657.
#> 5 2010 15108 15127. 660. 659. 648. 669.