Introduction to PHEindicatormethods

Georgina Anderson

2022-11-30

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015   9 102
#> 2 Area2 2015  54 177
#> 3 Area3 2015  88 139
#> 4 Area4 2015  84 158
#> 5 Area1 2016  99 135
#> 6 Area2 2016  93 138
#> 7 Area3 2016  28 162
#> 8 Area4 2016  55 198

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop      value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015   9 102 0.08823529 0.04711556 0.1592446        95% proportion of 1
#> 2 Area2 2015  54 177 0.30508475 0.24198941 0.3764609        95% proportion of 1
#> 3 Area3 2015  88 139 0.63309353 0.55039583 0.7086326        95% proportion of 1
#> 4 Area4 2015  84 158 0.53164557 0.45401284 0.6077760        95% proportion of 1
#> 5 Area1 2016  99 135 0.73333333 0.65303779 0.8007172        95% proportion of 1
#> 6 Area2 2016  93 138 0.67391304 0.59191304 0.7464930        95% proportion of 1
#> 7 Area3 2016  28 162 0.17283951 0.12237447 0.2384609        95% proportion of 1
#> 8 Area4 2016  55 198 0.27777778 0.22007129 0.3439430        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop      value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015   9 102 0.08823529 0.03332059 0.2136507      99.8% proportion of 1
#> 2 Area2 2015  54 177 0.30508475 0.21040918 0.4197159      99.8% proportion of 1
#> 3 Area3 2015  88 139 0.63309353 0.50203957 0.7470356      99.8% proportion of 1
#> 4 Area4 2015  84 158 0.53164557 0.41069915 0.6489847      99.8% proportion of 1
#> 5 Area1 2016  99 135 0.73333333 0.60321518 0.8326216      99.8% proportion of 1
#> 6 Area2 2016  93 138 0.67391304 0.54286833 0.7824461      99.8% proportion of 1
#> 7 Area3 2016  28 162 0.17283951 0.10000154 0.2821011      99.8% proportion of 1
#> 8 Area4 2016  55 198 0.27777778 0.19138399 0.3846208      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop     value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015   9 102  8.823529  4.711556 15.92446        95% percentage Wilson
#> 2 Area2 2015  54 177 30.508475 24.198941 37.64609        95% percentage Wilson
#> 3 Area3 2015  88 139 63.309353 55.039583 70.86326        95% percentage Wilson
#> 4 Area4 2015  84 158 53.164557 45.401284 60.77760        95% percentage Wilson
#> 5 Area1 2016  99 135 73.333333 65.303779 80.07172        95% percentage Wilson
#> 6 Area2 2016  93 138 67.391304 59.191304 74.64930        95% percentage Wilson
#> 7 Area3 2016  28 162 17.283951 12.237447 23.84609        95% percentage Wilson
#> 8 Area4 2016  55 198 27.777778 22.007129 34.39430        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015   9 102  8.823529  3.332059 21.36507      99.8% percentage Wilson
#> 2 Area2 2015  54 177 30.508475 21.040918 41.97159      99.8% percentage Wilson
#> 3 Area3 2015  88 139 63.309353 50.203957 74.70356      99.8% percentage Wilson
#> 4 Area4 2015  84 158 53.164557 41.069915 64.89847      99.8% percentage Wilson
#> 5 Area1 2016  99 135 73.333333 60.321518 83.26216      99.8% percentage Wilson
#> 6 Area2 2016  93 138 67.391304 54.286833 78.24461      99.8% percentage Wilson
#> 7 Area3 2016  28 162 17.283951 10.000154 28.21011      99.8% percentage Wilson
#> 8 Area4 2016  55 198 27.777778 19.138399 38.46208      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop     value   lowercl  uppercl
#> 1 Area1 2015   9 102  8.823529  3.332059 21.36507
#> 2 Area2 2015  54 177 30.508475 21.040918 41.97159
#> 3 Area3 2015  88 139 63.309353 50.203957 74.70356
#> 4 Area4 2015  84 158 53.164557 41.069915 64.89847
#> 5 Area1 2016  99 135 73.333333 60.321518 83.26216
#> 6 Area2 2016  93 138 67.391304 54.286833 78.24461
#> 7 Area3 2016  28 162 17.283951 10.000154 28.21011
#> 8 Area4 2016  55 198 27.777778 19.138399 38.46208

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop     value  lowercl  uppercl confidence       statistic
#> 1 Area1 2015   9 102  8823.529  4034.68 16749.81        95% rate per 100000
#> 2 Area2 2015  54 177 30508.475 22917.36 39807.71        95% rate per 100000
#> 3 Area3 2015  88 139 63309.353 50774.49 77999.75        95% rate per 100000
#> 4 Area4 2015  84 158 53164.557 42404.81 65822.10        95% rate per 100000
#> 5 Area1 2016  99 135 73333.333 59600.31 89281.55        95% rate per 100000
#> 6 Area2 2016  93 138 67391.304 54392.05 82559.75        95% rate per 100000
#> 7 Area3 2016  28 162 17283.951 11482.52 24981.06        95% rate per 100000
#> 8 Area4 2016  55 198 27777.778 20924.66 36157.28        95% rate per 100000
#>   method
#> 1  Exact
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value   lowercl  uppercl confidence    statistic
#> 1 Area1 2015   9 102  8.823529  2.404338 22.21311      99.8% rate per 100
#> 2 Area2 2015  54 177 30.508475 19.254307 45.65917      99.8% rate per 100
#> 3 Area3 2015  88 139 63.309353 44.470464 87.08100      99.8% rate per 100
#> 4 Area4 2015  84 158 53.164557 37.012185 73.65871      99.8% rate per 100
#> 5 Area1 2016  99 135 73.333333 52.635818 99.10346      99.8% rate per 100
#> 6 Area2 2016  93 138 67.391304 47.828369 91.91981      99.8% rate per 100
#> 7 Area3 2016  28 162 17.283951  8.894771 29.97285      99.8% rate per 100
#> 8 Area4 2016  55 198 27.777778 17.611838 41.42614      99.8% rate per 100
#>   method
#> 1  Exact
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop     value   lowercl  uppercl
#> 1 Area1 2015   9 102  8.823529  2.404338 22.21311
#> 2 Area2 2015  54 177 30.508475 19.254307 45.65917
#> 3 Area3 2015  88 139 63.309353 44.470464 87.08100
#> 4 Area4 2015  84 158 53.164557 37.012185 73.65871
#> 5 Area1 2016  99 135 73.333333 52.635818 99.10346
#> 6 Area2 2016  93 138 67.391304 47.828369 91.91981
#> 7 Area3 2016  28 162 17.283951  8.894771 29.97285
#> 8 Area4 2016  55 198 27.777778 17.611838 41.42614

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   235   576 0.408   0.369   0.449 95%        proportion of 1 Wilson
#> 2  2016   275   633 0.434   0.396   0.473 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   235   576 40799.  35748.  46362. 95%        rate per 100000 Byars 
#> 2  2016   275   633 43444.  38460.  48894. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev value  lowercl  uppercl confidence statistic
#> 1       510           8 32.82747 63.75 36.30555 91.19445        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confi…¹ stati…² method
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#> 1  2015       235           4  36.5  58.8   -128.    245. 99.8%   mean    Stude…
#> 2  2016       275           4  33.4  68.8   -102.    239. 99.8%   mean    Stude…
#> # … with abbreviated variable names ¹​confidence, ²​statistic

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       235           4  36.5  58.8   -128.    245.
#> 2  2016       275           4  33.4  68.8   -102.    239.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  24 10192
#> 2 Area1 2006 Male       5 169 18681
#> 3 Area1 2006 Male      10 158 14702
#> 4 Area1 2006 Male      15  99 11868
#> 5 Area1 2006 Male      20   2 13844
#> 6 Area1 2006 Male      25 109 19556

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        2007   269260  756.    721.    793. 95%     dsr pe…
#>  2 Area1  2006 Male          1698   288766  559.    531.    588. 95%     dsr pe…
#>  3 Area1  2007 Female        2232   276230  875.    836.    915. 95%     dsr pe…
#>  4 Area1  2007 Male          2180   296054  724.    693.    757. 95%     dsr pe…
#>  5 Area1  2008 Female        1720   292559  650.    617.    684. 95%     dsr pe…
#>  6 Area1  2008 Male          2178   290502  733.    701.    767. 95%     dsr pe…
#>  7 Area1  2009 Female        1673   314807  556.    529.    585. 95%     dsr pe…
#>  8 Area1  2009 Male          1730   273165  625.    593.    658. 95%     dsr pe…
#>  9 Area1  2010 Female        1991   281760  698.    665.    731. 95%     dsr pe…
#> 10 Area1  2010 Male          2151   279641  804.    767.    843. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        2007    269260  756.    721.    793.
#>  2 Area1  2006 Male          1698    288766  559.    531.    588.
#>  3 Area1  2007 Female        2232    276230  875.    836.    915.
#>  4 Area1  2007 Male          2180    296054  724.    693.    757.
#>  5 Area1  2008 Female        1720    292559  650.    617.    684.
#>  6 Area1  2008 Male          2178    290502  733.    701.    767.
#>  7 Area1  2009 Female        1673    314807  556.    529.    585.
#>  8 Area1  2009 Male          1730    273165  625.    593.    658.
#>  9 Area1  2010 Female        1991    281760  698.    665.    731.
#> 10 Area1  2010 Male          2151    279641  804.    767.    843.
#> # … with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        2007   269260  756.    721.    793. 95%     dsr pe…
#>  2 Area1  2006 Male          1698   288766  559.    531.    588. 95%     dsr pe…
#>  3 Area1  2007 Female        2232   276230  875.    836.    915. 95%     dsr pe…
#>  4 Area1  2007 Male          2180   296054  724.    693.    757. 95%     dsr pe…
#>  5 Area1  2008 Female        1720   292559  650.    617.    684. 95%     dsr pe…
#>  6 Area1  2008 Male          2178   290502  733.    701.    767. 95%     dsr pe…
#>  7 Area1  2009 Female        1673   314807  556.    529.    585. 95%     dsr pe…
#>  8 Area1  2009 Male          1730   273165  625.    593.    658. 95%     dsr pe…
#>  9 Area1  2010 Female        1991   281760  698.    665.    731. 95%     dsr pe…
#> 10 Area1  2010 Male          2151   279641  804.    767.    843. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        2007   269260  756.    721.    793. 95%     dsr pe…
#>  2 Area1  2006 Male          1698   288766  559.    531.    588. 95%     dsr pe…
#>  3 Area1  2007 Female        2232   276230  875.    836.    915. 95%     dsr pe…
#>  4 Area1  2007 Male          2180   296054  724.    693.    757. 95%     dsr pe…
#>  5 Area1  2008 Female        1720   292559  650.    617.    684. 95%     dsr pe…
#>  6 Area1  2008 Male          2178   290502  733.    701.    767. 95%     dsr pe…
#>  7 Area1  2009 Female        1673   314807  556.    529.    585. 95%     dsr pe…
#>  8 Area1  2009 Male          1730   273165  625.    593.    658. 95%     dsr pe…
#>  9 Area1  2010 Female        1991   281760  698.    665.    731. 95%     dsr pe…
#> 10 Area1  2010 Male          2151   279641  804.    767.    843. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1545   215014  742.    704.    781. 95%     dsr pe…
#>  2 Area1  2006 Male          1314   226680  551.    521.    583. 95%     dsr pe…
#>  3 Area1  2007 Female        1750   214719  886.    844.    929. 95%     dsr pe…
#>  4 Area1  2007 Male          1745   244313  714.    680.    749. 95%     dsr pe…
#>  5 Area1  2008 Female        1383   234043  648.    613.    685. 95%     dsr pe…
#>  6 Area1  2008 Male          1661   235481  716.    681.    752. 95%     dsr pe…
#>  7 Area1  2009 Female        1385   250593  556.    527.    587. 95%     dsr pe…
#>  8 Area1  2009 Male          1235   214445  616.    582.    652. 95%     dsr pe…
#>  9 Area1  2010 Female        1393   224130  642.    608.    677. 95%     dsr pe…
#> 10 Area1  2010 Male          1623   210733  805.    765.    846. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups:   area, year [20]
#>    area   year total_count total_…¹ value lowercl uppercl confi…² stati…³ method
#>    <chr> <int>       <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#>  1 Area1  2006        3705   558026  651.    628.    674. 95%     dsr pe… Dobson
#>  2 Area1  2007        4412   572284  755.    732.    779. 95%     dsr pe… Dobson
#>  3 Area1  2008        3898   583061  671.    649.    694. 95%     dsr pe… Dobson
#>  4 Area1  2009        3403   587972  573.    552.    594. 95%     dsr pe… Dobson
#>  5 Area1  2010        4142   561401  734.    710.    758. 95%     dsr pe… Dobson
#>  6 Area2  2006        3504   587960  590.    570.    612. 95%     dsr pe… Dobson
#>  7 Area2  2007        3894   570559  705.    681.    728. 95%     dsr pe… Dobson
#>  8 Area2  2008        3672   599158  606.    585.    627. 95%     dsr pe… Dobson
#>  9 Area2  2009        3812   586282  648.    626.    670. 95%     dsr pe… Dobson
#> 10 Area2  2010        3500   595776  616.    595.    638. 95%     dsr pe… Dobson
#> 11 Area3  2006        4077   568050  747.    723.    772. 95%     dsr pe… Dobson
#> 12 Area3  2007        4709   569532  800.    776.    824. 95%     dsr pe… Dobson
#> 13 Area3  2008        3382   612962  543.    524.    563. 95%     dsr pe… Dobson
#> 14 Area3  2009        3568   576734  659.    636.    682. 95%     dsr pe… Dobson
#> 15 Area3  2010        3862   565964  696.    673.    719. 95%     dsr pe… Dobson
#> 16 Area4  2006        3694   556898  697.    673.    722. 95%     dsr pe… Dobson
#> 17 Area4  2007        4137   560678  759.    735.    784. 95%     dsr pe… Dobson
#> 18 Area4  2008        3641   539819  657.    634.    680. 95%     dsr pe… Dobson
#> 19 Area4  2009        4329   579105  749.    726.    774. 95%     dsr pe… Dobson
#> 20 Area4  2010        3604   546257  637.    615.    660. 95%     dsr pe… Dobson
#> # … with abbreviated variable names ¹​total_pop, ²​confidence, ³​statistic

Execute calculate_ISRatio and calculate_ISRate

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   790 103962
#> 2       5   810 144855
#> 3      10   707 128045
#> 4      15   507 126804
#> 5      20   423 119120
#> 6      25   962 121688

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence stati…¹
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>  
#>  1 Area1  2006 Female     2007    1746. 1.15    1.10    1.20  95%        indire…
#>  2 Area1  2006 Male       1698    1946. 0.872   0.831   0.915 95%        indire…
#>  3 Area1  2007 Female     2232    1810. 1.23    1.18    1.29  95%        indire…
#>  4 Area1  2007 Male       2180    1972. 1.11    1.06    1.15  95%        indire…
#>  5 Area1  2008 Female     1720    1875. 0.917   0.874   0.961 95%        indire…
#>  6 Area1  2008 Male       2178    1937. 1.12    1.08    1.17  95%        indire…
#>  7 Area1  2009 Female     1673    2072. 0.808   0.769   0.847 95%        indire…
#>  8 Area1  2009 Male       1730    1843. 0.938   0.895   0.984 95%        indire…
#>  9 Area1  2010 Female     1991    1898. 1.05    1.00    1.10  95%        indire…
#> 10 Area1  2010 Male       2151    1848. 1.16    1.12    1.21  95%        indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   name ¹​statistic

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence stati…¹
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>  
#>  1 Area1  2006 Female     2007    1746. 1.15    1.10    1.20  95%        indire…
#>  2 Area1  2006 Male       1698    1946. 0.872   0.831   0.915 95%        indire…
#>  3 Area1  2007 Female     2232    1810. 1.23    1.18    1.29  95%        indire…
#>  4 Area1  2007 Male       2180    1972. 1.11    1.06    1.15  95%        indire…
#>  5 Area1  2008 Female     1720    1875. 0.917   0.874   0.961 95%        indire…
#>  6 Area1  2008 Male       2178    1937. 1.12    1.08    1.17  95%        indire…
#>  7 Area1  2009 Female     1673    2072. 0.808   0.769   0.847 95%        indire…
#>  8 Area1  2009 Male       1730    1843. 0.938   0.895   0.984 95%        indire…
#>  9 Area1  2010 Female     1991    1898. 1.05    1.00    1.10  95%        indire…
#> 10 Area1  2010 Male       2151    1848. 1.16    1.12    1.21  95%        indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   name ¹​statistic

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 × 6
#> # Groups:   year [5]
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    14980   14980  1       0.984   1.02 
#> 2  2007    17152   15070. 1.14    1.12    1.16 
#> 3  2008    14593   15459. 0.944   0.929   0.959
#> 4  2009    15112   15414. 0.980   0.965   0.996
#> 5  2010    15108   15127. 0.999   0.983   1.01

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl confide…¹
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>    
#>  1 Area1  2006 Female     2007    1746.     660.  758.    726.    792. 95%      
#>  2 Area1  2006 Male       1698    1946.     660.  576.    548.    604. 95%      
#>  3 Area1  2007 Female     2232    1810.     660.  814.    780.    848. 95%      
#>  4 Area1  2007 Male       2180    1972.     660.  729.    699.    761. 95%      
#>  5 Area1  2008 Female     1720    1875.     660.  605.    577.    634. 95%      
#>  6 Area1  2008 Male       2178    1937.     660.  742.    711.    774. 95%      
#>  7 Area1  2009 Female     1673    2072.     660.  533.    507.    559. 95%      
#>  8 Area1  2009 Male       1730    1843.     660.  619.    590.    649. 95%      
#>  9 Area1  2010 Female     1991    1898.     660.  692.    662.    723. 95%      
#> 10 Area1  2010 Male       2151    1848.     660.  768.    736.    801. 95%      
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> #   abbreviated variable name ¹​confidence

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl confide…¹
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>    
#>  1 Area1  2006 Female     2007    1746.     660.  758.    726.    792. 95%      
#>  2 Area1  2006 Male       1698    1946.     660.  576.    548.    604. 95%      
#>  3 Area1  2007 Female     2232    1810.     660.  814.    780.    848. 95%      
#>  4 Area1  2007 Male       2180    1972.     660.  729.    699.    761. 95%      
#>  5 Area1  2008 Female     1720    1875.     660.  605.    577.    634. 95%      
#>  6 Area1  2008 Male       2178    1937.     660.  742.    711.    774. 95%      
#>  7 Area1  2009 Female     1673    2072.     660.  533.    507.    559. 95%      
#>  8 Area1  2009 Male       1730    1843.     660.  619.    590.    649. 95%      
#>  9 Area1  2010 Female     1991    1898.     660.  692.    662.    723. 95%      
#> 10 Area1  2010 Male       2151    1848.     660.  768.    736.    801. 95%      
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> #   abbreviated variable name ¹​confidence

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")  %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 × 7
#> # Groups:   year [5]
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    14980   14980      660.  660.    649.    670.
#> 2  2007    17152   15070.     660.  751.    740.    762.
#> 3  2008    14593   15459.     660.  623.    613.    633.
#> 4  2009    15112   15414.     660.  647.    636.    657.
#> 5  2010    15108   15127.     660.  659.    648.    669.