openair is an R package developed for the purpose of analysing air quality data — or more generally atmospheric composition data. The package is extensively used in academia, the public and private sectors. The project was initially funded by the UK Natural Environment Research Council (NERC), with additional funds from Defra.
The most up to date information on openair
can be found
in the package itself and at the book website (https://bookdown.org/david_carslaw/openair/).
Installation can be done in the normal way:
install.packages("openair")
The development version can be installed from GitHub. Installation of
openair
from GitHub is easy using the devtools
package. Note, because openair
contains C++ code a compiler
is also needed. For Windows - for example, Rtools is
needed.
# install.packages("devtools")
::install_github("davidcarslaw/openair") devtools
openair
has developed over several years to help analyse
air quality data.
This package continues to develop and input from other developers would be welcome. A summary of some of the features are:
importAURN
and
family functions.timeAverage
and selectByDate
to make it easier to manipulate
atmospheric composition data.windRose
and pollutionRose
.openair
type
option available in most functions.type
option to easily evaluate models by season, hour of the day etc. These
include key model statistics, Taylor Diagram, Conditional Quantile
plots.It is easy to import hourly data from 100s of sites and to import several sites at one time and several years of data.
library(openair)
<- importAURN(site = "kc1", year = 2020)
kc1
kc1#> # A tibble: 8,784 × 14
#> site code date co nox no2 no o3 so2 pm10
#> <chr> <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 London N… KC1 2020-01-01 00:00:00 0.214 64.8 46.2 12.1 1.13 NA 41.8
#> 2 London N… KC1 2020-01-01 01:00:00 0.237 74.1 45.0 19.0 1.20 NA 43
#> 3 London N… KC1 2020-01-01 02:00:00 0.204 60.5 41.4 12.4 1.50 NA 46.0
#> 4 London N… KC1 2020-01-01 03:00:00 0.204 53.5 39.8 8.93 1.60 NA 48.5
#> 5 London N… KC1 2020-01-01 04:00:00 0.169 37.7 33.6 2.63 5.79 NA 45.4
#> 6 London N… KC1 2020-01-01 05:00:00 0.160 43.3 36.8 4.25 6.09 NA 43.8
#> 7 London N… KC1 2020-01-01 06:00:00 0.157 48.2 39.4 5.76 2.74 NA 43.6
#> 8 London N… KC1 2020-01-01 07:00:00 0.178 60.5 44.7 10.3 1.20 NA 42.1
#> 9 London N… KC1 2020-01-01 08:00:00 0.233 71.8 47.9 15.6 2.25 NA 41.8
#> 10 London N… KC1 2020-01-01 09:00:00 0.329 128. 46.9 53.2 2.25 NA 37.6
#> # … with 8,774 more rows, and 4 more variables: pm2.5 <dbl>, ws <dbl>,
#> # wd <dbl>, air_temp <dbl>
Using the selectByDate
function it is easy to select
quite complex time-based periods. For example, to select weekday (Monday
to Friday) data from June to September for 2012 and for the
hours 7am to 7pm inclusive:
<- selectByDate(kc1,
sub day = "weekday",
year = 2020,
month = 6:9,
hour = 7:19
)
sub#> # A tibble: 1,144 × 14
#> date site code co nox no2 no o3 so2 pm10
#> <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2020-06-01 07:00:00 London … KC1 0.125 23.1 16.8 4.14 56.5 2.29 14.8
#> 2 2020-06-01 08:00:00 London … KC1 0.133 25.2 17.8 4.79 61.7 2.68 17.8
#> 3 2020-06-01 09:00:00 London … KC1 0.119 15.6 12.2 2.22 75.8 2.35 15.4
#> 4 2020-06-01 10:00:00 London … KC1 0.104 13.8 11.1 1.79 87.1 1.57 15.2
#> 5 2020-06-01 11:00:00 London … KC1 0.0956 14.0 11.8 1.46 96.7 1.44 11.4
#> 6 2020-06-01 12:00:00 London … KC1 0.0985 11.3 9.97 0.893 106. 1.44 11.9
#> 7 2020-06-01 13:00:00 London … KC1 0.0927 11.0 9.64 0.893 112. 2.03 11.8
#> 8 2020-06-01 14:00:00 London … KC1 0.0927 12.5 10.8 1.14 114. 2.81 11.9
#> 9 2020-06-01 15:00:00 London … KC1 0.0811 10.7 9.48 0.822 115. 2.88 10
#> 10 2020-06-01 16:00:00 London … KC1 0.0898 13.9 11.9 1.29 104. 2.22 9.1
#> # … with 1,134 more rows, and 4 more variables: pm2.5 <dbl>, ws <dbl>,
#> # wd <dbl>, air_temp <dbl>
Similarly it is easy to time-average data in many flexible ways. For example, 2-week means can be calculated as
<- timeAverage(kc1, avg.time = "2 week") sub2
type
optionOne of the key aspects of openair
is the use of the
type
option, which is available for almost all
openair
functions. The type
option partitions
data by different categories of variable. There are many built-in
options that type
can take based on splitting your data by
different date values. A summary of in-built values of type are:
hemisphere
option that can be either
“northern” (default) or “southern”longitude
and latitude
wd
) is available
type = "wd"
will split the data up into 8 sectors: N, NE,
E, SE, S, SW, W, NW.type = "season"
will just split the data into
four seasons regardless of the year.If a categorical variable is present in a data frame
e.g. site
then that variables can be used directly
e.g. type = "site"
.
type
can also be a numeric variable. In this case the
numeric variable is split up into 4 quantiles i.e. four
partitions containing equal numbers of points. Note the user can supply
the option n.levels
to indicate how many quantiles to
use.
openair
can plot basic wind roses very easily provided
the variables ws
(wind speed) and wd
(wind
direction) are available.
windRose(mydata)
However, the real flexibility comes from being able to use the
type
option.
windRose(mydata,
type = "year",
layout = c(4, 2)
)
There are many flavours of bivariate polar plots, as described here that are useful for understanding air pollution sources.
polarPlot(mydata,
pollutant = "so2",
statistic = "cpf",
percentile = 90,
cols = "YlGnBu"
)