This vignette demonstrates how to use the rfars
package to download FARS data and use it to compare crash characteristics and trends in rural and urban areas. First we download the data, then make a convenient adjustment to the given coding scheme. Rural and urban crashes and fatalities are then compared on many different dimensions.
Below we get three years of data for one state, Virginia.
<- get_fars(years = 2018:2020, states = "VA") myFARS
Here we make an adjustment to the standard variable definitions to label all people driving motorcycles as motorcyclists rather than drivers of motorcycles. This makes it easier to refer to motorcyclists later. Note that the myFARS
object is a list, with the flat
tibble containing the required variables: body_typ
(vehicle body type) and per_typ
(person type).
$flat$per_typ <-
myFARSifelse(grepl("motorcycle",
$flat$body_typ,
myFARSignore.case = TRUE),
"Motorcyclist",
$flat$per_typ) myFARS
The counts()
function makes it easy to generate specific types of counts. Here we create a function to run two counts, differentiated by urbanicity, and then stack the results for easy graphing.
<- function(myFARS, what, involved=NULL){
compare_counts
bind_rows(
counts(myFARS, what=what, involved=involved, where="rural") %>%
mutate(where = "Rural"),
counts(myFARS, what=what, involved=involved, where="urban") %>%
mutate(where = "Urban")
%>%
) return()
}
The number of crashes is a reasonable starting point. Below we use our compare_counts()
function and ggplot()
to plot the annual count of crashes and fatalities in rural and urban areas, from 2015 to 2020.
compare_counts(myFARS, "crashes") %>%
ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1) +
facet_wrap(.~where) +
labs(x=NULL, y=NULL, title = "Crashes", fill=NULL)
compare_counts(myFARS, "fatalities") %>%
ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1) +
facet_wrap(.~where) +
labs(x=NULL, y=NULL, title = "Fatalities", fill=NULL)
involved
Argumentcounts()
makes it easy to hone in on specific crash types by using the involved
argument. It can be any of: distracted driver, drowsy driver, police pursuit, motorcycle, pedalcyclist, bicyclist, pedestrian, pedbike, young driver, older driver, speeding, alcohol, drugs, hit and run, roadway departure, rollover, or large trucks. Specifying involved
will filter the counts to those matching the criterion. For example involved="distracted driver"
will return counts associated with crashes involving a distracted driver. Multiple values can be supplied; if so, the resulting counts will satisfy all criteria. That is, they are combined with the and operator (as opposed to or).
Below we loop through all options available in rfars
and generate simple plots.
<- c("distracted driver", "drowsy driver",
crashfactors "police pursuit", "motorcycle", "pedalcyclist",
"bicyclist", "pedestrian", "pedbike",
"young driver", "older driver", "speeding",
"alcohol", "drugs", "hit and run",
"roadway departure", "rollover", "large trucks"
)
for(crashfactor in crashfactors){
<-
p compare_counts(myFARS, "fatalities", involved = crashfactor) %>%
ggplot(aes(x=year, y=n, label=scales::comma(n))) +
geom_col(position="dodge") +
facet_wrap(.~where) +
geom_label(position = position_dodge(.9), vjust=1) +
labs(title = paste0("Fatalities: ", crashfactor))
print(p)
}
#> Note: Young drivers are defined as those between the ages of 15 and 20.
#> Note: Young drivers are defined as those between the ages of 15 and 20.
#> Note: Older drivers are defined as those aged 65+.
#> Note: Older drivers are defined as those aged 65+.
filterOnly
OptionThe counts()
function has a filterOnly
option, which returns pre-summarized data fitting the other specifications (what, where, etc.). This can be useful when generating custom counts. For example, acc_type
(Crash Type) is a vehicle-level variable. To count the number of crashes by acc_type
, we need to prevent over-counting (as there will be one value for acc_type
for each vehicle involved in each crash). Below we take the value associated with veh_no
1 (vehicle number 1). This is reasonable, but may not be appropriate for all analysis situations.
bind_rows(
counts(myFARS,
what = "crashes",
where = "rural",
filterOnly = TRUE
%>%
) filter(veh_no==1) %>% #crash type is on the vehicle-level, this prevents over-counting
select(id, year, acc_type) %>% unique() %>% group_by(acc_type, year) %>% summarize(n=n()) %>%
mutate(where = "Rural"),
counts(myFARS,
what = "crashes",
where = "urban",
filterOnly = TRUE
%>%
) filter(veh_no==1) %>%
select(id, year, acc_type) %>% unique() %>% group_by(acc_type, year) %>% summarize(n=n()) %>%
mutate(where = "Urban")
%>%
) filter(!is.na(acc_type)) %>%
group_by(where, acc_type) %>% summarize(n=sum(n, na.rm=TRUE)) %>%
::pivot_wider(names_from = "where", values_from = "n") %>%
tidyrmutate(Total = Urban + Rural,
rural_pct = Rural/Total) %>%
arrange(desc(Total)) %>%
slice(1:20) %>%
arrange(desc(rural_pct)) %>%
mutate(acc_type = reorder(acc_type, rural_pct)) %>%
ggplot(aes(y=acc_type, x=rural_pct, fill=Rural, label=scales::percent(rural_pct, accuracy = 1))) +
geom_col() +
geom_label(hjust=1, fill="white") +
scale_fill_continuous(labels=scales::comma) +
labs(x=NULL, y=NULL,
title = "20 Most Common Crash Types by Prevalence in Rural Areas") +
theme(plot.title.position = "plot")
#> `summarise()` has grouped output by 'acc_type'. You can override using the
#> `.groups` argument.
#> `summarise()` has grouped output by 'acc_type'. You can override using the
#> `.groups` argument.
#> `summarise()` has grouped output by 'where'. You can override using the
#> `.groups` argument.
flat
TibbleIt is sometimes easiest to access the data directly, rather than with the counts()
function. The object created by use_fars()
is a list with five elements, all tibbles: flat
, multi_acc
, multi_veh
, multi_per
, and events
. The flat
data frame contains over 200 variables, and can often provide what’s needed.
Below are several examples:
$flat %>%
myFARSmutate(
vprofile = ifelse(vprofile %in% c("Uphill", "Downhill"), "Up/downhill", vprofile),
valign = ifelse(grepl("Curve", valign), "Curve", valign)
%>%
) filter(veh_no == 1, #to avoid over-counting
%in% c("Rural", "Urban"),
rur_urb %in% c("Straight", "Curve"),
valign !(vprofile %in% c("Unknown", "Reported as Unknown", "Not Reported"))
%>%
) select(id, vprofile, valign, rur_urb) %>% unique() %>%
group_by(vprofile, valign, rur_urb) %>%
summarize(n = n()) %>%
ggplot(aes(x=valign, y=vprofile, fill=n, label=scales::comma(n))) +
#geom_tile() +
facet_wrap(.~rur_urb) +
::scale_fill_viridis() +
viridisgeom_label() +
labs(title = "Roadway Profile and Alignment")
#> `summarise()` has grouped output by 'vprofile', 'valign'. You can override
#> using the `.groups` argument.
$flat %>%
myFARSfilter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, per_typ) %>%
summarise(n=n()) %>%
filter(n>2) %>%
mutate(per_typ = stringr::str_wrap(per_typ, 15)) %>%
ggplot(aes(x=per_typ, y=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
geom_label(vjust=1, position = position_dodge(.9)) +
labs(title = "Fatalities by Person Type and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
$flat %>%
myFARSfilter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, sex) %>%
summarise(n=n()) %>%
filter(n>90) %>%
mutate(sex = stringr::str_wrap(sex, 15)) %>%
ggplot(aes(x=sex, y=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
geom_label(vjust=1, position = position_dodge(.9)) +
labs(title = "Fatalities by Sex and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
$flat %>%
myFARSfilter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, hispanic) %>%
summarise(n=n()) %>%
filter(n>10) %>%
mutate(hispanic = stringr::str_wrap(hispanic, 15)) %>%
ggplot(aes(x=hispanic, y=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
geom_label(vjust=1, position = position_dodge(.9)) +
labs(title = "Fatalities by Ethnicity and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
$flat %>%
myFARSfilter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
filter(!(per_typ %in% c("Bicyclist", "Pedestrian"))) %>%
group_by(rur_urb, body_typ) %>%
summarise(n=n()) %>%
filter(n>30) %>%
mutate(body_typ = stringr::str_wrap(body_typ, 80)) %>%
ggplot(aes(y=body_typ, x=n, fill=rur_urb, label=scales::comma(n, accuracy = 1))) +
geom_col(position = "dodge") +
geom_label(hjust=1, position = position_dodge(.9)) +
labs(title = "Fatalities by Vehicle Type and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
$flat %>%
myFARSfilter(grepl("(K)", inj_sev),
%in% c("Rural", "Urban")) %>%
rur_urb mutate(age_n = gsub("\\D+","", age) %>% as.numeric()) %>%
group_by(rur_urb, age_n) %>% summarize(n=n()) %>%
filter(age_n <=90) %>%
ggplot(aes(x=age_n, y=n, color = rur_urb)) +
geom_line(size=1.2, alpha=.8) +
labs(title = "Fatalities by Age and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
$flat %>%
myFARSmutate(age_n = gsub("\\D+","", age) %>% as.numeric()) %>%
filter(grepl("(K)", inj_sev),
%in% c("Rural", "Urban"),
rur_urb < 25,
hour <= 90) %>%
age_n group_by(rur_urb, age_n, hour) %>% summarize(n=n()) %>%
ggplot(aes(x=hour, y=age_n, fill=n)) +
geom_tile() +
facet_wrap(.~rur_urb) +
::scale_fill_viridis() +
viridislabs(title = "Fatalities by Age, Time of Day, and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb', 'age_n'. You can override using
#> the `.groups` argument.
multi_per
TibbleIf the flat
tibble does not have the required information, it may be in one of the multi_
tibbles. Below, we access the multi_per
tibble to visualize fatalities by race.
$multi_per %>%
myFARSfilter(name == "race") %>%
select(state, st_case, veh_no, per_no, year, race=value) %>%
inner_join(myFARS$flat) %>%
filter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, race) %>%
summarise(n=n()) %>%
filter(n>9) %>%
mutate(race = stringr::str_wrap(race, 15)) %>%
ggplot(aes(x=race, y=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
geom_label(vjust=1, position = position_dodge(.9))
#> Joining, by = c("state", "st_case", "veh_no", "per_no", "year")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.