The goal of rfars
is to simplify the process of
analyzing FARS data. FARS stands for Fatality
Analysis Reporting System, the census of fatal crashes in the United
States maintained by the National
Highway Traffic Safety Administration. The Fatality and Injury Reporting System
Tool allows users to generate queries, and can produce simple tables
and graphs. This suffices for simple analysis, but often leaves
researchers wanting more. Digging any deeper, however, involves a
time-consuming process of downloading annual ZIP files and attempting to
stitch them together - after first combing through the immense data
dictionary to determine the required variables and table names.
rfars
allows users to download FARS data back to 2015 with
just one line of code. The result is a full, rich dataset ready for
mapping, modeling, and other downstream analysis. Helper functions are
also provided to produce common counts and comparisons.
A companion package rfarsplus
(currently in development)
will provide exposure data and facilitate the calculation of various
rates.
You can install the latest version of rfars
from GitHub with:
# install.packages("devtools")
::install_github("s87jackson/rfars") devtools
Then load the required packages:
Use the get_fars()
function to bring FARS data into the
current environment. This is done by (a) downloading the data to a
temporary directory, (b) downloading to a permanent directory, or (c)
importing from a permanent directory. After data is downloaded to a
permanent directory, the function will look there rather than
downloading the data again. If a year of data is requested but not
found, R will ask your permission to download the missing data.
Here we get three years of data for Virginia:
<- get_fars(years = 2018:2020, states = "VA") myFARS
This returns a ‘FARS’ object: a list with five tibbles:
flat
, multi_acc
, multi_veh
,
multi_per
, and events
.
The flat
tibble contains all variables for which there
is just one value per crash (“accident”), vehicle, or person (e.g.,
weather conditions, travel speed, age). Each row corresponds to a person
involved in a crash. As there may be multiple people and/or vehicles
involved in one crash, some variable-values are repeated within a crash
or vehicle. Each crash is uniquely identified with id
,
which is a combination of year
and st_case
.
Note that st_case
is not unique across years, for example,
st_case
510001 will appear in each year. The
id
variable attempts to avoid this issue.
The multi_
tibbles contain those variables for which
there may be a varying number of values for any entity (e.g., driver
impairments, vehicle events, weather conditions at time of crash). Each
tibble has the requisite data elements corresponding to the entity:
multi_acc
includes st_case
and
year
, multi_veh
adds veh_no
(vehicle number), and multi_per
adds per_no
(person number).
The events
tibble provides a sequence of numbered events
for each vehicle in each crash.
glimpse(myFARS)
#> List of 5
#> $ flat : tibble [5,259 × 194] (S3: tbl_df/tbl/data.frame)
#> ..$ year : num [1:5259] 2018 2018 2018 2018 2018 ...
#> ..$ state : chr [1:5259] "Virginia" "Virginia" "Virginia" "Virginia" ...
#> ..$ st_case : num [1:5259] 510001 510001 510001 510002 510002 ...
#> ..$ id : num [1:5259] 2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
#> ..$ veh_no : num [1:5259] 1 1 1 1 2 0 1 1 1 2 ...
#> ..$ per_no : num [1:5259] 1 2 3 1 1 1 1 1 2 1 ...
#> ..$ county : chr [1:5259] "RICHMOND (760)" "RICHMOND (760)" "RICHMOND (760)" "PITTSYLVANIA (143)" ...
#> ..$ city : chr [1:5259] "RICHMOND" "RICHMOND" "RICHMOND" "NOT APPLICABLE" ...
#> ..$ lon : num [1:5259] -77.4 -77.4 -77.4 -79.4 -79.4 ...
#> ..$ lat : num [1:5259] 37.6 37.6 37.6 36.7 36.7 ...
#> ..$ ve_total : num [1:5259] 1 1 1 2 2 1 1 2 2 2 ...
#> ..$ ve_forms : num [1:5259] 1 1 1 2 2 1 1 2 2 2 ...
#> ..$ pvh_invl : num [1:5259] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ peds : num [1:5259] 0 0 0 0 0 1 1 0 0 0 ...
#> ..$ persons : num [1:5259] 3 3 3 2 2 1 1 3 3 3 ...
#> ..$ permvit : num [1:5259] 3 3 3 2 2 1 1 3 3 3 ...
#> ..$ pernotmvit : num [1:5259] 0 0 0 0 0 1 1 0 0 0 ...
#> ..$ day : num [1:5259] 1 1 1 3 3 3 3 2 2 2 ...
#> ..$ month : chr [1:5259] "January" "January" "January" "January" ...
#> ..$ day_week : chr [1:5259] "Monday" "Monday" "Monday" "Wednesday" ...
#> ..$ hour : num [1:5259] 16 16 16 7 7 20 20 12 12 12 ...
#> ..$ minute : num [1:5259] 15 15 15 30 30 34 34 10 10 10 ...
#> ..$ nhs : chr [1:5259] "This section IS NOT on the NHS" "This section IS NOT on the NHS" "This section IS NOT on the NHS" "This section IS NOT on the NHS" ...
#> ..$ route : chr [1:5259] "Local Street - Municipality" "Local Street - Municipality" "Local Street - Municipality" "State Highway" ...
#> ..$ tway_id : chr [1:5259] "VA-127" "VA-127" "VA-127" "SR-41 FRANKLIN TRPK" ...
#> ..$ tway_id2 : chr [1:5259] NA NA NA "GOLF CLUB RD" ...
#> ..$ rur_urb : chr [1:5259] "Urban" "Urban" "Urban" "Rural" ...
#> ..$ func_sys : chr [1:5259] "Major Collector" "Major Collector" "Major Collector" "Minor Arterial" ...
#> ..$ rd_owner : chr [1:5259] "City or Municipal Highway Agency" "City or Municipal Highway Agency" "City or Municipal Highway Agency" "State Highway Agency" ...
#> ..$ milept : chr [1:5259] "11" "11" "11" "35" ...
#> ..$ sp_jur : chr [1:5259] "No Special Jurisdiction" "No Special Jurisdiction" "No Special Jurisdiction" "No Special Jurisdiction" ...
#> ..$ harm_ev : chr [1:5259] "Tree (Standing Only)" "Tree (Standing Only)" "Tree (Standing Only)" "Motor Vehicle In-Transport" ...
#> ..$ man_coll : chr [1:5259] "Not a Collision with Motor Vehicle In-Transport" "Not a Collision with Motor Vehicle In-Transport" "Not a Collision with Motor Vehicle In-Transport" "Front-to-Front" ...
#> ..$ reljct1 : chr [1:5259] "No" "No" "No" "No" ...
#> ..$ reljct2 : chr [1:5259] "Non-Junction" "Non-Junction" "Non-Junction" "Intersection-Related" ...
#> ..$ typ_int : chr [1:5259] "Not an Intersection" "Not an Intersection" "Not an Intersection" "T-Intersection" ...
#> ..$ wrk_zone : chr [1:5259] "None" "None" "None" "None" ...
#> ..$ rel_road : chr [1:5259] "On Roadside" "On Roadside" "On Roadside" "On Roadway" ...
#> ..$ lgt_cond : chr [1:5259] "Daylight" "Daylight" "Daylight" "Daylight" ...
#> ..$ sch_bus : chr [1:5259] "No" "No" "No" "No" ...
#> ..$ rail : chr [1:5259] "Not Applicable" "Not Applicable" "Not Applicable" "Not Applicable" ...
#> ..$ not_hour : num [1:5259] 99 99 99 99 99 99 99 99 99 99 ...
#> ..$ not_min : num [1:5259] 99 99 99 99 99 99 99 99 99 99 ...
#> ..$ arr_hour : num [1:5259] 99 99 99 99 99 99 99 99 99 99 ...
#> ..$ arr_min : num [1:5259] 99 99 99 99 99 99 99 99 99 99 ...
#> ..$ hosp_hr : num [1:5259] 99 99 99 99 99 88 88 99 99 99 ...
#> ..$ hosp_mn : num [1:5259] 99 99 99 99 99 88 88 99 99 99 ...
#> ..$ fatals : num [1:5259] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ drunk_dr : num [1:5259] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ str_veh : num [1:5259] 0 0 0 0 0 1 0 0 0 0 ...
#> ..$ age : chr [1:5259] "23 Years" "22 Years" "19 Years" "29 Years" ...
#> ..$ sex : chr [1:5259] "Male" "Male" "Female" "Male" ...
#> ..$ per_typ : chr [1:5259] "Driver of a Motor Vehicle In-Transport" "Passenger of a Motor Vehicle In-Transport" "Passenger of a Motor Vehicle In-Transport" "Driver of a Motor Vehicle In-Transport" ...
#> ..$ inj_sev : chr [1:5259] "Suspected Minor Injury (B)" "Suspected Serious Injury (A)" "Fatal Injury (K)" "Suspected Serious Injury (A)" ...
#> ..$ seat_pos : chr [1:5259] "Front Seat, Left Side" "Second Seat, Left Side" "Front Seat, Right Side" "Front Seat, Left Side" ...
#> ..$ rest_use : chr [1:5259] "None Used / Not Applicable" "None Used / Not Applicable" "None Used / Not Applicable" "Shoulder and Lap Belt Used" ...
#> ..$ rest_mis : chr [1:5259] "No" "No" "No" "No" ...
#> ..$ air_bag : chr [1:5259] "Deployed- Front" "Not Deployed" "Not Deployed" "Deployed- Front" ...
#> ..$ ejection : chr [1:5259] "Not Ejected" "Not Ejected" "Not Ejected" "Not Ejected" ...
#> ..$ ej_path : chr [1:5259] "Ejection Path Not Applicable" "Ejection Path Not Applicable" "Ejection Path Not Applicable" "Ejection Path Not Applicable" ...
#> ..$ extricat : chr [1:5259] "Not Extricated or Not Applicable" "Not Extricated or Not Applicable" "Not Extricated or Not Applicable" "Not Extricated or Not Applicable" ...
#> ..$ drinking : chr [1:5259] "No (Alcohol Not Involved)" "Not Reported" "Not Reported" "No (Alcohol Not Involved)" ...
#> ..$ alc_det : chr [1:5259] "Not Reported" "Not Reported" "Not Reported" "Not Reported" ...
#> ..$ alc_status : chr [1:5259] "Test Not Given" "Test Not Given" "Test Not Given" "Test Not Given" ...
#> ..$ atst_typ : chr [1:5259] "Test Not Given" "Test Not Given" "Test Not Given" "Test Not Given" ...
#> ..$ alc_res : chr [1:5259] "Test Not Given" "Test Not Given" "Test Not Given" "Test Not Given" ...
#> ..$ drugs : chr [1:5259] "Reported as Unknown" "Not Reported" "Not Reported" "No (drugs not involved)" ...
#> ..$ drug_det : chr [1:5259] "Not Reported" "Not Reported" "Not Reported" "Not Reported" ...
#> ..$ dstatus : chr [1:5259] "Test Not Given" "Test Not Given" "Test Not Given" "Test Not Given" ...
#> ..$ hospital : chr [1:5259] "EMS Unknown Mode" "EMS Unknown Mode" "Not Transported" "EMS Unknown Mode" ...
#> ..$ doa : chr [1:5259] "Not Applicable" "Not Applicable" "Died at Scene" "Not Applicable" ...
#> ..$ death_da : chr [1:5259] "Not Applicable (Non-Fatal)" "Not Applicable (Non-Fatal)" "1" "Not Applicable (Non-Fatal)" ...
#> ..$ death_mo : chr [1:5259] "Not Applicable (Non-Fatal)" "Not Applicable (Non-Fatal)" "January" "Not Applicable (Non-Fatal)" ...
#> ..$ death_yr : chr [1:5259] "Not Applicable (Non-fatal)" "Not Applicable (Non-fatal)" "2018" "Not Applicable (Non-fatal)" ...
#> ..$ death_hr : num [1:5259] 88 88 16 88 8 20 88 12 88 88 ...
#> ..$ death_mn : num [1:5259] 88 88 20 88 45 40 88 39 88 88 ...
#> ..$ death_tm : chr [1:5259] "Not Applicable (Non-fatal)" "Not Applicable (Non-fatal)" "1620" "Not Applicable (Non-fatal)" ...
#> ..$ lag_hrs : num [1:5259] 999 999 0 999 1 0 999 0 999 999 ...
#> ..$ lag_mins : num [1:5259] 99 99 5 99 15 6 99 29 99 99 ...
#> ..$ work_inj : chr [1:5259] "Not Applicable (not a fatality)" "Not Applicable (not a fatality)" "No" "Not Applicable (not a fatality)" ...
#> ..$ hispanic : chr [1:5259] "Not A Fatality (not Applicable)" "Not A Fatality (not Applicable)" "Non-Hispanic" "Not A Fatality (not Applicable)" ...
#> ..$ location : chr [1:5259] "Occupant of a Motor Vehicle" "Occupant of a Motor Vehicle" "Occupant of a Motor Vehicle" "Occupant of a Motor Vehicle" ...
#> ..$ numoccs : chr [1:5259] "03" "03" "03" "01" ...
#> ..$ unittype : chr [1:5259] "Motor Vehicle In-Transport (Inside or Outside the Trafficway)" "Motor Vehicle In-Transport (Inside or Outside the Trafficway)" "Motor Vehicle In-Transport (Inside or Outside the Trafficway)" "Motor Vehicle In-Transport (Inside or Outside the Trafficway)" ...
#> ..$ hit_run : chr [1:5259] "No" "No" "No" "No" ...
#> ..$ reg_stat : chr [1:5259] "Virginia" "Virginia" "Virginia" "Virginia" ...
#> ..$ owner : chr [1:5259] "Driver (in this crash) was Registered Owner" "Driver (in this crash) was Registered Owner" "Driver (in this crash) was Registered Owner" "Driver (in this crash) was Registered Owner" ...
#> ..$ make : chr [1:5259] "Honda" "Honda" "Honda" "Honda" ...
#> ..$ model : num [1:5259] 32 32 32 32 2 NA 401 403 403 881 ...
#> ..$ mak_mod : chr [1:5259] "Honda Accord (Note: For Crosstour model years 2010 and 2011 only. For Crosstour model years 2012-2015, see vehi"| __truncated__ "Honda Accord (Note: For Crosstour model years 2010 and 2011 only. For Crosstour model years 2012-2015, see vehi"| __truncated__ "Honda Accord (Note: For Crosstour model years 2010 and 2011 only. For Crosstour model years 2012-2015, see vehi"| __truncated__ "Honda Accord (Note: For Crosstour model years 2010 and 2011 only. For Crosstour model years 2012-2015, see vehi"| __truncated__ ...
#> ..$ body_typ : chr [1:5259] "4-door sedan, hardtop" "4-door sedan, hardtop" "4-door sedan, hardtop" "4-door sedan, hardtop" ...
#> ..$ mod_year : chr [1:5259] "2005" "2005" "2005" "2003" ...
#> ..$ vin : chr [1:5259] "1HGCM66535A0" "1HGCM66535A0" "1HGCM66535A0" "1HGCM56333A0" ...
#> ..$ tow_veh : chr [1:5259] "No Trailing Units" "No Trailing Units" "No Trailing Units" "No Trailing Units" ...
#> ..$ j_knife : chr [1:5259] "Not an Articulated Vehicle" "Not an Articulated Vehicle" "Not an Articulated Vehicle" "Not an Articulated Vehicle" ...
#> ..$ mcarr_i1 : chr [1:5259] "Not Applicable" "Not Applicable" "Not Applicable" "Not Applicable" ...
#> ..$ mcarr_i2 : chr [1:5259] "Not Applicable" "Not Applicable" "Not Applicable" "Not Applicable" ...
#> ..$ mcarr_id : chr [1:5259] "Not Applicable" "Not Applicable" "Not Applicable" "Not Applicable" ...
#> ..$ v_config : chr [1:5259] "Not Applicable" "Not Applicable" "Not Applicable" "Not Applicable" ...
#> .. [list output truncated]
#> $ multi_acc:'data.frame': 4777 obs. of 5 variables:
#> ..$ state : chr [1:4777] "Virginia" "Virginia" "Virginia" "Virginia" ...
#> ..$ st_case: num [1:4777] 510001 510002 510003 510004 510005 ...
#> ..$ name : chr [1:4777] "weather" "weather" "weather" "weather" ...
#> ..$ value : chr [1:4777] "Not Reported" "Not Reported" "Not Reported" "Not Reported" ...
#> ..$ year : num [1:4777] 2018 2018 2018 2018 2018 ...
#> $ multi_veh:'data.frame': 47959 obs. of 6 variables:
#> ..$ state : chr [1:47959] "Virginia" "Virginia" "Virginia" "Virginia" ...
#> ..$ st_case: num [1:47959] 510001 510002 510002 510003 510004 ...
#> ..$ veh_no : num [1:47959] 1 1 2 1 1 2 1 1 1 1 ...
#> ..$ name : chr [1:47959] "vehiclesf" "vehiclesf" "vehiclesf" "vehiclesf" ...
#> ..$ value : chr [1:47959] "None" "None" "None" "None" ...
#> ..$ year : num [1:47959] 2018 2018 2018 2018 2018 ...
#> $ multi_per:'data.frame': 31896 obs. of 7 variables:
#> ..$ state : chr [1:31896] "Virginia" "Virginia" "Virginia" "Virginia" ...
#> ..$ st_case: num [1:31896] 510001 510001 510001 510002 510002 ...
#> ..$ veh_no : num [1:31896] 1 1 1 1 2 0 1 1 1 2 ...
#> ..$ per_no : num [1:31896] 1 2 3 1 1 1 1 1 2 1 ...
#> ..$ name : chr [1:31896] "race" "race" "race" "race" ...
#> ..$ value : chr [1:31896] "Not a Fatality (not Applicable)" "Not a Fatality (not Applicable)" "Black" "Not a Fatality (not Applicable)" ...
#> ..$ year : num [1:31896] 2018 2018 2018 2018 2018 ...
#> $ events :'data.frame': 8410 obs. of 7 variables:
#> ..$ state : chr [1:8410] "Virginia" "Virginia" "Virginia" "Virginia" ...
#> ..$ st_case : num [1:8410] 510001 510001 510002 510002 510002 ...
#> ..$ veh_no : num [1:8410] 1 1 1 1 2 1 1 2 1 1 ...
#> ..$ veventnum: num [1:8410] 1 2 1 2 1 1 1 1 1 2 ...
#> ..$ soe : chr [1:8410] "Ran Off Roadway - Right" "Tree (Standing Only)" "Cross Centerline" "Motor Vehicle In-Transport" ...
#> ..$ aoi : chr [1:8410] "Non-Harmful Event" "1 Clock Point" "Non-Harmful Event" "12 Clock Point" ...
#> ..$ year : num [1:8410] 2018 2018 2018 2018 2018 ...
#> - attr(*, "class")= chr [1:2] "list" "FARS"
You can review the list of variables to help guide your analysis with:
View(fars_varnames)
A first step in many transportation safety analyses involves counting
the number of relevant crashes, fatalities, or people involved.
counts()
lets users specify what to count,
where to count them (rural/urban and/or in specified states),
who to include, which years to include and an
aggregation interval (annually or monthly), and factors
involved in the crash. It returns a simple tibble that can be
easily piped into ggplot()
to quickly visualize counts.
<- counts(
my_counts
myFARS,what = "crashes",
interval = c("year")
)
head(my_counts)
#> # A tibble: 3 × 3
#> year n date
#> <dbl> <int> <date>
#> 1 2018 778 2018-01-01
#> 2 2019 774 2019-01-01
#> 3 2020 796 2020-01-01
%>%
my_counts ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1.2) +
labs(x=NULL, y=NULL, title = "Fatal Crashes in Virginia")
counts(
myFARS,what = "fatalities",
interval = c("year")
%>%
) ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1.2) +
labs(x=NULL, y=NULL, title = "Fatalities in Virginia")
counts(myFARS,
what = "fatalities",
where = "rural",
interval = c("year")
%>%
) ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1.2) +
labs(x=NULL, y=NULL, title = "Rural Fatalities in Virginia")
counts(myFARS,
what = "fatalities",
where = "rural",
interval = c("year"),
involved = "speeding"
%>%
) ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1.2) +
labs(x=NULL, y=NULL, title = "Speeding-Related Fatalities in Rural Virginia")
We can combine two counts()
results to make a
comparison. Here we compare the number of speeding-related fatalities in
rural and urban Virginia:
bind_rows(
counts(myFARS,
what = "fatalities",
where = "rural",
interval = c("year"),
involved = "speeding"
%>%
) mutate(where = "Rural"),
counts(myFARS,
what = "fatalities",
where = "urban",
interval = c("year"),
involved = "speeding"
%>%
) mutate(where = "Urban")
%>%
) ggplot(aes(x=date, y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1.2) +
facet_wrap(.~where) +
labs(x=NULL, y=NULL, title = "Speeding-Related Fatalities in Virginia", fill=NULL)
We can take advantage of having access to the full data with maps. Here we map pedestrian and bicyclist fatalities in Virginia:
counts(
myFARS, what = "crashes",
involved = "pedbike",
filterOnly = TRUE
%>%
) leaflet() %>%
addTiles() %>%
addHeatmap(group = "Heatmap", radius=10, blur=20, minOpacity = .01, max = .2, cellSize = 1) %>%
addCircleMarkers(
radius = 1,
color = "red",
stroke = FALSE,
fillOpacity = 0.7, group = "Crash Locations")
#> Assuming "lon" and "lat" are longitude and latitude, respectively
#> Assuming "lon" and "lat" are longitude and latitude, respectively
Drug-related crashes:
counts(
myFARS, what = "crashes",
involved = "drugs",
filterOnly = TRUE
%>%
) filter(!is.na(lat), !is.na(lon)) %>%
leaflet() %>%
addTiles() %>%
addHeatmap(group = "Heatmap", radius=10, blur=20, minOpacity = .01, max = .2, cellSize = 1) %>%
addCircleMarkers(
radius = 1,
color = "red",
stroke = FALSE,
fillOpacity = 0.7, group = "Crash Locations")
#> Assuming "lon" and "lat" are longitude and latitude, respectively
#> Assuming "lon" and "lat" are longitude and latitude, respectively
Young drivers:
counts(
myFARS, what = "crashes",
involved = "young driver",
filterOnly = TRUE
%>%
) filter(!is.na(lat), !is.na(lon)) %>%
leaflet() %>%
addTiles() %>%
addHeatmap(group = "Heatmap", radius=10, blur=20, minOpacity = .01, max = .2, cellSize = 1) %>%
addCircleMarkers(
radius = 1,
color = "red",
stroke = FALSE,
fillOpacity = 0.7, group = "Crash Locations")
#> Note: Young drivers are defined as those between the ages of 15 and 20.
#> Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
#> Assuming "lon" and "lat" are longitude and latitude, respectively
#> Assuming "lon" and "lat" are longitude and latitude, respectively
Having access to the full dataset also allows us to develop statistical models. Here we fit a simple model of injury severity as a function of age and restraint use. The results indicate that injury severity increases with age, and when seat belts are not used properly.
# table(myFARS$flat$inj_sev)
# table(myFARS$flat$rest_use, useNA = "ifany")
# table(myFARS$flat$per_typ, useNA = "ifany")
<-
model_data $flat %>%
myFARSfilter(rest_use %in% c("Lap Belt Only Used",
"Shoulder Belt Only Used",
"None Used / Not Applicable",
"None Used/Not Applicable",
"Shoulder and Lap Belt Used"),
%in% c("Driver of a Motor Vehicle In-Transport",
per_typ "Passenger of a Motor Vehicle In-Transport")
%>%
) mutate(
rest_use = case_when(
%in% c("Lap Belt Only Used", "Shoulder Belt Only Used") ~ "Partial",
rest_use %in% c("None Used / Not Applicable", "None Used/Not Applicable") ~ "None",
rest_use %in% c("Shoulder and Lap Belt Used") ~ "Full",
rest_use TRUE ~ "Unknown"
%>%
) as.factor() %>%
relevel(ref = "Full"),
kabco = case_when(
== "Fatal Injury (K)" ~ 4,
inj_sev %in% c("Suspected Serious Injury (A)",
inj_sev "Suspected Serious Injury(A)") ~ 3,
%in% c("Suspected Minor Injury (B)",
inj_sev "Suspected Minor Injury(B)") ~ 2,
== "Possible Injury (C)" ~ 1,
inj_sev == "No Apparent Injury (O)" ~ 0,
inj_sev TRUE ~ as.numeric(NA)
),age_n = gsub("\\D+","", age) %>% as.numeric())
<- lm(kabco ~ age_n + rest_use, data = model_data)
my_model
::stargazer(my_model, type = "html") stargazer
Dependent variable: | |
kabco | |
age_n | 0.009*** |
(0.001) | |
rest_useNone | 1.474*** |
(0.042) | |
rest_usePartial | 0.520** |
(0.222) | |
Constant | 1.702*** |
(0.053) | |
Observations | 4,301 |
R2 | 0.224 |
Adjusted R2 | 0.224 |
Residual Std. Error | 1.338 (df = 4297) |
F Statistic | 414.442*** (df = 3; 4297) |
Note: | p<0.1; p<0.05; p<0.01 |
<- expand.grid(
new_data age_n = c(20, 60),
rest_use = factor(c("Full", "Partial", "None"), levels = c("Full", "Partial", "None"), ordered = TRUE) )
%>%
new_data mutate(pred = predict(my_model, newdata = new_data),
age = paste0(age_n, " yrs")) %>%
ggplot(aes(x=rest_use, y=pred)) +
geom_col() +
facet_wrap(.~age) +
scale_y_continuous(
limits = c(0,4), breaks = 0:4, labels = c("O", "C", "B", "A", "K"), expand = expansion()) +
theme(panel.grid = element_blank(),
panel.grid.major.y = element_line(c("black")),
axis.ticks = element_blank()) +
labs(x="", y="", title = "Predicted Injury Severity by Age and Restraint Use",
caption = "Full = correctly used seatbelt, partial = partially correctly used, none = no seatbelt.")