The JumpeR
R
package is used for converting
human readable track and field (athletics) results into data frames for
use in analysis.
JumpeR is available on CRAN install.packages(JumpeR)
devtools::install_github("gpilgrim2670/JumpeR")
v0.3.0 - November 16th, 2021
Package is still under heavy development so development versions will be unstable. Please use the stable CRAN release unless you have a very good reason not to.
JumpeR
reads track and field results into
R
, similar to what the SwimmeR
package does
for swimming results.
JumpeR
currently supports reading in single
column Hy-tek/Active.com style results in either .html or .pdf
format. JumpeR
also supports Flash Results style results in
.pdf format (but not html).
These are Hy-tek results in html format, from the 2019 Greg Page relays at Cornell University. This particular file contains the entire meet.
It can be imported into R
using JumpeR
:
tf_parse(
read_results(
"https://www.leonetiming.com/2019/Indoor/GregPageRelays/Results.htm"
) )
This is a Hy-tek .pdf results file, from the Singapore Masters Track and Field Association 2019 Championship. It contains the entire meet.
Once saved (it’s included in JumpeR
as an example) it
can be imported into R
using JumpeR
:
tf_parse(
read_results(
system.file("extdata", "SMTFA-2019-Full-Results.pdf", package = "JumpeR")
),rounds = TRUE
)
This is a Flash Results .pdf result, from the prelims of the 2019 NCAA Men’s 100m Championships.
It can be imported into R
using JumpeR
:
tf_parse(
read_results(
"https://www.flashresults.com/2019_Meets/Outdoor/06-05_NCAAOTF-Austin/001-1.pdf"
) )
Flash Results also post .html version of results like these, which are currently NOT supported.
JumpeR
reads track and field results into R
and outputs tidy dataframes. JumpeR
uses
read_results
to read in either a PDF or HTML file (like a
url) and the tf_parse
(for track and field) function to
convert the read file to a tidy dataframe.
read_results
has two arguments. * file
,
which is the file path to read in * node
, required only for
HTML files, this is a CSS selector node where the results reside.
node
defaults to "pre"
, which has been correct
in every instance tested thus far.
tf_parse
has six arguments as of version 0.1.0.
file
is the output of read_results
and
is required.
avoid
is a list of strings. Rows in
file
containing any of those strings will not be included
in the final results. avoid
is optional. Incorrectly
specifying it may lead to nonsense rows in the final data frame, but
will not cause an error. Nonsense rows can be removed after
import.
typo
and replacement
work together to
fix typos, by replacing them with replacements. Strings in
typo
will be replaced by strings in
replacement
in element index order - that is the first
element of typo
will be replaced everywhere it appears by
the first element of replacement
. Uncorrected typos can
cause lost data and nonsense rows.
relay_athletes
defaults to FALSE
.
Setting it to TRUE
will cause tf_parse
to try
to pull out the names of athletes participating in relays. Athlete names
will be in separate columns called Relay_Athlete_1
,
Relay_Athlete_2
etc. etc.
Here’s the Women’s 4x400m relay from the 2019 Greg Page relays at Cornell University.
Here’s the same thing after importing with JumpeR
tf_parse(
read_results(
"https://www.leonetiming.com/2019/Indoor/GregPageRelays/Results.htm"
),relay_athletes = TRUE
)
rounds
records a unit of length for events where
athletes get to try multiple times (long jump, javelin, pole vault etc.
- basically the “field” events in track and field). The default is
FALSE
but setting rounds
to TRUE
will cause tf_parse
to attempt to collect the
distance/height (or FOUL) for each round. New columns called
Round_1
, Round_2
etc. will be created.Here’s the long jump prelims from the 2019 Virginia Grand Prix at the University of Virginia with the “rounds” highlighted in yellow.
Here’s the same thing after importing with JumpeR
tf_parse(
read_results(
"https://www.flashresults.com/2018_Meets/Outdoor/04-28_VirginiaGrandPrix/035-1.pdf"
),rounds = TRUE
)
round_attempts
records the outcome of each attempt
(height) in the vertical jumping events (high jump, pole vault). The
default for round_attempts
is FALSE
but
setting it to TRUE
will include these values (usually some
combination of “X”, “O” and “-”) in new columns called
Round_1_Attempts
, Round_2_Attempts
etc. If
round_attempts = TRUE
then rounds = TRUE
must
be set as well.Here’s the pole vault results from the 2019 Duke Invite at (natch) Duke University with the “round_attempts” highlighted in yellow and the “rounds” circled in red.
Here’s the same thing after importing with JumpeR
-
adding all these columns makes the results very wide.
tf_parse(
read_results(
"https://www.flashresults.com/2018_Meets/Outdoor/04-20_DukeInvite/014-1.pdf"
),rounds = TRUE,
round_attempts = TRUE
)
split_attempts
setting
split_attempts = TRUE
will cause tf_parse
to
break each Round_X_Attempts
column into pieces. A column
containing “XXO” for example will become three columns, one containing
“X”, the second containing the second “X” and the third containing “O”.
This will mean there are a lot of columns! If
split_attempts = TRUE
then round_attempts
must
be TRUE
as well.Looking at those same Duke
pole vault results, here’s how using split_attempts
works - adding all these columns make the results extremely wide. I’m
only going to show the first six split columns, called
Round_1_Attempt_1
, Round_1_Attempt_2
,
Round_1_Attempt_3
etc..
tf_parse(
read_results(
"https://www.flashresults.com/2018_Meets/Outdoor/04-20_DukeInvite/014-1.pdf"
),rounds = TRUE,
round_attempts = TRUE,
split_attempts = TRUE
)
See ?tf_parse
for more information.
While setting split_attempts = TRUE
in
tf_parse
can be used to generate wide format results of
vertical jump attempts it might be more useful to create long format
results instead. This can be accomplished after
tf_parse
.
Using those same Duke pole vault results here’s the first place finisher in long format
<-
df tf_parse(
read_results(
"https://www.flashresults.com/2018_Meets/Outdoor/04-20_DukeInvite/014-1.pdf"
),rounds = TRUE,
round_attempts = TRUE,
)%>%
df attempts_split_long() %>%
select(Place, Name, Age, Team, Finals_Result, Event, Bar_Height, Attempt, Result)
By default all results (like the Finals_Result
column)
returned by JumpeR
are characters, not numeric. This is
because lots of results don’t fit R
s notions of what a
number is. A result like "1.65m"
for a long jump can’t be a
number because of the “m”. A result like "1:45.32"
as a
time can’t be a number because of the “:”. Luckily JumpeR
is here to help with all of that. Passing results to
math_format
will return results formatted as numeric, such
that they can be used in math.
Please note however that JumpeR
doesn’t understand
units. Passing
math_format(c("1.65m", "DNS", "1:45.32"))
will return 1.65
(meters, but not noted),
NA
(nice touch there), and 105.32
(seconds,
also not noted). You’ll need to keep track of your units yourself, or
perhaps use the units
package. This is an area of possible
future development.
The best use of math_format
is to convert an entire
column, like Finals_Results
<- tf_parse(
df read_results(
"https://www.leonetiming.com/2019/Indoor/GregPageRelays/Results.htm"
)
)
library(dplyr)
<- df %>%
df mutate(Finals_Result_Math = math_format(Finals_Result)) %>%
select(Place, Name, Team, Finals_Result, Finals_Result_Math, Event)
You’re welcome to contact me with bug reports, feature requests, etc.
for JumpeR
.
If you find bug, please provide a minimal reproducible example at github.
JumpeR
is conceptually very similar to the
SwimmeR
package, which I also developed and maintain. I do
a lot of demos on how to use SwimmeR
at my blog Swimming + Data Science, which
may be instructive for users of JumpeR
as well.
SwimmeR
also has a vignette (JumpeR
does not
at the moment).