The REDCapTidieR package uses vocabulary that is standard for REDCap database architects but not necessarily well known to all R users. It also introduces several idiosyncratic terms.
Below we provide find a rough mapping of REDCap concepts to their corresponding artifacts REDCapTidieR generates in the R environment, followed by a listing of definition of terms.
REDCap | REDCapTidieR |
---|---|
Project, Database | Supertibble |
Instrument, Form | One row of the supertibble Data is in the data tibble |
Field | Data column (a column of the data tibble) |
Field name | Variable name of a data column |
Field type | Data type of a data column |
Field label | Variable label of a data column (only present if supertibble is labelled) |
Record | One or several rows of a data tibble Record ID column is the first column of a data tibble |
Event |
(only present if the project is longitudinal) |
Arm |
(only present if the project is longitudinal with multiple arms) |
Repeat instance |
(only present if the instrument is repeating) |
A group of events. Arms provide a mechanism that allows one longitudinal project to have multiple different sequences of events defined. ↩︎
A rectangular data structure (matrix) that is constructed from multiple smaller rectangular data structures (blocks). In the context of REDCap, the block matrix is the rectangular data set that contains data from all instruments returned by the REDCap API. ↩︎
A primary key is a column in a table that is distinct in each row and serves to identify each row. A composite primary key is a primary key that consists of multiple columns that in combination are distinct in each row and serve to identify each row. Taken together, the identifier columns of the data tibble form a composite primary key. This makes it easy to join data tibbles together. ↩︎
An option or category defined in the context of a single-answer or multi-answer categorical field type in REDCap. You can define choices using the REDCap Field Editor. Choices have a raw value (a unique identifier - usually a serial number but this can be changed) and a choice label (a human readable description of the choice, which is displayed during data entry).
In the context of REDCapTidieR, choices come into play in two scenarios during the construction of the data tibble. Choice labels of single-answer type fields (dropdown and radio) are used to define the values of data columns that are derived from those fields. Raw values of the multi-answer checkbox field are used to construct the names of data columns derived from them. ↩︎
Also known as a traditional project, this the simplest type of REDCap project. You can define one or multiple instruments (also called forms) for data entry. Both repeating and nonrepeating instruments are allowed. Nonrepeating instruments are completed only once for each record. For nonrepeating instruments, one row of data in the data tibble represents one record. Repeating instruments can be completed an arbitrary number of times for each record. For repeating instruments, one row of data in the data tibble represents one repeat instance of one record. See also: Longitudinal project. ↩︎
In the context of REDCap, this is the same as project. We prefer the term “project” because it is has a more specific meaning. ↩︎
A column of the data tibble that is derived from data that were entered into the fields of a REDCap instrument. ↩︎
A tibble that contains data that were entered
into the fields of a specific REDCap instrument. The redcap_data
column
of the supertibble contains the data tibbles
of a project. The columns of the data tibble
include identifier columns that jointly
identify each row and data columns that
contain data that was entered into REDCap. REDCapTidieR provides several
functions to extract data tibbles from the supertibble. See also: Metadata tibble.
↩︎
A part of the RStudio IDE functionality that allows you to inspect data frames, tibbles, and some other data structures. It includes features to perform basic exploratory data analysis such as sorting, filtering, and searching. The supertibble is designed to work well with the data viewer. ↩︎
A fundamental data structure in R that allows
binding a set of names to a set of objects. The
global environment is the namespace in which you bind
objects such as values and tibbles during
interactive work. The bind_tibbles()
function takes a supertibble and binds its data tibbles to the global environment.
↩︎
A part of a longitudinal project. Each event can be associated with one or multiple instruments. ↩︎
A data type in R for categorical data. By default, single-answer categorical REDCap field types (dropdown, radio) are represented as factor variables in the data tibble. ↩︎
An attribute about an entity (e.g., age or height) that can be captured in REDCap. Instruments are made up of fields. You can configure the fields of an instrument using the REDCap Field Editor. Fields have a field type and can have a descriptive field label. The data tibble contains the data entered into the fields of a REDCap project. ↩︎
A piece of text that acts as the prompt for data entry in REDCap. The
make_labelled()
function creates variable labels based on the field label.
↩︎
The data type of the data that can be entered into a specific field. Important field types include:
text, which is used for free-text and numeric data
yesno and truefalse, which are used for logical data
dropdown and radio, which are used for single-answer categorical data
checkbox, which is used for multi-answer categorical data ↩︎
In the context of REDCap, this is the same as an instrument. We prefer the term “instrument” because it has a more specific meaning than “form.” ↩︎
A function provided by REDCapTidieR designed to help turning field labels of data
columns into pretty variable labels.
See the format-helpers
reference page. ↩︎
The level of detail that a specific row in a data tibble represents. This depends on the structure of the project (classic vs. longitudinal vs longitudinal with arms) and the structure of the instrument (nonrepeating vs repeating). For example, a data tibble containing data from a nonrepeating instrument in a longitudinal project with two arms has a granularity of one row per record per event per arm. See also: the section Longitudinal REDCap projects in the Diving Deeper vignette. ↩︎
A column in the data
tibble that serves to partially identify the entity described in a
row. The record ID column is
always present in the data tibble. Depending on the structure of the project
(classic vs. longitudinal vs. longitudinal with arms) and the structure of the instrument (nonrepeating vs repeating) there may be additional identifier
columns, including redcap_event
, redcap_arm
,
and/or redcap_repeat_instance
. Taken together, the
identifier columns form a composite
primary key. See also: the section Longitudinal
REDCap projects in the Diving
Deeper vignette. ↩︎
In the context of REDCapTidieR, this is the process of using the REDCap API to query data from a REDCap project to make it available inside the R environment. We use the term “import” in the
sense described in R
for Data Science which is to “take data stored in a file, database,
or web application programming interface (API), and load it into a data
frame in R.” Of note, the term “import” is ambiguous. From the
perspective of REDCap, “import” may mean writing external data into the
database. To clarify the direction of the import, we have named the main
function of REDCapTidieR read_redcap()
which is analogous
to other import functions in the tidyverse such as
read_csv()
. You can use the read_redcap()
function to import data from a REDCap project.
↩︎
Also called form. An electronic data entry form in
REDCap. An instrument consists of one or more fields. In the supertibble,
each row corresponds to one instrument. The
instrument’s name and human-readable label are shown in the
redcap_form_name
and redcap_form_label
columns
of the supertibble, respectively. A data
tibble contains all the data that was entered into a specific
instrument. ↩︎
The labelled R
package provides functions to attach a human-readable description (a
label) to a variable (a variable label). Labelled data can streamline
data exploration and assist with the generation of a data dictionary.
There are multiple
packages that support
labelled data. The
make_labelled()
function attaches variable labels to the
variables of a supertibble and the variables
of the data tibbles and metadata tibbles contained in that
supertibble. ↩︎
A list is a
fundamental data type in R. A tibble can contain
columns that are lists, and these columns are
called list
columns. REDCapTidieR leverages list columns to store tibbles inside
of the supertibble. For example, the
redcap_data
column of the supertibble is a list column that
contains data tibbles, and
redcap_metadata
is a list column that contains metadata tibbles.
↩︎
A type of REDCap project that contains events and optionally arms. One instrument can be associated with multiple events. This makes it possible to collect the same kind of data for the same record multiple times, which is useful for longitudinal research studies with multiple study visits. See also: Classic project. ↩︎
A tibble that contains metadata about a
specific REDCap instrument. The
redcap_metadata
column of the supertibble contains the metadata tibbles of a
project. The rows of the metadata tibble
represent fields of the instrument. The columns
represent attributes of those fields. For example, the
field_name
, field_label
, and
field_type
columns show the field’s name, a human-readable
description (the field label), and its field type. ↩︎
A structure of an instrument that allows it to be filled out exactly once per record in a classic project and once per record per event (per arm) in a longitudinal project. See also: Repeating. ↩︎
Also called a database, a REDCap project is a
self-contained collection of all the of data and metadata related to
some data collection activity (for example, a specific research study).
A project may be classic or longitudinal. A classic project
consists of instruments that contain fields. A longitudinal project may additionally
include events and arms. You can
use read_redcap()
to import the data
from a project. ↩︎
The set of information about a single entity (e.g., a study participant) for which data is being captured in REDCap. Each record consists of a discrete data values organized into fields that can be spread across multiple instruments, events, and/or arms. Each record has a unique record ID. In the data tibble, the record ID is always the first column. It is one of the identifier columns. ↩︎
The application programming interface (API) of a REDCap instance allows external programs to connect, upload, and download data of a REDCap project. To access the REDCap API, a user must have appropriate access privileges, an API token, and the uniform resource identifier (URI) of the API endpoint (something like “my.institution.edu/redcap/api”). The REDCapTidieR package uses REDCapR to query the REDCap API. ↩︎
The REDCapR R package provides functions to interact with the REDCap API. REDCapTidieR builds on REDCapR to import data. ↩︎
A structure of an instrument that allows it to be filled out zero, one, or multiple times for each record. See also: Nonrepeating. ↩︎
A horizontal series of cells in a data frame or tibble. One row of a supertibble represents an instrument. One row of a data tibble can represent different things, depending on the granularity of the data. See also: Column. ↩︎
The structure of an instrument can be repeating or nonrepeating. The supertibble shows the instrument’s structure in
the structure
column. The structure of a project can be classic, longitudinal, or longitudinal with arms. The granularity of a data tibble depends on the structure of both the
instrument and the project. See also: the section Longitudinal
REDCap projects in the Diving
Deeper vignette. ↩︎
A special tibble that contains data and
metadata of a REDCap project returned by the
read_redcap()
function. Each row of the
supertibble corresponds to one instrument. The
redcap_form_name
and redcap_form_label
columns identify the instrument. The
redcap_data
and redcap_metadata
contain the
instrument’s data tibble and metadata tibble. Additional columns contain
useful information about the data tibble, such as row and column counts,
size in memory, and the percentage of missing values in the data.
↩︎
A special kind of instrument that can be completed by someone who is not a user on a REDCap project. ↩︎
A variant of the R data frame that makes data analysis in the tidyverse a little easier. The data structures generated by REDCapTidieR are based on tibbles. See also: chapter on Tibbles in R for Data Science. ↩︎
The term “tidy” is part of REDCapTidieR’s name because it underlies two key ideas of the package.
The first is the concept of Tidy Data. A rectangular data structure is tidy if:
Data returned by the REDCap API (the “block matrix”) often satisfies the first two requirements of tidy data. However, if the project contains both repeating and nonrepeating instruments then the granularity is inconsistent from row to row. A key function of the REDCapTidieR package is to break down the block matrix by instrument. The resulting set of data tibbles is tidy because the granularity of each data tibble is consistent. This makes it easy to work with them.
The second is the idea of Tidy Tools, which is a set of design guidelines for the packages of the Tidyverse. Tidy tools should follow the following principles:
We strive to follow these principles in the design of the REDCapTidieR package. ↩︎