3 Examples

3.1 redcap_data

Exported data from REDCap:

datos <- redcap_data(data_path="C:/Users/username/example.r",
                     dic_path="C:/Users/username/example_dictionary.csv")

Data using API:

datos_api <- redcap_data(uri ="https://redcap.idibell.cat/api/",
                         token = "55E5C3D1E83213ADA2182A4BFDEA")

For the examples we will use COVICAN dataset that is already included in the package. For more information use ?covican.

datos_redcap <- covican

List of 2
 $ data      :'data.frame': 342 obs. of  56 variables:
 $ dictionary:'data.frame': 21 obs. of  8 variables:

datos_redcap$data:

   record_id        redcap_event_name redcap_data_access_group
1      100-6      initial_visit_arm_1              hospital_11
2      100-6 follow_up_visit_da_arm_1              hospital_11
21    100-13      initial_visit_arm_1              hospital_11
22    100-13 follow_up_visit_da_arm_1              hospital_11
34    100-16      initial_visit_arm_1              hospital_11
35    100-16 follow_up_visit_da_arm_1              hospital_11

datos_redcap$dictionary:

            field_name                   form_name field_type
1            record_id inclusionexclusion_criteria       text
2                inc_1 inclusionexclusion_criteria      radio
3                inc_2 inclusionexclusion_criteria      radio
4                inc_3 inclusionexclusion_criteria      radio
6                exc_1 inclusionexclusion_criteria      radio
10 screening_fail_crit inclusionexclusion_criteria       calc

3.2 rd_transform

3.2.1 Raw data transformation

datos <- rd_transform(
  data = datos_redcap$data, 
  dic = datos_redcap$dictionary
)
data <- datos$data

#To print the results
datos$results

1. Recalculating calculated fields and saving them as '[field_name]_recalc'

| Total calculated fields | Non-transcribed fields | Recalculated different fields |
|:-----------------------:|:----------------------:|:-----------------------------:|
|            2            |         0 (0%)         |            1 (50%)            |


|     field_name      | Transcribed? | Is equal? |
|:-------------------:|:------------:|:---------:|
|         age         |     Yes      |   FALSE   |
| screening_fail_crit |     Yes      |   TRUE    |

2. Transforming checkboxes: changing their values to No/Yes, their names to the names of its options and transforming missing values of those checkboxes having question doors specified in the branching logic

Table: Checkbox variables advisable to be reviewed

| Variables without any branching logic |
|:-------------------------------------:|
|        type_underlying_disease        |

3. Replacing original variables for their factor version
4. Deleting variables that contain some patterns

As you can see, there are 4 steps in the transformation:

Recalculate autocalculated fields

There were two autocalculated fields, as we see in the summary we have been able to recalculate both of them and there is one that has changed:

data %>% 
  dplyr::select(d_birth, d_ingreso, age, age_recalc) %>% 
  dplyr::filter(age != age_recalc)

     d_birth  d_ingreso age age_recalc
1 1945-04-16 2020-04-16  74         75

Transformation of the checkboxes: change the names of the checkboxes to the names of their options and the name of their labels to No/Yes. Also, if the checkbox contains a door question, it changes the missings to

#Checkbox no gatekeeper

table(data$type_underlying_disease_haematological_cancer)


 No Yes 
103  87

#Checkbox with gatekeeper: [type_underlying_disease(0)]='1'

#In the original data set:
table(datos_redcap$data$type_underlying_disease___0, datos_redcap$data$underlying_disease_hemato___1)

#In the transformed data set:
table(data$type_underlying_disease_haematological_cancer, data$underlying_disease_hemato_acute_myeloid_leukemia)

     
      No Yes
  No   0   0
  Yes 84   3

Changes the original variables by their version in factor, except ‘redcap_event_name’ and ‘redcap_data_access_group’ which keep the two versions and the checkboxes that have already been transformed.

str(data$dm)

 Factor w/ 2 levels "No","Yes": 1 NA 2 NA 2 NA 1 NA 2 NA ...

Eliminates variables that contain a pattern. By default the pattern it looks for is ’_complete’. In this case there was no ’_complete’ variable initially.

3.2.2 Data transformation and classification by event

In order to perform the transformation by event we have to add the path where the file with the correspondence between events and forms is located. In this case, we will use the path where we have saved the file of this correspondence that we have downloaded from the COVICAN project.

datos <- rd_transform(
             data = datos_redcap$data, 
             dic = datos_redcap$dictionary,
             event_path = "files/COVICAN_instruments.csv",
             final_format = "by_event"
             )

#To print the results
datos$results

1. Recalculating calculated fields and saving them as '[field_name]_recalc'

| Total calculated fields | Non-transcribed fields | Recalculated different fields |
|:-----------------------:|:----------------------:|:-----------------------------:|
|            2            |         0 (0%)         |            1 (50%)            |


|     field_name      | Transcribed? | Is equal? |
|:-------------------:|:------------:|:---------:|
|         age         |     Yes      |   FALSE   |
| screening_fail_crit |     Yes      |   TRUE    |

2. Transforming checkboxes: changing their values to No/Yes, their names to the names of its options and transforming missing values of those checkboxes having question doors specified in the branching logic

Table: Checkbox variables advisable to be reviewed

| Variables without any branching logic |
|:-------------------------------------:|
|        type_underlying_disease        |

3. Replacing original variables for their factor version
4. Deleting variables that contain some patterns
5. Erasing variables from forms that are not linked to any event
6. Final arrangment of the data by event

Now a step in the transformation has been added, which is to split the preprocessed data for each of the events in the study so that the function returns

datos$data

# A tibble: 2 × 3
  events                   vars       df             
  <chr>                    <list>     <list>         
1 initial_visit_arm_1      <chr [25]> <df [190 × 25]>
2 follow_up_visit_da_arm_1 <chr [8]>  <df [152 × 8]>

3.2.3 Data transformation and classification by form

To perform the transformation by form we must also add the path of the file of the events and forms

datos <- rd_transform(data = datos_redcap$data, 
             dic = datos_redcap$dictionary,
             event_path = "files/COVICAN_instruments.csv",
             final_format = "by_form"
             )

#To print the results
datos$results

1. Recalculating calculated fields and saving them as '[field_name]_recalc'

| Total calculated fields | Non-transcribed fields | Recalculated different fields |
|:-----------------------:|:----------------------:|:-----------------------------:|
|            2            |         0 (0%)         |            1 (50%)            |


|     field_name      | Transcribed? | Is equal? |
|:-------------------:|:------------:|:---------:|
|         age         |     Yes      |   FALSE   |
| screening_fail_crit |     Yes      |   TRUE    |

2. Transforming checkboxes: changing their values to No/Yes, their names to the names of its options and transforming missing values of those checkboxes having question doors specified in the branching logic

Table: Checkbox variables advisable to be reviewed

| Variables without any branching logic |
|:-------------------------------------:|
|        type_underlying_disease        |

3. Replacing original variables for their factor version
4. Deleting variables that contain some patterns
5. Erasing variables from forms that are not linked to any event
6. Final arrangment of the data by form

As before, a final step is added, which consists of splitting the preprocessed data for each form of the study so that the function returns

datos$data

# A tibble: 6 × 4
  form                        events    vars       df             
  <chr>                       <list>    <list>     <list>         
1 inclusionexclusion_criteria <chr [1]> <chr [11]> <df [190 × 11]>
2 demographics                <chr [1]> <chr [9]>  <df [190 × 9]> 
3 comorbidities               <chr [1]> <chr [10]> <df [190 × 10]>
4 vital_signs                 <chr [2]> <chr [7]>  <df [177 × 7]> 
5 laboratory_findings         <chr [2]> <chr [7]>  <df [177 × 7]> 
6 microbiological_studies     <chr [1]> <chr [6]>  <df [190 × 6]>

3.2.4 Additional arguments

checkbox_labels: specify the name of the categories that will have the checkbox variables. Default is No/Yes, we can change it to N/Y.

datos <- rd_transform(
  data = datos_redcap$data, 
  dic = datos_redcap$dictionary,
  checkbox_labels = c("N", "Y")
)

data <- datos$data

We see how the categories of the following checkboxes have changed, for example

table(data$type_underlying_disease_haematological_cancer)


  N   Y 
103  87

exclude_to_factor: specify the name of a variable that we do not want to be transformed to a factor. For example, if we want the variable dm to keep its numeric version

datos <- rd_transform(
  data = datos_redcap$data, 
  dic = datos_redcap$dictionary,
  exclude_to_factor = "dm"
)

data <- datos$data

table(data$dm)


  0   1 
140  45

keep_labels: logical argument to retain data set tags that are processed from REDCap

datos <- rd_transform(
  data = datos_redcap$data, 
  dic = datos_redcap$dictionary,
  keep_labels = TRUE
)

data <- datos$data

str(data[,1:5])

'data.frame':   342 obs. of  5 variables:
 $ record_id                      : 'labelled' chr  "100-13" "100-13" "100-16" "100-16" ...
  ..- attr(*, "label")= Named chr ""
  .. ..- attr(*, "names")= chr "record_id"
 $ redcap_event_name              : 'labelled' chr  "initial_visit_arm_1" "follow_up_visit_da_arm_1" "initial_visit_arm_1" "follow_up_visit_da_arm_1" ...
  ..- attr(*, "label")= Named chr "Event Name"
  .. ..- attr(*, "names")= chr "redcap_event_name"
 $ redcap_event_name.factor       : Factor w/ 5 levels "Initial visit",..: 1 2 1 2 1 2 1 2 1 2 ...
  ..- attr(*, "label")= Named chr ""
  .. ..- attr(*, "names")= chr "redcap_data_access_group"
 $ redcap_data_access_group       : 'labelled' chr  "hospital_11" "hospital_11" "hospital_11" "hospital_11" ...
  ..- attr(*, "label")= Named chr "Patients older than 18 years"
  .. ..- attr(*, "names")= chr "inc_1"
 $ redcap_data_access_group.factor: Factor w/ 26 levels "Hospital 1","Hospital 2",..: 11 11 11 11 11 11 11 11 11 11 ...
  ..- attr(*, "label")= Named chr "Cancer patients"
  .. ..- attr(*, "names")= chr "inc_2"

delete_vars: specify strings whereby any variables containing them will be removed from the data set. By default all variables containing ’_complete’ are removed. In this case we did not have any complete variables. We want for example to remove all inclusion and exclusion criteria

datos <- rd_transform(
  data = datos_redcap$data, 
  dic = datos_redcap$dictionary,
  delete_vars = c("inc_", "exc_")
)

data <- datos$data

names(data)

 [1] "record_id"                                              
 [2] "redcap_event_name"                                      
 [3] "redcap_event_name.factor"                               
 [4] "redcap_data_access_group"                               
 [5] "redcap_data_access_group.factor"                        
 [6] "screening_fail_crit"                                    
 [7] "screening_fail_crit_recalc"                             
 [8] "d_birth"                                                
 [9] "d_ingreso"                                              
[10] "age"                                                    
[11] "age_recalc"                                             
[12] "dm"                                                     
[13] "type_dm"                                                
[14] "copd"                                                   
[15] "fio2_aportado"                                          
[16] "analitica_disponible"                                   
[17] "potassium"                                              
[18] "resp_freq"                                              
[19] "hemato_neo"                                             
[20] "leukemia"                                               
[21] "type_underlying_disease_haematological_cancer"          
[22] "type_underlying_disease_solid_tumour"                   
[23] "underlying_disease_hemato_acute_myeloid_leukemia"       
[24] "underlying_disease_hemato_myelodysplastic_syndrome"     
[25] "underlying_disease_hemato_chronic_myeloid_leukaemia"    
[26] "underlying_disease_hemato_acute_lymphoblastic_leukaemia"
[27] "underlying_disease_hemato_hodgkin_lymphoma"             
[28] "underlying_disease_hemato_non_hodgkin_lymphoma"         
[29] "underlying_disease_hemato_multiple_myeloma"             
[30] "underlying_disease_hemato_myelofibrosis"                
[31] "underlying_disease_hemato_aplastic_anaemia"             
[32] "urine_culture"

which_event: in the event-driven transformation, specify whether you only want a specific event to be returned. For example, we only want the first visit

datos <- rd_transform(
             data = datos_redcap$data, 
             dic = datos_redcap$dictionary,
             event_path = "files/COVICAN_instruments.csv",
             final_format = "by_event",
             which_event = "initial_visit_arm_1"
             )

data <- datos$data

table(data$redcap_event_name)


initial_visit_arm_1 
                190

which_form: in the transformation by form, specify if you only want a specific form to be returned. For example, we only want the form demographics

datos <- rd_transform(
             data = datos_redcap$data, 
             dic = datos_redcap$dictionary,
             event_path = "files/COVICAN_instruments.csv",
             final_format = "by_form",
             which_form = "demographics"
             )

data <- datos$data

names(data)

[1] "record_id"                       "redcap_event_name"              
[3] "redcap_data_access_group"        "redcap_event_name.factor"       
[5] "redcap_data_access_group.factor" "d_ingreso"                      
[7] "d_birth"                         "age"                            
[9] "age_recalc"

wide: so that the data set that is returned in the transformation by form is in wide format or not. We do this for the laboratory findings form

datos <- rd_transform(
             data = datos_redcap$data, 
             dic = datos_redcap$dictionary,
             event_path = "files/COVICAN_instruments.csv",
             final_format = "by_form",
             which_form = "laboratory_findings",
             wide = TRUE
             )

data <- datos$data

head(data)

# A tibble: 6 × 5
  record_id analitica_disponible_1 analitica_disponible_2 potassium_1 potassiu…¹
  <chr>     <fct>                  <fct>                        <dbl>      <dbl>
1 100-13    Yes                    Yes                           3.66       4.1 
2 100-16    Yes                    No                            4.04      NA   
3 100-31    Yes                    <NA>                          4.58      NA   
4 100-34    Yes                    No                            3.48      NA   
5 100-36    Yes                    No                            4.09      NA   
6 100-52    Yes                    Yes                           3.7        7.15
# … with abbreviated variable name ¹potassium_2

3.3 rd_rlogic

It transforms the REDCap logic into logic that can be evaluated in R. This function is built into the recalculate function, but it may be useful to use it separately. Let’s see how it transforms the logics of the autocalculated field calculations.

#screening failure
rd_rlogic(logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0',1,0)",
          data = datos_redcap$data)

[1] "ifelse(data$exc_1=='1' | data$inc_1=='0' | data$inc_2=='0' | data$inc_3=='0',1,0)"

#age
rd_rlogic(logic = 'rounddown(datediff([d_birth],[d_ingreso],"y","dmy"),0)',
          data = datos_redcap$data)

[1] "floor(lubridate::time_length(lubridate::interval(data$d_birth,data$d_ingreso), 'year'))"

3.4 rd_insert_na

Function to set missing variables when a certain logic is fulfilled. Useful for example in the checkboxes that we do not have a gatekeeper. For example, we put the checkbox without gatekeeper (type_underlying_disease) to missing when the age is less than 65 years old.

datos <- rd_transform(
    data = datos_redcap$data, 
    dic = datos_redcap$dictionary
)

data <- datos$data

#Before inserting missings
table(data$type_underlying_disease_haematological_cancer)


 No Yes 
103  87

data2 <- rd_insert_na(
  data = data,
  filter = rep("age < 65", 2),
  vars = grep("type_underlying_disease", names(data), value = TRUE)
)

#After inserting missings
table(data2$type_underlying_disease_haematological_cancer)


 No Yes 
 65  50

3.5 rd_query

3.5.1 Output

Identifier	DAG	Event	Instrument	Field	Description	Query	Code
100-58	Hospital 11	Initial visit	Comorbidities	copd	Chronic pulmonary disease	The value is NA and it should not be missing	100-58-1
102-113	Hospital 24	Initial visit	Demographics	age	Age	The value is NA and it should not be missing	102-113-1
105-11	Hospital 5	Initial visit	Comorbidities	copd	Chronic pulmonary disease	The value is NA and it should not be missing	105-11-1
105-11	Hospital 5	Initial visit	Demographics	age	Age	The value is NA and it should not be missing	105-11-2
105-56	Hospital 5	Initial visit	Comorbidities	copd	Chronic pulmonary disease	The value is NA and it should not be missing	105-56-1
105-56	Hospital 5	Initial visit	Demographics	age	Age	The value is NA and it should not be missing	105-56-2

3.5.2 Missings

Mandatory arguments:

example <- rd_query(variables = c("copd", "age"),
                         expression = c("%in%NA", "%in%NA"),
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data)

# Printing results
example$results

Report of queries
Variables	Description	Total
copd	Chronic pulmonary disease	6
age	Age	5

Optional arguments:

example<- rd_query(variables = c("age", "copd"),
                        variables_names = c("Age", "Chronic obstructive pulmonary disease"),#### OPCIONAL
                        expression = c("%in%NA", "%in%NA"),
                        query_name = c("Age is missing at baseline visit", "COPD"), #### OPCIONAL
                        instrument = c("Inclusión del paciente","Inclusión"),  #### OPCIONAL
                        event = "initial_visit_arm_1",
                        dic = datos_redcap$dictionary,
                        data = datos_redcap$data)

3.5.3 Missings in a branching logic

Sin filtro:

example <- rd_query(variables = c("age", "copd", "potassium"),
                          expression = c("%in%NA", "%in%NA", "%in%NA"),
                          event = "initial_visit_arm_1",
                          dic = datos_redcap$dictionary,
                          data=datos_redcap$data)

Warning: Some of the variables that were checked for missings present a branching logic. 
Check the results tab of output for more details (...$results).

# Printing results
example$results

Report of queries
Variables	Description	Total	Branching logic
age	Age	5	-
copd	Chronic pulmonary disease	6	-
potassium	Potassium	31	[analitica_disponible]=‘1’

Applying filter:

example <- rd_query(variables = c("potassium"),
                         expression = c("%in%NA"),
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data,
                         filter = c("analitica_disponible=='1'"))

Warning: Some of the variables that were checked for missings present a branching logic. 
Check the results tab of output for more details (...$results).

# Printing results
example$results

Report of queries
Variables	Description	Total	Branching logic
potassium	Potassium	21	[analitica_disponible]=‘1’

3.5.4 Expresiones

Simple:

example <- rd_query(variables=c("age"),
                         expression=c(">20"),
                         event="initial_visit_arm_1",
                         dic=datos_redcap$dictionary,
                         data=datos_redcap$data)

# Printing results
example$results

Report of queries
Variables	Description	Total
age	Age	185

Complex:

example <- rd_query(variables=c("age", "copd"),
                         expression=c("(>20 & <70) | %in%NA", "==1"),
                         event="initial_visit_arm_1",
                         dic=datos_redcap$dictionary,
                         data=datos_redcap$data)

# Printing results
example$results

Report of queries
Variables	Description	Total
age	Age	108
copd	Chronic pulmonary disease	21

3.5.5 Special cases

Same expression for all variables:

example <- rd_query(variables = c("copd","age","dm"),
                         expression = c("%in%NA"),
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data)

Warning: There are more variables than expressions, so the same expression was
applied to all variables

# Printing results
example$results

Report of queries
Variables	Description	Total
copd	Chronic pulmonary disease	6
age	Age	5
dm	Diabetes (treated with insulin or antidiabetic …	5

Not defining an event:

example <- rd_query(variables = c("copd"),
                         expression = c("%in%NA"),
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data)

Warning: event = NA, but the dataset presents a variable that indicates the
presence of events, please specify the event.

# Printing results
example$results

Report of queries
Variables	Description	Total
copd	Chronic pulmonary disease	158

3.5.6 Additional arguments

negate: negation of expression used

example <- rd_query(variables = c("copd"),
                         expression = c("%in%NA"),
                         negate = TRUE,
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data)

# Printing results
example$results

Report of queries
Variables	Description	Total
copd	Chronic pulmonary disease	184

addTo: join queries to an existing data frame

example2 <- rd_query(variables = c("age"),
                         expression = c("%in%NA"),
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data=datos_redcap$data,
                         addTo = example)

# Printing results
example2$results

Report of queries
Variables	Description	Total
copd	Chronic pulmonary disease	184
age	Age	5

report_title: customize the queries table title

example <- rd_query(variables = c("copd", "age"),
                         expression = c("%in%NA", "<20"),
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data,
                         report_title = "Missing COPD values in the baseline event")

# Printing results
example$results

Missing COPD values in the baseline event
Variables	Description	Total
copd	Chronic pulmonary disease	6

report_zeros: choose whether variables with zero queries should be reported in the table

example <- rd_query(variables = c("copd", "age"),
                         expression = c("%in%NA", "<20"),
                         event = "initial_visit_arm_1",
                         dic = datos_redcap$dictionary,
                         data = datos_redcap$data,
                         report_zeros = TRUE)

# Printing results
example$results

Report of queries
Variables	Description	Total
copd	Chronic pulmonary disease	6
age	Age	0

3.6 rd_event