First let’s load the required packages for the code in this vignette. If you haven’t already installed them, all the other packages can be installed using ´install.packages()´
library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
Now let´s connect to a duckdb database with the Eunomia data (https://github.com/OHDSI/Eunomia).
<- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
con <- cdm_from_con(con, cdm_schema = "main")
cdm
cdm#> # OMOP CDM reference (tbl_duckdb_connection)
#>
#> Tables: person, observation_period, visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, measurement, observation, death, location, care_site, provider, drug_era, dose_era, condition_era, cdm_source, concept, vocabulary, concept_relationship, concept_synonym, concept_ancestor, drug_strength
This cdm object is now what we´ll use going forward. It provides a reference to the OMOP CDM tables. We can see that these tables are still in the database, but now we have a reference to each of the ones we might want to use in our analysis. For example, the person table can be referenced like so
Say we want to make a histogram of year of birth in the person table. We can select that variable, bring it into memory, and then use ggplot to make the histogram.
$person %>%
cdmselect(year_of_birth) %>%
collect() %>%
ggplot(aes(x = year_of_birth)) +
geom_histogram(bins = 30)
If we wanted to make a boxplot for length of observation periods we could do the computation on the database side, bring in the new variable into memory, and use ggplot to produce the boxplot
$observation_period %>%
cdmselect(observation_period_start_date, observation_period_end_date) %>%
mutate(observation_period = (observation_period_end_date - observation_period_start_date)/365, 25) %>%
select(observation_period) %>%
collect() %>%
ggplot(aes(x = observation_period)) +
geom_boxplot()
We use show_query to check the sql that is being run against duckdb
$person %>%
cdmtally() %>%
show_query()
#> <SQL>
#> SELECT COUNT(*) AS n
#> FROM main.person
$person %>%
cdmsummarise(median(year_of_birth))%>%
show_query()
#> Warning: Missing values are always removed in SQL aggregation functions.
#> Use `na.rm = TRUE` to silence this warning
#> This warning is displayed once every 8 hours.
#> <SQL>
#> SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY year_of_birth) AS "median(year_of_birth)"
#> FROM main.person
$person %>%
cdmmutate(gender = case_when(
== "8507" ~ "Male",
gender_concept_id == "8532" ~ "Female",
gender_concept_id TRUE ~ NA_character_))%>%
show_query()
#> <SQL>
#> SELECT
#> *,
#> CASE
#> WHEN (gender_concept_id = '8507') THEN 'Male'
#> WHEN (gender_concept_id = '8532') THEN 'Female'
#> ELSE NULL
#> END AS gender
#> FROM main.person
::dbDisconnect(con, disconnect = TRUE) DBI