The bigrquery package makes it easy to work with data stored in Google BigQuery by allowing you to query BigQuery tables and retrieve metadata about your projects, datasets, tables, and jobs. The bigrquery package provides three levels of abstraction on top of BigQuery:
The low-level API provides thin wrappers over the underlying REST
API. All the low-level functions start with bq_
, and mostly
have the form bq_noun_verb()
. This level of abstraction is
most appropriate if you’re familiar with the REST API and you want do
something not supported in the higher-level APIs.
The DBI interface wraps the low-level API and makes working with BigQuery like working with any other database system. This is most convenient layer if you want to execute SQL queries in BigQuery or upload smaller amounts (i.e. <100 MB) of data.
The dplyr interface lets you treat BigQuery tables as if they are in-memory data frames. This is the most convenient layer if you don’t want to write SQL, but instead want dbplyr to write it for you.
The current bigrquery release can be installed from CRAN:
install.packages("bigrquery")
The newest development release can be installed from GitHub:
# install.packages('devtools')
::install_github("r-dbi/bigrquery") devtools
library(bigrquery)
<- bq_test_project() # replace this with your project ID
billing <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"
sql
<- bq_project_query(billing, sql)
tb bq_table_download(tb, n_max = 10)
#> # A tibble: 10 × 4
#> year month day weight_pounds
#> <int> <int> <int> <dbl>
#> 1 1969 3 15 6.88
#> 2 1969 7 11 6.12
#> 3 1969 11 8 7.50
#> 4 1969 3 15 7.69
#> 5 1969 3 12 6.31
#> 6 1969 10 24 7.19
#> 7 1969 5 14 7.69
#> 8 1969 10 14 4.31
#> 9 1969 4 5 8.44
#> 10 1969 2 6 8.50
library(DBI)
<- dbConnect(
con ::bigquery(),
bigrqueryproject = "publicdata",
dataset = "samples",
billing = billing
)
con #> <BigQueryConnection>
#> Dataset: publicdata.samples
#> Billing: gargle-169921
dbListTables(con)
#> [1] "github_nested" "github_timeline" "gsod" "natality"
#> [5] "shakespeare" "trigrams" "wikipedia"
dbGetQuery(con, sql, n = 10)
#> # A tibble: 10 × 4
#> year month day weight_pounds
#> <int> <int> <int> <dbl>
#> 1 1969 3 15 6.88
#> 2 1969 7 11 6.12
#> 3 1969 11 8 7.50
#> 4 1969 3 15 7.69
#> 5 1969 3 12 6.31
#> 6 1969 10 24 7.19
#> 7 1969 5 14 7.69
#> 8 1969 10 14 4.31
#> 9 1969 4 5 8.44
#> 10 1969 2 6 8.50
library(dplyr)
<- tbl(con, "natality")
natality #> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
%>%
natality select(year, month, day, weight_pounds) %>%
head(10) %>%
collect()
#> # A tibble: 10 × 4
#> year month day weight_pounds
#> <int> <int> <int> <dbl>
#> 1 2005 2 NA 9.31
#> 2 2005 9 NA 7.75
#> 3 2005 3 NA 7.39
#> 4 2005 12 NA 6.75
#> 5 2005 7 NA 8.38
#> 6 2005 11 NA 7.79
#> 7 2005 11 NA 8.98
#> 8 2005 5 NA 7.51
#> 9 2005 4 NA 8.38
#> 10 2005 12 NA 7.37
When using bigrquery interactively, you’ll be prompted to authorize
bigrquery in the browser. Your token will be cached across sessions
inside the folder ~/.R/gargle/gargle-oauth/
, by default.
For non-interactive usage, it is preferred to use a service account
token and put it into force via
bq_auth(path = "/path/to/your/service-account.json")
. More
places to learn about auth:
bigrquery::bq_auth()
.gargle::token_fetch()
,
which supports a variety of token flows. This article provides full
details, such as how to take advantage of Application Default
Credentials or service accounts on GCE VMs.Note that bigrquery requests permission to modify your data; but it
will never do so unless you explicitly request it (e.g. by calling
bq_table_delete()
or bq_table_upload()
). Our
Privacy
policy provides more info.
If you just want to play around with the BigQuery API, it’s easiest to start with Google’s free sample data. You’ll still need to create a project, but if you’re just playing around, it’s unlikely that you’ll go over the free limit (1 TB of queries / 10 GB of storage).
To create a project:
Open https://console.cloud.google.com/ and create a project. Make a note of the “Project ID” in the “Project info” box.
Click on “APIs & Services”, then “Dashboard” in the left the left menu.
Click on “Enable Apis and Services” at the top of the page, then search for “BigQuery API” and “Cloud storage”.
Use your project ID as the billing
project whenever you
work with free sample data; and as the project
when you
work with your own data.
Please note that the ‘bigrquery’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.