parquetize
R package that allows to convert databases of different formats (csv, SAS, SPSS, Stata, rds, duckdb, sqlite, JSON, ndJSON) to parquet format in a same function.
::install_github("ddotta/parquetize") remotes
library(parquetize)
# install.packages(c("haven","arrow","curl","readr","dplyr","jsonlite","DBI","duckdb","RSQLite"))
install.packages("parquetize",
repos = "https://nexus.insee.fr/repository/r-local",
type = "source")
This package is a simple wrapper of some very useful functions from the haven, readr, jsonlite, RSQLite, duckdb and arrow packages.
While working, I realized that I was often repeating the same operation when working with parquet files :
As a fervent of the DRY principle (don’t repeat yourself) the exported functions of this package make my life easier and execute these operations within the same function.
The last benefit of using package
{parquetize}
is that its functions allow to create single
parquet files or partitioned files depending on the arguments chosen in
the functions.
For more details, see the documentation and examples :
- table_to_parquet().
- csv_to_parquet().
- json_to_parquet().
- rds_to_parquet().
- sqlite_to_parquet()
- duckdb_to_parquet().
You want to use the Insee file of first names by birth department? Use R and {parquetize} package that takes care of everything: it downloads the data (3.7 million rows) and converts it to parquet format in few seconds !
Feel welcome to contribute to add features that you find useful in your daily work. Ideas are welcomed in the issues.