tidytable
is a data frame manipulation library for users
who need data.table
speed but prefer tidyverse
-like syntax.
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("devtools")
::install_github("markfairbanks/tidytable") devtools
tidytable
replicates tidyverse
syntax but
uses data.table
in the background. In general you can
simply use library(tidytable)
to replace your existing
dplyr
and tidyr
code with the faster
tidytable
equivalents.
A full list of implemented functions can be found here.
library(tidytable)
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
You can use the normal tidyverse
group_by()
/ungroup()
workflow, or you can use
.by
syntax to reduce typing. Using .by
in a
function is shorthand for
df %>% group_by() %>% fn() %>% ungroup()
.
.by = z
.by = c(y, z)
<- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df
%>%
df summarize(avg_z = mean(z),
.by = c(x, y))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
All functions that can operate by group have a .by
argument built in. (mutate()
, filter()
,
summarize()
, etc.)
The above syntax is equivalent to:
%>%
df group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Both options are available for users, so you can use the syntax that you prefer.
tidytable
allows you to select/drop columns just like
you would in the tidyverse by utilizing the tidyselect
package
in the background.
Normal selection can be mixed with all tidyselect
helpers: everything()
, starts_with()
,
ends_with()
, any_of()
, where()
,
etc.
<- data.table(
df a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
%>%
df select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
A full overview of selection options can be found here.
.by
tidyselect
helpers also work when using
.by
:
<- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df
%>%
df summarize(avg_z = mean(z),
.by = where(is.character))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Tidy evaluation can be used to write custom functions with
tidytable
functions. The embracing shortcut
{{ }}
works, or you can use enquo()
with
!!
if you prefer:
<- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))
df
<- function(data, add_col) {
add_one %>%
data mutate(new_col = {{ add_col }} + 1)
}
%>%
df add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 1 5 a 2
#> 3 1 6 b 2
The .data
and .env
pronouns also work
within tidytable
functions:
<- 10
var
%>%
df mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 11
#> 2 1 5 a 11
#> 3 1 6 b 11
A full overview of tidy evaluation can be found here.
dt()
helperThe dt()
function makes regular data.table
syntax pipeable, so you can easily mix tidytable
syntax
with data.table
syntax:
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#> z avg_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 3
For those interested in performance, speed comparisons can be found here.
verb.()
syntaxFor backwards compatibility tidytable
exports
verb.()
versions of functions. This will also allow users
to more easily combine dplyr
and tidytable
functions in one script:
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df mutate.(double_x = x * 2)
#> # A tidytable: 3 × 4
#> x y z double_x
#> <int> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 2 5 a 4
#> 3 3 6 b 6
tidytable
is only possible because of the great
contributions to R by the data.table
and
tidyverse
teams. data.table
is used as the
main data frame engine in the background, while tidyverse
packages like rlang
, vctrs
, and
tidyselect
are heavily relied upon to give users an
experience similar to dplyr
and tidyr
.