Explanation of the cubble data structure

This vignette provides some details on the cubble class. Here is a diagram that summarises the attributes associated with a nested and long cubble:

Here the attributes can be divided into four categories:

Now we look into these attributes through two examples, one for nested cubble and one for long cubble:

Nested cubble

A nested cubble is built on top of a rowwise_df, which comes with the groups attributes, as well as the usual names, row.names, and class. Note that the class attribute contains the history class that a rowwise_df is built upon: tbl_df, tbl and data.frame. This is an example of the attributes for a nested cubble:

climate_small <-  climate_flat %>%  
  filter(date %in% as.Date(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04"))) 

nested <- climate_small%>%  
  as_cubble(key = id, index = date, coords = c(long, lat))
nested
#> # cubble:   id [5]: nested form
#> # bbox:     [115.97, -32.94, 133.55, -12.42]
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#>   id            lat  long  elev name           wmo_id ts              
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl> <list>          
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610 <tibble [4 × 4]>
#> 2 ASN00010311 -31.9  117. 179   york            94623 <tibble [4 × 4]>
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627 <tibble [4 × 4]>
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120 <tibble [4 × 4]>
#> 5 ASN00015131 -17.6  134. 220   elliott         94236 <tibble [4 × 4]>
attributes(nested)
#> $class
#> [1] "cubble_df"  "rowwise_df" "tbl_df"     "tbl"        "data.frame"
#> 
#> $row.names
#> [1] 1 2 3 4 5
#> 
#> $groups
#> # A tibble: 5 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00009021         [1]
#> 2 ASN00010311         [1]
#> 3 ASN00010614         [1]
#> 4 ASN00014015         [1]
#> 5 ASN00015131         [1]
#> 
#> $names
#> [1] "id"     "lat"    "long"   "elev"   "name"   "wmo_id" "ts"    
#> 
#> $index
#> [1] "date"
#> attr(,"ordered")
#> [1] TRUE
#> 
#> $coords
#> [1] "long" "lat" 
#> 
#> $form
#> [1] "nested"

In this example, four stations are recorded at four time points: 1st to 4th Jan 2020. The key specified is mapped to the groups while index and coords are stored as the variable name in a cubble. This attribute will be the same as key most of the time but a useful piece of information to record later in the hierarchical data structure.

The rowwise_df class uses a group attribute to ensure each row is in its own group and this structure makes it simpler to calculate on the list. For example calculating the number of non-raining day can be done by:

nested %>%  
  mutate(rain = sum(ts$prcp != 0, na.rm = TRUE))
#> # cubble:   id [5]: nested form
#> # bbox:     [115.97, -32.94, 133.55, -12.42]
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#>   id            lat  long  elev name           wmo_id ts                rain
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl> <list>           <int>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610 <tibble [4 × 4]>     1
#> 2 ASN00010311 -31.9  117. 179   york            94623 <tibble [4 × 4]>     0
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627 <tibble [4 × 4]>     0
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120 <tibble [4 × 4]>     2
#> 5 ASN00015131 -17.6  134. 220   elliott         94236 <tibble [4 × 4]>     3

which matches the basic mutate style of calculation, so it is easier to remember than the purrr style syntax:

climate_small %>%  
  tidyr::nest(c(date, prcp, tmax, tmin)) %>%  
  mutate(rain = purrr::map_dbl(data, ~sum(.x$prcp != 0, na.rm= TRUE)))
#> # A tibble: 5 × 8
#>   id            lat  long  elev name           wmo_id data              rain
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl> <list>           <dbl>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610 <tibble [4 × 4]>     1
#> 2 ASN00010311 -31.9  117. 179   york            94623 <tibble [4 × 4]>     0
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627 <tibble [4 × 4]>     0
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120 <tibble [4 × 4]>     2
#> 5 ASN00015131 -17.6  134. 220   elliott         94236 <tibble [4 × 4]>     3

Long cubble

A long form cubble is built from the grouped_df class where observations from the same site forms a group. Below prints a long cubble along with its attributes:

long <- nested %>%  face_temporal()
long
#> # cubble:  date, id [5]: long form
#> # bbox:    [115.97, -32.94, 133.55, -12.42]
#> # spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
#>    id          date        prcp  tmax  tmin
#>    <chr>       <date>     <dbl> <dbl> <dbl>
#>  1 ASN00009021 2020-01-01     0  31.9  15.3
#>  2 ASN00009021 2020-01-02     0  24.9  16.4
#>  3 ASN00009021 2020-01-03     6  23.2  13  
#>  4 ASN00009021 2020-01-04     0  28.4  12.4
#>  5 ASN00010311 2020-01-01     0  38    16.4
#>  6 ASN00010311 2020-01-02     0  30    15.5
#>  7 ASN00010311 2020-01-03     0  25.2  10.5
#>  8 ASN00010311 2020-01-04     0  26.3   7  
#>  9 ASN00010614 2020-01-01     0  34    14  
#> 10 ASN00010614 2020-01-02     0  27.5  15.2
#> 11 ASN00010614 2020-01-03     0  21    10.9
#> 12 ASN00010614 2020-01-04     0  24.9   6.5
#> 13 ASN00014015 2020-01-01    58  34.4  23.6
#> 14 ASN00014015 2020-01-02    32  34.5  25.9
#> 15 ASN00014015 2020-01-03     0  34.7  26.3
#> 16 ASN00014015 2020-01-04     0  35    26.6
#> 17 ASN00015131 2020-01-01    18  40.1  24.6
#> 18 ASN00015131 2020-01-02    32  40.6  27.6
#> 19 ASN00015131 2020-01-03    34  41.2  24  
#> 20 ASN00015131 2020-01-04     0  40.9  26.8
attributes(long)
#> $class
#> [1] "cubble_df"  "grouped_df" "tbl_df"     "tbl"        "data.frame"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
#> 
#> $names
#> [1] "id"   "date" "prcp" "tmax" "tmin"
#> 
#> $groups
#> # A tibble: 5 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00009021         [4]
#> 2 ASN00010311         [4]
#> 3 ASN00010614         [4]
#> 4 ASN00014015         [4]
#> 5 ASN00015131         [4]
#> 
#> $index
#> [1] "date"
#> attr(,"ordered")
#> [1] TRUE
#> 
#> $spatial
#> # A tibble: 5 × 6
#> # Rowwise:  id
#>   id            lat  long  elev name           wmo_id
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610
#> 2 ASN00010311 -31.9  117. 179   york            94623
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120
#> 5 ASN00015131 -17.6  134. 220   elliott         94236
#> 
#> $coords
#> [1] "long" "lat" 
#> 
#> $form
#> [1] "long"

The spatial attribute records the time-invariant variables in the data, so that when switch back to the nested form, these variables won’t get lost.

Switching between the nested and long cubble

Access cubble attributes

Apart from the usual %@% or attr(DATA, "ATTRIBUTE") to extract class attributes, cubble provides functions with the corresponding attribute name for easier extraction: index(), key_vars(), key_data(), coords(), and spatial() can be used to extract the relevant component in both forms:

key_vars(long)
#> [1] "id"
key_data(long)
#> # A tibble: 5 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00009021         [4]
#> 2 ASN00010311         [4]
#> 3 ASN00010614         [4]
#> 4 ASN00014015         [4]
#> 5 ASN00015131         [4]
spatial(long)
#> # A tibble: 5 × 6
#> # Rowwise:  id
#>   id            lat  long  elev name           wmo_id
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610
#> 2 ASN00010311 -31.9  117. 179   york            94623
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120
#> 5 ASN00015131 -17.6  134. 220   elliott         94236