Global Historical Climatology Network (GHCN) provides daily climate
measures from stations across the world. prcp_aus
extracts
daily precipitation and minimum temperature from GHCN for 639 stations
in Australia from 2016 to 2020. This is where these stations locate in
an Australia map:
<- st_simplify(ozmaps::abs_ste, dTolerance = 4000)
state_map ggplot() +
geom_sf(data = state_map, inherit.aes = FALSE,
color = "grey80", alpha = 0.4, linetype = 3) +
geom_point(data = prcp_aus , aes(x = long, y = lat)) +
theme_void()
This is a lot of stations to look at for one climate variable and they can’t all fit into a glyph map. What we can do is to group stations into clusters and look at the aggregated series in the glyph map. In this vignette, I will introduce how to perform aggregation using a hierarchical data structure in cubble.
First, let’s summarise the daily data into weekly sum for each station:
prcp_aus#> # cubble: id [663]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: wk [dbl], prcp [dbl]
#> id long lat elev name wmo_id ts
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
#> 1 ASN00001006 128. -15.5 3.8 wyndham aero 95214 <tibble [53 × 2]>
#> 2 ASN00001007 126. -13.8 6 troughton island 94102 <tibble [53 × 2]>
#> 3 ASN00001013 128. -15.5 11 wyndham 94214 <tibble [53 × 2]>
#> 4 ASN00001018 126. -16.4 546 mount elizabeth 94211 <tibble [53 × 2]>
#> 5 ASN00001019 127. -14.3 23 kalumburu 94100 <tibble [53 × 2]>
#> 6 ASN00001020 126. -14.1 51 truscott 95101 <tibble [53 × 2]>
#> 7 ASN00001025 126. -15.4 385 doongan 94215 <tibble [53 × 2]>
#> 8 ASN00002012 128. -18.2 422 halls creek airport 94212 <tibble [53 × 2]>
#> 9 ASN00002032 128. -17.0 203 warmun 94213 <tibble [53 × 2]>
#> 10 ASN00002056 129. -15.8 44 kununurra aero 94216 <tibble [53 × 2]>
#> # … with 653 more rows
First we need to assign each station a cluster number. Here we use a
simple kmean algorithm based on the distance matrix to create 20
clusters. This creates station_nested
as a station level
nested cubble with a cluster column indicating the group each station
belongs to.
station_nested#> # cubble: id [663]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: wk [dbl], prcp [dbl]
#> id long lat elev name wmo_id ts cluster
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <int>
#> 1 ASN00001006 128. -15.5 3.8 wyndham aero 95214 <tibble> 15
#> 2 ASN00001007 126. -13.8 6 troughton island 94102 <tibble> 15
#> 3 ASN00001013 128. -15.5 11 wyndham 94214 <tibble> 15
#> 4 ASN00001018 126. -16.4 546 mount elizabeth 94211 <tibble> 15
#> 5 ASN00001019 127. -14.3 23 kalumburu 94100 <tibble> 15
#> 6 ASN00001020 126. -14.1 51 truscott 95101 <tibble> 15
#> 7 ASN00001025 126. -15.4 385 doongan 94215 <tibble> 15
#> 8 ASN00002012 128. -18.2 422 halls creek airport 94212 <tibble> 15
#> 9 ASN00002032 128. -17.0 203 warmun 94213 <tibble> 15
#> 10 ASN00002056 129. -15.8 44 kununurra aero 94216 <tibble> 15
#> # … with 653 more rows
To create a group level cubble, use switch_key()
with
the new key variable, cluster
:
<- station_nested %>% switch_key(cluster)
cluster_nested %>% head(5)
cluster_nested #> # cubble: cluster [5]: nested form
#> # bbox: [118.84, -43.66, 152.87, -10.05]
#> # temporal: id [chr], wk [dbl], prcp [dbl]
#> cluster .val ts
#> <int> <list> <list>
#> 1 1 <tibble [18 × 6]> <tibble [954 × 3]>
#> 2 2 <tibble [51 × 6]> <tibble [2,677 × 3]>
#> 3 3 <tibble [19 × 6]> <tibble [1,007 × 3]>
#> 4 4 <tibble [6 × 6]> <tibble [318 × 3]>
#> 5 5 <tibble [63 × 6]> <tibble [3,339 × 3]>
The resulted cluster_nested
now has cluster
as the key and all the station level time invariant variables are nested
inside .val
. Currently, there is no cluster level time
invariant variables and we can add the centroid of each cluster by
get_centroid()
:
<- cluster_nested %>% get_centroid())
(cluster_nested #> # cubble: cluster [20]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: id [chr], wk [dbl], prcp [dbl]
#> cluster .val ts hull cent_long cent_lat
#> <int> <list> <list> <list> <dbl> <dbl>
#> 1 1 <tibble [18 × 6]> <tibble [954 × 3]> <tibble> 141. -22.9
#> 2 2 <tibble [51 × 6]> <tibble [2,677 × 3]> <tibble> 147. -41.8
#> 3 3 <tibble [19 × 6]> <tibble [1,007 × 3]> <tibble> 121. -30.6
#> 4 4 <tibble [6 × 6]> <tibble [318 × 3]> <tibble> 143. -11.9
#> 5 5 <tibble [63 × 6]> <tibble [3,339 × 3]> <tibble> 150. -32.3
#> 6 6 <tibble [56 × 6]> <tibble [2,951 × 3]> <tibble> 151. -28.2
#> 7 7 <tibble [21 × 6]> <tibble [1,113 × 3]> <tibble> 145. -29.5
#> 8 8 <tibble [61 × 6]> <tibble [3,233 × 3]> <tibble> 117. -31.2
#> 9 9 <tibble [58 × 6]> <tibble [3,074 × 3]> <tibble> 148. -35.3
#> 10 10 <tibble [55 × 6]> <tibble [2,915 × 3]> <tibble> 145. -37.8
#> 11 11 <tibble [37 × 6]> <tibble [1,961 × 3]> <tibble> 142. -36.1
#> 12 12 <tibble [29 × 6]> <tibble [1,537 × 3]> <tibble> 149. -23.6
#> 13 13 <tibble [16 × 6]> <tibble [848 × 3]> <tibble> 130. -27.3
#> 14 14 <tibble [26 × 6]> <tibble [1,378 × 3]> <tibble> 143. -17.7
#> 15 15 <tibble [19 × 6]> <tibble [1,007 × 3]> <tibble> 125. -17.7
#> 16 16 <tibble [48 × 6]> <tibble [2,544 × 3]> <tibble> 139. -33.9
#> 17 17 <tibble [20 × 6]> <tibble [1,060 × 3]> <tibble> 116. -23.7
#> 18 18 <tibble [17 × 6]> <tibble [901 × 3]> <tibble> 134. -17.3
#> 19 19 <tibble [24 × 6]> <tibble [1,272 × 3]> <tibble> 132. -13.0
#> 20 20 <tibble [19 × 6]> <tibble [1,007 × 3]> <tibble> 137. -30.3
You can also use face_temporal()
to get the cluster
level long cubble:
<- cluster_nested %>% face_temporal(ts))
(cluster_long #> # cubble: wk, cluster [20]: long form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # spatial: id [chr], long [dbl], lat [dbl], elev [dbl], name [chr], wmo_id
#> # [dbl], hull [list], cent_long [dbl], cent_lat [dbl]
#> cluster id wk prcp
#> <int> <chr> <dbl> <dbl>
#> 1 1 ASN00015602 1 256
#> 2 1 ASN00015602 2 1554
#> 3 1 ASN00015602 3 8
#> 4 1 ASN00015602 4 184
#> 5 1 ASN00015602 5 2022
#> 6 1 ASN00015602 6 80
#> 7 1 ASN00015602 7 44
#> 8 1 ASN00015602 8 28
#> 9 1 ASN00015602 9 102
#> 10 1 ASN00015602 10 1658
#> # … with 35,086 more rows
Now we should have access to both station and cluster level in the nested and long form. Let’s summarise them within a diagram:
With these data, we can make a glyph map to understand the precipitation pattern in Australia:
Or to inspect the station membership of each cluster:
ggplot() +
geom_sf(data = state_map, inherit.aes = FALSE,
color = "grey80", alpha = 0.4, linetype = 3) +
geom_point(data = station_nested, aes(x = long, y = lat), size = 0.5) +
::geom_mark_hull(
ggforcedata = cluster_nested %>% tidyr::unnest(hull),
expand = 0, radius = 0,
aes(x = long, y = lat, group = cluster)) +
theme_void()