Each occurrence record contains taxonomic information and information
about the observation itself, like its location and the date of
observation. These pieces of information are recorded and categorised
into respective fields. When you import data using
galah
, columns of the resulting tibble
correspond to these fields.
Data fields are important because they provide a means to manipulate
queries to return only the information that you need, and no more.
Consequently, much of the architecture of galah
has been
designed to make narrowing as simple as possible. These functions
include:
galah_identify
or identify
galah_filter
or filter
galah_select
or select
galah_group_by
or group_by
galah_geolocate
or st_crop
galah_down_to
These names have been chosen to echo comparable functions from
dplyr
; namely filter
, select
and
group_by
. With the exception of
galah_geolocate
, they also use dplyr
tidy
evaluation and syntax. This means that how you use dplyr
functions is also how you use galah_
functions.
Perhaps unsurprisingly, search_taxa
searches for
taxonomic information. It uses fuzzy matching to work a lot like the
search bar on the Atlas of Living
Australia website, and you can use it to search for taxa by their
scientific name. Finding your desired taxon with
search_taxa
is an important step to using this taxonomic
information to download data with galah
.
For example, to search for reptiles, we first need to identify whether we have the correct query:
search_taxa("Reptilia")
## # A tibble: 1 × 9
## search_term scientific_name taxon_concept_id rank match_type kingdom phylum class issues
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue
If we want to be more specific by providing additional taxonomic
information to search_taxa
, you can provide a
data.frame
containing more levels of the taxonomic
hierarchy:
search_taxa(data.frame(genus = "Eolophus", kingdom = "Aves"))
## # A tibble: 1 × 13
## search_term scientific_name scientific_name_authorship taxon_concept_id rank match…¹ kingdom phylum class order family genus issues
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Eolophus_Aves Eolophus Bonaparte, 1854 https://biodiversity.org.au/afd/taxa/009169a9-a9… genus exactM… Animal… Chord… Aves Psit… Cacat… Eolo… noIss…
## # … with abbreviated variable name ¹match_type
Once we know that our search matches the correct taxon or taxa, we
can use galah_identify
to narrow the results of our
queries:
galah_call() |>
galah_identify("Reptilia") |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 1482850
taxa <- search_taxa(data.frame(genus = "Eolophus", kingdom = "Aves"))
galah_call() |>
galah_identify(taxa) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 948944
If you’re using an international atlas, search_taxa
will
automatically switch to using the local name-matching service. For
example, Portugal uses the GBIF taxonomic backbone, but integrates
seamlessly with our standard workflow.
galah_config(atlas = "Portugal")
## Atlas selected: GBIF Portugal (GBIF.pt) [Portugal]
galah_call() |>
galah_identify("Lepus") |>
galah_group_by(species) |>
atlas_counts()
## # A tibble: 5 × 2
## species count
## <chr> <int>
## 1 Lepus granatensis 1378
## 2 Lepus microtis 64
## 3 Lepus europaeus 10
## 4 Lepus saxatilis 2
## 5 Lepus capensis 1
Conversely, the UK’s National Biodiversity Network (NBN), has its’ own taxonomic backbone, but is supported using the same function call.
galah_config(atlas = "United Kingdom")
## Atlas selected: National Biodiversity Network (NBN) [United Kingdom]
galah_call() |>
galah_identify("Bufo") |>
galah_group_by(species) |>
atlas_counts()
## # A tibble: 2 × 2
## species count
## <chr> <int>
## 1 Bufo bufo 75241
## 2 Bufo spinosus 1
## Atlas selected: Atlas of Living Australia (ALA) [Australia]
Perhaps the most important function in galah
is
galah_filter
, which is used to filter the rows of
queries:
# Get total record count since 2000
galah_call() |>
galah_filter(year > 2000) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 72911598
# Get total record count for iNaturalist in 2021
galah_call() |>
galah_filter(
year > 2000,
dataResourceName == "iNaturalist Australia") |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 4055078
To find available fields and corresponding valid values, use the
field lookup functions show_all(fields)
,
search_all(fields)
& show_values()
.
Finally, a special case of galah_filter
is to make more
complex taxonomic queries than are possible using
search_taxa
. By using the taxonConceptID
field, it is possible to build queries that exclude certain taxa, for
example. This can be useful for paraphyletic concepts such as
invertebrates:
galah_call() |>
galah_filter(
taxonConceptID == search_taxa("Animalia")$taxon_concept_id,
taxonConceptID != search_taxa("Chordata")$taxon_concept_id
) |>
galah_group_by(class) |>
atlas_counts()
## # A tibble: 83 × 2
## class count
## <chr> <int>
## 1 Insecta 4114912
## 2 Gastropoda 883073
## 3 Arachnida 573491
## 4 Malacostraca 562455
## 5 Maxillopoda 424009
## 6 Polychaeta 257786
## 7 Bivalvia 216454
## 8 Anthozoa 170662
## 9 Demospongiae 113677
## 10 Ostracoda 59271
## # … with 73 more rows
When working with the ALA, a notable feature is the ability to
specify a profile
to remove records that are suspect in
some way.
galah_call() |>
galah_filter(year > 2000) |>
galah_apply_profile(ALA) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 65921857
To see a full list of data quality profiles, use
show_all(profiles)
.
Use galah_group_by
to group record counts and summarise
counts by specified fields:
# Get record counts since 2010, grouped by year and basis of record
galah_call() |>
galah_filter(year > 2015 & year <= 2020) |>
galah_group_by(year, basisOfRecord) |>
atlas_counts()
## # A tibble: 25 × 3
## basisOfRecord year count
## <chr> <chr> <int>
## 1 HUMAN_OBSERVATION 2020 6309037
## 2 HUMAN_OBSERVATION 2019 5516864
## 3 HUMAN_OBSERVATION 2018 5219597
## 4 HUMAN_OBSERVATION 2017 4313276
## 5 HUMAN_OBSERVATION 2016 3483039
## 6 OCCURRENCE 2016 165997
## 7 OCCURRENCE 2018 116242
## 8 OCCURRENCE 2017 102206
## 9 OCCURRENCE 2019 91640
## 10 OCCURRENCE 2020 39429
## # … with 15 more rows
Use galah_select
to choose which columns are returned
when downloading records:
# Get *Reptilia* records from 1930, but only 'eventDate' and 'kingdom' columns
occurrences <- galah_call() |>
galah_identify("reptilia") |>
galah_filter(year == 1930) |>
galah_select(eventDate, kingdom) |>
atlas_occurrences()
occurrences |> head()
## # A tibble: 6 × 2
## eventDate kingdom
## <dttm> <chr>
## 1 1929-12-31 14:00:00 Animalia
## 2 1929-12-31 14:00:00 Animalia
## 3 1929-12-31 14:00:00 Animalia
## 4 1929-12-31 14:00:00 Animalia
## 5 1929-12-31 14:00:00 Animalia
## 6 1929-12-31 14:00:00 Animalia
You can also use other dplyr
functions that work with
dplyr::select()
with galah_select()
occurrences <- galah_call() |>
galah_identify("reptilia") |>
galah_filter(year == 1930) |>
galah_select(starts_with("elev") & ends_with("n")) |>
atlas_occurrences()
occurrences |> head()
## # A tibble: 6 × 55
## recor…¹ catal…² taxon…³ verba…⁴ raw_v…⁵ scien…⁶ taxon…⁷ verna…⁸ kingdom phylum class order family genus species subsp…⁹ dataR…˟ insti…˟ insti…˟ colle…˟ colle…˟ dcter…˟
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 050d4c… J3729 https:… Oxyura… coasta… Oxyura… species Taipan Animal… Chord… Rept… Squa… Elapi… Oxyu… Oxyura… <NA> dr1132 <NA> <NA> <NA> <NA> CC-BY …
## 2 0aee0e… 391391 https:… Tympan… Lined … Tympan… species Grassl… Animal… Chord… Rept… Squa… Agami… Tymp… Tympan… <NA> dr361 <NA> <NA> <NA> <NA> CC-BY …
## 3 0cbfa7… R36835 https:… Natrix… <NA> COLUBR… family <NA> Animal… Chord… Rept… Squa… Colub… <NA> <NA> <NA> dr346 in22 South … co125 South … CC-BY
## 4 0fb28f… <NA> https:… Notech… easter… Notech… species Tiger … Animal… Chord… Rept… Squa… Elapi… Note… Notech… <NA> dr1132 <NA> <NA> <NA> <NA> CC-BY …
## 5 15e65c… 34102 https:… Emydur… Murray… Emydur… subspe… Macqua… Animal… Chord… Rept… Test… Cheli… Emyd… Emydur… Emydur… dr1132 <NA> <NA> <NA> <NA> CC-BY …
## 6 170fbb… 77798 https:… Deniso… orname… Deniso… species Orname… Animal… Chord… Rept… Squa… Elapi… Deni… Deniso… <NA> dr1132 <NA> <NA> <NA> <NA> CC-BY …
## # … with 33 more variables: institutionCode <chr>, collectionCode <chr>, locality <chr>, verbatimLatitude <dbl>, verbatimLongitude <dbl>,
## # verbatimCoordinateSystem <chr>, decimalLatitude <dbl>, decimalLongitude <dbl>, coordinatePrecision <dbl>, coordinateUncertaintyInMeters <dbl>, country <chr>,
## # stateProvince <chr>, cl959 <chr>, cl21 <chr>, cl1048 <chr>, minimumElevationInMeters <lgl>, maximumElevationInMeters <lgl>, minimumDepthInMeters <lgl>,
## # maximumDepthInMeters <lgl>, individualCount <dbl>, recordedBy <chr>, year <dbl>, month <dbl>, day <dbl>, eventDate <dttm>, verbatimBasisOfRecord <chr>,
## # basisOfRecord <chr>, occurrenceStatus <chr>, raw_sex <chr>, preparations <chr>, informationWithheld <lgl>, dataGeneralizations <lgl>, spatiallyValid <lgl>, and
## # abbreviated variable names ¹recordID, ²catalogNumber, ³taxonConceptID, ⁴verbatimScientificName, ⁵raw_vernacularName, ⁶scientificName, ⁷taxonRank, ⁸vernacularName,
## # ⁹subspecies, ˟dataResourceUid, ˟institutionUid, ˟institutionName, ˟collectionUid, ˟collectionName, ˟`dcterms:license`
Use galah_geolocate
to specify a geographic area or
region to limit your search:
# Get list of perameles species only in area specified:
# (Note: This can also be specified by a shapefile)
wkt <- "POLYGON((131.36328125 -22.506468769126,135.23046875 -23.396716654542,134.17578125 -27.287832521411,127.40820312499 -26.661206402316,128.111328125 -21.037340349154,131.36328125 -22.506468769126))"
galah_call() |>
galah_identify("perameles") |>
galah_geolocate(wkt) |>
atlas_species()
## # A tibble: 2 × 10
## kingdom phylum class order family genus species author species_guid verna…¹
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Animalia Chordata Mammalia Peramelemorphia Peramelidae Perameles Perameles eremiana Spencer, 1897 https://biodiversity.org.au/afd/taxa/59459401-eaa… Desert…
## 2 Animalia Chordata Mammalia Peramelemorphia Peramelidae Perameles Perameles bougainville Quoy & Gaimard, 1824 https://biodiversity.org.au/afd/taxa/dbd00fc7-ecd… Shark …
## # … with abbreviated variable name ¹vernacular_name
Use galah_down_to
to specify the lowest taxonomic level
to contruct a taxonomic tree:
galah_call() |>
galah_identify("fungi") |>
galah_down_to(phylum) |>
atlas_taxonomy()
## levelName
## 1 Fungi
## 2 ¦--Dikarya
## 3 ¦ °--Entorrhizomycota
## 4 ¦--Ascomycota
## 5 ¦--Basidiomycota
## 6 ¦--Blastocladiomycota
## 7 ¦--Chytridiomycota
## 8 ¦--Cryptomycota
## 9 ¦--Glomeromycota
## 10 ¦--Microspora
## 11 ¦--Microsporidia
## 12 ¦--Mucoromycota
## 13 ¦--Neocallimastigomycota
## 14 ¦--Zoopagomycota
## 15 °--Zygomycota