Each occurrence record contains taxonomic information and information about the observation itself, like its location and the date of observation. These pieces of information are recorded and categorised into respective fields. When you import data using galah, columns of the resulting tibble correspond to these fields.

Data fields are important because they provide a means to manipulate queries to return only the information that you need, and no more. Consequently, much of the architecture of galah has been designed to make narrowing as simple as possible. These functions include:

These names have been chosen to echo comparable functions from dplyr; namely filter, select and group_by. With the exception of galah_geolocate, they also use dplyr tidy evaluation and syntax. This means that how you use dplyr functions is also how you use galah_ functions.

galah_identify & search_taxa

Perhaps unsurprisingly, search_taxa searches for taxonomic information. It uses fuzzy matching to work a lot like the search bar on the Atlas of Living Australia website, and you can use it to search for taxa by their scientific name. Finding your desired taxon with search_taxa is an important step to using this taxonomic information to download data with galah.

For example, to search for reptiles, we first need to identify whether we have the correct query:

search_taxa("Reptilia")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                                          rank  match_type kingdom  phylum   class    issues 
##   <chr>       <chr>           <chr>                                                                     <chr> <chr>      <chr>    <chr>    <chr>    <chr>  
## 1 Reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue

If we want to be more specific by providing additional taxonomic information to search_taxa, you can provide a data.frame containing more levels of the taxonomic hierarchy:

search_taxa(data.frame(genus = "Eolophus", kingdom = "Aves"))
## # A tibble: 1 × 13
##   search_term   scientific_name scientific_name_authorship taxon_concept_id                                  rank  match…¹ kingdom phylum class order family genus issues
##   <chr>         <chr>           <chr>                      <chr>                                             <chr> <chr>   <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr> 
## 1 Eolophus_Aves Eolophus        Bonaparte, 1854            https://biodiversity.org.au/afd/taxa/009169a9-a9… genus exactM… Animal… Chord… Aves  Psit… Cacat… Eolo… noIss…
## # … with abbreviated variable name ¹​match_type

Once we know that our search matches the correct taxon or taxa, we can use galah_identify to narrow the results of our queries:

galah_call() |>
  galah_identify("Reptilia") |>
  atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 1482850
taxa <- search_taxa(data.frame(genus = "Eolophus", kingdom = "Aves"))

galah_call() |>
 galah_identify(taxa) |>
 atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 948944

If you’re using an international atlas, search_taxa will automatically switch to using the local name-matching service. For example, Portugal uses the GBIF taxonomic backbone, but integrates seamlessly with our standard workflow.

galah_config(atlas = "Portugal")
## Atlas selected: GBIF Portugal (GBIF.pt) [Portugal]
galah_call() |> 
  galah_identify("Lepus") |> 
  galah_group_by(species) |> 
  atlas_counts()
## # A tibble: 5 × 2
##   species           count
##   <chr>             <int>
## 1 Lepus granatensis  1378
## 2 Lepus microtis       64
## 3 Lepus europaeus      10
## 4 Lepus saxatilis       2
## 5 Lepus capensis        1

Conversely, the UK’s National Biodiversity Network (NBN), has its’ own taxonomic backbone, but is supported using the same function call.

galah_config(atlas = "United Kingdom")
## Atlas selected: National Biodiversity Network (NBN) [United Kingdom]
galah_call() |> 
  galah_identify("Bufo") |> 
  galah_group_by(species) |> 
  atlas_counts()
## # A tibble: 2 × 2
##   species       count
##   <chr>         <int>
## 1 Bufo bufo     75241
## 2 Bufo spinosus     1
## Atlas selected: Atlas of Living Australia (ALA) [Australia]

galah_filter

Perhaps the most important function in galah is galah_filter, which is used to filter the rows of queries:

# Get total record count since 2000
galah_call() |>
  galah_filter(year > 2000) |>
  atlas_counts()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 72911598
# Get total record count for iNaturalist in 2021
galah_call() |>
  galah_filter(
    year > 2000,
    dataResourceName == "iNaturalist Australia") |>
  atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 4055078

To find available fields and corresponding valid values, use the field lookup functions show_all(fields), search_all(fields) & show_values().

Finally, a special case of galah_filter is to make more complex taxonomic queries than are possible using search_taxa. By using the taxonConceptID field, it is possible to build queries that exclude certain taxa, for example. This can be useful for paraphyletic concepts such as invertebrates:

galah_call() |>
  galah_filter(
     taxonConceptID == search_taxa("Animalia")$taxon_concept_id,
     taxonConceptID != search_taxa("Chordata")$taxon_concept_id
  ) |>
  galah_group_by(class) |>
  atlas_counts()
## # A tibble: 83 × 2
##    class          count
##    <chr>          <int>
##  1 Insecta      4114912
##  2 Gastropoda    883073
##  3 Arachnida     573491
##  4 Malacostraca  562455
##  5 Maxillopoda   424009
##  6 Polychaeta    257786
##  7 Bivalvia      216454
##  8 Anthozoa      170662
##  9 Demospongiae  113677
## 10 Ostracoda      59271
## # … with 73 more rows

galah_apply_profile

When working with the ALA, a notable feature is the ability to specify a profile to remove records that are suspect in some way.

galah_call() |>
  galah_filter(year > 2000) |>
  galah_apply_profile(ALA) |>
  atlas_counts()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 65921857

To see a full list of data quality profiles, use show_all(profiles).

galah_group_by

Use galah_group_by to group record counts and summarise counts by specified fields:

# Get record counts since 2010, grouped by year and basis of record
galah_call() |>
  galah_filter(year > 2015 & year <= 2020) |>
  galah_group_by(year, basisOfRecord) |>
  atlas_counts()
## # A tibble: 25 × 3
##    basisOfRecord     year    count
##    <chr>             <chr>   <int>
##  1 HUMAN_OBSERVATION 2020  6309037
##  2 HUMAN_OBSERVATION 2019  5516864
##  3 HUMAN_OBSERVATION 2018  5219597
##  4 HUMAN_OBSERVATION 2017  4313276
##  5 HUMAN_OBSERVATION 2016  3483039
##  6 OCCURRENCE        2016   165997
##  7 OCCURRENCE        2018   116242
##  8 OCCURRENCE        2017   102206
##  9 OCCURRENCE        2019    91640
## 10 OCCURRENCE        2020    39429
## # … with 15 more rows

galah_select

Use galah_select to choose which columns are returned when downloading records:

# Get *Reptilia* records from 1930, but only 'eventDate' and 'kingdom' columns
occurrences <- galah_call() |>
  galah_identify("reptilia") |>
  galah_filter(year == 1930) |>
  galah_select(eventDate, kingdom) |>
  atlas_occurrences()

occurrences |> head()
## # A tibble: 6 × 2
##   eventDate           kingdom 
##   <dttm>              <chr>   
## 1 1929-12-31 14:00:00 Animalia
## 2 1929-12-31 14:00:00 Animalia
## 3 1929-12-31 14:00:00 Animalia
## 4 1929-12-31 14:00:00 Animalia
## 5 1929-12-31 14:00:00 Animalia
## 6 1929-12-31 14:00:00 Animalia

You can also use other dplyr functions that work with dplyr::select() with galah_select()

occurrences <- galah_call() |>
  galah_identify("reptilia") |>
  galah_filter(year == 1930) |>
  galah_select(starts_with("elev") & ends_with("n")) |>
  atlas_occurrences()

occurrences |> head()
## # A tibble: 6 × 55
##   recor…¹ catal…² taxon…³ verba…⁴ raw_v…⁵ scien…⁶ taxon…⁷ verna…⁸ kingdom phylum class order family genus species subsp…⁹ dataR…˟ insti…˟ insti…˟ colle…˟ colle…˟ dcter…˟
##   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
## 1 050d4c… J3729   https:… Oxyura… coasta… Oxyura… species Taipan  Animal… Chord… Rept… Squa… Elapi… Oxyu… Oxyura… <NA>    dr1132  <NA>    <NA>    <NA>    <NA>    CC-BY …
## 2 0aee0e… 391391  https:… Tympan… Lined … Tympan… species Grassl… Animal… Chord… Rept… Squa… Agami… Tymp… Tympan… <NA>    dr361   <NA>    <NA>    <NA>    <NA>    CC-BY …
## 3 0cbfa7… R36835  https:… Natrix… <NA>    COLUBR… family  <NA>    Animal… Chord… Rept… Squa… Colub… <NA>  <NA>    <NA>    dr346   in22    South … co125   South … CC-BY  
## 4 0fb28f… <NA>    https:… Notech… easter… Notech… species Tiger … Animal… Chord… Rept… Squa… Elapi… Note… Notech… <NA>    dr1132  <NA>    <NA>    <NA>    <NA>    CC-BY …
## 5 15e65c… 34102   https:… Emydur… Murray… Emydur… subspe… Macqua… Animal… Chord… Rept… Test… Cheli… Emyd… Emydur… Emydur… dr1132  <NA>    <NA>    <NA>    <NA>    CC-BY …
## 6 170fbb… 77798   https:… Deniso… orname… Deniso… species Orname… Animal… Chord… Rept… Squa… Elapi… Deni… Deniso… <NA>    dr1132  <NA>    <NA>    <NA>    <NA>    CC-BY …
## # … with 33 more variables: institutionCode <chr>, collectionCode <chr>, locality <chr>, verbatimLatitude <dbl>, verbatimLongitude <dbl>,
## #   verbatimCoordinateSystem <chr>, decimalLatitude <dbl>, decimalLongitude <dbl>, coordinatePrecision <dbl>, coordinateUncertaintyInMeters <dbl>, country <chr>,
## #   stateProvince <chr>, cl959 <chr>, cl21 <chr>, cl1048 <chr>, minimumElevationInMeters <lgl>, maximumElevationInMeters <lgl>, minimumDepthInMeters <lgl>,
## #   maximumDepthInMeters <lgl>, individualCount <dbl>, recordedBy <chr>, year <dbl>, month <dbl>, day <dbl>, eventDate <dttm>, verbatimBasisOfRecord <chr>,
## #   basisOfRecord <chr>, occurrenceStatus <chr>, raw_sex <chr>, preparations <chr>, informationWithheld <lgl>, dataGeneralizations <lgl>, spatiallyValid <lgl>, and
## #   abbreviated variable names ¹​recordID, ²​catalogNumber, ³​taxonConceptID, ⁴​verbatimScientificName, ⁵​raw_vernacularName, ⁶​scientificName, ⁷​taxonRank, ⁸​vernacularName,
## #   ⁹​subspecies, ˟​dataResourceUid, ˟​institutionUid, ˟​institutionName, ˟​collectionUid, ˟​collectionName, ˟​`dcterms:license`

galah_geolocate

Use galah_geolocate to specify a geographic area or region to limit your search:

# Get list of perameles species only in area specified:
# (Note: This can also be specified by a shapefile)
wkt <- "POLYGON((131.36328125 -22.506468769126,135.23046875 -23.396716654542,134.17578125 -27.287832521411,127.40820312499 -26.661206402316,128.111328125 -21.037340349154,131.36328125 -22.506468769126))"

galah_call() |>
  galah_identify("perameles") |>
  galah_geolocate(wkt) |>
  atlas_species()
## # A tibble: 2 × 10
##   kingdom  phylum   class    order           family      genus     species                author               species_guid                                       verna…¹
##   <chr>    <chr>    <chr>    <chr>           <chr>       <chr>     <chr>                  <chr>                <chr>                                              <chr>  
## 1 Animalia Chordata Mammalia Peramelemorphia Peramelidae Perameles Perameles eremiana     Spencer, 1897        https://biodiversity.org.au/afd/taxa/59459401-eaa… Desert…
## 2 Animalia Chordata Mammalia Peramelemorphia Peramelidae Perameles Perameles bougainville Quoy & Gaimard, 1824 https://biodiversity.org.au/afd/taxa/dbd00fc7-ecd… Shark …
## # … with abbreviated variable name ¹​vernacular_name

galah_down_to

Use galah_down_to to specify the lowest taxonomic level to contruct a taxonomic tree:

galah_call() |>
  galah_identify("fungi") |>
  galah_down_to(phylum) |>
  atlas_taxonomy()
##                    levelName
## 1  Fungi                    
## 2   ¦--Dikarya              
## 3   ¦   °--Entorrhizomycota 
## 4   ¦--Ascomycota           
## 5   ¦--Basidiomycota        
## 6   ¦--Blastocladiomycota   
## 7   ¦--Chytridiomycota      
## 8   ¦--Cryptomycota         
## 9   ¦--Glomeromycota        
## 10  ¦--Microspora           
## 11  ¦--Microsporidia        
## 12  ¦--Mucoromycota         
## 13  ¦--Neocallimastigomycota
## 14  ¦--Zoopagomycota        
## 15  °--Zygomycota