Look up information

Martin Westgate & Dax Kellie

2022-11-30

show_all() & search_all()

As of galah 1.5.0, there are two simplified functions to look up information: show_all() and search_all().

These are individual functions that are able to return all types of information in one place, rather than using specific sub-functions to look up information.

For example, to show all available Living Atlases supported:

show_all(atlases)
## # A tibble: 10 × 4
##    atlas          institution                                                             acronym url                         
##    <chr>          <chr>                                                                   <chr>   <chr>                       
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au      
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityatlas.at
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br        
##  4 Estonia        eElurikkus                                                              <NA>    https://elurikkus.ee        
##  5 France         Inventaire National du Patrimoine Naturel                               INPN    https://inpn.mnhn.fr        
##  6 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob.gt   
##  7 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt         
##  8 Spain          GBIF Spain                                                              GBIF.es https://www.gbif.es         
##  9 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversitydata.se 
## 10 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk

To search for a specific available Living Atlas:

search_all(atlases, "Spain")
## # A tibble: 1 × 4
##   atlas institution acronym url                
##   <chr> <chr>       <chr>   <chr>              
## 1 Spain GBIF Spain  GBIF.es https://www.gbif.es

To show all fields:

show_all(fields)
## # A tibble: 572 × 4
##    id                    description                                                                                    type  link 
##    <chr>                 <chr>                                                                                          <chr> <chr>
##  1 abcdTypeStatus        ABCD field in use by herbaria                                                                  fiel… <NA> 
##  2 acceptedNameUsage     http://rs.tdwg.org/dwc/terms/acceptedNameUsage                                                 fiel… <NA> 
##  3 acceptedNameUsageID   http://rs.tdwg.org/dwc/terms/acceptedNameUsageID                                               fiel… <NA> 
##  4 accessRights          <NA>                                                                                           fiel… <NA> 
##  5 assertionUserId       User ID of the person who has made an assertion about this record                              fiel… <NA> 
##  6 assertions            A list of all assertions (user and system supplied) for a record resulting from data quality … fiel… <NA> 
##  7 associatedMedia       http://rs.tdwg.org/dwc/terms/associatedMedia                                                   fiel… <NA> 
##  8 associatedOccurrences http://rs.tdwg.org/dwc/terms/associatedOccurrences                                             fiel… <NA> 
##  9 associatedOrganisms   http://rs.tdwg.org/dwc/terms/associatedOrganisms                                               fiel… <NA> 
## 10 associatedReferences  http://rs.tdwg.org/dwc/terms/associatedReferences                                              fiel… <NA> 
## # … with 562 more rows

And to search for a specific field:

search_all(fields, "australian states")
## # A tibble: 2 × 4
##   id     description                                                                                                    type  link 
##   <chr>  <chr>                                                                                                          <chr> <chr>
## 1 cl2013 ASGS Australian States and Territories Australian Statistical Geography Standard  Australian States and Terri… laye… http…
## 2 cl22   Australian States and Territories Australian States and Territories                                            laye… http…

Here is a list of information types that can be used with show_all() and search_all():

Information type Description Sub-functions
Configuration
atlases Show what living atlases are available show_all_atlases(), search_atlases()
apis Show what APIs & functions are available for each atlas show_all_apis(), search_apis()
reasons Show what values are acceptable as ‘download reasons’ for a specified atlas show_all_reasons(), search_reasons()
Taxonomy
taxa Search for one or more taxonomic names search_taxa()
identifiers Take a universal identifier and return taxonomic information search_identifiers()
ranks Show valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.) show_all_ranks(), search_ranks())
Filters
fields Show fields that are stored in an atlas show_all_fields(), search_fields()
assertions Show results of data quality checks run by each atlas show_all_assertions(), search_assertions()
Group filters
profiles Show what data quality profiles are available show_all_profiles(), search_profiles()
lists Show what species lists are available show_lists(), search_lists()
Data providers
providers Show which institutions have provided data show_all_providers(), search_providers()
collections Show the specific collections within those institutions show_all_collections(), search_collections()
datasets Shows all the data groupings within those collections show_all_datasets(), search_datasets()

show_all_ subfunctions

While show_all is useful for a variety of cases, you can still call the underlying subfunctions if you prefer. These functions - with the prefix show_all_ - return a tibble doing exactly that; showing all the possible values of the category specified. These functions include:

show_all_ functions require no arguments. Simply call the function and it will return all accepted values as a tibble:

show_all_atlases()
## # A tibble: 10 × 4
##    atlas          institution                                                             acronym url                         
##    <chr>          <chr>                                                                   <chr>   <chr>                       
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au      
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityatlas.at
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br        
##  4 Estonia        eElurikkus                                                              <NA>    https://elurikkus.ee        
##  5 France         Inventaire National du Patrimoine Naturel                               INPN    https://inpn.mnhn.fr        
##  6 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob.gt   
##  7 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt         
##  8 Spain          GBIF Spain                                                              GBIF.es https://www.gbif.es         
##  9 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversitydata.se 
## 10 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk
show_all_reasons()
## # A tibble: 13 × 2
##       id name                            
##    <int> <chr>                           
##  1     0 conservation management/planning
##  2     1 biosecurity management/planning 
##  3     2 environmental assessment        
##  4     3 education                       
##  5     4 scientific research             
##  6     5 collection management           
##  7     6 other                           
##  8     7 ecological research             
##  9     8 systematic research/taxonomy    
## 10    10 testing                         
## 11    11 citizen science                 
## 12    12 restoration/remediation         
## 13    13 species modelling

search_ subfunctions

The second subset of lookup subfunctions use the search_ prefix, and differ from show_all_ in that they require a query to work. They are used to search for detailed information that can’t be summarised across the whole atlas, and include:

Search for a single taxon or multiple taxa by name with search_taxa.

search_taxa("reptilia")
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                            rank  match…¹ kingdom phylum class issues
##   <chr>       <chr>           <chr>                                                       <chr> <chr>   <chr>   <chr>  <chr> <chr> 
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-83… class exactM… Animal… Chord… Rept… noIss…
## # … with abbreviated variable name ¹​match_type
search_taxa("reptilia", "aves", "mammalia", "pisces")
## # A tibble: 4 × 10
##   search_term scientific_name taxon_concept_id                                    rank  match…¹ kingdom phylum class issues verna…²
##   <chr>       <chr>           <chr>                                               <chr> <chr>   <chr>   <chr>  <chr> <chr>  <chr>  
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c… class exactM… Animal… Chord… Rept… noIss… <NA>   
## 2 aves        AVES            https://biodiversity.org.au/afd/taxa/65625205-db74… class exactM… Animal… Chord… Aves  noIss… Birds  
## 3 mammalia    MAMMALIA        https://biodiversity.org.au/afd/taxa/e9e7db31-04df… class exactM… Animal… Chord… Mamm… noIss… Mammals
## 4 pisces      PISCES          https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5… <NA>  exactM… Animal… Chord… <NA>  noIss… Fishes 
## # … with abbreviated variable names ¹​match_type, ²​vernacular_name

Alternatively, search_identifiers is the partner function to search_taxa. If we already know a taxonomic identifier, we can search for which taxa the identifier belongs to with search_identifiers:

search_identifiers("urn:lsid:biodiversity.org.au:afd.taxon:682e1228-5b3c-45ff-833b-550efd40c399")
## # A tibble: 1 × 8
##   scientific_name taxon_concept_id                                                        rank  match…¹ kingdom phylum class issues
##   <chr>           <chr>                                                                   <chr> <chr>   <chr>   <chr>  <chr> <chr> 
## 1 REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c… class taxonI… Animal… Chord… Rept… noIss…
## # … with abbreviated variable name ¹​match_type

Sifting through the output of show_all_fields to find a specific field can be inefficient. Instead, we might wish to use search_fields to look for specific fields that match a search. As with search_taxa, search_fields requires a query to work.

search_fields("date") |> head()
## # A tibble: 6 × 4
##   id            description                                                                                             type  link 
##   <chr>         <chr>                                                                                                   <chr> <chr>
## 1 month         "Month of observation, specimen collection date. http://rs.tdwg.org/dwc/terms/month"                    fiel… <NA> 
## 2 eventDate     "The ISO formatted date of observation, specimen collection date. http://rs.tdwg.org/dwc/terms/eventDa… fiel… <NA> 
## 3 cl10903       "Tenure of Australia's forests (2013) v2.0 Tenure of Australia's forests (2013) v2.0 is a continental … laye… http…
## 4 cl10955       "National Indicative Aggregated Fire Extent Dataset 2019-2020 - v20200324 The National Indicative Aggr… laye… http…
## 5 lastLoadDate   <NA>                                                                                                   fiel… <NA> 
## 6 datePrecision "The precision of the date information for the record. Values include Day, Month, Year, Year range, Mo… fiel… <NA>

show_values() & search_values()

Once a desired field is found, you can use show_values to understand the information contained within that field, e.g.

search_all(fields, "basis") |> show_values()
## ! Search returned 2 matched fields.
## • Showing values for 'basisOfRecord'.
## # A tibble: 9 × 2
##   field         category           
##   <chr>         <chr>              
## 1 basisOfRecord HUMAN_OBSERVATION  
## 2 basisOfRecord PRESERVED_SPECIMEN 
## 3 basisOfRecord OBSERVATION        
## 4 basisOfRecord OCCURRENCE         
## 5 basisOfRecord MACHINE_OBSERVATION
## 6 basisOfRecord MATERIAL_SAMPLE    
## 7 basisOfRecord LIVING_SPECIMEN    
## 8 basisOfRecord MATERIAL_CITATION  
## 9 basisOfRecord FOSSIL_SPECIMEN

This provides the information you need to pass meaningful queries to galah_filter.

galah_call() |> 
  galah_filter(basisOfRecord == "LIVING_SPECIMEN") |> 
  atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 213918

This works for several other types of query, such as data profiles:

search_all(profiles, "ALA") |> 
  show_values() |> 
  head()
## ! Search returned 2 matched profiles.
## • Showing values for 'ALA'.
## # A tibble: 6 × 2
##   description                                                                                                                filter
##   <chr>                                                                                                                      <chr> 
## 1 "Exclude all records where spatial validity is \"false\""                                                                  "-spa…
## 2 "Exclude all records with an assertion that the scientific name provided does not match any of the names lists used by th… "-ass…
## 3 "Exclude all records with an assertion that the scientific name provided is not structured as a valid scientific name. Al… "-ass…
## 4 "Exclude all records with an assertion that the name and classification supplied can't be used to choose between 2 homony… "-ass…
## 5 "Exclude all records with an assertion that kingdom provided doesn't match a known kingdom e.g. Animalia, Plantae"         "-ass…
## 6 "Exclude all records with an assertion that the scientific name provided in the record does not match the expected taxono… "-ass…

Or collections:

search_all(collections, "herbarium") |> 
  show_values() |> 
  head()
## ! Search returned 46 matched collections.
## • Showing values for 'co214'.
## # A tibble: 1 × 17
##   name     acronym uid   phone email pubDe…¹ latit…² longi…³ websi…⁴ alaPu…⁵ dateC…⁶ lastU…⁷ userL…⁸ active numRe…⁹ numRe…˟ geogr…˟
##   <chr>    <chr>   <chr> <chr> <chr> <chr>     <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>    <int>   <int> <chr>  
## 1 Allan H… CHR     co214 +64 … Herb… "The A…   -43.6    172. http:/… https:… 2015-0… 2021-1… not av… Activ…  660000  236000 Worldw…
## # … with abbreviated variable names ¹​pubDescription, ²​latitude, ³​longitude, ⁴​websiteUrl, ⁵​alaPublicUrl, ⁶​dateCreated,
## #   ⁷​lastUpdated, ⁸​userLastModified, ⁹​numRecords, ˟​numRecordsDigitised, ˟​geographicDescription