Welcome to rfishbase 4
. This is the fourth rewrite of
the original rfishbase
package described in Boettiger et
al. (2012).
rfishbase 1.0
relied on parsing of XML pages served
directly from Fishbase.org.rfishbase 2.0
relied on calls to a ruby-based API,
fishbaseapi
, that provided access to SQL snapshots of about
20 of the more popular tables in FishBase or SeaLifeBase.rfishbase 3.0
side-stepped the API by making queries
which directly downloaded compressed csv tables from a static web host.
This substantially improved performance a reliability, particularly for
large queries. The release largely remained backwards compatible with
2.0, and added more tables.rfishbase 4.0
extends the static model and interface.
Static tables are distributed in parquet and accessed through a
provenance-based identifier. While old functions are retained, a new
interface is introduced to provide easy access to all fishbase
tables.We welcome any feedback, issues or questions that users may encounter through our issues tracker on GitHub: https://github.com/ropensci/rfishbase/issues
::install_github("ropensci/rfishbase") remotes
library("rfishbase")
library("dplyr") # convenient but not required
All fishbase tables can be accessed by name using the
fb_tbl()
function:
fb_tbl("ecosystem")
# A tibble: 155,792 × 18
autoctr E_CODE EcosystemRefno Speccode Stockcode Status CurrentPresence
<int> <int> <int> <int> <int> <chr> <chr>
1 1 1 50628 549 565 native Present
2 2 1 189 552 568 native Present
3 3 1 189 554 570 native Present
4 4 1 79732 873 889 native Present
5 5 1 5217 948 964 native Present
6 7 1 39852 956 972 native Present
7 8 1 39852 957 973 native Present
8 9 1 39852 958 974 native Present
9 10 1 188 1526 1719 native Present
10 11 1 188 1626 1819 native Present
# … with 155,782 more rows, and 11 more variables: Abundance <chr>,
# LifeStage <chr>, Remarks <chr>, Entered <int>, Dateentered <dttm>,
# Modified <int>, Datemodified <dttm>, Expert <int>, Datechecked <dttm>,
# WebURL <chr>, TS <dttm>
You can see all the tables using fb_tables()
to see a
list of all the table names (specify sealifebase
if
desired). Careful, there are a lot of them! The fishbase databases have
grown a lot in the decades, and were not intended to be used directly by
most end-users, so you may have considerable work to determine what’s
what. Keep in mind that many variables can be estimated in different
ways (e.g. trophic level), and thus may report different values in
different tables. Also note that species is name (or SpecCode) is not
always the primary key for a table – many tables are specific to stocks
or even individual samples, and some tables are reference lists that are
not species focused at all, but meant to be joined to other tables
(faoareas
, etc). Compare tables against what you see on
fishbase.org, or ask on our issues forum for advice!
<- c("Oreochromis niloticus", "Salmo trutta")
fish
fb_tbl("species") %>%
mutate(sci_name = paste(Genus, Species)) %>%
filter(sci_name %in% fish) %>%
select(sci_name, FBname, Length)
# A tibble: 2 × 3
sci_name FBname Length
<chr> <chr> <dbl>
1 Oreochromis niloticus Nile tilapia 60
2 Salmo trutta Sea trout 140
SeaLifeBase.org is maintained by the same organization and largely
parallels the database structure of Fishbase. As such, almost all
rfishbase
functions can instead be instructed to address
the
fb_tbl("species", "sealifebase")
# A tibble: 97,220 × 109
SpecCode Genus Species Author SpeciesRefNo FBname FamCode Subfamily GenCode
<int> <chr> <chr> <chr> <int> <chr> <int> <chr> <int>
1 32307 Aapto… americ… (Pilsb… 19 <NA> 815 <NA> 27838
2 32306 Aapto… brinto… Newman… 81749 <NA> 815 <NA> 27838
3 32308 Aapto… callis… (Pilsb… 19 <NA> 815 <NA> 27838
4 32304 Aapto… leptod… Newman… 19 <NA> 815 <NA> 27838
5 32305 Aapto… trider… Newman… 19 <NA> 815 <NA> 27838
6 51720 Aaptos aaptos (Schmi… 19 <NA> 2630 <NA> 9253
7 165941 Aaptos bergma… de Lau… 108813 <NA> 2630 <NA> 9253
8 105687 Aaptos ciliata (Wilso… 3477 <NA> 2630 <NA> 9253
9 139407 Aaptos duchas… (Topse… 85482 <NA> 2630 <NA> 9253
10 130070 Aaptos laxosu… (Solla… 81108 <NA> 2630 <NA> 9253
# … with 97,210 more rows, and 100 more variables: TaxIssue <int>,
# Remark <chr>, PicPreferredName <chr>, PicPreferredNameM <chr>,
# PicPreferredNameF <chr>, PicPreferredNameJ <chr>, Source <chr>,
# AuthorRef <int>, SubGenCode <int>, Fresh <int>, Brack <int>,
# Saltwater <int>, Land <int>, BodyShapeI <chr>, DemersPelag <chr>,
# AnaCat <chr>, MigratRef <int>, DepthRangeShallow <int>,
# DepthRangeDeep <int>, DepthRangeRef <int>, DepthRangeComShallow <int>, …
By default, tables are downloaded the first time they are used.
rfishbase
defaults to download the latest available
snapshot; be aware that the most recent snapshot may be months behind
the latest data on fishbase.org. Check available releases:
available_releases()
[1] "21.06" "19.04"
Users can trigger a one-time download of all fishbase tables (or a
list of desired tables) using fb_import()
. This will ensure
later use of any function can operate smoothly even when no internet
connection is available. Any table already downloaded will not be
re-downloaded. (Note: fb_import()
also returns a remote
duckdb database connection to the tables, for users who prefer to work
with the remote data objects.)
<- fb_import() conn
If you have very limited RAM (e.g. <= 2 GB available) it may be
helpful to use fishbase
tables in remote form by setting
collect = FALSE
. This allows the tables to remain on disk,
while the user is still able to use almost all dplyr
functions (see the dbplyr
vignette). Once the table is
appropriately subset, the user will need to call
dplyr::collect()
to use generic non-dplyr functions, such
as plotting commmands.
fb_tbl("occurrence", collect = FALSE)
# Source: table<occurrence> [?? x 106]
# Database: duckdb_connection
catnum2 OccurrenceRefNo SpecCode Syncode Stockcode GenusCol SpeciesCol
<int> <int> <int> <int> <int> <chr> <chr>
1 34424 36653 227 22902 241 "Megalops" "cyprinoid…
2 95154 45880 NA NA NA "" ""
3 97606 45880 NA NA NA "" ""
4 100025 45880 5520 25676 5809 "Johnius" "belangeri…
5 98993 45880 5676 16650 5969 "Chromis" "retrofasc…
6 99316 45880 454 23112 468 "Drepane" "punctata"
7 99676 45880 5388 145485 5647 "Gymnothorax" "boschi"
8 99843 45880 16813 119925 15264 "Hemiramphus" "balinensi…
9 100607 45880 8288 59635 8601 "Ostracion" "rhinorhyn…
10 101529 45880 NA NA NA "Scomberoides" "toloo-par…
# … with more rows, and 99 more variables: ColName <chr>, PicName <chr>,
# CatNum <chr>, URL <chr>, Station <chr>, Cruise <chr>, Gazetteer <chr>,
# LocalityType <chr>, WaterDepthMin <dbl>, WaterDepthMax <dbl>,
# AltitudeMin <int>, AltitudeMax <int>, LatitudeDeg <int>, LatitudeMin <dbl>,
# NorthSouth <chr>, LatitudeDec <dbl>, LongitudeDeg <int>,
# LongitudeMIn <dbl>, EastWest <chr>, LongitudeDec <dbl>, Accuracy <chr>,
# Salinity <chr>, LatitudeTo <dbl>, LongitudeTo <dbl>, LatitudeDegTo <int>, …
RStudio users can also browse all fishbase tables interactively in
the RStudio connection browser by using the function
fisbase_pane()
. Note that this function will first download
a complete set of the fishbase tables.
rfishbase
4.0 tries to maintain as much backwards
compatibility as possible with rfishbase 3.0. Because parquet preserves
native data types, some encoded types may differ from earlier versions.
As before, these are not always the native type – e.g. fishbase encodes
some boolean (logical TRUE/FALSE) values as integer (-1, 0) or character
types. Use as.logical()
to coerce into the appropriate type
in that case.
Toggling between fishbase and sealifebase servers using an
environmental variable, FISHBASE_API
, is now
deprecated.
Note that fishbase will store downloaded files by hash in the app
directory, given by db_dir()
. The default location can be
set by configuring the desired path in the environmental variable,
FISHBASE_HOME
.
Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.