Note: This package is deprecated. For new projects, we recommend using the sf package to interface with geodatabases.
The postGIStools package extends the standard R / PostgreSQL interface (as implemented in RPostgreSQL) to provide support for two popular PostgreSQL extensions: PostGIS (spatial data) and hstore (key/value pairs).
Install the package from this GitHub repository using the following code:
install.packages("devtools") # if necessary
::install_github("SESYNC-ci/postGIStools") devtools
We demonstrate the postGIStools functions using a test database hosted on Heroku. It contains a single table country with the following fields:
name | type | comments |
---|---|---|
name | text | country name |
iso2 | text | ISO two-letter code (primary key) |
capital | text | |
population | integer | |
translations | hstore | key/value pairs where key is language (e.g. "es", "fr") |
geom | geometry | country polygons |
The data originates from the REST Countries API, whereas the country geometries are from the wrld_simpl map included in maptools R package.
To read data from PostgreSQL into R, postGIStools provides the
get_postgis_query
function. Like the
dbGetQuery
function in PostgreSQL, it requires a connection
object and a SQL statement, which in this case must be a SELECT
statement. In addition, the user may identify a geometry and/or hstore
field by name.
library(RPostgreSQL)
library(postGIStools)
<- dbConnect(PostgreSQL(), dbname = "d2u06to89nuqei", user = "mzcwtmyzmgalae",
con host = "ec2-107-22-246-250.compute-1.amazonaws.com",
password = "UTv2BuwJUPuruhDqJthcngyyvO")
<- get_postgis_query(con, "SELECT * FROM country
countries WHERE population > 1000000",
geom_name = "geom", hstore_name = "translations")
class(countries)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
When a geometry column is specified, the query output is a spatial data frame type from the sp package. The hstore column is converted to a list-column in R, where each “cell” is a named list.
str(countries@data[1:2,])
## 'data.frame': 2 obs. of 5 variables:
## $ name : chr "Afghanistan" "Albania"
## $ iso2 : chr "AF" "AL"
## $ capital : chr "Kabul" "Tirana"
## $ population : int 26023100 2893005
## $ translations:List of 2
## ..$ :List of 5
## .. ..$ de: chr "Afghanistan"
## .. ..$ es: chr "Afganistán"
## .. ..$ fr: chr "Afghanistan"
## .. ..$ it: chr "Afghanistan"
## .. ..$ ja: chr "アフガニスタン"
## ..$ :List of 5
## .. ..$ de: chr "Albanien"
## .. ..$ es: chr "Albania"
## .. ..$ fr: chr "Albanie"
## .. ..$ it: chr "Albania"
## .. ..$ ja: chr "アルバニア"
The query statement can include the output of PostGIS spatial
functions. In that case, it is simpler to set an alias for the output
column and pass that alias as the geom_name
. For example,
the following query returns a SpatialPointsDataFrame of the
country centroids.
<- get_postgis_query(con,
centroids "SELECT name, ST_Centroid(geom) centr FROM country",
geom_name = "centr")
head(centroids)
## coordinates name
## 1 (66.00111, 33.85878) Afghanistan
## 2 (19.97356, 60.20129) Åland Islands
## 3 (20.07437, 41.13255) Albania
## 4 (2.627813, 28.17211) Algeria
## 5 (-169.8823, -14.25412) American Samoa
## 6 (1.650388, 42.56045) Andorra
To interact with hstore columns imported into R, postGIStools defines
the %->%
operator, which is analogous to
->
in PostgreSQL. Specifically,
hstore %->% "key"
extracts the value in each cell of the
hstore corresponding to the given key, or NA
when the key
is absent from a given cell.
head(countries$translations %->% "es")
## [1] "Afganistán" "Albania" "Argelia" "Angola" "Argentina"
## [6] "Armenia"
The operator is also compatible with single bracket subsetting of the hstore.
$translations[5:7] %->% "fr"
countries## [1] "Argentine" "Arménie" "Australie"
The assignment version of %->%
operates similarly,
with the option of deleting keys by assigning them to
NULL
.
$translations[2] %->% "nl" <- "Albanië"
countries$translations[3] %->% "fr" <- NULL
countries$translations[2:3]
countries## [[1]]
## [[1]]$de
## [1] "Albanien"
##
## [[1]]$es
## [1] "Albania"
##
## [[1]]$fr
## [1] "Albanie"
##
## [[1]]$it
## [1] "Albania"
##
## [[1]]$ja
## [1] "アルバニア"
##
## [[1]]$nl
## [1] "Albanië"
##
##
## [[2]]
## [[2]]$de
## [1] "Algerien"
##
## [[2]]$es
## [1] "Argelia"
##
## [[2]]$it
## [1] "Algeria"
##
## [[2]]$ja
## [1] "アルジェリア"
The new_hstore
function creates a blank hstore of a
given length, which is just an empty list of lists. It is most useful
when assigned to a data frame column
(e.g. df$hs <- new_hstore(3)
) that can be then populated
with %->%
and written back to a PostgreSQL database.
The two write methods postgis_insert
and
postgis_update
wrap around their namesake SQL commands,
while also converting R spatial objects and list-columns back into the
geometry and hstore data types, respectively.
To demonstrate these functions, we create a new temporary table in the database.
dbSendQuery(con, paste("CREATE TEMP TABLE cty_tmp (name text,",
"iso2 text PRIMARY KEY, capital text,",
"translations hstore, geom geometry)"))
Calls to postgis_insert
must specify the connection,
data frame and table name. By default, all data frame columns are
inserted, but a subset of columns can be specified as
write_cols
. In both cases, the names of inserted columns
must have a match in the target table.
postgis_insert(con, countries[1:10,], "cty_tmp",
write_cols = c("name", "iso2", "translations"),
geom_name = "geom", hstore_name = "translations")
## <PostgreSQLResult:(39779,0,4)>
# Reimport to check
<- get_postgis_query(con, paste("SELECT name, iso2, capital,",
cty_tmp "geom, translations FROM cty_tmp"),
geom_name = "geom", hstore_name = "translations")
head(cty_tmp@data)
## name iso2 capital
## 1 Afghanistan AF <NA>
## 2 Albania AL <NA>
## 3 Algeria DZ <NA>
## 4 Angola AO <NA>
## 5 Argentina AR <NA>
## 6 Armenia AM <NA>
## translations
## 1 Afghanistan, Afganistán, Afghanistan, Afghanistan, アフガニスタン
## 2 Albanien, Albania, Albanie, Albania, アルバニア, Albanië
## 3 Algerien, Argelia, Algeria, アルジェリア
## 4 Angola, Angola, Angola, Angola, アンゴラ
## 5 Argentinien, Argentina, Argentine, Argentina, アルゼンチン
## 6 Armenien, Armenia, Arménie, Armenia, アルメニア
We next update the records in cty_tmp to include the
capital field. The syntax of postgis_update
is
similar to postgis_insert
, except that we must specify both
id_cols
, the column(s) identifying the records to update,
as well as update_cols
, the column(s) to be updated. (The
underlying PostgreSQL operation is of the format UPDATE… SET …
FROM….) Neither the geometry nor the hstore can be used as
id_cols
. Note that since the input data frame
countries[1:10,]
includes spatial and list-column data, we
need to specify geom_name
and hstore_name
,
even if those columns are not needed for the update operation.
postgis_update(con, countries[1:10,], "cty_tmp", id_cols = "iso2",
update_cols = "capital", geom_name = "geom",
hstore_name = "translations")
## <PostgreSQLResult:(39779,0,7)>
<- get_postgis_query(con, paste("SELECT name, iso2, capital,",
cty_tmp "geom, translations FROM cty_tmp"),
geom_name = "geom", hstore_name = "translations")
head(cty_tmp@data)
## name iso2 capital
## 1 Australia AU Canberra
## 2 Afghanistan AF Kabul
## 3 Albania AL Tirana
## 4 Algeria DZ Algiers
## 5 Angola AO Luanda
## 6 Argentina AR Buenos Aires
## translations
## 1 Australien, Australia, Australie, Australia, オーストラリア
## 2 Afghanistan, Afganistán, Afghanistan, Afghanistan, アフガニスタン
## 3 Albanien, Albania, Albanie, Albania, アルバニア, Albanië
## 4 Algerien, Argelia, Algeria, アルジェリア
## 5 Angola, Angola, Angola, Angola, アンゴラ
## 6 Argentinien, Argentina, Argentine, Argentina, アルゼンチン
By default, hstore columns are updated by concatenation: keys present
in the input data frame but not the original table are added to the
hstore, keys present in both the data frame and table have their
associated values updated, but keys absent from the input data frame are
not deleted from the table. This can be changed by setting
hstore_concat = FALSE
, in which case whole hstore cells are
replaced with corresponding ones in the input data frame.
$translations[2] %->% "nl" <- NULL
countries$translations[3] %->% "fr" <- "Algérie"
countries
postgis_update(con, countries[1:10,], "cty_tmp", id_cols = "iso2",
update_cols = "translations", geom_name = "geom",
hstore_name = "translations")
## <PostgreSQLResult:(39779,0,10)>
<- get_postgis_query(con, paste("SELECT name, iso2, capital,",
cty_tmp "geom, translations FROM cty_tmp"),
geom_name = "geom", hstore_name = "translations")
@data[cty_tmp$iso2 %in% c("AL", "DZ"), ]
cty_tmp## name iso2 capital
## 2 Albania AL Tirana
## 5 Algeria DZ Algiers
## translations
## 2 Albanien, Albania, Albanie, Albania, アルバニア, Albanië
## 5 Algerien, Argelia, Algérie, Algeria, アルジェリア
# Key deletion not reflected in database unless hstore_concat = FALSE
postgis_update(con, countries[1:10,], "cty_tmp", id_cols = "iso2",
update_cols = "translations", geom_name = "geom",
hstore_name = "translations", hstore_concat = FALSE)
## <PostgreSQLResult:(39779,0,13)>
<- get_postgis_query(con, paste("SELECT name, iso2, capital,",
cty_tmp "geom, translations FROM cty_tmp"),
geom_name = "geom", hstore_name = "translations")
@data[cty_tmp$iso2 %in% c("AL", "DZ"), ]
cty_tmp## name iso2 capital translations
## 2 Albania AL Tirana Albanien, Albania, Albanie, Albania, アルバニア
## 5 Algeria DZ Algiers Algerien, Argelia, Algérie, Algeria, アルジェリア
The code to import geom fields is based on blog post from Lee Hachadoorian: Load PostGIS geometries in R without rgdal
Development of this R package was supported by the National Socio-Environmental Synthesis Center (SESYNC) under funding received from the National Science Foundation DBI-1052875.