This R Notebook demonstrates how to work with the R packages sf
and ggplot2
to create basic maps. For this demo, we will create two maps showing the distribution of yellowjacket occurrences across the contiguous 48 United States.
Disclaimer: I’m still very new to working with geospatial data. Take this demo with a grain of salt, and consult more detailed tutorials and GIS courses to get a more thorough udnerstanding of principles. My aim with this demo is to demonstrate two common tasks I’ve encountered working with museum data: plotting latitude/longitude points on a map, and creating heatmaps to show how counts vary across states or countries.
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.0 --[39m
[30m[32mv[30m [34mggplot2[30m 3.2.1 [32mv[30m [34mpurrr [30m 0.3.3
[32mv[30m [34mtibble [30m 2.1.3 [32mv[30m [34mdplyr [30m 0.8.3
[32mv[30m [34mtidyr [30m 1.0.0 [32mv[30m [34mstringr[30m 1.4.0
[32mv[30m [34mreadr [30m 1.3.1 [32mv[30m [34mforcats[30m 0.4.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(data.table) # Read/write large tabular data files
Registered S3 method overwritten by 'data.table':
method from
print.data.table
data.table 1.12.6 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: 㤼㸱data.table㤼㸲
The following objects are masked from 㤼㸱package:dplyr㤼㸲:
between, first, last
The following object is masked from 㤼㸱package:purrr㤼㸲:
transpose
library(sf) # Classes and functions for vector mapping data
package 㤼㸱sf㤼㸲 was built under R version 3.6.2Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
For this demo, we will be working with two geographic datasets: a shapefile containing data on the US states, and a CSV containing georeferenced occurrence data for yellowjackets in the United States.
A shapefile is a geospatial vector data format commonly used with GIS software (like ArcGIS). For this demo, we can think of it as a set of related files containing data on set of spatial features (geographic places) and their associated geometrical representations (points, lines, polygons).
There are different packages and different R objects for working with geospatial data. We will be using a package called sf
, which creates a type of R object called a simple feature object. This object acts much like a data frame, where every row contains data on a geographic place (or feature). In addition, each row has an associated geometry, which contains the geometric representation of that place on a map (as a point, line, or polygon). Simple feature objects can be read into R from existing geospatial data sources (like shapefiles), or created from other file types containing geospatial data (like CSV files).
To plot our maps, we will use the ggplot2
package and its built-in geom geom_sf()
. This geom is designed to plot sf object data.
Coordinate reference systems (CRS) are systems of equations and visual representations cartographers use to represent the 3D spherical Earth as 2D images. For this demo, we will use a CRS called WGS 84. We need to make sure that all our spatial data are using the WGS 84 CRS, or else our different datasets will not align properly.
WGS 84 is represented with the EPSG code 4326.
wgs84 <- 4326 # WGS 84, used in GPS. https://epsg.io/4326
For a good explanation of CRSes:
The US Census Bureau publishes shapefiles containing spatial data for different United States regions. These shapefiles are available here: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
For this demo, we will be using the file cb_2018_us_state_20m.zip, which contains the names and low-resolution geographic boundaries for the US states and territories.
This file was unzipped and placed in a folder called data
. The shapefile itself uses the file extension .shp
, but all the files in the zip archive are required in order to successfully access the data.
We can use the sf function st_read()
to read the data in as an sf object.
us_states <- st_read('data/cb_2018_us_state_20m.shp')
Reading layer `cb_2018_us_state_20m' from data source `C:\Users\Amanda\GitHub\cbb-r-mapping\data\cb_2018_us_state_20m.shp' using driver `ESRI Shapefile'
Simple feature collection with 52 features and 9 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
When we read the shapefile, we see that the current CRS (+datum) is NAD83, which is a projected coordinate system that centers on North America. We will convert the us_states
data to the WGS 84 instead.
us_states <- st_transform(us_states, crs = wgs84)
We can use ggplot2 to do a quick plot to see how our state data are looking. When we plot sf data, we will use a geom called geom_sf()
, which plots the geometry for each row in our data.
ggplot()+ geom_sf(data = us_states)
For this demo, we will focus on just the contiguous 48 states. We can simply filter our sf object as we would a normal data frame.
non_contiguous <- c('Alaska', 'Hawaii', 'Puerto Rico')
us_states_con48 <- filter(us_states, !(NAME %in% non_contiguous))
We can take a look again to see our filtered US states data.
ggplot()+ geom_sf(data = us_states_con48)
We can also create an object containing some theme settings for our map. We can then include this object for future ggplot calls.
map_theme <- theme_bw() +
theme(axis.line = element_blank(), axis.text.x = element_blank(),
axis.text.y = element_blank(), axis.ticks = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
text = element_text(size = 16), legend.text = element_text(size = 16),
legend.title = element_text(face = "bold"))
ggplot() +
geom_sf(data = us_states_con48) +
map_theme
Our data on yellowjacket occurrences were downloaded from the Global Biodiversity Information Facility (GBIF). Each occurrence record contains taxonomic information, occurrence type information, and a georeferenced coordinate.
Dataset citation: GBIF.org (29 April 2020) GBIF Occurrence Download https://doi.org/10.15468/dl.44t6at
We can import the GBIF data as a standard data frame. I like to use the data.table
package (and its fread()
function), which works really well for parsing large files. I will also filter out records where decimalLatitude
or decimalLongitude
were not supplied.
gbif <- fread('data/gbif_yellowjackets.csv') %>%
filter(!is.na(decimalLatitude),
!is.na(decimalLongitude))
We next need to convert this data frame to a geospatial sf
object. The st_as_sf()
function requires that the coordinate columns be specified (x = longitude, y = latitude). The CRS can also be specified. Most GBIF data is assumed to use the WGS 84 coordinate system.
gbif_sf <- st_as_sf(gbif,
coords = c("decimalLongitude", "decimalLatitude"),
crs = wgs84,
remove = FALSE) # retains original lat/long columns
gbif_sf
Simple feature collection with 19814 features and 50 fields
geometry type: POINT
dimension: XY
bbox: xmin: -166.5367 ymin: 19.3417 xmax: -67.2844 ymax: 71.3875
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
First 10 features:
gbifID datasetKey
1 2603556438 50c9509d-22c7-4a22-a47d-8c48425ef4a7
2 2603542509 50c9509d-22c7-4a22-a47d-8c48425ef4a7
3 2603527237 50c9509d-22c7-4a22-a47d-8c48425ef4a7
4 2603520270 50c9509d-22c7-4a22-a47d-8c48425ef4a7
5 2603515579 50c9509d-22c7-4a22-a47d-8c48425ef4a7
6 2603514616 50c9509d-22c7-4a22-a47d-8c48425ef4a7
7 2603469718 50c9509d-22c7-4a22-a47d-8c48425ef4a7
8 2603464579 50c9509d-22c7-4a22-a47d-8c48425ef4a7
9 2603462426 50c9509d-22c7-4a22-a47d-8c48425ef4a7
10 2603460129 50c9509d-22c7-4a22-a47d-8c48425ef4a7
occurrenceID kingdom
1 https://www.inaturalist.org/observations/42252062 Animalia
2 https://www.inaturalist.org/observations/42288189 Animalia
3 https://www.inaturalist.org/observations/42190277 Animalia
4 https://www.inaturalist.org/observations/42172468 Animalia
5 https://www.inaturalist.org/observations/42347183 Animalia
6 https://www.inaturalist.org/observations/42334838 Animalia
7 https://www.inaturalist.org/observations/42043468 Animalia
8 https://www.inaturalist.org/observations/41991262 Animalia
9 https://www.inaturalist.org/observations/41919239 Animalia
10 https://www.inaturalist.org/observations/40430108 Animalia
phylum class order family genus
1 Arthropoda Insecta Hymenoptera Vespidae Vespula
2 Arthropoda Insecta Hymenoptera Vespidae Vespula
3 Arthropoda Insecta Hymenoptera Vespidae Vespula
4 Arthropoda Insecta Hymenoptera Vespidae Dolichovespula
5 Arthropoda Insecta Hymenoptera Vespidae Dolichovespula
6 Arthropoda Insecta Hymenoptera Vespidae Vespula
7 Arthropoda Insecta Hymenoptera Vespidae Vespula
8 Arthropoda Insecta Hymenoptera Vespidae Dolichovespula
9 Arthropoda Insecta Hymenoptera Vespidae Vespula
10 Arthropoda Insecta Hymenoptera Vespidae Dolichovespula
species infraspecificEpithet taxonRank
1 Vespula squamosa NA SPECIES
2 Vespula pensylvanica NA SPECIES
3 Vespula maculifrons NA SPECIES
4 Dolichovespula maculata NA SPECIES
5 Dolichovespula maculata NA SPECIES
6 Vespula acadica NA SPECIES
7 Vespula maculifrons NA SPECIES
8 Dolichovespula maculata NA SPECIES
9 Vespula maculifrons NA SPECIES
10 Dolichovespula maculata NA SPECIES
scientificName verbatimScientificName
1 Vespula squamosa (Drury, 1770) Vespula squamosa
2 Vespula pensylvanica (de Saussure, 1857) Vespula pensylvanica
3 Vespula maculifrons (Buysson, 1905) Vespula maculifrons
4 Dolichovespula maculata (Linnaeus, 1763) Dolichovespula maculata
5 Dolichovespula maculata (Linnaeus, 1763) Dolichovespula maculata
6 Vespula acadica (Sladen, 1918) Vespula acadica
7 Vespula maculifrons (Buysson, 1905) Vespula maculifrons
8 Dolichovespula maculata (Linnaeus, 1763) Dolichovespula maculata
9 Vespula maculifrons (Buysson, 1905) Vespula maculifrons
10 Dolichovespula maculata (Linnaeus, 1763) Dolichovespula maculata
verbatimScientificNameAuthorship countryCode locality
1 US
2 US
3 US
4 US
5 US
6 US
7 US
8 US
9 US
10 US
stateProvince occurrenceStatus individualCount
1 North Carolina NA
2 Arizona NA
3 Maryland NA
4 North Carolina NA
5 North Carolina NA
6 Alaska NA
7 New Jersey NA
8 Maryland NA
9 Virginia NA
10 New York NA
publishingOrgKey decimalLatitude
1 28eb1a3f-1c15-4a95-931a-4af90ecb574d 34.96565
2 28eb1a3f-1c15-4a95-931a-4af90ecb574d 31.72451
3 28eb1a3f-1c15-4a95-931a-4af90ecb574d 38.54904
4 28eb1a3f-1c15-4a95-931a-4af90ecb574d 35.27257
5 28eb1a3f-1c15-4a95-931a-4af90ecb574d 36.12257
6 28eb1a3f-1c15-4a95-931a-4af90ecb574d 57.03464
7 28eb1a3f-1c15-4a95-931a-4af90ecb574d 40.49301
8 28eb1a3f-1c15-4a95-931a-4af90ecb574d 38.60010
9 28eb1a3f-1c15-4a95-931a-4af90ecb574d 38.91710
10 28eb1a3f-1c15-4a95-931a-4af90ecb574d 42.38316
decimalLongitude coordinateUncertaintyInMeters coordinatePrecision
1 -79.40654 28762 NA
2 -110.83710 10 NA
3 -77.18601 2 NA
4 -80.96621 1407 NA
5 -79.52758 1080 NA
6 -135.31406 NA NA
7 -74.41784 162 NA
8 -76.76374 28210 NA
9 -77.27886 14 NA
10 -76.56134 NA NA
elevation elevationAccuracy depth depthAccuracy
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 NA NA NA NA
10 NA NA NA NA
eventDate day month year taxonKey speciesKey
1 2020-04-14T10:59:53 14 4 2020 1311662 1311662
2 2020-04-15T12:31:55 15 4 2020 1311698 1311698
3 2020-04-11T15:27:00 11 4 2020 1311693 1311693
4 2020-04-14T13:12:07 14 4 2020 1311815 1311815
5 2020-04-16T17:51:16 16 4 2020 1311815 1311815
6 2020-03-23T14:35:00 23 3 2020 1311640 1311640
7 2015-09-24T08:33:52 24 9 2015 1311693 1311693
8 2020-04-12T08:32:06 12 4 2020 1311815 1311815
9 2020-04-11T11:23:35 11 4 2020 1311693 1311693
10 2020-02-28T15:12:00 28 2 2020 1311815 1311815
basisOfRecord institutionCode collectionCode catalogNumber
1 HUMAN_OBSERVATION iNaturalist Observations 42252062
2 HUMAN_OBSERVATION iNaturalist Observations 42288189
3 HUMAN_OBSERVATION iNaturalist Observations 42190277
4 HUMAN_OBSERVATION iNaturalist Observations 42172468
5 HUMAN_OBSERVATION iNaturalist Observations 42347183
6 HUMAN_OBSERVATION iNaturalist Observations 42334838
7 HUMAN_OBSERVATION iNaturalist Observations 42043468
8 HUMAN_OBSERVATION iNaturalist Observations 41991262
9 HUMAN_OBSERVATION iNaturalist Observations 41919239
10 HUMAN_OBSERVATION iNaturalist Observations 40430108
recordNumber identifiedBy dateIdentified license
1 2020-04-15T18:26:11 CC_BY_NC_4_0
2 2020-04-16T02:43:21 CC_BY_NC_4_0
3 2020-04-14T21:05:06 CC_BY_NC_4_0
4 2020-04-15T11:36:59 CC_BY_NC_4_0
5 2020-04-16T22:00:44 CC_BY_NC_4_0
6 2020-04-17T04:43:12 CC_BY_NC_4_0
7 2020-04-12T22:47:23 CC_BY_4_0
8 2020-04-12T12:46:29 CC_BY_NC_4_0
9 2020-04-11T17:29:13 CC_BY_NC_4_0
10 2020-03-22T00:38:00 CC_BY_NC_4_0
rightsHolder recordedBy typeStatus establishmentMeans
1 snail_hiker snail_hiker
2 Tony Palmer Tony Palmer
3 spyingnaturalist spyingnaturalist
4 karyncook79 karyncook79
5 kristin0107 kristin0107
6 rolandwirth rolandwirth
7 John Beetham John Beetham
8 klanderson klanderson
9 Izabella Farr Izabella Farr
10 Jason Dombroskie Jason Dombroskie
lastInterpreted mediaType
1 2020-04-22T02:25:24.859Z StillImage
2 2020-04-22T02:23:43.478Z StillImage
3 2020-04-22T02:23:37.213Z StillImage
4 2020-04-22T02:25:22.709Z StillImage
5 2020-04-22T02:25:26.216Z StillImage
6 2020-04-22T02:25:24.884Z
7 2020-04-22T02:23:12.788Z StillImage
8 2020-04-22T02:24:56.760Z StillImage
9 2020-04-22T02:25:10.318Z StillImage
10 2020-04-22T02:24:48.416Z StillImage
issue
1 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
2 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
3 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
4 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
5 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
6 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
7 GEODETIC_DATUM_ASSUMED_WGS84
8 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
9 COORDINATE_ROUNDED;GEODETIC_DATUM_ASSUMED_WGS84
10 GEODETIC_DATUM_ASSUMED_WGS84
geometry
1 POINT (-79.40654 34.96565)
2 POINT (-110.8371 31.72451)
3 POINT (-77.18601 38.54904)
4 POINT (-80.96621 35.27257)
5 POINT (-79.52758 36.12257)
6 POINT (-135.3141 57.03464)
7 POINT (-74.41784 40.49301)
8 POINT (-76.76374 38.6001)
9 POINT (-77.27886 38.9171)
10 POINT (-76.56134 42.38316)
We can now plot our GBIF data. We’ll use two geom_sf()
calls - the first will plot our US state data, and then second will plot our GBIF data on top of it.
ggplot() +
geom_sf(data = us_states_con48) +
geom_sf(data = gbif_sf) +
map_theme
We can see some data points that seem to be occurring in Alaska and Hawaii. Since we’re limiting our map to the contiguous 48 states, we’d like to remove these data points.
We can use the us_states_48con
sf object to filter our GBIF data, returning only the GBIF points that occur within one of the contiguous 48 states. We can do this with a function called st_join()
, which joins the data from two sf objects according to their geometries and retains the geometry for the first sf object. Here, we take each GBIF point and join the corresponding US state data in which it occurs. left = FALSE
specifies that we are performing an inner join, so any point that does not find a corresponding state is removed.
gbif_con48 <- st_join(gbif_sf, us_states_con48, left = FALSE)
although coordinates are longitude/latitude, st_intersects assumes that they are planar
although coordinates are longitude/latitude, st_intersects assumes that they are planar
ggplot() +
geom_sf(data = us_states_con48) +
geom_sf(data = gbif_con48) +
map_theme
Like in any ggplot figure, we can use aesthetic mapping to provide additional information. For example, we could color our GBIF data by species of yellowjacket.
species_map <- ggplot() +
geom_sf(data = us_states_con48, fill = "white") +
geom_sf(data = gbif_con48,
mapping = aes(color = species),
alpha = 0.4,
size = 2,
show.legend = FALSE) +
map_theme +
labs(title = 'Yellowjacket records by species',
subtitle = "GBIF data for Vespula and Dolichovespula, April 2020")
species_map
We can also create heatmaps (or choropleth maps) of our states, coloring each state according to one of its corresponding counts. For example, we might be interested in the number of occurrences found in each state.
To do this, we can again join our US state and GBIF sf objects. This time, we will join them in the reverse order, retaining the geographic information for our states and attaching GBIF data for each corresponding record. Then we can group and summarize (as we would with a normal data frame) to get the occurrence counts for each area.
I created two summary columns - records (the count of records), and the square root of record count, which we can use for plotting.
state_records <- st_join(us_states_con48, gbif_sf) %>%
group_by(STUSPS) %>%
summarize(records = n(),
records_sqrt = sqrt(records))
although coordinates are longitude/latitude, st_intersects assumes that they are planar
This time when we plot our map, we will only use one geom_sf()
call, because we only need to plot our state data. We will also call geom_sf_text()
, which will allow us to put labels on each of our states.
state_heatmap <- ggplot(state_records, aes(fill = records_sqrt, label = records)) +
geom_sf(show.legend = FALSE) +
geom_sf_text(size = 4) +
map_theme +
scale_fill_gradient(low = 'white', high = 'goldenrod1') +
labs(title = "Yellowjacket records by US state",
subtitle = "GBIF data for Vespula and Dolichovespula, April 2020")
state_heatmap
Once we are happy with our maps, we can save them as image files with ggplot’s ggsave()
.
ggsave(
'figures/records_species_map.png',
plot = species_map,
width = 12, height = 7, units = 'in')
ggsave(
'figures/state_counts_map.png',
plot = state_heatmap,
width = 12, height = 7, units = 'in')