12 `sf` and map making in R

Learning objectives

Read spatial data
Understand how to convert a dataframe into an sf object
Practice transforming sf spatial data between projections
Demonstrate a spatial join, st_intersection()
Create interactive webmaps
Create static maps with the base plot() method and ggplot2
Save/export static maps

12.1 Geospatial data with `sf`

R is a powerful tool for working with spatial data and making maps. Within R, you can do nearly everything that a GUI type program can do (e.g., ArcGIS or QGIS), and moreover, you can write a script to automate or scale up routine analyses, thus saving time on repetitive tasks and ensuring you create reproducible workflows.

There are a variety of spatial mapping/plotting packages in R. However, the best option being used widely for vector-based spatial data in R is the sf package. sf is a powerful, clean, and fairly straightforward approach because it has an easy to use syntax, it is fast and it can do most if not all of the tasks commonly done in other geospatial software programs. Even better, sf spatial objects are simply data.frames with a geometry column, so all of the tidy, wrangle, join, and plot capabilities from dplyr and ggplot2 also work on sf objects. Therefore, it is possible to use R to address all your data processing needs in a single environment without the need to move between tools for tabular data (e.g., Excel) and geospatial data (e.g., ArcGIS or QGIS).

sf stands for “simple features” which is a way of representing spatial data. See the R sf package documentation for more details. “Sticky geometry” in the figure means that all geometry information is contained in a single column, and that column follows the data, no matter how you work with it.

12.1.1 Reading spatial data

The sf package can read spatial data in various formats (e.g., shapefile, GeoJSON, PostGIS, kml), and is very straightforward if the data is already spatial, requiring only a call to the function st_read().

# First we load the sf and here packages
library(sf)
library(tidyverse)

# may need to unzip first:
unzip("data/shp/sac_county.zip", exdir = "data/shp/sac")

# Read a shapefile of Sacramento county 
sac <- st_read("data/shp/sac/sac_county.shp", quiet = TRUE) 

colnames(sac)

 [1] "OBJECTID"   "COUNTY_NAM" "COUNTY_ABB" "COUNTY_NUM" "COUNTY_COD"
 [6] "COUNTY_FIP" "Shape"      "Shape.STAr" "Shape.STLe" "geometry"

Note, the quiet = TRUE option suppresses printing information about the imported spatial data. Sometimes this information is helpful, but see read_sf() which is the same as st_read() but defaults to quietly reading in data.

12.1.2 Converting a dataframe to sf

At other times, you may need to convert tabular data to a spatial format. To make an sf object, we are creating a geometry column. This contains all the geospatial information we need, and it lives within the dataframe. Importantly, this geometry column is “sticky”, meaning whatever we do to our data (tidy, filter, mutate etc) the associated geometry column will stick with the data. What’s awesome is this column can contain anything from information for a point to a line to a complex polygon – all in one column. To make an sf object, we need to know two important pieces of information…

coords: The columns that contain the geospatial coordinates (either name or column number)
crs: The projection or EPSG (CRS, SRID, etc) that the data in¹

To practice, let’s read all groundwater level monitoring stations in Sacramento County, stations_sac.csv, and convert this tabular data to an sf object with the function st_as_sf(). We will specify which columns contain the coordinates (coords), and what the projection, or coordinate reference system (crs) the data is in. In this case, we know the data is in NAD83, or EPSG 4269.

Notice that all functions in sf begin with an “st_” prefix. This makes it easy to type “st_” and use tab-completion to find sf functions, and it also makes it easier to read code and spot spatial transformations.

# read groundwater level stations in Sacramento county as dataframe
stations <- read_csv("data/gwl/stations_sac.csv")

# check object class
class(stations)

[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

# convert stations into an sf object by specifying coordinates and crs
stations <- st_as_sf(stations, 
                     coords = c("LONGITUDE", "LATITUDE"), # note x goes first
                     crs = 4269, # projection, this is NAD83
                     remove = FALSE) # don't remove lat/lon cols from dataframe

# check the class of the new object
class(stations)

[1] "sf"          "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

12.1.3 Projecting spatial data

A common problem in geospatial analysis is when two different datasets are in different projections. We can check the projection of our sac and stations objects with the function st_crs(), and transform our data (or re-project) with the st_transform() function.

We know stations is in NAD83, but what about sac? Let’s check with st_crs(). Line 2 of the output indicates WGS84.

st_crs(sac)

Coordinate Reference System:
  User input: WGS 84 / Pseudo-Mercator 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            MEMBER["World Geodetic System 1984 (G2296)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]

We can re-project, or transform the projection (crs) with st_transform() and by specifying the EPSG code to transform the data to. Here we use NAD83 (EPSG: 4269), so the Sacramento county boundary (sac) is in the same projection as the groundwater level monitoring points (stations).

sac <- st_transform(sac, crs = 4269)

Lastly, we verify our transformation worked.

st_crs(sac)

Coordinate Reference System:
  User input: EPSG:4269 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Geodesy."],
        AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands."],
        BBOX[14.92,167.65,86.45,-40.73]],
    ID["EPSG",4269]]

We can even ask R if the projection (crs) of sac and stations are identical, and they are.

identical(st_crs(stations), st_crs(sac))

[1] TRUE

We can also transform data using the actual sf object or dataframe, without needing to find the specific EPSG or CRS code. For example, if we want to transform our sac county polygon into the same projection as our stations data, we can do the following:

sac <- st_transform(sac, crs = st_crs(stations))

# verify these are the same
identical(st_crs(stations), st_crs(sac))

[1] TRUE

# or look at just the EPSG code:
st_crs(sac)$epsg

[1] 4269

st_crs(stations)$epsg

[1] 4269

12.1.4 Spatial join with st_intersection()

With all of our spatial data in the same projection, we can perform a spatial join. Perhaps the most common spatial join is an intersection. For example, above, the stations object only contains stations in Sacramento County, but it came from a much larger set of stations (n = 43,807). Let’s bring in all groundwater stations in the state of California, convert it to an sf object class, and plot the data.

# all gw stations in California
all_gw_stations <- read_csv("data/gwl/stations.csv") %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), 
           crs = 4269, 
           remove = FALSE)

As we can see above, groundwater monitoring stations are concentrated in Bulletin 118 subbasins.²

Using dplyr::filter() we can subset these stations to Sacramento County.

# filter to Sacramento county & verify this worked
stations_sac <- stations %>% 
  filter(COUNTY_NAME == "Sacramento") 

# verify this worked
unique(stations_sac$COUNTY_NAME)

[1] "Sacramento"

But what if we didn’t have the county data already, or if we wanted to filter these data by a polygon that wasn’t detailed in one of the existing variables in our dataframe? In this case, we can use st_intersection() which is a spatial join that takes two arguments, the sf object we want to filter (x), and another sf object to filter by (y). If we use x = all_gw_stations and y = sac it will return all of the points in all_gw_stations that fall within sac county.

Before performing the spatial join, we must re-project our data from a geographic coordinate reference system (CRS) to a projected coordinate reference system.³

# it's good practice to ensure your spatial data are in a projected CRS
# like meters before performing spatial operations, so we transform to 3310
all_gw_stations_3310 <- st_transform(stations, 3310)
sac_3310 <- st_transform(sac, 3310)

# perform the intersection 
stations_sac_3310 <- st_intersection(all_gw_stations_3310, sac_3310) 

# number of observations in each county
table(stations_sac_3310$COUNTY_NAME)


     Placer  Sacramento San Joaquin      Sutter 
          2         485           1           1

Interestingly, there are 4 counties in our data after the spatial join. Why is that? If we visualize the data, we can see that all of these points are right on the border of Sacramento County, and the process that previously added county names to these data must have used a slightly different Sacramento county polygon than the one we are using in this module.

There are many many more methods available beyond intersections, including area, distances, buffers, crops, voronoi polygons, nearest neighbor calculations, convex hull calculations, centroid calculations, and much, much more. The list of operations within sf are shown below.

methods(class = 'sfc')

 [1] [                            [<-                         
 [3] as.data.frame                c                           
 [5] coerce                       format                      
 [7] fortify                      identify                    
 [9] initialize                   mapView                     
[11] obj_sum                      Ops                         
[13] points                       print                       
[15] rep                          scale_type                  
[17] show                         slotsFromS3                 
[19] st_area                      st_as_binary                
[21] st_as_grob                   st_as_s2                    
[23] st_as_sf                     st_as_text                  
[25] st_bbox                      st_boundary                 
[27] st_break_antimeridian        st_buffer                   
[29] st_cast                      st_centroid                 
[31] st_collection_extract        st_concave_hull             
[33] st_convex_hull               st_coordinates              
[35] st_crop                      st_crs                      
[37] st_crs<-                     st_difference               
[39] st_exterior_ring             st_geometry                 
[41] st_inscribed_circle          st_intersection             
[43] st_intersects                st_is_full                  
[45] st_is_valid                  st_is                       
[47] st_line_merge                st_m_range                  
[49] st_make_valid                st_minimum_bounding_circle  
[51] st_minimum_rotated_rectangle st_nearest_points           
[53] st_node                      st_normalize                
[55] st_point_on_surface          st_polygonize               
[57] st_precision                 st_reverse                  
[59] st_sample                    st_segmentize               
[61] st_set_precision             st_shift_longitude          
[63] st_simplify                  st_snap                     
[65] st_sym_difference            st_transform                
[67] st_triangulate_constrained   st_triangulate              
[69] st_union                     st_voronoi                  
[71] st_wrap_dateline             st_write                    
[73] st_z_range                   st_zm                       
[75] str                          summary                     
[77] text                         type_sum                    
[79] unique                       vec_cast.sfc                
[81] vec_ptype2.sfc              
see '?methods' for accessing help and source code

To learn more about advanced spatial operations, see the Spatial data module in the Intermediate to Advanced wrds course, and online books and resources.

12.2 Plotting `sf` data

With all of our spatial data in the same projection, we can start making maps! We will cover the built-in plot() method for sf objects, interactive maps with mapview, and plotting with ggplot2.

12.2.1 Inspect data with `plot()`

After reading spatial data you may want to plot it to make sure that it imported correctly, and to understand the fields. For a quick plot, you can simply use plot().

# plot the geometry
plot(sac$geometry)

If we don’t specify the “geometry” column, plot() will plot the first 10 columns in the dataframe (you can control the number of subplots shown with the max.plot argument). Here we can see there are 4 distinct basins (BASIN_CODE) in Sacramento County.

plot(stations)

12.3 Interactive mapping with `mapview`

One of the easiest and coolest packages you’ll find for interactive mapping is mapview. As long as data are in sf format, you can quickly make an interactive map. First let’s make sure we have an sf class of data.

class(stations)

[1] "sf"          "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

class(sac)

[1] "sf"         "data.frame"

12.3.1 Basic use

Next we can use the simple mapview() function to create an interactive webmap!

library(mapview)

mapview(sac)

We can add mapview objects to one another in the same way we add layers to a ggplot, by using a +. We can then toggle them on and off from the interactive map from the top-left hand layer control icon. We can also change the basemap layers being used on the map from this same menu.

# combine sac and stations data
mapview(sac) + mapview(stations)

Note that we can open a mapview object in our default web browser by clicking on the little box and arrow to expand and view. This is particularly helpful when pop-up tables contain dense information, as is the case with our stations dataframe.

12.3.2 Customizing mapview

Not only can you combine mapview objects, but you can also customize their appearance by adjusting a variety of built-in arguments to the mapview() function.

# make sac polygon transparent, with a thick red outline
mapview(sac, alpha.regions = 0, color = "red", lwd = 2, layer.name = "Sac Co") +
  # color points by the well depth
  mapview(stations, zcol = "WELL_DEPTH", layer.name = "Well depth (ft)")

Challenge 1: You Try!

Create a new mapview object of Sacramento County (sac), plus stations colored by “WELL_USE”. Add the argument burst = TRUE, and read the mapview documentation to learn what this does (Hint: Enter ?mapview and scroll to “Arguments”).
Toggle all layers off except for irrigation and residential wells. Recall this relationship for the next module on EDA.

Click for Answers!

mapview(sac,                    # sacramento county sf polygon
        alpha.regions = 0,      # transparent interior
        color = "red",          # red outline
        lwd = 2,                # thick outline
        layer.name = "Sac Co",  # layer name 
        legend = FALSE) +       # hide legend
    mapview(stations,           # stations sf points
            zcol = "WELL_USE",  # color by the well use
            burst = TRUE)       # split each category into a layer

Additional Info

mapview is a great package for quickly visualizing and sharing spatial data. To export and save a .html map that can be shared with others, it’s currently advisable to specify mapviewOptions(fgb = FALSE) after loading the library. This allows us to save the map as a self-contained html file. To do so, click on the Viewer tab, and then on Export > Save as Web Page. This .html file can then be zipped and emailed or shared, and opened in most web browsers.

12.4 Static maps with `ggplot2`

Interactive maps are useful for fast data exploration and integration into web applications, however, depending on the project, static maps may more appropriate for reports, presentations, and sharing. Mapmaking with sf objects in ggplot2 follows the same syntax we practiced in the ggplot2 module, using the geom_sf() function.

12.4.1 Basic use

# put data in geoms rather than ggplot() as we have multiple datasets in one plot
p <- ggplot() +
  geom_sf(data = sac) +
  geom_sf(data = stations, color = "blue", alpha = 0.5)

Adding a north arrow and scale bar is achieved with the ggspatial package.

p + 
  # north arrow (top left) & scale bar (bottom right) 
  ggspatial::annotation_north_arrow(location = "tl") +
  ggspatial::annotation_scale(location = "br") +
  labs(x = "Longitude (NAD83)", y = "Latitude", 
       title = "Groundwater monitoring staions",
       subtitle = "Sacramento County") +
  theme_minimal()

Just as before in ggplot2, we can connect an aesthetic, aes, like color to one of the column variables. We can also facet and change the theme(). For maps, theme_void() is useful because it removes graticules (grid lines), axis ticks, and labels which allows us to focus on the data explored in the plot.

p <- ggplot() +
  geom_sf(data = sac) +
  geom_sf(data = stations, aes(color = WELL_DEPTH)) +
  scale_color_viridis_c("Well depth (ft)") + 
  theme_void() 
p

Challenge 2: Debug and modify.

Fix the following code that, as written, will return an error. Then, with the fixed code, map the color aesthetic to how the well is used (WELL_USE), and add a viridis color scale to this discrete variable.

ggplot() %>% 
  geom_sf(data = sac) +
  geom_sf(data = stations)

Click for Answers!

# Use `+` instead of `%>%` to add ggplot objects together
ggplot() +
  geom_sf(data = sac) +
  geom_sf(data = stations)

Map “WELL_USE” to color in the stations dataframe, and ensure the color scale is a discrete variable with: scale_color_viridis_d()

ggplot() +
  geom_sf(data = sac) +
  geom_sf(data = stations, aes(color = WELL_USE)) +
  scale_color_viridis_d("Well type", option = "B") +
  theme_void()

12.4.2 Customizing maps in ggplot2

Because we’re working in ggplot2, we can also facet_wrap() and customize the theme().

# facet the plot we created above, p, by the well use
p +
  facet_wrap(~WELL_USE, nrow = 2) +
  theme(legend.position = "top") +
  guides(color = guide_colorbar(barwidth = unit(20, "lines"), 
                                barheight = unit(0.5, "lines"),
                                title.hjust = 0.5, 
                                title.position = "top"))

We might notice that a few extremely deep wells dominate the color scale and cause it to extend to large values that prevent us from seeing variation in shallower well depths. We can improve this visualization by drawing on some skills from the dplyr module. Let’s overwrite large well depth values with dplyr::mutate() and improve our labeling on the map.

# only 4.7% of wells have a well depth that exceeds 700 feet
# but the color bar goes all the way to 1600 feet!
sum(stations$WELL_DEPTH >= 700, na.rm = TRUE) / nrow(stations)

[1] 0.04703476

# overwrite large values & assign the resulting dataframe to a new object
stations_viz <- stations %>% 
  mutate(WELL_DEPTH = ifelse(WELL_DEPTH >= 700, 700, WELL_DEPTH))

# replot with the new stations_viz object
p <- ggplot() +
  geom_sf(data = sac) +
  geom_sf(data = stations_viz, aes(color = WELL_DEPTH)) +
  facet_wrap(~WELL_USE, nrow = 2) + 
  # improve labels to reflect the changes made to values in these data
  scale_color_viridis_c("Well depth (ft)",
                        # break points for labels along the colorbar
                        breaks = seq(0, 700, 100),
                        # labels to insert at each of the breaks
                        labels = c(seq(0, 600, 100), "> 700")) + 
  theme_void() +
  theme(legend.position = "top") +
  guides(color = guide_colorbar(barwidth = unit(20, "lines"), 
                                barheight = unit(0.5, "lines"),
                                title.hjust = 0.5, 
                                title.position = "top"))
p

In this improved visualization, it’s clear that irrigation wells tend to be deeper than residential wells. The main cluster of “Other” wells is associated with the Aerojet Superfund site, and are fairly deep, around 300-600 ft.

12.4.3 Exporting maps in ggplot2

Once a map has been created, how do we get it out of R and into a form we can share with others? One option is to save a map directly from the Plot viewer in RStudio.

For more control and reproducibility, we can also save a map by using the methods covered in the “Saving plots” section of data visualization module, which include ggsave(). It’s good to practice this way of doing things because it will build your understanding and help you automate saving hundreds of plots at a single time down the road.

ggsave("results/sac_well_depth_type.png", p, height = 5, width = 8)

12.5 Additional Resources

We covered 3 common frameworks in which to plot spatial data in R: the plot() method for sf objects, mapview package, and ggplot2. Although these are popular and powerful approaches to visualize and map spatial data, other packages in the R ecosystem are available including:

leaflet and extensions like leafpop
tmap
mapdeck
rasterVis for raster data
plotly

The online books/guides Geocomputation with R, RSpatial.org, and Spatial Data Science are additional resources for a deeper dive into spatial data processing and mapping in R.

Coordinate reference systems are an entire lesson unto themselves. For a good overview, check out this Data Carpentry lesson. ↩︎
Bulletin 118 subbasins are DWR desginated zones that generally correspond to productive aquifers, and can be viewed here.↩︎
A discussion on coordinate reference systems is a complex topic in and of itself, and for the purposes of this module, we summarize it as follows: A geographic CRS is round and based on angular units of degrees (lat/lng), whereas a projected CRS is flat and has linear units (meters or feet). Many functions in sf that make calculations on data expect a projected CRS, and can return inaccurate results if an object in a geographic CRS is used. This is a fascinating topic with lots written about it! For more reading see this Esri blog, the Data Carpentry geospatial lesson, and the online Geocomputation with R book.↩︎

12.1 Geospatial data with sf

12.1.1 Reading spatial data

12.1.2 Converting a dataframe to sf

12.1.3 Projecting spatial data

12.1.4 Spatial join with st_intersection()

12.2 Plotting sf data

12.2.1 Inspect data with plot()

12.3 Interactive mapping with mapview

12.3.1 Basic use

12.3.2 Customizing mapview

12.4 Static maps with ggplot2

12.4.1 Basic use

12.4.2 Customizing maps in ggplot2

12.4.3 Exporting maps in ggplot2

12.5 Additional Resources

12.1 Geospatial data with `sf`

12.2 Plotting `sf` data

12.2.1 Inspect data with `plot()`

12.3 Interactive mapping with `mapview`

12.4 Static maps with `ggplot2`