Chapter 4 Multiple-Dataset GIS Operations / Visualization pt. 2
4.1 Learning Objectives
- Combine two datasets with spatial join
- Perform spatial aggregation (point in polygon)
- Manipulate data with dplyr
- Save a ggplot image
4.2 Functions Learned
st_join()
select()
count()
arrange()
st_geometry()
ggsave()
Hint: For each new function we go over, type ?
in front of it in the console to pull up the help page.
4.3 Interactive Tutorial
This workshop’s script can be found here.
4.4 Challenges
We’ve been reading shapefiles that we’ve downloaded, but we call also read data directly from a website using an “API”. These are often great ways to get data without having to manually download it.
We’re going to read data from the Chicago Data Portal:
- Libraries points
- Community Area polygons
Click on the “API” button to directly access the data, rather than having to download a csv via “Export”.
Challenge
- Which one of these is a geographic data format?
Challenge
- Fill in the following script:
# Load libraries for use
areas <- st_read("https://data.cityofchicago.org/resource/igwz-8jzy.geojson")
## Reading layer `igwz-8jzy' from data source `https://data.cityofchicago.org/resource/igwz-8jzy.geojson' using driver `GeoJSON'
## Simple feature collection with 77 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
# Read in libraries and areas data
# Project both
# Make a ggplot with libraries and community areas
Challenge
- Which community areas have no libraries? Use
st_intersects
andfilter
to make a map.
# Load library with filter() in it
# Find which areas intersect with libraries and save as a variable called "intersects"
# Filter areas by *without* libraries. Save as a variable called "no_lib" Hint: use "==" instead of ">" in the logical comparison
# Make a ggplot with libraries, community areas, and community areas without libraries
The order in which you give arguments to st_intersects
matters! I always have to look it up, but for point-in-polygon, you want the polygon first, then the points.
One question you may be asking yourself is, how many libraries are in each area?
We can tackle this with an operation known as a spatial join. What we do is join information about the polygons to the points, so we have for each point which community area it’s in. More formally, we are adding the attributes of a layer to the other one.
A spatial join is not the same as an attribute join, which is based on common column (attribute) values between two datasets. Spatial joins are based on a spatial relationship: is this point inside this polygon?
You can try doing an attribute join on community area number/name with this Public Health dataset, and the command left_join()
from dplyr
.
The syntax is generally as follows, for point-in-polygon:
st_join(point_sf, poly_sf)
4.4.1 A simple example
We can spatial join just one attribute, or a few. We can use select()
to choose attributes.
One we’ve done our spatial join, we can manipulate the data with count()
and arrange()
to figure out which community areas have the most libraries. This is also known as spatial aggregation.
If work with the spatial data gets too clumsy or slow, we can drop the geometry column with st_geometry()<-
.
4.4.2 Saving your plots
We ran out of time for this last time, but to save a ggplot image, you can use ggsave()
after a ggplot2 command. You can adjust the width and height of the image in arguments to the function.
Challenge
figs/name-of-plot.png
.
4.5 Links
All the links in this workshop:
- Link to Chicago Libraries data: https://data.cityofchicago.org/Education/Libraries-Locations-Hours-and-Contact-Information/x8fc-8rcq
- Link to Chicago Community Areas data: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6