Chapter 14 Spatial Autocorrelation

14.1 Install packages

Make sure you have the spdep package installed:

install.packages("spdep")

I also developed a helper package for some of this spatial autocorrelation stuff, called sfExtras. Install with:

# install.packages("remotes")
remotes::install_github("spatialanalysis/sfExtras")

14.2 Review from previous workshops

We’ll be working with a new dataset, called “ncovr”. Go ahead and load it with geodaData:

library(geodaData)
library(sf)
head(ncovr)
## Simple feature collection with 6 features and 69 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -120.8746 ymin: 47.79012 xmax: -94.43044 ymax: 49.37173
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##                NAME STATE_NAME STATE_FIPS CNTY_FIPS  FIPS STFIPS COFIPS
## 1 Lake of the Woods  Minnesota         27       077 27077     27     77
## 2             Ferry Washington         53       019 53019     53     19
## 3           Stevens Washington         53       065 53065     53     65
## 4          Okanogan Washington         53       047 53047     53     47
## 5      Pend Oreille Washington         53       051 53051     53     51
## 6          Boundary      Idaho         16       021 16021     16     21
##   FIPSNO SOUTH     HR60     HR70      HR80      HR90      HC60      HC70
## 1  27077     0 0.000000 0.000000  8.855827  0.000000 0.0000000 0.0000000
## 2  53019     0 0.000000 0.000000 17.208742 15.885624 0.0000000 0.0000000
## 3  53065     0 1.863863 1.915158  3.450775  6.462453 0.3333333 0.3333333
## 4  53047     0 2.612330 1.288643  3.263814  6.996502 0.6666667 0.3333333
## 5  53051     0 0.000000 0.000000  7.770008  7.478033 0.0000000 0.0000000
## 6  16021     0 0.000000 0.000000  4.573101  4.000640 0.0000000 0.0000000
##        HC80      HC90  PO60  PO70  PO80  PO90       RD60       RD70
## 1 0.3333333 0.0000000  4304  3987  3764  4076 -0.1751055 -0.1965356
## 2 1.0000000 1.0000000  3889  3655  5811  6295 -0.8368683 -0.8478556
## 3 1.0000000 2.0000000 17884 17405 28979 30948 -0.5373716 -0.2252831
## 4 1.0000000 2.3333333 25520 25867 30639 33350 -0.8201698 -0.3911261
## 5 0.6666667 0.6666667  6914  6025  8580  8915 -0.9361701 -0.4514569
## 6 0.3333333 0.3333333  5809  5484  7289  8332 -0.8503190 -0.6301795
##          RD80       RD90       PS60       PS70       PS80       PS90 UE60
## 1 -0.36285011 -0.8027737 -1.4499461 -1.4625594 -1.5851232 -1.4955070  7.9
## 2  0.11932743 -0.1354830 -1.7072056 -1.6977199 -1.4440801 -1.3610839  8.2
## 3 -0.51119686 -0.2765439 -0.5681465 -0.5918834 -0.3154613 -0.2831232 10.1
## 4 -0.08242177  0.3707621 -0.5549386 -0.5520160 -0.5253836 -0.4724997  7.5
## 5 -0.28951626  0.1079923 -1.1064386 -1.1817536 -1.0122015 -0.9719778 10.6
## 6 -0.12031784 -0.5515577 -1.2083943 -1.2194174 -1.0980090 -0.9876962 10.8
##   UE70      UE80     UE90     DV60     DV70     DV80      DV90 MA60 MA70
## 1  9.0  5.902579  3.89479 1.858974 2.619808 3.746594  7.388535 28.8 30.5
## 2 15.4 15.422886 16.81159 2.863278 3.686007 6.625442 11.543135 25.9 27.1
## 3  9.0 13.574064 10.70079 2.711447 2.976378 5.448223  9.123212 29.6 31.8
## 4 10.5 12.700195 10.20354 3.372041 4.090386 7.122333  9.245627 29.4 31.1
## 5 13.4 18.148999 14.99102 3.008988 4.010695 5.287860 10.158351 31.2 33.8
## 6 12.1 16.395711 11.37951 2.590420 3.489399 6.071152  8.477287 27.6 30.3
##   MA80 MA90     POL60     POL70     POL80     POL90    DNL60     DNL70
## 1 34.5 35.5  8.367300  8.290794  8.233238  8.312871 1.188755 1.1122489
## 2 27.2 32.8  8.265907  8.203851  8.667508  8.747511 0.568786 0.5067301
## 3 28.7 34.5  9.791662  9.764513 10.274327 10.340064 1.975245 1.9480958
## 4 31.2 35.0 10.147218 10.160723 10.330029 10.414813 1.571567 1.5850725
## 5 31.3 36.1  8.841304  8.703673  9.057189  9.095491 1.595649 1.4580177
## 6 29.2 32.8  8.667164  8.609590  8.894122  9.027859 1.516462 1.4588886
##       DNL80    DNL90   MFIL59   MFIL69   MFIL79   MFIL89 FP59 FP69
## 1 1.0660453 1.145294 8.220134 8.708309 9.551729 10.27921 36.3 16.1
## 2 0.9712953 1.049482 8.519790 9.006754 9.679719 10.25397 22.5 11.6
## 3 2.4623128 2.524736 8.403352 8.868835 9.739379 10.26270 30.9 15.9
## 4 1.7581014 1.845350 8.479907 8.969923 9.715228 10.04962 25.0 12.9
## 5 1.8129617 1.850906 8.508354 8.895219 9.648466 10.11144 26.7 18.3
## 6 1.7488466 1.882032 8.423102 8.946505 9.604745 10.13809 27.2 11.1
##       FP79     FP89      BLK60      BLK70      BLK80      BLK90      GI59
## 1 13.37047  8.51419 0.44144981 0.47654878 0.05313496 0.02453386 0.2852352
## 2 18.63395 17.52451 0.07714065 2.29822161 0.55067975 0.31771247 0.2561578
## 3 11.22669 13.58268 0.07828226 0.09192761 0.09317092 0.21002973 0.2839986
## 4 12.19627 16.94620 0.09796238 0.28221286 0.10117824 0.15592204 0.2585395
## 5 12.59607 15.93098 0.02892682 0.01659751 0.06993007 0.13460460 0.2432630
## 6 14.78873 10.63830 0.00000000 0.05470460 0.04115791 0.03600576 0.2619386
##        GI69      GI79      GI89      FH60 FH70      FH80      FH90
## 1 0.3723362 0.3421036 0.3364546 11.279621  5.4  5.663881  9.515860
## 2 0.3606654 0.3619284 0.3606395 10.053476  2.6 10.079576 11.397059
## 3 0.3940829 0.3575660 0.3699418  9.258437  5.6  6.812127 10.352015
## 4 0.3712182 0.3812402 0.3945189  9.039900  8.1 10.084926 12.840340
## 5 0.3656141 0.3587056 0.3878477  8.243930  4.1  7.557643 10.313002
## 6 0.3503514 0.3559135 0.3405252  7.112971  6.8  8.249497  9.343201
##                         geometry
## 1 MULTIPOLYGON (((-95.34258 4...
## 2 MULTIPOLYGON (((-118.8505 4...
## 3 MULTIPOLYGON (((-117.4378 4...
## 4 MULTIPOLYGON (((-118.971 47...
## 5 MULTIPOLYGON (((-117.4375 4...
## 6 MULTIPOLYGON (((-117.028 48...

Question

Take a few minutes and try to understand what this is about.

  • How many observations and variables are there? What data is stored? (dim(), str(), head(), summary())
  • What does the metadata tell you about this data? (?ncovr)
  • What geometries are in this data? Can you make a quick map with plot()?
  • What coordinate reference system is there? Is this data projected? (st_crs())

Question

Try to make a simple map of one of the attributes of interest (homicides, etc) with tmap. Can you make it interactive?

14.3 Spatial autocorrelation

We’ll start by talking about spatial weights. There are two main types of spatial weights, contiguity and distance based weights.

We’ll focus on contiguity today: aka, that a spatial unit shares a border with another spatial unit. We look at rook vs. queen contiguity.

There’s also second order contiguity:

How we define “neighbors” matters when we are trying to determine spatial autocorrelation.

14.4 Do it in R

Load two more libraries:

library(sfExtras)
library(spdep)

We’ll be using the following functions (potentially buggy):

?st_rook
?st_queen
?st_as_nb
?st_centroid_coords # for mapping the weights

The code we wrote last week to calculate weights:

library(tmap)
tm_shape(ncovr) +
  tm_polygons("HR60")

ncovr_rook <- st_rook(ncovr)
## although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
ncovr_queen <- st_queen(ncovr)
## although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
# check average number of neighbors per county
rook_neighbors <- lengths(ncovr_rook)
queen_neighbors <- lengths(ncovr_queen)

mean(rook_neighbors) # 5.6 ish
## [1] 5.571475
mean(queen_neighbors) # 5.8 ish
## [1] 5.889141
# convert lists of neighbors to "nb" object to make map
rook_nb <- st_as_nb(ncovr_rook)
summary(rook_nb)
## Neighbour list object:
## Number of regions: 3085 
## Number of nonzero links: 17188 
## Percentage nonzero weights: 0.1805989 
## Average number of links: 5.571475 
## Link number distribution:
## 
##    1    2    3    4    5    6    7    8    9   10   11   13 
##   24   41  108  377  863 1007  484  136   37    6    1    1 
## 24 least connected regions:
## 45 49 585 643 837 1197 1377 1402 1442 1464 1470 1523 1531 1532 1543 1596 1605 1653 1737 1767 1775 2892 2893 2919 with 1 link
## 1 most connected region:
## 606 with 13 links
queen_nb <- st_as_nb(ncovr_queen)
summary(queen_nb)
## Neighbour list object:
## Number of regions: 3085 
## Number of nonzero links: 18168 
## Percentage nonzero weights: 0.190896 
## Average number of links: 5.889141 
## Link number distribution:
## 
##    1    2    3    4    5    6    7    8    9   10   11   13   14 
##   24   36   91  281  620 1037  704  227   50   11    2    1    1 
## 24 least connected regions:
## 45 49 585 643 837 1197 1377 1402 1442 1464 1470 1523 1531 1532 1543 1596 1605 1653 1737 1767 1775 2892 2893 2919 with 1 link
## 1 most connected region:
## 1371 with 14 links
centroid_coords <- st_centroid_coords(ncovr)

plot(queen_nb, centroid_coords, lwd = 0.2, cex = 0.5, col = "blue")

14.5 Second order contiguity weights

14.6 Distance-Band Spatial Weights

To match the GeoDa documentation, we’ll be using a dataset called “clev_pts”. Go ahead and load it with geodaData (if you can’t, you can download it from the Data and Lab website):

library(geodaData)
library(sf)
head(clev_pts)
## Simple feature collection with 6 features and 9 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 2177090 ymin: 662872 xmax: 2182100 ymax: 663462
## epsg (SRID):    3734
## proj4string:    +proj=lcc +lat_1=41.7 +lat_2=40.43333333333333 +lat_0=39.66666666666666 +lon_0=-82.5 +x_0=600000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
##   unique_id     parcel       x      y sale_price tract10int Quarter year1
## 1      1183 002-02-036 2177340 663165     235500     101200       4  2015
## 2      1198 002-02-053 2177090 662872      65000     101200       4  2015
## 3      1516 002-14-053 2182100 663462      92000     103500       4  2015
## 4      1606 002-15-038 2181090 663162       5000     103400       4  2015
## 5      1612 002-15-043 2181090 663380     116250     103400       4  2015
## 6      1624 002-16-003 2180350 663301     120000     103100       4  2015
##   yrquarter               geometry
## 1       154 POINT (2177340 663165)
## 2       154 POINT (2177090 662872)
## 3       154 POINT (2182100 663462)
## 4       154 POINT (2181090 663162)
## 5       154 POINT (2181090 663380)
## 6       154 POINT (2180350 663301)

Question

Take a few minutes and try to understand what this is about.

  • How many observations and variables are there? What data is stored? (dim(), str(), head(), summary())
  • What does the metadata tell you about this data? (?clev_pts)
  • What geometries are in this data? Can you make a quick map with plot()?
  • What coordinate reference system is there? Is this data projected? (st_crs())

Question

Try to make a simple map of one of the attributes of interest (homicides, etc) with tmap. Can you make it interactive?

There are two ways to go about doing distance-based spatial weights, as mentioned in the GeoDa documentation:

  1. Determine a distance band under which something is considered a neighbor (see below)
  2. Determine the number of neighbors each point should have (distances will vary)

We’ll start with the first.

?dnearneigh

Let’s specify a bandwidth of 1000.

dnearneigh(clev_pts, 0, 1000)
## Neighbour list object:
## Number of regions: 205 
## Number of nonzero links: 314 
## Percentage nonzero weights: 0.7471743 
## Average number of links: 1.531707 
## 55 regions with no links:
## 3 16 18 20 22 29 30 36 37 55 56 57 58 59 63 64 92 96 111 112 116 117 118 120 121 131 139 140 141 142 145 148 149 150 151 152 153 154 155 156 157 160 163 164 165 166 167 173 174 187 190 197 198 199 200

Question

What units does “1000” refer to? (Hint: check with st_crs!)

We can plot this to get a sense of what’s going on. Remember, the syntax for plot() is plot(nb, coord_matrix, options).

dist_1000_nb <- dnearneigh(clev_pts, 0, 1000)
plot(dist_1000_nb, st_coordinates(clev_pts), lwd=.2, col="blue", cex = .5)

Notice how a lot of points are missing neighbors? They’re referred to as “isolates”.

Maybe we want to specify a minimum distance so that each point has a neighbor. This is referred to as the critical threshold.

If we want each point to have one neighbor, we can use knearneigh. This (older function) takes in a matrix and returns a matrix, so we need to convert formats.

knn_matrix <- knearneigh(st_coordinates(clev_pts))

To convert this matrix into a familiar nb, we use knn2nb:

knn2nb(knn_matrix)
## Neighbour list object:
## Number of regions: 205 
## Number of nonzero links: 205 
## Percentage nonzero weights: 0.4878049 
## Average number of links: 1 
## Non-symmetric neighbours list

Now when we plot again, we see each point has a neighbor:

k1_nb <- knn2nb(knn_matrix)
plot(k1_nb, st_coordinates(clev_pts), lwd=.2, col="blue", cex = .5)

Each point has exactly one neighbor here.

Question

Can you set it so that each point has 6 neighbors? Plot it!

Now let’s try doing something different: setting the distance band equal to the critical threshold in dnearneigh().

The nbdists function will give us the distance between a point and its closest neighbor.

nbdists(k1_nb, st_coordinates(clev_pts))

Question

Can you find the maximum distance in this list and assign it to a variable called critical_thres? Can you put it into the dnearneigh function and plot a map where we have a fixed distance band of the critical threshold?

14.7 Connectivity histograms

When doing distance band weights, points have different numbers of neighbors, unlike in k-nearest weights. Let’s get the cardinality (aka number of neighbors) for each observation

card(dist_crit_nb)
##   [1]  9  9 15 15 15 17 15 15 19 18 20 20 20 19 14 17 17 19 13  3  9 10  7
##  [24]  6  6 12 10 10 10 13 16 17 12 16 14 17  9 11 13 11 16 14 17 17 18 13
##  [47] 13 17 18 16 16 24 27 22 17  8  7  5  0 11 11 11 11 11 13 13 13 13 13
##  [70] 16 16 14 14 13 13 10 10  9 10 23 25 32 28 28 26 23 23 32 23 23 21 24
##  [93] 18 18 17 21 15 20 20 22 19 22 22 19 16 18 20 19 19 23 19 29 30  1  1
## [116]  1  1  3  5  3  3  4  9  9  9  9 11 10 11 13 11  8  9 12 16 10  9  7
## [139]  6  6  3  6  8  8  7  7  7  4  6  5  5  3  2  2  2  7  8  8 10  8  6
## [162]  3  2  2  3  2  6  7  8 10 14 14 12 17 16 16 20 19 19 13 17 13 15 12
## [185] 11 14  5  8  9  8 10 12  6  5  4  3  4  1  9  4 13 13 15 15 17

Question

What’s the most common number of neighbors? Try making a ggplot histogram out of this, too.