Chapter 12 Local Spatial Autocorrelation 1

Introduction

This notebook cover the functionality of the Local Spatial Autocorrelation section of the GeoDa workbook. We refer to that document for details on the methodology, references, etc. The goal of these notes is to approximate as closely as possible the operations carried out using GeoDa by means of a range of R packages.

The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).

For this notebook, we use Cleveland house price data. Our goal in this lab is show how to assign spatial weights based on different distance functions.

12.0.1 Objectives

After completing the notebook, you should know how to carry out the following tasks:

  • Identify clusters with the Local Moran cluster map and significance map

  • Identify clusters with the Local Geary cluster map and significance map

  • Identify clusters with the Getis-Ord Gi and Gi* statistics

  • Identify clusters with the Local Join Count statistic

  • Interpret the spatial footprint of spatial clusters

  • Assess potential interaction effects by means of conditional cluster maps

  • Assess the significance by means of a randomization approach

  • Assess the sensitivity of different significance cut-off values

  • Interpret significance by means of Bonferroni bounds

12.0.1.1 R Packages used

  • spatmap: To construct significance and cluster maps for a variety of local statistics

  • geodaData: To load the data for this notebook

  • tmap: To format the maps made

  • rgeoda: To run local spatial autocorrelation analysis

12.0.1.2 R Commands used

Below follows a list of the commands used in this notebook. For further details and a comprehensive list of options, please consult the R documentation.

  • Base R: install.packages, library, setwd, set.seed, cut, rep

  • tmap: tm_shape, tm_borders, tm_fill, tm_layout, tm_facets

12.1 Preliminaries

Before starting, make sure to have the latest version of R and of packages that are compiled for the matching version of R (this document was created using R 3.5.1 of 2018-07-02). Also, optionally, set a working directory, even though we will not actually be saving any files.31

12.1.1 Load packages

First, we load all the required packages using the library command. If you don’t have some of these in your system, make sure to install them first as well as their dependencies.32 You will get an error message if something is missing. If needed, just install the missing piece and everything will work after that.

library(sf)
library(tmap)
library(rgeoda)
library(geodaData)
library(RColorBrewer)

12.1.2 spatmap

The main package used throughout this notebook will be rgeoda. This package provides the statistical computations of local spatial statistics and tmap for the mapping component. All of the visualizations are built with a similar style to GeoDa. The visualizations include cluster maps and their associated significance maps. The mapping functions are built off of tmap and can have additional layers added to them like tm_borders or tm_layout.

12.1.3 geodaData

All of the data for the R notebooks is available in the geodaData package. We loaded the library earlier, now to access the individual data sets, we use the double colon notation. This works similar to to accessing a variable with $, in that a drop down menu will appear with a list of the datasets included in the package. For this notebook, we use guerry.

guerry <- geodaData::guerry

12.1.4 Univariate analysis

Throughout the notebook, we will focus on the variable Donatns, which is charitable donations per capita. Before proceeding with the local spatial statistics and visualizations, we will take preliminary look at the spatial distribution of this variable. This is done with tmap functions. We will not go into too much detail on these because there is a lot to cover local spatial statistics and this functionality was covered in a previous notebook. Please the Basic Mapping notebook for more information on basic tmap functionality

For the univariate map, we use the natural breaks or jenks style to get a general sense of the spatial distribution for our variable.

tm_shape(guerry) +
  tm_fill("Donatns", style = "jenks", n = 6) +
  tm_borders() +
  tm_layout(legend.outside = TRUE, legend.outside.position = "left")

12.2 Local Moran

12.2.1 Principle

The local Moran statistic was suggested in Anselin(1995) as a way to identify local clusters and local spaital outliers. Most global spatial autocorrelation can be expressed as a double sum over i and j indices, such as \(\Sigma_i\Sigma_jg_{ij}\). The local form of such a statistic would then be, for each observation(location)i, the sum of the relevant expression over the j index, \(\Sigma_jg_{ij}\).

Specifically, the local Moran statistic takes the form \(cz_i\Sigma_jw_{ij}z_j\), with z in deviations from the mean. The scalar c is the same for all locations and therefore does not play a role in the assessment of significance. The latter is obtained by means of a conditional permutation method, where, in turn, each \(z_i\) is held fixed, and the remaining z-values are randomly permuted to yield a reference distribution for the statistic. This operates in the same fashion as for the global Moran’s I, except that the permutation is carried out for each observation in turn. The result is a pseudo p-value for each location, which can then be used to assess significance. Note that this notion of significance is not the standard one, and should not be interpreted that way (see the discussion of multiple comparisons below).

Assessing significance in and of itself is not that useful for the Local Moran. However, when an indication of significance is combined with the location of each observation in the Moran Scatterplot, a very powerful interpretation becomes possible. The combined information allows for a classification of the significant locations as high-high and low-low spatial clusters, and high-low and low-high spatial outliers. It is important to keep in mind that the reference to high and low is relative to the mean of the variable, and should not be interpreted in an absolute sense.

12.2.2 Implementation

With the function local_moran from rgeoda, we can create a local moran cluster map. The parameters needed are an sf dataframe, which is guerry in our case, and the name of a variable from the sf dataframe.

Some help functions that create maps based the statistical results of rgeoda:

match_palette <- function(patterns, classifications, colors){
  classes_present <- base::unique(patterns)
  mat <- matrix(c(classifications,colors), ncol = 2)
  logi <- classifications %in% classes_present
  pre_col <- matrix(mat[logi], ncol = 2)
  pal <- pre_col[,2]
  return(pal)
}

lisa_map <- function(df, lisa, alpha = .05) {
  clusters <- lisa_clusters(lisa,cutoff = alpha)
  labels <- lisa_labels(lisa)
  pvalue <- lisa_pvalues(lisa)
  colors <- lisa_colors(lisa)
  lisa_patterns <- labels[clusters+1]

  pal <- match_palette(lisa_patterns,labels,colors)
  labels <- labels[labels %in% lisa_patterns]

  df["lisa_clusters"] <- clusters
  tm_shape(df) +
    tm_fill("lisa_clusters",labels = labels, palette = pal,style = "cat")
}

significance_map <- function(df, lisa, permutations = 999, alpha = .05) {
  pvalue <- lisa_pvalues(lisa)
  target_p <- 1 / (1 + permutations)
  potential_brks <- c(.00001, .0001, .001, .01)
  brks <- potential_brks[which(potential_brks > target_p & potential_brks < alpha)]
  brks2 <- c(target_p, brks, alpha)
  labels <- c(as.character(brks2), "Not Significant")
  brks3 <- c(0, brks2, 1)
  
  cuts <- cut(pvalue, breaks = brks3,labels = labels)
  df["sig"] <- cuts
  
  pal <- rev(brewer.pal(length(labels), "Greens"))
  pal[length(pal)] <- "#D3D3D3"
  
  tm_shape(df) +
    tm_fill("sig", palette = pal)
}

It is important to note the default parameters of local_moran. These include permutations = 999, significance_cutoff = .05, and weights = NULL. Permutations is the number of permutations used in computing the reference distributions of the local statistic for each location. Significance_cutoff or alpha is the cutoff significance level. The weights parameter is where we specify the weights used for the computation of the local statistics. In the NULL case, 1st order queen contiguity are computed.

w <- queen_weights(guerry)
lisa <- local_moran(w, guerry['Donatns'])
lisa_map(guerry, lisa)

To get a significance map for the local moran, we use significance_map. Default number of permutations is 999, the alpha level is .05.

significance_map(guerry, lisa) 

12.2.2.1 tmap additions

With the mapping functions of lisa_map, additional tmap layers can be added with the + operator. This gives the maps strong formatting options. With tm_borders, we can make the borders of the local moran map more distinct. Withtm_layout we can add a title and move the legend to the outside of the map. There many more formatting options, including tmap_arrange, which we used earlier.

lisa_map(guerry, lisa) +
  tm_borders() +
  tm_layout(title = "Local Moran Cluster Map of Donatns", legend.outside = TRUE)

We can set the tmap mode to “view”” to get an interactive base map with tmap_mode.

tmap_mode("view")
lisa_map(guerry, lisa) +
  tm_borders() +
  tm_layout(title = "Local Moran Cluster Map of Donatns",legend.outside = TRUE)