Make sure to move to the proper working directory. We continue to work with the NYC_Sub_borough_Area data.
We will be using four packages: tidyverse
, sf
, tmap
, and RColorBrewer
. You may need to install some of these if you don’t have them yet.
library(tidyverse)
[30m── [1mAttaching packages[22m ────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.0.0 [32m✔[30m [34mpurrr [30m 0.2.5
[32m✔[30m [34mtibble [30m 1.4.2 [32m✔[30m [34mdplyr [30m 0.7.6
[32m✔[30m [34mtidyr [30m 0.8.1 [32m✔[30m [34mstringr[30m 1.3.1
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mforcats[30m 0.3.0[39m
[30m── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(sf)
Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(tmap)
library(RColorBrewer)
Spatial data are characterized by the combination of locational information (the precise definition of points, lines or areas) and so-called attribute information (variables). So far, we have only worked with attribute information, stored in the file NYC_Sub_borough_Area.dbf and turned into a data frame (or tibble). In order to make maps and carry out spatial analysis we need the locational information as well. There are many formats to store this information. To keep things simple we will be using a so-called shape file, a standard supported by ESRI, one of the major commercial GIS vendors.
It is a bit confusing, since there is no such thing as one shape file, but there is instead a collection of three (or four) files. One file has the extension .shp, one .shx, one .dbf, and one .prj (with the projection information). The first three are required, the fourth one is optional, but highly recommended. The files should all be in the same directory and have the same file name.
There are a number of different ways to read shape files, but we will use the one associated with the package sf
. The command is st_read
and you pass the file name for the .shp file (with the shp file extension). In our example, that would be st_read("NYC_Sub_borough_Area.shp")
. We assign this to the nyc data frame.
nyc <- st_read("Nyc_Sub_borough_Area.shp")
Reading layer `NYC_Sub_borough_Area' from data source `/Users/luc/Dropbox/z_Courses/uc_spatial_analysis_1_2018/R_stuff/NYC_Sub_borough_Area.shp' using driver `ESRI Shapefile'
Simple feature collection with 55 features and 34 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 913037.2 ymin: 120117 xmax: 1067549 ymax: 272751.4
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
Upon successful completion, some summary characteristics of the layer are listed that describe its spatial characteristics. When you compute a summary
of the contents of the data frame, you will notice that in addition to the by now familiar variable names, there is also a column named geometry. This contains the spatial information. You will be exploring this further next quarter, but for now it suffices to know that the attributes are combined with the spatial properties.
summary(nyc)
bor_subb name code subborough forhis06 forhis07 forhis08 forhis09
Min. :101.0 Astoria : 1 Min. :101.0 Astoria : 1 Min. :10.70 Min. :10.37 Min. : 9.689 Min. :14.66
1st Qu.:204.5 Bay Ridge : 1 1st Qu.:204.5 Bay Ridge : 1 1st Qu.:29.61 1st Qu.:28.92 1st Qu.:28.182 1st Qu.:28.28
Median :218.0 Bayside / Little Neck: 1 Median :218.0 Bayside/Little Neck: 1 Median :39.72 Median :39.21 Median :39.567 Median :35.99
Mean :274.4 Bedford Stuyvesant : 1 Mean :274.4 Bedford Stuyvesant : 1 Mean :39.22 Mean :38.33 Mean :37.764 Mean :37.46
3rd Qu.:403.5 Bellerose / Rosedale : 1 3rd Qu.:403.5 Bensonhurst : 1 3rd Qu.:44.61 3rd Qu.:45.30 3rd Qu.:45.496 3rd Qu.:45.14
Max. :503.0 Bensonhurst : 1 Max. :503.0 Borough Park : 1 Max. :69.52 Max. :69.40 Max. :69.341 Max. :69.16
(Other) :49 (Other) :49
forwh06 forwh07 forwh08 forwh09 hhsiz1990 hhsiz00 hhsiz02 hhsiz05 hhsiz08
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :1.560 Min. :1.574 Min. :1.532 Min. :1.544 Min. :1.544
1st Qu.:14.26 1st Qu.:14.56 1st Qu.:14.30 1st Qu.:13.57 1st Qu.:2.428 1st Qu.:2.445 1st Qu.:2.234 1st Qu.:2.271 1st Qu.:2.203
Median :21.18 Median :21.10 Median :21.39 Median :19.69 Median :2.716 Median :2.718 Median :2.568 Median :2.567 Median :2.488
Mean :24.36 Mean :23.54 Mean :22.67 Mean :21.98 Mean :2.635 Mean :2.639 Mean :2.505 Mean :2.471 Mean :2.424
3rd Qu.:31.53 3rd Qu.:29.78 3rd Qu.:31.38 3rd Qu.:27.98 3rd Qu.:2.937 3rd Qu.:2.929 3rd Qu.:2.812 3rd Qu.:2.682 3rd Qu.:2.665
Max. :60.36 Max. :59.64 Max. :59.11 Max. :56.89 Max. :3.376 Max. :3.202 Max. :3.087 Max. :3.106 Max. :3.222
kids2000 kids2005 kids2006 kids2007 kids2008 kids2009 rent2002 rent2005 rent2008
Min. : 8.382 Min. : 8.708 Min. : 8.683 Min. : 8.067 Min. : 8.004 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0
1st Qu.:30.301 1st Qu.:27.871 1st Qu.:26.215 1st Qu.:27.366 1st Qu.:27.843 1st Qu.:26.69 1st Qu.: 750.0 1st Qu.: 897.5 1st Qu.:1000
Median :38.228 Median :35.763 Median :36.524 Median :35.414 Median :33.883 Median :33.53 Median : 800.0 Median : 990.0 Median :1100
Mean :36.040 Mean :34.353 Mean :33.847 Mean :33.801 Mean :33.212 Mean :32.07 Mean : 944.4 Mean :1045.9 Mean :1257
3rd Qu.:42.773 3rd Qu.:42.152 3rd Qu.:41.174 3rd Qu.:40.290 3rd Qu.:40.610 3rd Qu.:39.68 3rd Qu.:1000.0 3rd Qu.:1100.0 3rd Qu.:1362
Max. :55.367 Max. :50.872 Max. :51.911 Max. :51.502 Max. :49.716 Max. :48.13 Max. :2500.0 Max. :2500.0 Max. :2900
rentpct02 rentpct05 rentpct08 pubast90 pubast00 yrhom02 yrhom05 yrhom08
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :23.89 Min. : 0.8981 Min. : 8.216 Min. : 8.583 Min. : 8.278
1st Qu.:13.27 1st Qu.:13.04 1st Qu.:15.48 1st Qu.:62.38 1st Qu.: 3.3860 1st Qu.:11.264 1st Qu.:11.268 1st Qu.:10.846
Median :21.11 Median :22.59 Median :23.70 Median :75.44 Median : 6.8781 Median :12.391 Median :12.217 Median :12.274
Mean :22.01 Mean :22.08 Mean :23.41 Mean :71.53 Mean : 8.4372 Mean :12.251 Mean :12.273 Mean :12.058
3rd Qu.:29.49 3rd Qu.:30.51 3rd Qu.:31.50 3rd Qu.:84.59 3rd Qu.:11.5369 3rd Qu.:13.160 3rd Qu.:13.288 3rd Qu.:13.001
Max. :43.38 Max. :40.88 Max. :47.38 Max. :96.65 Max. :23.4318 Max. :16.124 Max. :16.627 Max. :15.425
geometry
MULTIPOLYGON :55
epsg:NA : 0
+proj=lcc ...: 0
Checking the class
of the nyc data frame, we see that it is of type sf, which stands for simple features, the data structure used to deal with spatial information.
class(nyc)
[1] "sf" "data.frame"
We can also make a quick plot
, but this is not exactly what we want. A small map is created for each of the variables. It just confirms that we indeed have both the attributes and the spatial information.
plot(nyc)
plotting the first 9 out of 34 attributes; use max.plot = 34 to plot all
We will again need variables to do some conditioning. If you did not save these with the dbf file, we can quickly re-create them and add them to the nyc data frame.
The manbronx variable:
nyc <- nyc %>% mutate(manbronx = if_else((code > 300 & code < 311) | (code > 100 & code < 111),"Select","Rest"))
The cut-off by household size in 2000:
nyc$cut.hhsiz <- cut_number(nyc$hhsiz00,n=2)
nyc$cut.hhsiz
[1] (2.72,3.2] [1.57,2.72] (2.72,3.2] [1.57,2.72] [1.57,2.72] (2.72,3.2] (2.72,3.2] [1.57,2.72] [1.57,2.72] [1.57,2.72] [1.57,2.72] (2.72,3.2]
[13] (2.72,3.2] [1.57,2.72] (2.72,3.2] (2.72,3.2] [1.57,2.72] [1.57,2.72] [1.57,2.72] [1.57,2.72] [1.57,2.72] [1.57,2.72] [1.57,2.72] [1.57,2.72]
[25] [1.57,2.72] [1.57,2.72] (2.72,3.2] (2.72,3.2] (2.72,3.2] (2.72,3.2] (2.72,3.2] (2.72,3.2] [1.57,2.72] (2.72,3.2] [1.57,2.72] [1.57,2.72]
[37] (2.72,3.2] (2.72,3.2] (2.72,3.2] (2.72,3.2] [1.57,2.72] [1.57,2.72] [1.57,2.72] (2.72,3.2] (2.72,3.2] (2.72,3.2] (2.72,3.2] (2.72,3.2]
[49] (2.72,3.2] [1.57,2.72] (2.72,3.2] (2.72,3.2] [1.57,2.72] [1.57,2.72] [1.57,2.72]
Levels: [1.57,2.72] (2.72,3.2]
A final check on all the variables using summary
:
summary(nyc)
bor_subb name code subborough forhis06 forhis07 forhis08 forhis09
Min. :101.0 Astoria : 1 Min. :101.0 Astoria : 1 Min. :10.70 Min. :10.37 Min. : 9.689 Min. :14.66
1st Qu.:204.5 Bay Ridge : 1 1st Qu.:204.5 Bay Ridge : 1 1st Qu.:29.61 1st Qu.:28.92 1st Qu.:28.182 1st Qu.:28.28
Median :218.0 Bayside / Little Neck: 1 Median :218.0 Bayside/Little Neck: 1 Median :39.72 Median :39.21 Median :39.567 Median :35.99
Mean :274.4 Bedford Stuyvesant : 1 Mean :274.4 Bedford Stuyvesant : 1 Mean :39.22 Mean :38.33 Mean :37.764 Mean :37.46
3rd Qu.:403.5 Bellerose / Rosedale : 1 3rd Qu.:403.5 Bensonhurst : 1 3rd Qu.:44.61 3rd Qu.:45.30 3rd Qu.:45.496 3rd Qu.:45.14
Max. :503.0 Bensonhurst : 1 Max. :503.0 Borough Park : 1 Max. :69.52 Max. :69.40 Max. :69.341 Max. :69.16
(Other) :49 (Other) :49
forwh06 forwh07 forwh08 forwh09 hhsiz1990 hhsiz00 hhsiz02 hhsiz05 hhsiz08
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :1.560 Min. :1.574 Min. :1.532 Min. :1.544 Min. :1.544
1st Qu.:14.26 1st Qu.:14.56 1st Qu.:14.30 1st Qu.:13.57 1st Qu.:2.428 1st Qu.:2.445 1st Qu.:2.234 1st Qu.:2.271 1st Qu.:2.203
Median :21.18 Median :21.10 Median :21.39 Median :19.69 Median :2.716 Median :2.718 Median :2.568 Median :2.567 Median :2.488
Mean :24.36 Mean :23.54 Mean :22.67 Mean :21.98 Mean :2.635 Mean :2.639 Mean :2.505 Mean :2.471 Mean :2.424
3rd Qu.:31.53 3rd Qu.:29.78 3rd Qu.:31.38 3rd Qu.:27.98 3rd Qu.:2.937 3rd Qu.:2.929 3rd Qu.:2.812 3rd Qu.:2.682 3rd Qu.:2.665
Max. :60.36 Max. :59.64 Max. :59.11 Max. :56.89 Max. :3.376 Max. :3.202 Max. :3.087 Max. :3.106 Max. :3.222
kids2000 kids2005 kids2006 kids2007 kids2008 kids2009 rent2002 rent2005 rent2008
Min. : 8.382 Min. : 8.708 Min. : 8.683 Min. : 8.067 Min. : 8.004 Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0
1st Qu.:30.301 1st Qu.:27.871 1st Qu.:26.215 1st Qu.:27.366 1st Qu.:27.843 1st Qu.:26.69 1st Qu.: 750.0 1st Qu.: 897.5 1st Qu.:1000
Median :38.228 Median :35.763 Median :36.524 Median :35.414 Median :33.883 Median :33.53 Median : 800.0 Median : 990.0 Median :1100
Mean :36.040 Mean :34.353 Mean :33.847 Mean :33.801 Mean :33.212 Mean :32.07 Mean : 944.4 Mean :1045.9 Mean :1257
3rd Qu.:42.773 3rd Qu.:42.152 3rd Qu.:41.174 3rd Qu.:40.290 3rd Qu.:40.610 3rd Qu.:39.68 3rd Qu.:1000.0 3rd Qu.:1100.0 3rd Qu.:1362
Max. :55.367 Max. :50.872 Max. :51.911 Max. :51.502 Max. :49.716 Max. :48.13 Max. :2500.0 Max. :2500.0 Max. :2900
rentpct02 rentpct05 rentpct08 pubast90 pubast00 yrhom02 yrhom05 yrhom08
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. :23.89 Min. : 0.8981 Min. : 8.216 Min. : 8.583 Min. : 8.278
1st Qu.:13.27 1st Qu.:13.04 1st Qu.:15.48 1st Qu.:62.38 1st Qu.: 3.3860 1st Qu.:11.264 1st Qu.:11.268 1st Qu.:10.846
Median :21.11 Median :22.59 Median :23.70 Median :75.44 Median : 6.8781 Median :12.391 Median :12.217 Median :12.274
Mean :22.01 Mean :22.08 Mean :23.41 Mean :71.53 Mean : 8.4372 Mean :12.251 Mean :12.273 Mean :12.058
3rd Qu.:29.49 3rd Qu.:30.51 3rd Qu.:31.50 3rd Qu.:84.59 3rd Qu.:11.5369 3rd Qu.:13.160 3rd Qu.:13.288 3rd Qu.:13.001
Max. :43.38 Max. :40.88 Max. :47.38 Max. :96.65 Max. :23.4318 Max. :16.124 Max. :16.627 Max. :15.425
manbronx geometry cut.hhsiz
Length:55 MULTIPOLYGON :55 [1.57,2.72]:28
Class :character epsg:NA : 0 (2.72,3.2] :27
Mode :character +proj=lcc ...: 0
If you want to keep the new variables for future use, you need to write them out as a new shape file. Without further details, the command is st_write
. You must specify the data frame, a name for the shape file (without the file extension) and the driver
, e.g. st_write(nyc,"new_nyc",driver="ESRI Shapefile")
. This will create a folder containing the four new files.
st_write(nyc,"new_nyc",driver="ESRI Shapefile")
We now turn to the choropleth mapping functionality included in the tmap package. This package is extremely powerful, and there are many features that we do not cover. Detailed information on tmap functionality can be found at tmap documentation. In addition, several extensive examples are given in the chapter on mapping in Lovelace, Nowosad, and Muenchow’s Geocomputation with R book.
The package tmap uses the same layered logic for graphics as ggplot. The initial command is tm_shape
, which specifies the geography to which the mapping is applied. This is followed by a number of tm_*
options that select the type of map and several optional customizations.
We start with a basic choropleth map of the rent2008 variable.
In tmap there are two ways to create a choropleth map. The simplest one is to use the tm_polygons
command with the variable name as the argument (under the hood, this is the value for the col parameter, i.e., the color of the polygons).
In our example, the map is created with two commands, tm_shape(nyc)
and tm_polygons("rent2008")
(note the quotes in “rent2008”, unlike what we did for ggplot).
The color coding corresponds to style = "pretty"
, which is the default when the classification breaks are not set explicitly, and the default values are used (see the discussion of map classifications below).
tm_shape(nyc) +
tm_polygons("rent2008")
As it turns out, the tm_polygons
command is a wrapper around two other functions, tm_fill
and tm_borders
. tm_fill
controls the contents of the polygons (color, classification, etc.), while tm_borders
does the same for the polygon outlines.
For example, using the same shape (but no variable), whe obtain the outlines of the sub-boroughs from the tm_borders( )
command.
tm_shape(nyc) +
tm_borders()
Similarly, we obtain a choropleth map without the polygon outlines when we just use the tm_fill("rent2008")
command.
tm_shape(nyc) +
tm_fill("rent2008")
When we combine the two commands, we obtain the same map as with tm_polygons
(this illustrates how in R one can often obtain the same result in a number of different ways).
tm_shape(nyc) +
tm_fill("rent2008") +
tm_borders()
An extensive set of options is available to customize the appearance of the map. A full list is given in the documentation page for tm_fill
. In what follows, we briefly consider the most common ones.
The range of colors used to depict the spatial distribution of a variable is determined by the palette. The palette is an argument to the tm_fill
function. Several built-in palettes are contained in tmap. For example, using palette = "Reds"
(again, note the quotes) would yield the following map for our example.
tm_shape(nyc) +
tm_fill("rent2008",palette="Reds") +
tm_borders()
Under the hood, “Reds” refers to one of the color schemes supported by the RColorBrewer package, discussed next.
The preferred approach to select a color palette is to choose one of the schemes contained in the RColorBrewer package. These are based on the research of cartographer Cynthia Brewer (see the colorbrewer2 web site for details). ColorBrewer makes a distinction between sequential scales (for a scale that goes from low to high), diverging scales (to highlight how values differ from a central tendency), and qualitative scales (for categorical variables). For each scale, a series of single hue and multi-hue scales are suggested. In the RColorBrewer package, these are referred to by a name (e.g., the “Reds” palette we used above is an example). The full list is contained in the RColorBrewer documentation.
There are two very useful commands in this package. One sets a color palette by specifying its name and the number of desired categories. The result is a character vector with the hex codes of the corresponding colors.
For example, we select a sequential color scheme going from blue to green, as BuGn, by means of the command brewer.pal
, with the number of categories (6) and the scheme as arguments. The resulting vector contains the HEX codes for the colors.
pal <- brewer.pal(6,"BuGn")
pal
[1] "#EDF8FB" "#CCECE6" "#99D8C9" "#66C2A4" "#2CA25F" "#006D2C"
Using this palette in our map yields the following result.
tm_shape(nyc) +
tm_fill("rent2008",palette=pal) +
tm_borders()
The command display.brewer.pal
allows us to explore different color schemes before applying them to a map. Continuing with our “BuGn” example, that would be invoked with display.brewer.pal(6,"BuGn")
.
display.brewer.pal(6,"BuGn")
There are many options to change the formatting of the legend entries through the legend.format argument. We refer to the tm_fill
documentation for specific details.
Often, the automatic title for the legend is not that attractive, since it is simply the variable name. This can be customized by setting the title argument to tm_fill
. For example, keeping all the other settings to the default, we change the legend title to Rent in 2008 by specifying title="Rent in 2008"
as an option to tm_fill
.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders()
Another important aspect of the legend is its positioning. This is handled through the tm_layout
function. This function has a vast number of options, as detailed in the documentation. There are also specialized subsets of layout functions, focused on specific aspects of the map, such as tm_legend
, tm_style
and tm_format
. We illustrate the positioning of the legend.
The default is to position the legend inside the map. Often, this default solution is appropriate, but sometimes further control is needed. The legend.position argument to the tm_layout
function takes a vector of two string variables that determine both the horizontal position (“left”, “right”, or “center”) and the vertical position (“top”, “bottom”, or “center”).
For example, if we would want to move the legend to the lower-right position (clearly inferior to the default solution), we would specify the legend.position = c("right","bottom")
to the tm_layout
function.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(legend.position = c("right", "bottom"))
There is also the option to position the legend outside the frame of the map. This is accomplished by setting legend.outside to TRUE (the default is FALSE), and optionally also specify its position by means of legend.outside.position. The latter can take the values “top”, “bottom”, “right”, and “left”.
For example, to position the legend outside and on the right, would be accomplished by passing the option legend.outside = TRUE, legend.outside.position = "right"
to the tm_layout
function.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(legend.outside = TRUE, legend.outside.position = "right")
We can also customize the size of the legend, its alignment, font, etc. We refer to the documentation for specifics.
A final interesting option for the legend is to match it with a histogram that visualizes the number of observations in each map category. This is accomplished by setting legend.hist=TRUE
in the tm_fill
command. Further customization is possible, but is not covered here.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008",legend.hist=TRUE) +
tm_borders() +
tm_layout(legend.outside = TRUE, legend.outside.position = "right")
Another functionality of the tm_layout
function is to set a title for the map, and specify its position, size, etc. For example, we can set title = "Rent 2008 NYC Sub-Boroughs", title.size = 0.8, title.position = c("right","bottom")
(for details, see the tm_layout
documentation). We made the font size a bit smaller (0.8) in order not to overwhelm the map, and positioned it in the bottom right-hand corner.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(title = "Rent 2008 NYC Sub-Boroughs", title.size = 0.8, title.position = c("right","bottom"))
Finally, in order to have a title appear on top (or on the bottom) of the map, rather than inside (the default), we need to set the main.title argument of the tm_layout
function. For example, main.title = "Rent 2008 NYC Sub-Boroughs", title.size = 1.5,main.title.position="center"
has the title centered on top and with the size set to 1.5 to have a larger font).
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(main.title = "Rent 2008 NYC Sub-Boroughs", title.size = 1.5,main.title.position="center")
The default mode in tmap is plot, i.e., the standard plotting environment in R to draw a map on the screen or to a device (e.g., a pdf file). There is also an interactive mode, which builds upon leaflet to add a basemap and interact with the map through zooming and identification of individual observations. The interactive mode is referred to as view. We switch between modes by means of the tmap_mode
command. For example, to switch to the interactive mode, we set tmap_mode("view")
.
tmap_mode("view")
tmap mode set to interactive viewing
There are a number of different ways in which a basemap can be added. The current preferred approach is through the tm_basemap
command. This takes two important arguments. One is the name of the server. A number of basemaps are supported (and no doubt, that number will increase over time), but we will illustrate the OpenStreetMap option. A second argument is the degree of transparency, or alpha level. This can also be set for the map itself through tm_fill
. In practice, one typically needs to experiment a bit to find the right balance between the information in the choropleth map and the background.
In the example below, we set the alpha level for the main map to 0.7 in tm_fill
, and for the base layer to 0.5 in tm_basemap
, as tm_basemap(server="OpenStreetMap",alpha=0.5)
.
When we move the pointer over the polygons, their ID value is shown. A click on a location also gives the value for the variable that is being mapped. Zooming and panning are supported as well. In addition, it is possible to stack several layers, but we won’t pursue that here.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008",alpha=0.7) +
tm_borders() +
tm_basemap(server="OpenStreetMap",alpha=0.5)
Before we proceed, we turn the mode back to plot.
tmap_mode("plot")
tmap mode set to plotting
The statistical aspects of a choropleth map are expressed through the map classification that is used to convert observations for a continuous variable into a small number of bins, similar to what is the case for a histogram. Different classifications reveal different aspects of the spatial distribution of the variable.
In tmap, the classification is selected by means of the style option in tm_fill
. So far, we have used the default, which is set to pretty. The latter is a base R function to compute a sequence of roughly equally spaced values.
There are many classification types available in tmap, which each correspond to either a base R function, or to a custom expression contained in the classIntervals functionality.
In this section, we review the quantile map, natural breaks map, and equal intervals map, the most commonly used map classifications. We continue to use the same example for rent2008.
A quantile map shows the spatial distribution of a variable by classifying the observations into categories according to the quantile to which they below. For example, for a quintile map (with 5 categories), the observations are sorted, and the first 20% are given the color for the first quintile, the second for the second quintile, etc.
A quantile map is obtained by setting style="quantile"
in the tm_fill
command. The number of categories is taken from the n argument, which has a default value of 5. So, for example, using this default will yield a quintile map with five categories (we specify the type of map in the title through tm_layout
), e.g., title= "Quintile Map"
, with title.size = 0.9
for somewhat better proportions.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008",style="quantile") +
tm_borders() +
tm_layout(title = "Quintile Map", title.size = 0.9, title.position = c("right","bottom"))
For a quartile map, with four categories, we need to specify the argument n=4
explicitly (five is the default, so we did not have to specify that in the quintile map). The rest of the commands are the same as before (but now with title = "Quartile Map"
.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008",n=4,style="quantile") +
tm_borders() +
tm_layout(title = "Quartile Map", title.size = 0.9, title.position = c("right","bottom"))
An equal intervals map is the counterpart of a histogram. You specify the number of bins that each cover the same range from beginning to end for the variable, hence the name equal intervals. For example, we can check the range for rent2008 using the summary
command (remember the data frame and the $ symbol).
summary(nyc$rent2008)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 1000 1100 1257 1362 2900
The minimum is 0 (which is suspicious, but we ignore that aspect for now) and the maximum is 2900. If we wanted 4 intervals, each would cover a range of 2900/4 = 725. So, the first interval would be from 0 to 725, the second from 725 to 1450, etc.
Again, we specify this type of classification by setting the proper style as style = "equal"
, and by setting the number of bins as n = 4
. We adjust the title to give title="Equal Intervals Map"
. Note the difference with the quantile map.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008",n=4,style="equal") +
tm_borders() +
tm_layout(title = "Equal Intervals Map", title.size = 0.9, title.position = c("right","bottom"))
A natural breaks, or Jenks classification is based on an algorithm that creates groups of observations such that they are most similar within the group, and dissimilar between groups. This is a clustering logic for the variable we want to map. The algorithm is fairly complex, but it does result in (mostly) intuitive breaks in the data.
A natural breaks map is obtained by specifying the style = “jenks” in tm_fill
. All the other options are as before. We illustrate this for six categories, with n=6
, but using the standard palette (feel free to customize). We set the title = "Natural Breaks Map"
.
Note how the classification nicely separates the observations with rent zero, as well as the highest rent neighborhoods in Manhattan.
tm_shape(nyc) +
tm_fill("rent2008",title="Rent in 2008",n=6,style="jenks") +
tm_borders() +
tm_layout(title = "Natural Breaks Map", title.size = 0.9, title.position = c("right","bottom"))
For all the built-in styles, the category breaks are computed internally. In most cases this is fine, but what if we want to compare the rent over different years? The classifications are all relative to the values observed in that year. For example, for a quartile map, we could track whether an observations (neighborhood) moves from one quartile to another, but that doesn’t tell us anything about the actual rent. It would be more informative to set fixed break points for all the years to illustrate how the rent changes over time.
In order to override the standard defaults, the breakpoints can be set explicitly by means of the breaks
argument to the tm_fill
function. For example, say we wanted to track the rent between 2002, 2005 and 2008 (i.e., rent2002, rent2005, and rent2008). First, we check the descriptive statistics we computed earlier with the summary
command. From that, we will use the quartiles for 2005 as the break points (just to illustrate how to go about this). These are 898, 990 and 1100 (rounded). We also need to set a minimum (0) and a maximum, the largest value in 2008, or 2900.
In tmap, the custom break points are set in the breaks
option. They must include a minimum and a maximum. So, in our example, we need to set the breakpoints to 0 (minimum), 898, 990, 1100, and 2900 (maximum). This is passed as a vector, using the c( )
notation, as breaks = c(0,898,990,1100,2900)
. Or, alternatively, we first set a vector as custom.breaks = c(0,898,990,1100,2900)
, and then pass that vector to the breaks
option.
custom.breaks <- c(0,898,990,1100,2900)
Now, we can make our three maps using the custom classification. We use the palette="YlOrBr"
from RColorBrewer. First, for 2002 (with an appropriate title, resized to 0.8)
tm_shape(nyc) +
tm_fill("rent2002",title="Median Rent",breaks=custom.breaks,palette="YlOrBr") +
tm_borders() +
tm_layout(title = "Custom Breaks Map - 2002", title.size = 0.9, title.position = c("right","bottom"))
For 2005:
tm_shape(nyc) +
tm_fill("rent2005",title="Median Rent",breaks=custom.breaks,palette="YlOrBr") +
tm_borders() +
tm_layout(title = "Custom Breaks Map - 2005", title.size = 0.9, title.position = c("right","bottom"))
And for 2008:
tm_shape(nyc) +
tm_fill("rent2008",title="Median Rent",breaks=custom.breaks,palette="YlOrBr") +
tm_borders() +
tm_layout(title = "Custom Breaks Map - 2008", title.size = 0.9, title.position = c("right","bottom"))
The maps clearly show how the rent increases spread throughout the neighborhoods.
A conditional map, or facet map, or small multiples, is created by the tm_facets
command. This largely follows the logic of the facet_grid
command in ggplot that we covered earlier. An extensive set of options is available to customize the facet maps. An in-depth coverage of all the subtleties is beyond our scope (details can be found on the tm_facets
documentation page)
We will use the same conditioning variables as before (which we recreated above), i.e., manbronx and cut.hhsiz. These two factor variables are used as conditioning variables in the by
argument of the tm_facets
command. The first variable conditions the rows (i.e., is the y-axis), the second variable conditions the columns (i.e., is the x-axis), in the same way as we saw for facet_grid
in ggplot.
We illustrate this with the rent2008 variable, using the default map type. Two important options that affect the look of the conditional map are free.coords
and drop.units
. The default is that each facet map has its own scaling, focused on the relevant observations. Typically, one wants the same spatial layout in each facet, i.e., all the sub-boroughs in our example. This is ensured by setting free.coords=FALSE
. In addition, we want all the spatial units to be shown in each facet (the default is to only show those that match the conditions). This is accomplished by setting drop.units=FALSE
.
One “gotcha” is that manbronx is not a factor which is what tmap requires for the conditioning variable. So, first we need to create an additional variable that is a factor, based on the values of manbronx, i.e., using nyc$mb <- as.factor(nyc$manbronx)
nyc$mb <- as.factor(nyc$manbronx)
Now, we can proceed with the command tm_facets(by = c("cut.hhsiz", "mb"),free.coords = FALSE,drop.units=FALSE)
. We just keep the defaults (titles, labels, etc. can be added later).
tm_shape(nyc) +
tm_fill("rent2008") +
tm_borders() +
tm_facets(by = c("cut.hhsiz", "mb"),free.coords = FALSE,drop.units=FALSE)