# How to (quickly) enrich a map with natural and anthropic details

(This article was first published on R – : Francesco Bailo :, and kindly contributed to R-bloggers)

In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.

## Data

Let’s first start with some data: a few points somewhere in Italy.

``my_points.df ``

I will use the great `sf` package so let’s create a `sf` spatial object with

``````library(sf)
my_points.sf ``````

`crs = 4326` sets the Coordinate Reference System using an integer EPSG code. EPSG:4326 uses the World Geodetic System 1984 (WGS 84) that is used in GPS and its unit is the degree (not the meter!).

In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.

As a first step let’s define a bounding box that includes my points.

``my_bbox ``

Once I have set the points, I can create a polygon with the function `sf::st_polygon()`, which takes a two-column matrix for each polygon and create a simple feature (sf) object, and the function `sf::st_geometry()`, which takes a simple future object and get its geometry. Points need to be five because we always need to close our polygons! So the last (or 5th) row will have the same pair of coordinates of the first row. `st_crs` will set the CRS for the geometry `my_bbox.sf`.

``my_bbox.m ``

To give some margin to my points, I create an additional buffered bbox polygon around `my_bbox.sf`. I want to specify the buffer distance in meters, so I need to transform my polygon to a meter-based coordinate system such as EPSG:32632 and then back to EPSG:4326. EPSG:32632 uses the same geodetic CRS of EPSG:4326 (WGS 84) but crucially its unit is the meter (not the degree!) so all calculations are meter-based. EPSG:32632 is the 32N zone of the Universal Transverse Mercator (UTM) coordinate system. The world grid that can help you find the best UTM zone for your project is here.

``````my_bbox_buff_2500.sf %
st_transform(crs = 32632) %>%
st_buffer(dist = 2500) %>% # 2.5 kilometers
st_transform(crs = 4326)

my_bbox_buff_5000.sf %
st_transform(crs = 32632) %>%
st_buffer(dist = 5000) %>% # 5 kilometers
st_transform(crs = 4326)

my_bbox_buff_25000.sf %
st_transform(crs = 32632) %>%
st_buffer(dist = 25000) %>% # 25 kilometers
st_transform(crs = 4326)``````

This is how my point data look like in its essence.

``````library(ggplot2)

my_world_map ``````

To give some geographic context to the data I plotted also the country polygons of Italy and Switzerland from the `ggplot2::map_data('world')`. `coord_sf` allows to set the limits of the plot which are extracted from the bounding box of `my_bbox_buff_25000.sf` with the function `sf::st_bbox()`.

Depending on the specific geography of the place you need to map, it might be helpful to provide an idea of the elevation. Elevation differences (that is, hills, mountains and valleys) are crucial constraints in the development of human and animal activities. To map the elevation around our points, I need to download data of a Digital Elevation Model (DEM). There are different options, I will use here data obtained by the Shuttle Radar Topography Mission (SRTM). The package `raster` ships with the function (`raster::getData()`) to download the data directly from the R console. The arguments `lat` and `lon` are used to get the data file that is needed for the project.

``````library(raster)
dem.raster ``````

The SRTM data file is actually pretty large so it probably a good idea to crop it to reduce plotting time. The function `raster::crop()` takes an sp object (form the `sp` package) to be used as mask to determine the extent of the resulting (cropped) raster. I need to convert my bbox first with `as(my_bbox_buff_25000.sf, 'Spatial')`.

``dem.raster ``

Because I need to add my raster as a ggplot layer, I need to transform it first into a matrix of points with `raster::rasterToPoints()` and finally into a `data.frame` object. The third column of the `dem.df` contains the altitude value (`alt`).

``dem.m  ``

I can already plot `dem.df` with

``````ggplot() +
geom_raster(data = dem.df, aes(lon, lat, fill = alt), alpha = .45) +
geom_sf(data = my_bbox_buff_2500.sf, fill = NA) +
coord_sf(xlim = c(st_bbox(my_bbox_buff_25000.sf)['xmin'], st_bbox(my_bbox_buff_25000.sf)['xmax']),
ylim = c(st_bbox(my_bbox_buff_25000.sf)['ymin'], st_bbox(my_bbox_buff_25000.sf)['ymax'])) +
geom_polygon(data = my_world_map,
aes(x=long, y = lat, group = group), fill = NA, colour = 'black') +
theme_bw()``````

Now, instead of colouring each cell of the raster based on its altitude, I want to convey the same information with the hill shade. First, I extract the terrain characteristics (the `slope` and the `aspect` in this case) with `raster::terrain()`. Then, I compute the hill shade with `raster::hillShade()`, setting the elevation angle (of the sun) to `40` and the direction angle of the light to `270`. I then transform the resulting `hill.raster` into a `data.frame` as before. (Thanks to tim riffe for this)

``slope.raster ``

And here we go

``````ggplot() +
geom_raster(data = hill.df, aes(lon, lat, fill = hill), alpha = .45) +
geom_sf(data = my_bbox_buff_2500.sf, fill = NA) +
coord_sf(xlim = c(st_bbox(my_bbox_buff_25000.sf)['xmin'], st_bbox(my_bbox_buff_25000.sf)['xmax']),
ylim = c(st_bbox(my_bbox_buff_25000.sf)['ymin'], st_bbox(my_bbox_buff_25000.sf)['ymax'])) +
geom_polygon(data = my_world_map,
aes(x=long, y = lat, group = group), fill = NA, colour = 'black') +
theme_bw()``````

As you might have guessed from the presence of a few flat surfaces, there are lakes in the region. We can use the excellent `osmdata` package to query the Open Street Map (OSM) Overpass API to download information about lakes and rivers. With the function `osmdata::opq()` we define the bbox for the query and with `osmdata::add_osm_feature()` we extract specific OSM features using the tags (`key` and `value`).

``````library(osmdata)
osm_lakes.sf %
add_osm_feature(key = 'water', value = 'lake') %>%
osmdata_sf()
osm_lakes.sf %
add_osm_feature(key = 'waterway', value = 'river') %>%
osmdata_sf()
osm_rivers.sf ``````

Similarly we can download details about roads. The road network on OSM might be very dense, so it is better to do some filtering using the OSM tags.

``````osm_roads_primary.sf %
add_osm_feature(key = 'highway', value = 'trunk') %>%
osmdata_sf()
add_osm_feature(key = 'highway', value = 'secondary') %>%
osmdata_sf()
add_osm_feature(key = 'highway', value = 'tertiary') %>%
osmdata_sf()

## Identify densely populated areas

Densely populated areas are certainly very important in every map that wants to describe human activities. Again, I use OSM data to identify these areas. First, I query for features that are usually associated with human residency: residential roads and buildings. On OSM roads and buildings are not mapped with the same completeness everywhere but the density of their presence might provide a good enough approximation of where people actually live. The number of such features (polygons and lines) is usually very large. But actually I don’t need individual features, I just need the geography of densely populated areas.

First, I query the OSM API for all the buildings (`osm_polygons`), residential roads and unclassified roads (`osm_lines`). Second, I transform polygons and lines into points with `sf::st_centroid()` (which requires a CRS transformation). Finally, I store all the point simple feature objects into `osm_residential_areas_pnt.df`, with two columns respectively for longitude and latitude, and create from it the spatial object `osm_residential_areas_pnt.sf`.

``````osm_buildings.sf %
osmdata_sf()
osm_buildings.sf %
st_transform(crs = 32632) %>%
st_centroid() %>%
st_transform(crs = 4326)

add_osm_feature(key = 'highway', value = 'residential') %>%
osmdata_sf()
st_transform(crs = 32632) %>%
st_centroid() %>%
st_transform(crs = 4326)

add_osm_feature(key = 'highway', value = 'unclassified') %>%
osmdata_sf()
st_transform(crs = 32632) %>%
st_centroid() %>%
st_transform(crs = 4326)

osm_residential_areas_pnt.df %
as.data.frame()
colnames(osm_residential_areas_pnt.df) ``````

Once I have the coordinate `data.frame`, with one row for each OSM feature, I pass it to the function `dbscan::dbscan()` of the `dbscan` package, which implements a Density-based spatial clustering algorithm to identify areas with the highest density of points . `eps` and `minPts`, respectively the size of the neighbourhood and the number of minimum points for each region, should be adjusted accordingly to the local geography. Once the number the of clusters are identified, I can create a polygon with the `convex hull` of the clustered points with the function `grDevices::chull()`. Each created polygon is stored the spatial object `pop_dense_areas.sf`.

``````library(dbscan)
res ``````

To give an idea of the actual result, you see below a densely populated area inferred from the cluster of points derived from building and residential roads. The algorithm successfully identified relatively more anthorpised regions. Tweaking of the two arguments of the density-based clustering function might be necessary depending on the size and human geography of the final map.

The final enrichment of my map is the addition of a few names of populated places (towns and villages in the OSM tagging system). Again I use the OSM API. I filter the results for the more important places based on their relative population. To avoid overcrowding the map with labels, I only select the 10 larger places. `geom_sf` doesn’t allow to label points (yet). It is then necessary to convert the spatial object into a simple `data.frame` with two coordinate columns. Names might be too long, so I add a `n` (newline) after 10 characters with `gsub('(.{1,10})(s|\$)', '1n', str)`. (Thanks to xiechao for this)

``````library(dplyr)
osm_villages.sf %
add_osm_feature(key = 'place', value = 'village') %>%
osmdata_sf()
osm_villages.sf %
add_osm_feature(key = 'place', value = 'town') %>%
osmdata_sf()
osm_towns.sf %
mutate(population = as.numeric(as.character(population))) %>%
top_n(10, population)

osm_larger_places.df\$name ``````

And now let’s plot everything along with my initial data points:

``````ggplot() +

geom_raster(data = hill.df, aes(lon, lat, fill = hill), alpha = .45) +

geom_sf(data = pop_dense_areas.sf, fill = '#fee08b', colour = NA, alpha = 0.9) +
geom_sf(data = osm_lakes.sf, fill = '#9ecae1', colour = NA) +
geom_sf(data = osm_rivers.sf, colour = '#9ecae1', size = 0.4) +
geom_sf(data = osm_roads_primary.sf, colour = '#636363', size = 0.1) +
geom_sf(data = osm_roads_secondary.sf, colour = '#636363', size = 0.08) +
geom_sf(data = osm_roads_tertiary.sf, colour = '#636363', size = 0.08) +
geom_sf(data = my_points.sf, shape = 5, colour = 'black', size = .5) +

geom_text(data = osm_larger_places.df, aes(x=X,y=Y,label=name), size = 2.5, alpha = .65) +

coord_sf(xlim = c(st_bbox(my_bbox_buff_2500.sf)['xmin'], st_bbox(my_bbox_buff_2500.sf)['xmax']),
ylim = c(st_bbox(my_bbox_buff_2500.sf)['ymin'], st_bbox(my_bbox_buff_2500.sf)['ymax'])) +
guides(fill=FALSE)+
labs(x=NULL, y=NULL) +
theme_bw()``````

# References

Hahsler, Michael, and Matthew Piekenbrock. 2018. Dbscan: Density Based Clustering of Applications with Noise (Dbscan) and Related Algorithms. https://CRAN.R-project.org/package=dbscan.

Hijmans, Robert J. 2017. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.

Padgham, Mark, Bob Rudis, Robin Lovelace, and Maëlle Salmon. 2018. Osmdata: Import ‘Openstreetmap’ Data as Simple Features or Spatial Objects. https://CRAN.R-project.org/package=osmdata.

Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Romain Francois, Lionel Henry, and Kirill Müller. 2017. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Xie, Yihui. 2018. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.