When visualizing a network with nodes that refer to a geographic place, it is often useful to put these nodes on a map and draw the connections (edges) between them. By this, we can directly see the geographic distribution of nodes and their connections in our network. This is different to a traditional network plot, where the placement of the nodes depends on the layout algorithm that is used (which may for example form clusters of strongly interconnected nodes).
In this blog post, I’ll present three ways of visualizing network graphs on a map using R with the packages igraph, ggplot2 and optionally ggraph. Several properties of our graph should be visualized along with the positions on the map and the connections between them. Specifically, the size of a node on the map should reflect its degree, the width of an edge between two nodes should represent the weight (strength) of this connection (since we can’t use proximity to illustrate the strength of a connection when we place the nodes on a map), and the color of an edge should illustrate the type of connection (some categorical variable, e.g. a type of treaty between two international partners).
We’ll need to load the following libraries first:
library(assertthat) library(dplyr) library(purrr) library(igraph) library(ggplot2) library(ggraph) library(ggmap)
Now, let’s load some example nodes. I’ve picked some random countries with their geo-coordinates:
So we now have 15 countries, each with an ID, geo-coordinates (
lat) and a name. These are our graph nodes. We’ll now create some random connections (
edges) between our nodes:
set.seed(123) # set random generator state for the same output N_EDGES_PER_NODE_MIN % mutate(category = as.factor(category))
Each of these edges defines a connection via the node IDs in the
to columns and additionally we generated random connection
weights. Such properties are often used in graph analysis and will later be visualized too.
Our nodes and edges fully describe a graph so we can now generate a graph structure
g with the igraph library. This is especially necessary for fast calculation of the degree or other properties of each node later.
We now create some data structures that will be needed for all the plots that we will generate. At first, we create a data frame for plotting the edges. This data frame will be the same like the
edges data frame but with four additional columns that define the start and end points for each edge (
edges_for_plot % inner_join(nodes %>% select(id, lon, lat), by = c('from' = 'id')) %>% rename(x = lon, y = lat) %>% inner_join(nodes %>% select(id, lon, lat), by = c('to' = 'id')) %>% rename(xend = lon, yend = lat) assert_that(nrow(edges_for_plot) == nrow(edges))
Let’s give each node a weight and use the degree metric for this. This will be reflected by the node sizes on the map later.
nodes$weight = degree(g)
Now we define a common ggplot2 theme that is suitable for displaying maps (sans axes and grids):
Not only the theme will be the same for all plots, but they will also share the same world map as “background” (using
map_data('world')) and the same fixed ratio coordinate system that also specifies the limits of the longitude and latitude coordinates.
Plot 1: Pure ggplot2
Let’s start simple by using ggplot2. We’ll need three geometric objects (geoms) additional to the country polygons from the world map (
country_shapes): Nodes can be drawn as points using
geom_point and their labels with
geom_text; edges between nodes can be realized as curves using
geom_curve. For each geom we need to define aesthetic mappings that “describe how variables in the data are mapped to visual properties” in the plot. For the nodes we map the geo-coordinates to the x and y positions in the plot and make the node size dependent on its weight (
aes(x = lon, y = lat, size = weight)). For the edges, we pass our
edges_for_plot data frame and use the
yend as start and end points of the curves. Additionally, we make each edge’s color dependent on its
category, and its “size” (which refers to its line width) dependent on the edges’ weights (we will see that the latter will fail). Note that the order of the geoms is important as it defines which object is drawn first and can be occluded by an object that is drawn later in the next geom layer. Hence we draw the edges first and then the node points and finally the labels on top:
ggplot(nodes) + country_shapes + geom_curve(aes(x = x, y = y, xend = xend, yend = yend, # draw edges as arcs color = category, size = weight), data = edges_for_plot, curvature = 0.33, alpha = 0.5) + scale_size_continuous(guide = FALSE, range = c(0.25, 2)) + # scale for edge widths geom_point(aes(x = lon, y = lat, size = weight), # draw nodes shape = 21, fill = 'white', color = 'black', stroke = 0.5) + scale_size_continuous(guide = FALSE, range = c(1, 6)) + # scale for node size geom_text(aes(x = lon, y = lat, label = name), # draw text labels hjust = 0, nudge_x = 1, nudge_y = 4, size = 3, color = "white", fontface = "bold") + mapcoords + maptheme
A warning will be displayed in the console saying “Scale for ‘size’ is already present. Adding another scale for ‘size’, which will replace the existing scale.”. This is because we used the “size” aesthetic and its scale twice, once for the node size and once for the line width of the curves. Unfortunately you cannot use two different scales for the same aesthetic even when they’re used for different geoms (here: “size” for both node size and the edges’ line widths). There is also no alternative to “size” I know of for controlling a line’s width in ggplot2.
With ggplot2, we’re left of with deciding which geom’s size we want to scale. Here, I go for a static node size and a dynamic line width for the edges:
ggplot(nodes) + country_shapes + geom_curve(aes(x = x, y = y, xend = xend, yend = yend, # draw edges as arcs color = category, size = weight), data = edges_for_plot, curvature = 0.33, alpha = 0.5) + scale_size_continuous(guide = FALSE, range = c(0.25, 2)) + # scale for edge widths geom_point(aes(x = lon, y = lat), # draw nodes shape = 21, size = 3, fill = 'white', color = 'black', stroke = 0.5) + geom_text(aes(x = lon, y = lat, label = name), # draw text labels hjust = 0, nudge_x = 1, nudge_y = 4, size = 3, color = "white", fontface = "bold") + mapcoords + maptheme
Plot 2: ggplot2 + ggraph
Luckily, there is an extension to ggplot2 called ggraph with geoms and aesthetics added specifically for plotting network graphs. This allows us to use separate scales for the nodes and edges. By default, ggraph will place the nodes according to a layout algorithm that you can specify. However, we can also define our own custom layout using the geo-coordinates as node positions:
node_pos % select(lon, lat) %>% rename(x = lon, y = lat) # node positions must be called x, y lay
We pass the layout
lay and use ggraph’s geoms
geom_node_point for plotting:
ggraph(lay) + country_shapes + geom_edge_arc(aes(color = category, edge_width = weight, # draw edges as arcs circular = FALSE), data = edges_for_plot, curvature = 0.33, alpha = 0.5) + scale_edge_width_continuous(range = c(0.5, 2), # scale for edge widths guide = FALSE) + geom_node_point(aes(size = weight), shape = 21, # draw nodes fill = "white", color = "black", stroke = 0.5) + scale_size_continuous(range = c(1, 6), guide = FALSE) + # scale for node sizes geom_node_text(aes(label = name), repel = TRUE, size = 3, color = "white", fontface = "bold") + mapcoords + maptheme
The edges’ widths can be controlled with the
edge_width aesthetic and its scale functions
scale_edge_width_*. The nodes’ sizes are controlled with
size as before. Another nice feature is that
geom_node_text has an option to distribute node labels with
repel = TRUE so that they do not occlude each other that much.
Note that the plot’s edges are differently drawn than with the ggplot2 graphics before. The connections are still the same only the placement is different due to different layout algorithms that are used by ggraph. For example, the turquoise edge line between Canada and Japan has moved from the very north to south across the center of Africa.
Plot 3: the hacky way (overlay several ggplot2 “plot grobs”)
I do not want to withhold another option which may be considered a dirty hack: You can overlay several separately created plots (with transparent background) by annotating them as “grobs” (short for “graphical objects”). This is probably not how grob annotations should be used, but anyway it can come in handy when you really need to overcome the aesthetics limitation of ggplot2 described above in plot 1.
As explained, we will produce separate plots and “stack” them. The first plot will be the “background” which displays the world map as before. The second plot will be an overlay that only displays the edges. Finally, a third overlay shows only the points for the nodes and their labels. With this setup, we can control the edges’ line widths and the nodes’ point sizes separately because they are generated in separate plots.
The two overlays need to have a transparent background so we define it with a theme:
The base or “background” plot is easy to make and only shows the map:
Now we create the first overlay with the edges whose line width is scaled according to the edges’ weights:
The second overlay shows the node points and their labels:
Finally we combine the overlays using grob annotations. Note that proper positioning of the grobs can be tedious. I found that using
ymin works quite well but manual tweaking of the parameter seems necessary.
As explained before, this is a hacky solution and should be used with care. Still it is useful also in other circumstances. For example when you need to use different scales for point sizes and line widths in line graphs or need to use different color scales in a single plot this way might be an option to consider.
All in all, network graphs displayed on maps can be useful to show connections between the nodes in your graph on a geographic scale. A downside is that it can look quite cluttered when you have many geographically close points and many overlapping connections. It can be useful then to show only certain details of a map or add some jitter to the edges’ anchor points.
The full R script is available as gist on github.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News