Introducing the Kernelheaping Package II

By INWT-Blog-RBloggers

(This article was first published on INWT-Blog-RBloggers, and kindly contributed to R-bloggers)
In the first part of Introducing the Kernelheaping Package I showed how to compute and plot kernel density estimates on rounded or interval censored data using the Kernelheaping package. Now, let's make a big leap forward to the 2-dimensional case. Interval censoring can be generalised to rectangles or alternatively even arbitrary shapes. That may include counties, zip codes, electoral districts or administrative districts. Standard area-level mapping methods such as chloropleth maps suffer from very different area sizes or odd area shapes which can greatly distort the visual impression. The Kernelheaping package provides a way to convert these area-level data to a smooth point estimate. For the German capital city of Berlin, for example, there exists an open data initiative, where data on e.g. demographics is publicly available.  We first load a dataset on the Berlin population, which can be downloaded from: https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv ```r library(dplyr) library(fields) library(ggplot2) library(Kernelheaping) library(maptools) library(RColorBrewer) library(rgdal) gpclibPermit() ``` ```r data https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip ```r berlin %    do.call(rbind, .) %>%    cbind(data$E_E65U80) ``` In the next step we calculate the bivariate kernel density with the “dshapebivr” function (this may take some minutes) using the prepared data and the shape file: ```r est %    filter(Density > 0) ``` Now, we are able to plot the density together with the administrative districts: ```r ggplot(kData) +   geom_raster(aes(long, lat, fill = Density)) +    ggtitle("Bivariate density of Inhabitants between 65 and 80 years") +   scale_fill_gradientn(colours = c("#FFFFFF", "#5c87c2", "#19224e")) +   geom_path(color = "#000000", data = berlinDf, aes(long, lat, group = group)) +   coord_quickmap() ```
This map gives a much better overall impression of the distribution of older people than a simple choropleth map:  ```r ggplot(berlinDf) +   geom_polygon(aes(x = long, y = lat, fill = E_E65U80, group = id)) +    ggtitle("Number of Inhabitants between 65 and 80 years by district") +   scale_fill_gradientn(colours = c("#FFFFFF", "#5c87c2", "#19224e"), "n") +   geom_path(color = "#000000", data = berlinDf, aes(long, lat, group = group)) +   coord_quickmap() ```
Often, as the case with Berlin we may have large uninhabited areas such as woods or lakes. Furthermore, we may like to compute the proportion of older people compared to the overall population in a spatial setting. The third part of this series shows how you can compute boundary corrected and smooth proportion estimates with the Kernelheaping package.

Further parts of the article series Introducing the Kernelheaping Package:

To leave a comment for the author, please follow the link and comment on their blog: INWT-Blog-RBloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.