# Take your data frames to the next level.

(This article was first published on R – Real Data, and kindly contributed to R-bloggers)

While finishing up with R-rockstar Hadley Wickham’s book (Free Book – R for Data Science), the section on model building elaborates on something pretty cool that I had no idea about – list columns.

Most of us have probably seen the following data frame column format:

``df <- data.frame("col_uno" = c(1,2,3),"col_dos" = c('a','b','c'), "col_tres" = factor(c("google", "apple", "amazon")))``

And the output:

``df``
``````##   col_uno col_dos col_tres
## 2       2       b    apple
## 3       3       c   amazon``````

This is an awesome way to organize data and one of R’s strong points. However, we can use list functionality to go deeper. Check this out:

``````library(tidyverse)
library(datasets)``````
``head(iris)``
``````##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa``````
``````nested <- iris %>%
group_by(Species) %>%
nest()``````
``````## # A tibble: 3 × 2
##      Species              data
##       <fctr>            <list>
## 1     setosa <tibble [50 × 4]>
## 2 versicolor <tibble [50 × 4]>
## 3  virginica <tibble [50 × 4]>``````

Using `nest` we can compartmentalize our data frame for readability and more efficient iteration. Here we can use `map` from the `purrr` package to compute the mean of each column in our nested data.

``means <- map(nested\$data, colMeans)``
``````## [[1]]
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width
##        5.006        3.428        1.462        0.246
##
## [[2]]
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width
##        5.936        2.770        4.260        1.326
##
## [[3]]
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width
##        6.588        2.974        5.552        2.026``````

Once you’re done messing around with data-ception, use `unnest` to revert your data back to its original state.

``head(unnest(nested))``
``````## # A tibble: 6 × 5
##   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
##    <fctr>        <dbl>       <dbl>        <dbl>       <dbl>
## 1  setosa          5.1         3.5          1.4         0.2
## 2  setosa          4.9         3.0          1.4         0.2
## 3  setosa          4.7         3.2          1.3         0.2
## 4  setosa          4.6         3.1          1.5         0.2
## 5  setosa          5.0         3.6          1.4         0.2
## 6  setosa          5.4         3.9          1.7         0.4``````

I was pretty excited to learn about this property of data.frames and will definitely make use of it in the future. If you have any neat examples of nested dataset usage, please feel free to share in the comments. As always, I’m happy to answer questions or talk data!

Kiefer Smith