Populating data frame cells with more than one value

By markheckmann

(This article was first published on “R” you ready?, and kindly contributed to R-bloggers)

Data frames are lists

Most R users will know that data frames are lists. You can easily verify that a data frame is a list by typing

d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
 id name
1 1 Jon
2 2 Mark
is.list(d)
[1] TRUE

However, data frames are lists with some special properties. For example, all entries in the list must have the same length (here 2), etc. You can find a nice description of the differences between lists and data frames here. To access the first column of d, we find that it contains a vector (and a factor in case of column name). Note, that [[ ]] is an operator to select a list element. As data frames are lists, they will work here as well.

is.vector(d[[1]])
[1] TRUE

Data frame columns can contain lists

A long time, I was unaware of the fact, that data frames may also contain lists as columns instead of vectors. For example, let’s assume Jon’s children are Mary and James, and Mark’s children are called Greta and Sally. Their names are stored in a list with two elements. We can add them to the data frame like this:

d$children <-  list(c("Mary", "James"), c("Greta", "Sally"))
d
 id name children
1 1 Jon Mary, James
2 2 Mark Greta, Sally

A single data frame entry in column children now contains more than one value. Given that the column is a list, not a vector, we cannot go as usual when modifying an entry of the column. For example, to change Jon’s children, we cannot do

> d[1 , "children"] <- c("Mary", "James", "Thomas")

Error in `[<-.data.frame`(`*tmp*`, 1, "children", value = c("Mary", "James", :
replacement has 3 rows, data has 1

Taking into account the list structure of the column, we can type the following to change the values in a single cell.

d[1 , "children"][[1]] <- list(c("Mary", "James", "Thomas"))

# or also

d$children[1] <- list(c("Mary", "James", "Thomas"))
d
 id name children
1 1 Jon Mary, James, Thomas
2 2 Mark Greta, Sally

You can also create a data frame having a list as a column using the data.frame function, but with a little tweak. The list column has to be wrapped inside the function I. This will protect it from several conversions taking place in data.frame (see ?I documentation).

d <- data.frame(id = 1:2,
                   name = c("Jon", "Mark"),
                   children = I(list(c("Mary", "James"),
                                     c("Greta", "Sally")))
                )

This is an interesting feature, which gives me a deeper understanding of what a data frame is. But when exactly would I want to use it? I have not encountered the need to use it very often yet (though of course there may be plenty of situations where it makes sense). But today I had a case where this feature seemed particularly useful.

Converting lists and data frames to JSON

I had two separate types of information. One stored in a data frame and the other one in a list Referring to the example above, I had

d <- data.frame(id=1:2, name=c("Jon", "Mark"))
d
 id name
1 1 Jon
2 2 Mark

and

ch <- list(c("Mary", "James"), c("Greta", "Sally"))
ch
[[1]]
[1] "Mary" "James"

[[2]]
[1] "Greta" "Sally"

I needed to return an array of JSON objects which look like this.

[
  {
    "id": 1,
    "name": "Jon",
    "children": ["Mary", "James"]
  }, 
  {
    "id": 2,
    "name": "Mark",
    "children": ["Greta", "Sally"]
  }
]

Working with the superb jsonlite package to convert R to JSON, I could do the following to get the result above.

library(jsonlite)

l <- split(d, seq(nrow(d))) # convert data frame rows to list
l <- unname(l)              # remove list names
for (i in seq_along(l))     # add element from ch to list
    l[[i]] <- c(l[[i]], children=ch[i])

toJSON(l, pretty=T, auto_unbox = T) # convert to JSON

The results are correct, but getting there involved quite a number of tedious steps. These can be avoided by directly placing the list into a column of the data frame. Then jsonlite::toJSON takes care of the rest.

d$children <- list(c("Mary", "James"), c("Greta", "Sally"))
toJSON(d, pretty=T, auto_unbox = T)
[
  {
    "id": 1,
    "name": "Jon",
    "children": ["Mary", "James"]
  },
  {
    "id": 2,
    "name": "Mark",
    "children": ["Greta", "Sally"]
  }
]

Nice What we do here, is basically creating the same nested list structure as above, only now it is disguised as a data frame. However, this approach is much more convenient.

To leave a comment for the author, please follow the link and comment on their blog: “R” you ready?.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.