The Mathematics Genealogy Project: Customizing my mathematical family tree

By Francois Keck

(This article was first published on R_EN – Piece of K, and kindly contributed to R-bloggers)

Some time ago, Maëlle Salmon published a very nice post showing how she scraped her mathematical family tree from the Mathematics Genealogy Project. Of course I immediately wanted to produce my own! I am not a mathematician myself, but one of my PhD supervisor has a PhD in mathematics. Which makes me the indirect descendant of a long lineage of famous mathematicians! As Maëlle kindly invited me to share my tree on Twitter, I decided to write this post to show how I customized the appearance of my tree to make it more sexy. [It is actually part of my job to plot sexy phylogenetic tree…]. So, this post is a remix of Maëlle’s post which was a remix of Nathalie Vialaneix’s post

1. Scraping more data

First I modified a bit Maëlle’s scraping function to collect more data. I added Xpath selectors for the degree, the university, the country, the year and the title of the thesis. I am not going to use all these informations but now they are here if you need them.

library(magrittr)
library(DiagrammeR)
library(dplyr)
library(igraph)
library(purrr)
library(stringr)
library(rvest)
library(xml2)


.get_advisors %
    httr::GET() 
  
  # try until it works but not more than 5 times
  try %
      httr::GET() 
    try = try + 1
  }
  
  # Now get student's data
  student_name %
    rvest::xml_nodes(xpath = '//h2[@style="text-align: center; margin-bottom: 0.5ex; margin-top: 1ex"]') %>%
    rvest::html_text() %>%
    stringr::str_remove("n")
  
  degree %
    rvest::xml_nodes(xpath = '//span[@style="margin-right: 0.5em"]/span/preceding-sibling::text()') %>%
    rvest::html_text()
  
  university %
    rvest::xml_nodes(xpath = '//span[@style="margin-right: 0.5em"]/span') %>%
    rvest::html_text()
  
  year %
    rvest::xml_nodes(xpath = '//span[@style="margin-right: 0.5em"]/span/following-sibling::text()') %>%
    rvest::html_text() %>% 
    stringr::str_trim()
  
  country %
    rvest::xml_nodes(xpath = '//div[@style="line-height: 30px; text-align: center; margin-bottom: 1ex"]/img') %>%
    rvest::html_attr("title")
  
  thesis_title %
    rvest::xml_nodes(xpath = '//span[@id="thesisTitle"]') %>%
    rvest::html_text() %>%
    stringr::str_remove_all("n")
  
  # Get all nodes corresponding to advisors
  # Thanks to their... formatting but it works
  all_advisors %
    rvest::xml_nodes(xpath = "//p[@style='text-align: center; line-height: 2.75ex']") %>%
    rvest::html_nodes("a")
  
  if(terminal){
    name % mutate(student_name = stringr::str_trim(student_name),
                    name = stringr::str_trim(name))


terminal_df % 
  map(get_advisors, terminal = TRUE, sleep_time = 30) %>% 
  bind_rows() %>%
  mutate(student_name = stringr::str_trim(student_name),
         name = stringr::str_trim(name))


2. Labels

I wanted to create labels with the name of the mathematician, the flag of the country and the year. Displaying a picture in a node with graphviz (using DiagrammeR) was not simple. Apparently it is possible to use some kind of basic HTML to format nodes but I failed to include images. Finally I decided to use emojis Thanks to Hadley Wickham’s package emo it was fairly easy.

#### Construct the graph ####
# create nodes
labels %
  bind_rows(terminal_df) %>%
  filter(!duplicated(student_name)) %>% 
  right_join(nodes_df, by = c("student_name" = "label"))


# create edges
edges_df % 
  map_chr(str_trim) %>%
  str_replace_all("(?% 
  str_replace_all(" /", ", ") %>% 
  paste0("(", ., ")") %>% 
  str_replace_all("()", "")


df_red$country % 
  map(stringr::str_replace, "UnitedKingdom", "United Kingdom")

country_flag % 
  map_chr(paste, collapse = " ⋅ ")


label %
  str_replace_all("'", " ") %>%
  str_replace_all("[[:space:]]{2,}", " ")

3. Adding my self

nodes_df n(2016)"))
          
edges_df 

4. Customizing the style of the nodes

I changed the color and shape of the nodes. For some obscure reasons the rectangle-based shapes of graphiz are not correctly rendered in Firefox on Ubuntu (labels are overlapping). It worked on Windows but then I couldn't display my flags with colored emojis 🙁 The only solution I found was to fix the font size manually (see next point).

# Customizing the nodes
nodes_df %
  igraph::write.graph(file = "graph.dot",
                      format = "dot") 
DiagrammeR::grViz("graph.dot", width = 4000, height = 5000)

5. The final touch

The viz.js library which stands behind DiagrammeR renders DOT objects in the browser via SVG. SVG files are XML-based and therefore can be directly processed with R. This gives us great power to manipulate the look of our tree.

First, we need to convert the widget to static HTML/SVG. We can do that in command line using Chromium in headerless mode to render the widget page.

DiagrammeR::grViz("graph.dot", width = 4000, height = 5000) %>% 
  htmlwidgets::saveWidget("index.html")
system("chromium-browser --headless --dump-dom index.html > genealogy.html")

Finally, I used R and xml2 to edit directly the SVG content and improve the look of the tree. In the code below I show how to fill the page and labels backgrounds with a texture image, how to fix the text size, and how to add a shadow effect on labels.


# Load and clean html
html % 
  xml_remove()

# Background
xml_find_all(html, '/html/body') %>% 
  xml_set_attr('style', 'background-image: url("ricepaper2.png"); margin: 0px; padding: 40px;')

xml_find_all(html, '/html/body/div/div/svg/g/polygon') %>% 
  xml_set_attr('fill', 'transparent')


# Labels text size
xml_find_all(html, '//text') %>% 
  xml_text() %>% 
  str_detect("(^[A-Z])|(^()") %>%
  extract(xml_find_all(html, '//text'), .) %>% 
  xml_set_attr('font-size', '12')


# Labels background
xnodes %
  xml_add_child(read_xml('
  '), .where = 0)


# Labels shadow fx
xml_find_all(html, '//polygon[@id="paper_tag"]') %>% 
  xml_set_attr('filter', 'url(#f3)')

xnodes %
  xml_add_child(read_xml('
  '), .where = 0)

write_html(html, "genealogy.html")

I find the result pretty nice

And this is my mathematical family tree! I recognize some illustrious names here! Do you?

My tree

Bonus:

For a better parchment look, we can use the calligraphic font Tangerine for the labels. Note that some glyphs are unfortunately not supported by this font.

  
# Labels font. 
xml_find_all(html, '//text') %>% 
  xml_text() %>% 
  str_detect("(^[A-Z])|(^()") %>%
  extract(xml_find_all(html, '//text'), .) %>% 
  xml_set_attr('font-size', '20')

xml_find_all(html, '//text') %>% 
  xml_text() %>% 
  str_detect("(^[A-Z])|(^()") %>%
  extract(xml_find_all(html, '//text'), .) %>% 
  xml_set_attr('font-family', 'Tangerine')

xml_find_all(html, '/html/head') %>%
  xml_add_child(read_xml(''))

Facebooktwitter

To leave a comment for the author, please follow the link and comment on their blog: R_EN – Piece of K.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.