By Peter Prevos

**The Devil is in the Data**, and kindly contributed to R-bloggers)

R logo in ASCII art by picascii.com

Euler problem 22 is another trivial one that takes us to the realm of ASCII codes. ASCII is a method to convert symbols into numbers, originally invented for telegraphs.

Back in the 8-bit days, ASCII art was a method to create images without using lots of memory. Each image consists of a collection of text characters that give the illusion of an image. Euler problem 22 is, unfortunately, a bit less poetic.

## Euler Problem 22 Definition

Using names.txt, a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938^{th} name in the list. So, COLIN would obtain a score of 938 × 53 = 49,714.

What is the total of all the name scores in the file?

## Solution

This code reads and cleans the file and sorts the names alphabetically. The charToRaw function determines the numerical value of each character in each name. This output of this function is the hex ASCII code for each character. The letter A is number 65, so we subtract 64 from each value to get the sum total.

# ETL: reads the file and converts it to an ordered vector. namesWe can have a bit more fun with this problem by comparing this list with the most popular baby names in 2016. The first section of the code extracts the list of popular names from the website. The rest of the code counts the number of matches between the lists.

# Most popular baby names library(rvest) url % read_html() %>% html_nodes(xpath = '//*[@id="babyNameList"]/table') %>% html_table() babynamesThe post Euler Problem 22 : Names Scores appeared first on The Devil is in the Data.

Toleave a commentfor the author, please follow the link and comment on their blog:The Devil is in the Data.

R-bloggers.com offersdaily e-mail updatesabout R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Source:: R News