By Peter Prevos
R logo in ASCII art by picascii.com
Euler problem 22 is another trivial one that takes us to the realm of ASCII codes. ASCII is a method to convert symbols into numbers, originally invented for telegraphs.
Back in the 8-bit days, ASCII art was a method to create images without using lots of memory. Each image consists of a collection of text characters that give the illusion of an image. Euler problem 22 is, unfortunately, a bit less poetic.
Euler Problem 22 Definition
Using names.txt, a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49,714.
This code reads and cleans the file and sorts the names alphabetically. The charToRaw function determines the numerical value of each character in each name. This output of this function is the hex ASCII code for each character. The letter A is number 65, so we subtract 64 from each value to get the sum total.
# ETL: reads the file and converts it to an ordered vector. names
We can have a bit more fun with this problem by comparing this list with the most popular baby names in 2016. The first section of the code extracts the list of popular names from the website. The rest of the code counts the number of matches between the lists.
# Most popular baby names library(rvest) url % read_html() %>% html_nodes(xpath = '//*[@id="babyNameList"]/table') %>% html_table() babynames
To leave a comment for the author, please follow the link and comment on their blog: The Devil is in the Data.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...
Source:: R News