This set of exercises will help you to help you improve your skills with character functions in R. Most of the exercises are related with text mining, a statistical technique that analyses text using statistics. If you find them interesting I would suggest checking the library
tm, this includes functions designed for this task. There are many applications of text mining, a pretty popular one is the ability to associate a text with his or her author, this was how J.K.Rowling (Harry potter author) was caught publishing a new novel series under an alias. Before proceeding, it might be helpful to look over the help pages for the
strsplit. Take at the library
stringr and the functions it includes such as
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Before starting the set of exercises run the following code lines :
if (!'tm' %in% installed.packages()) install.packages('tm')
txt = system.file("texts", "txt", package = "tm")
ovid = VCorpus(DirSource(txt, encoding = "UTF-8"),
readerControl = list(language = "lat"))
OVID = c(data.frame(text=unlist(TEXT), stringsAsFactors = F))
TEXT = lapply(ovid[1:5], as.character)
TEXT1 = TEXT[]
Delete all the punctuation marks from TEXT1
How many letters does TEXT1 contains?
How many words does TEXT1 contains?
What is the most common word in TEXT1?
Get an object that contains all the words with at least one capital letter (Make sure the object contains each word only once)
Which are the 5 most common letter in the object
Which letters from the alphabet are not in the object
OVID object, there is a character from the popular sitcom ‘FRIENDS’ , Who is he/she? There were six main characters (Chandler, Phoebe, Ross, Monica, Joey, Rachel)
Find the line where this character is mentioned
How many words finish with a vowel, how many with a consonant?
Related exercise sets:
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News