By Sharp Sight
“Master the basics.”
That’s a common mantra here at Sharp Sight.
Loyal readers know what I mean by “master the basics.” To master data science, you need to master the foundational tools.
That means knowing how to create essential plots like:
- Bar charts
- Line charts
And performing data manipulations like:
- Creating new variables
- Joining datasets
After the foundations, the little details matter
Having said that, the little details matter.
After you master the basics, you need to learn the little details.
A great example of this is plot annotation.
Adding little details like plot annotations help you communicate more clearly and “tell a story” with your plots.
Moreover, annotations help your data visualizations “stand on their own.”
What I mean by this, is that they help your plots communicate and “tell a story,” without you being there to do the communicating.
Certainly, as a data scientist, if you create a report or analysis there will be many instances where you will personally present your work to an audience. In these instances, you will be there to explain your work and give a personal “voice over” for your visualizations.
But that’s not always the case. For example, you’ll often send your analyses to partners as reference documents. Other times, you’ll publish them (maybe on a blog, or internal company website). In these instances, you won’t be there to explain your work. You need your work to “speak for you.” The charts need to communicate on their own. To make sure that your work is still effective when you are not there, you’ll often need to annotate your work.
Therefore, annotations are one of the “little details” that you need to learn after you learn the foundations.
A simple annotation in ggplot2
Here, I’ll show you a very simple example of a plot annotation in ggplot2.
Before I show it to you though, I want to introduce you to a learning principle that you can use when you’re learning and practicing data science techniques.
To learn annotate(), start with a very simple example
When you’re learning a new skill, it’s very effective to learn and practice with very simple examples. The simpler the better.
As a side note, this is one of the reasons that I strongly discourage the “jump in and build something” method of learning. When people “jump in” and work on a project, they typically select something that’s far too complicated, so they spend all of their time struggling to do simple things that they should have learned before working on a project.
So before you jump into a project, learn individual techniques. And when you’re learning a new technique, use very, very basic examples.
Code: how to create an annotation in ggplot2
With that in mind, I’ll show you a very, very simple plot annotation in ggplot2.
Here we have a histogram with a dashed line at the mean. The mean is 5.
library(ggplot2) set.seed(10) df.rnorm <- data.frame(rnorm = rnorm(10000, mean = 5)) ggplot(data = df.rnorm, aes(x = rnorm)) + geom_histogram(bins = 50) + geom_vline(xintercept = 5, color = "red", linetype = "dashed")
We want to add an annotation that explicitly calls out the value of the mean.
Add an annotation with the annotate() function
ggplot(data = df.rnorm, aes(x = rnorm)) + geom_histogram(bins = 50) + geom_vline(xintercept = 5, color = "red", linetype = "dashed") + annotate("text", label = "Mean = 5", x = 4.5, y = 170, color = "white")
Here, we’ve added the annotation that says “Mean = 5.”
This is very straightforward.
To accomplish this, we’ve used the annotate() function.
The first argument of the function is “text”. This specifies that we want to use a text annotation. As it turns out, there are several different annotation types, including rectangles and line segments, so you need to specify exactly what type of annotation you want to add. Because we’re adding a text annotation, “text” is the appropriate option.
The next piece of syntax within annotate() is the label = parameter. label just specifies the exact text that we want to add to the plot. Here, we’re specifying that we want to add the text “Mean = 5”.
Next, we specify the exact location of the annotation by detailing the x and y coordinates with the x and y parameters respectively.
Finally, we specify the color. In this case, I’ve set the color to “white”.
You could simplify this code even further by removing the color specification and letting it default to black. When you practice making annotations, I would probably recommend that you remove the color specification for the sake of simplicity. (You are practicing R, right?)
Having said that, in this case, the white text looks best against the dark grey histogram, so I left that code in.
To remember the annotate() technique, you need to practice it
annotate() is a fairly simple technique.
Nevertheless, my bet is that many people will forget it in the long run because they fail to practice it. They’ll “cut and paste” once or twice, and then quickly forget it.
I’ve said before that it’s not enough just to learn a new technique. You need to remember it in the long run.
The problem is that even if you learn a new ggplot2 technique today, you’re very, very likely to forget within a few days.
Having said that, if you want to remember how to use annotate() and other R tools – if you want to master R – you need to practice.
Sign up to learn ggplot2
Discover how to rapidly master ggplot2 and other R data science tools.
If you sign up, you’ll get free tutorials about ggplot2 and other R tools, delivered to your inbox.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News