By Sharp Sight
Last week’s blog post about Amazon’s search for a location for a second headquarters left me thinking about the company’s growth.
After looking at the long term growth of the stock price, it occurred to me that visualizing the stock price data would be a great example of how to create a line chart in R using ggplot2.
So in this blog post, I’ll show you how to make a line chart with ggplot2, step by step.
Let’s jump in.
First, we’ll load several packages: we’ll load readr, which we’ll use to read in the data; tidyverse, which includes ggplot2, dplyr, and several other important packages; and stringr, which will let us do some string manipulation.
#=============== # LOAD PACKAGES #=============== library(readr) library(tidyverse) library(stringr)
Now that we’ve loaded the packages that we need, we’ll read in the data.
The data are contained in a .csv file that I’ve uploaded to the Sharp Sight webpage.
We’ll use readr::read_csv() to read in the file. This is an extremely straightforward use of read_csv().
#========== # READ DATA #========== stock_amzn
Now we'll quickly inspect the data by looking at the column names and printing out the first few rows of data.
#======== # INSPECT #======== stock_amzn %>% names() stock_amzn %>% head()
Upon inspection, you can see that the column names are capitalized. This is a minor problem, but ideally you want your variable names to be lower case; this makes them easier to type.
To convert the variable names to all lower case, we’ll use the str_to_lower() function from the stringr package.
#========================================================= # CHANGE COLUMN NAMES: lower case # - in the raw form (as read in) the first letter of # each variable is capitalized. # - This makes them harder to type! Not ideal. # - we'll use stringr::str_to_lower() to change the column # names to lower case #========================================================= colnames(stock_amzn) % str_to_lower() # inspect stock_amzn %>% names()
Here, on the right-hand-side of the assignment operator, we’re using colnames(stock_amzn) to retrieve the column names. Then we pipe the column names into str_to_lower() which converts the names to lower case.
The resulting output is then re-assigned to the column names of the dataframe. We do this by using the following: colnames(stock_amzn) . Essentially, we’re taking the result from the right-hand-side and assigning that result to the column names using colnames(). To be clear, colnames() can both retrieve column names and set the column names.
Now that the data are in the right form, let’s make a simple line chart.
#====== # PLOT #====== #-------------------------------------- # FIRST ITERATION # - this is the quick-and-dirty version #-------------------------------------- ggplot(data = stock_amzn, aes(x = date, y = close)) + geom_line()
This is about as simple as it gets in ggplot2, but let’s break it down.
The ggplot() function indicates that we’re going to use ggplot2 to make a plot.
The data = parameter specifies that we’re going to be plotting data in the stock_amzn dataframe.
Then, the aes() function allows us to specify our variable mappings. With the statement x = date, we are mapping the date variable to the x-axis. Similarly, with the statement y = close, we are mapping the close variable to the y-axis.
Finally, geom_line() specifies that we want to draw lines.
Again, this is just about as simple as it gets.
Once you know more about how ggplot2 works, you can format the plot.
Having said that, let’s take a look at a ‘polished’ version of the plot … a version that’s been heavily formatted:
#-------------------------------------- # POLISHED VERSION # - this is the 'finalized' version # - we arrive at this after a lot of # itteration .... #-------------------------------------- ggplot(stock_amzn, aes(x = date, close)) + geom_line(color = 'cyan') + geom_area(fill = 'cyan', alpha = .1) + labs(x = 'Date' , y = 'ClosingnPrice' , title = "Amazon's stock price has increased dramaticallynover the last 20 years") + theme(text = element_text(family = 'Gill Sans', color = "#444444") ,panel.background = element_rect(fill = '#444B5A') ,panel.grid.minor = element_line(color = '#4d5566') ,panel.grid.major = element_line(color = '#586174') ,plot.title = element_text(size = 28) ,axis.title = element_text(size = 18, color = '#555555') ,axis.title.y = element_text(vjust = 1, angle = 0) ,axis.title.x = element_text(hjust = 0) )
And here’s the final chart:
If you’re a beginner, don’t be intimidated: this finalized chart is not hard to do.
Really. With a little practice, you should be able to learn to create a well-formatted chart like this very quickly. It should take you only a few hours to learn how the code works, and you should be able to memorize this syntax within a week or two.
… and when I say memorize, I mean that you should be able to write all of this code from memory.
Ideally, if you’re fluent in R and ggplot2, it should only take you 10 or 15 minutes to write all of this code, start to finish.
Sign up now, and discover how to become fluent in R
Are you still struggling with R and ggplot2?
Becoming fluent in R is really straightforward, if you know how to practice.
If you’re ready to master R, sign up for our email list.
Not only will you receive tutorials (delivered to your inbox) …
… but you’ll also get lessons on how to practice so that you can master R as quickly as possible.
And if you sign up right now, you’ll also get access to our “Data Science Crash Course” for free.
SIGN UP NOW
Source:: R News