As many folks know, I live in semi-rural Maine and we were hit pretty hard with a wind+rain storm Sunday to Monday. The hrbrmstr compound had no power (besides a generator) and no stable/high-bandwidth internet (Verizon LTE was heavily congested) since 0500 Monday and still does not as I write this post.
I’ve played with scraping power outage data from Central Maine Power but there’s a great Twitter account — PowerOutage_us — that has done much of the legwork for the entire country. They don’t cover everything and do not provide easily accessible historical data (likely b/c evil folks wld steal it w/o payment or credit) but they do have a site you can poke at and do provide updates via Twitter. As you’ve seen in a previous post, we can use the
rtweet package to easily read Twitter data. And, the power outage tweets are regular enough to identify and parse. But raw data is so…raw.
While one could graph data just for one’s self, I decided to marry this power scraping capability with a recent idea I’ve been toying with adding to
gg_tweet(). Imagine being able to take a ggplot2 object and “plot” it to Twitter, fully conforming to Twitter’s stream or card image sizes. By conforming to these size constraints, they don’t get cropped in the timeline view (if you allow images to be previewed in-timeline). This is even more powerful if you have some helper functions for proper theme-ing (font sizes especially need to be tweaked). Enter
We’ll cover scraping @PowerOutage_us first, but we’ll start with all the packages we’ll need and a helper function to convert power outage estimates to numeric values:
library(httr) library(magick) library(rtweet) library(stringi) library(hrbrthemes) library(tidyverse) words_to_num
Now, I can’t cover setting up
rtweet OAuth here. The vignette and package web site do that well.
The bot tweets infrequently enough that this is really all we need (though, bump up
n as you need to):
Yep, that gets the last 300 tweets from said account. It’s amazingly simple.
Now, the outage tweets for the east coast / northeast are not individually uniform but collectively they are (there’s a pattern that may change but you can tweak this if they do):
filter(outage, stri_detect_regex(text, "#(EastCoast|NorthEast)")) %>% mutate(created_at = lubridate::with_tz(created_at, 'America/New_York')) %>% mutate(number_out = words_to_num(text)) %>% ggplot(aes(created_at, number_out)) + geom_segment(aes(xend=created_at, yend=0), size=5) + scale_x_datetime(date_labels = "%Y-%m-%dn%H:%M", date_breaks="2 hours") + scale_y_comma(limits=c(0,2000000)) + labs( x=NULL, y="# Customers Without Power", title="Northeast Power Outages", subtitle="Yay! Twitter as a non-blather data source", caption="Data via: @PowerOutage_us " ) -> gg
That pipe chain looks for key hashtags (for my area), rejiggers the time zone, and calls the helper function to, say, convert
1.2+ Million to
1200000. Finally it builds a mostly complete ggplot2 object (you should make the max Y limit more dynamic).
You can plot that on your own (print
gg). We’re here to tweet, so let’s go into the next section.
@opencpu made it possible shunt plot output to a
magick device. This means we have really precise control over ggplot2 output size as well as the ability to add other graphical components to a ggplot2 plot using
magick idioms. One thing we need to take into account is “retina” plots. They are — essentially — double resolution plots (72 => 144 pixels per inch). For the best looking plots we need to go retina, but that also means kicking up base plot theme font sizes a bit. Let’s build on
hrbrthemes::theme_ipsum_rc() a bit and make a
theme_tweet_rc font_sizes theme_ipsum_rc( grid = grid, plot_title_size = font_sizes, subtitle_size = font_sizes, axis_title_size = font_sizes, axis_text_size = font_sizes, caption_size = font_sizes ) }
Now, we just need a way to take a ggplot2 object and shunt it off to twitter. The following
gg_tweet() function does not (now) use
rtweet as I’ll likely add it to either
hrbrthemes and want to keep dependencies to a minimum. I may opt-in to bypass the current method since it relies on environment variables vs an RDS file for app credential storage. Regardless, one thing I wanted to do here was provide a way to preview the image before tweeting.
So you pass in a ggplot2 object (likely adding the tweet theme to it) and a Twitter status text (there’s a TODO to check the length for 140c compliance) plus choose a style (stream or card, defaulting to stream) and decide on whether you’re cool with the “retina” default.
Unless you tell it to send the tweet it won’t, giving you a chance to preview the image before sending, just in case you want to tweak it a bit before committing it to the Twitterverse. It als returns the
magick object it creates in the event you want to do something more with it:
gg_tweet dims dims["res"] res warn_for_status(res) unlink(tf) } fig }
Two Great Tastes That Taste Great Together
We can combine the power outage scraper & plotter with the tweeting code and just do:
gg_tweet( gg + theme_tweet_rc(grid="Y"), status = "Progress! #rtweet #gg_tweet", send=TRUE )
That was, in-fact, the last power outage tweet I sent.
Ironically, given current levels of U.S. news and public “discourse” on Twitter and some inane machinations in my own area of domain expertise (cyber),
gg_tweet() is likely one of the few ways I’ll be interacting with Twitter for a while. You can ping me on Keybase — hrbrmstr — or join the
rstats Keybase team via
keybase team request-access rstats if you need to poke me for anything for a while.
Kick the tyres and watch for
gg_tweet() ending up in
hrbrthemes. Don’t hesitate to suggest (or code up) feature requests. This is still an idea in-progress and definitely not ready for prime time without a bit more churning. (Also,
words_to_num() can be optimized, it was hastily crafted).
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News