By David Smith
Like many modern cities, New York offers a public pick-up/drop-off bicycle service (called Citi Bikes). Subscribing City Bike members can grab a bike from almost 500 stations scattered around the city, hop on and ride to their destination, and drop the bike at a nearby station. (Visitors to the city can also purchase day passes.) The City Bike program shares data to the public about the operation of the service: time and location of pick-ups and drop-offs, and basic demographic data (age and gender) of subscriber riders.
Data Scientist Todd Schneider has followed-up on his tour-de-force analysis of Taxi Rides in NYC with a similar analysis of the Citi Bike data. Check out the wonderful animation of bike rides on September 16 below. While the Citi Bike data doesn’t include actual trajectories (just the pick-up and drop-off locations), Todd has “interpolated” these points using Google Maps biking directions. Though these may not match actual routes (and gives extra weight to roads with bike lanes), it’s nonetheless an elegant visualization of bike commuter patterns in the city.
Check out in particular the rush hours of 7-9AM and 4-6PM. September 16 was a Wednesday, but as Todd shows in the chart below, biking patterns are very different on the weekends as the focus switches from commuting to pleasure rides.
Todd also matched the biking data with NYC weather data to take a look at its effect on biking patterns. Unsurprisingly, low temperatures and rain both have a dampening effect (pun intended!) on ridership: one inch of rain deters as many riders as a 24-degree (F) drop in temperature. Surprisingly, snow doesn’t have such a dramatic effect: an inch of snow depresses ridership like a 1.4 degree drop in temperature. (However, Todd’s data doesn’t include the recent blizzard in New York, from which many City Bike stations are still waiting to be dug out.)
Todd conducted all of the analysis and data visualization with the R language (he shares the R code on Github). He mainly used the the RPostgreSQL package for data extraction, the dplyr package for the data manipulation, the ggplot2 package for the graphics, and the minpack.lm package for the nonlinear least squares analysis of the weather impact.
There’s plenty more detail to the analysis, including the effects of age and gender on cycling speed. For the complete analysis and lots more interesting charts, follow the link to the blog post below.
Todd W. Schneider: A Tale of Twenty-Two Million Citi Bikes: Analyzing the NYC Bike Share System
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…
Source:: R News