Two new online R courses on time series (via DataCamp)

By Tal Galili

DataCamp recently launched two new online R courses on time series analysis.

Introduction to Time Series Analysis

What You’ll Learn:

  • Chapter One: Exploratory Time Series Data Analysis (FREE)
    Learn how to organize and visualize time series data in R.
  • Chapter Two: Predicting the Future
    Conduct trend spotting, learn the white noise model, the random walk model, and the definition of stationary processes.
  • Chapter Three: Correlation Analysis and the Autocorrelation Function
    Review the correlation coefficient, then practice estimating and visualizing autocorrelations for time series data.
  • Chapter Four: Autoregression
    Discover the autoregressive model and several of its basic properties.
  • Chapter Five: A Simple Moving Average
    Learn about the simple moving average model, then compare the performance of several models.

Play now…

ARIMA Modeling with R

What You’ll Learn:

  • Chapter One: Time Series Data and Models
    Investigate time series data and learn the basics of ARMA models, which can explain the behavior of such data.
  • Chapter Two: Fitting ARMA Models
    Discover the wonderful world of ARMA models and learn how to fit these models to time series data.
  • Chapter Three: ARIMA Models
    Learn about integrated ARMA (ARIMA) models for nonstationary time series.
  • Chapter Four: Seasonal ARIMA
    Learn how to fit and forecast seasonal time series data using seasonal ARIMA models.

Play now…

Source:: R News

Reproducible Finance with R: Sector Correlations

By Jonathan Regenstein

(This article was first published on RStudio, and kindly contributed to R-bloggers)

by Jonathan Regenstein

Welcome to the first installation of reproducible finance for 2017. It’s a new year, a new President takes office soon, and we could be entering a new political-economic environment. What better time to think about a popular topic over the last few years: equity correlations. Elevated correlations are important for several reasons – life is hard for active managers and diversification gains are vanishing – but I personally enjoy thinking about them more from an inference or data exploration perspective. Are changing correlations telling us something about the world? Are sectors diverging? How much can be attributed to the Central Bank regime at hand? So many questions, so many hypotheses to be explored. Let’s get started.

Today, we will build a Notebook and start exploring the historical rolling correlations between sector ETFs and the S&P 500. That is, we want to explore how equity returns in different sectors have been correlated with the returns of the broader index. Perhaps they are all moving in lockstep, perhaps they have been diverging. Either way, this Notebook will be the first step toward an flexdashboard that lets us do more interactive exploration – choosing different sector ETFs and rolling windows.

We are going to accomplish a few things today. We will load up the sector ETF tickers, then build a function to download their price history and calculate weekly returns. We will save this to one xts object. Next, we will build a function to calculate the rolling correlations between a chosen sector ETF and the S&P 500. Finally, dygraphs will make its usual appearance to help visualize the rolling correlation time series.

As usual, we will be living in the R Markdown world and, by way of disclaimer, the data import and return calculation functions here should be familiar from previous posts. That is by design, and hopefully it won’t be too boring for devotees of this series (I know you’re out there somewhere!). More importantly, I hope the usefulness of reproducible, reusable code is emerging. Some of the code chunks in previous posts might have seemed trivially simple, containing just a simple function and little else. But, the simplicity of those code chunks made it very easy to return to those previous scripts, understand the functions, and use them in this post.

Let’s load up a few packages.

Now, we need the tickers and sectors for the sector ETFs. They are copied below and available here. I deleted the XLRE real estate ETF because it’s only been around since 2015, and I want look back several years in this Notebook.

We’ve got our dataframe of tickers and sectors. Let’s build a function to download price history and then convert those price histories to weekly returns. We’ll use a combination of getSymbols() and periodReturn() to accomplish that. If you want to change this script to use daily returns, change the argument below to period = ‘daily’, but be prepared to import quite a bit more data.

This function has done some good work for us, and it was refreshingly comfortable to put in place because we used very similar functionality in this post and this post.

A pattern seems to be emerging in these Notebooks: grab tickers, get price history, convert to returns and save new xts object. In an ideal world, that pattern of data import and conversion would be so familiar as to be commonplace.

That said, enough with the commonplace stuff – let’s get on to something a little more dangerous: rolling correlations amongst etf returns. Correlations are important because high correlations make it hard to find diversification opportunities and they make it hard to deliver alpha – though I suppose it’s always hard to deliver alpha. Fortunately, we don’t have to worry about generating alpha today so let’s get to our function.

Calculating rolling correlations in R is pretty straightforward. We use the rollapply() function, along with the cor() function, pass in our data and a time window, and it’s off to the races. We’ll create our own function below to handle these jobs and return an xts object.

Notice that this function does something that seems unnecessary: it creates a new xts object that holds the sector returns, SPY returns and the rolling correlation. We don’t have much use for that separate object, and could probably have just added columns to our original xts object. Indeed, if this were our final product we might spend more time eliminating its present. I choose not to do that here for two reasons. First, this Notebook is built to underlie a flexdashboard that could go into production. I want to get the logic right here, then focus more on efficiency in the final app.

Second, and relatedly, we are prioritizing clarity of workflow in this Notebook. It should be crystal clear how we are moving from an xts object of ETF returns to creating a new XTS object of two returns plus one correlation. The goal is for any collaborators, including my future self, to open this Notebook and see the workflow. If that collaborator finds this step to be unnecessary and has a more clever solution – that’s fantastic because it means this document is intelligible enough to serve as the basis for more sophisticated work.

Let’s go ahead and use this function. We will pass in a time series of Information Technology ETF returns and a window of size 20 for the rolling correlation.

Alright, the function seems to have succeeded in building that new xts object and storing the rolling correlation. Now we will use dygraphs to visualize this rolling correlation over time and see if anything jumps out as interesting or puzzling.

The correlation between the Tech ETF and the S&P 500 ETF seems quite high. It dipped a bit in the middle of 2009 and again towards the end of 2013. It would be interesting to see if this was true of the other sector ETFs as well. In other words, were these periods of generally declining correlations, or was it limited to the technology/S&P 500 relationship?

The best way to do some exploratory analysis on that is, no surprise, build a shiny app that allows users to choose their own sectors and rolling windows. We’ll do that next time – see you in a few days!

To leave a comment for the author, please follow the link and comment on their blog: RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

The Genetic Map Comparator: a user-friendly application to display and compare genetic maps

By Holtz

(This article was first published on Blog – The R graph Gallery, and kindly contributed to R-bloggers)

The Genetic Map Comparator is an R Shiny application made to compare and characterize genetic maps. You can use it through the online version and read the related publication in Bioinformatics.

The biological perspective

A genetic map provides the position of genetic markers along chromosomes. Geneticists often have to visualize these maps and calculate their basic statistics (length, # of markers, gap size..). Multiple maps that share some markers have to be dealt with when several segregating populations are studied. These maps are compared to highlight their overall relative strengths and weaknesses (e.g. via marker distributions or map lengths), or local marker inconsistencies.

The Genetic Map Comparator is an effective user-friendly tool to complete these tasks. It is possible to upload your genetic maps and explore them using the various sheets accessible via links at the top of the window. The app provides example datasets, which makes it easy to understand its capacity in less than 2 minutes, so why not have a look?

The Dataviz perspective

This app highlights a few features of the power of shiny as a dataviz explorative tool:

  • The insertion of interactive charts (mainly using plotly) is straightforward
  • CSS can be applied, which gives a nice look to the app
  • It is easy to share the tool: in our case we installed it on our private shiny server, but the app is also usable from the github repository!
  • It is possible to load your files into the app, and export results really easily

If you want to see how to use these features, the code is freely available on github.

Example:

Here is a screenshot of the main tabs of the app:

1

To leave a comment for the author, please follow the link and comment on their blog: Blog – The R graph Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Workout Wednesday Redux (2017 Week 3)

By hrbrmstr

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

I had started a “52 Vis” initiative back in 2016 to encourage folks to get practice making visualizations since that’s the only way to get better at virtually anything. Life got crazy, 52 Vis fell to the wayside and now there are more visible alternatives such as Makeover Monday and Workout Wednesday. They’re geared towards the “T” crowd (I’m not giving a closed source and locked-in-data product any more marketing than two links) but that doesn’t mean R, Python or other open-tool/open-data communities can’t join in for the ride and learning experience.

This week’s workout is a challenge to reproduce or improve upon a chart by Matt Stiles. You should go to both (give them the clicks and eyeballs they both deserve since they did great work). They both chose a line chart, but the whole point of these exercises is to try out new things to help you learn how to communicate better. I chose to use geom_segment() to make mini-column charts since that:

  • eliminates the giant rose-coloured rectangles that end up everywhere
  • helps show the differences a bit better (IMO), and
  • also helps highlight some of the states that have had more difficulties than others

Click/tap to “embiggen”. I kept the same dimensions that Andy did but unlike Matt’s creation this is a plain ol’ PNG as I didn’t want to deal with web fonts (I’m on a Museo Sans Condensed kick at the moment but don’t have it in my TypeKit config yet). I went with official annual unemployment numbers as they may be calculated/adjusted differently (I didn’t check, but I knew that data source existed, so I used it).

One reason I’m doing this is a quote on the Workout Wednesday post:

This will be a very tedious exercise. To provide some context, this took me 2-3 hours to create. Don’t get discouraged and don’t feel like you have to do it all in one sitting. Basically, try to make yours look identical to mine.

This took me 10 minutes to create in R:

#' ---
#' output:
#'  html_document:
#'    keep_md: true
#' ---
#+ message=FALSE
library(ggplot2)
library(hrbrmisc)
library(readxl)
library(tidyverse)

# Use official BLS annual unemployment data vs manually calculating the average
# Source: https://data.bls.gov/timeseries/LNU04000000?years_option=all_years&periods_option=specific_periods&periods=Annual+Data
read_excel("~/Data/annual.xlsx", skip=10) %>%
  mutate(Year=as.character(as.integer(Year)), Annual=Annual/100) -> annual_rate


# The data source Andy Kriebel curated for you/us: https://1drv.ms/x/s!AhZVJtXF2-tD1UVEK7gYn2vN5Hxn #ty Andy!
read_excel("~/Data/staadata.xlsx") %>%
  left_join(annual_rate) %>%
  filter(State != "District of Columbia") %>%
  mutate(
    year = as.Date(sprintf("%s-01-01", Year)),
    pct = (Unemployed / `Civilian Labor Force Population`),
    us_diff = -(Annual-pct),
    col = ifelse(us_diff<0,
               "Better than U.S. National Average",
               "Worse than U.S. National Average")
  ) -> df

credits <- "Notes: Excludes the District of Columbia. 2016 figure represents October rate.nData: U.S. Bureau of Labor Statistics <https://www.bls.gov/lau/staadata.txt>nCredit: Matt Stiles/The Daily Viz <thedailyviz.com>"

#+ state_of_us, fig.height=21.5, fig.width=8.75, fig.retina=2
ggplot(df, aes(year, us_diff, group=State)) +
  geom_segment(aes(xend=year, yend=0, color=col), size=0.5) +
  scale_x_date(expand=c(0,0), date_labels="'%y") +
  scale_y_continuous(expand=c(0,0), label=scales::percent, limit=c(-0.09, 0.09)) +
  scale_color_manual(name=NULL, expand=c(0,0),
                     values=c(`Better than U.S. National Average`="#4575b4",
                              `Worse than U.S. National Average`="#d73027")) +
  facet_wrap(~State, ncol=5, scales="free_x") +
  labs(x=NULL, y=NULL, title="The State of U.S. Jobs: 1976-2016",
       subtitle="Percentage points below or above the national unemployment rate, by state. Negative values represent unemployment ratesnthat were lower — or better, from a jobs perspective — than the national rate.",
       caption=credits) +
  theme_hrbrmstr_msc(grid="Y", strip_text_size=9) +
  theme(panel.background=element_rect(color="#00000000", fill="#f0f0f055")) +
  theme(panel.spacing=unit(0.5, "lines")) +
  theme(plot.subtitle=element_text(family="MuseoSansCond-300")) +
  theme(legend.position="top")

Swap out ~/Data for where you stored the files.

The “weird” looking comments enable me to spin the script and is pretty much just the inverse markup for knitr R Markdown documents. As the comments say, you should really thank Andy for curating the BLS data for you/us.

If I really didn’t pine over aesthetics it would have taken me 5 minutes (most of that was waiting for re-rendering). Formatting the blog post took much longer. Plus, I can update the data source and re-run this in the future without clicking anything. This re-emphasizes a caution I tell my students: beware of dragon droppings (“drag-and-drop data science/visualization tools”).

Hopefully you presently follow or will start following Workout Wednesday and Makeover Monday and dedicate some time to hone your skills with those visualization katas.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Elements of a successful #openscience #rstats workshop

By lortie

(This article was first published on R – christopher lortie, and kindly contributed to R-bloggers)

What makes an open science workshop effective or successful*?

Over the last 15 years, I have had the good fortune to participate in workshops as a student and sometimes as an instructor. Consistently, there were beneficial discovery experiences, and at times, some of the processes highlighted have been transformative. Last year, I had the good fortune to participate in Software Carpentry at UCSB and Software Carpentry at YorkU, and in the past, attend (in part) workshops such as Open Science for Synthesis. Several of us are now deciding what to attend as students in 2017. I have been wondering about the potential efficacy of the workshop model and why it seems that they are so relatively effective. I propose that the answer is expectations. Here is a set of brief lists of observations from workshops that lead me to this conclusion.

*Note: I define a workshop as effective or successful when it provides me with something practical that I did not have before the workshop. Practical outcomes can include tools, ideas, workflows, insights, or novel viewpoints from discussion. Anything that helps me do better open science. Efficacy for me is relative to learning by myself (i.e. through reading, watching webinars, or stuggling with code or data), asking for help from others, taking an online course (that I always give up on), or attending a scientific conference.

Delivery elements of an open science training workshop

  1. Lectures
  2. Tutorials
  3. Demonstrations
  4. Q & A sessions
  5. Hands-on exercises
  6. Webinars or group-viewing recorded vignettes.

Summary expectations from this list: a workshop will offer me content in more than one way unlike a more traditional course offering. I can ask questions right there on the spot about content and get an answer.

Content elements of an open science training workshop

  1. Data and code
  2. Slide decks
  3. Advanced discussion
  4. Experts that can address basic and advanced queries
  5. A curated list of additional resources
  6. Opinions from the experts on the ‘best’ way to do something
  7. A list of problems or questions that need to addressed or solved both routinely and in specific contexts when doing science
  8. A toolkit in some form associated with the specific focus of the workshop.

Summary of expectations from this list: the best, most useful content is curated. It is contemporary, and it would be a challenge for me to find out this on my own.

Pedagogical elements of an open science training workshop

  1. Organized to reflect authentic challenges
  2. Uses problem-based learning
  3. Content is very contemporary
  4. Very light on lecture and heavy on practical application
  5. Reasonably small groups
  6. Will include team science and networks to learn and solve problems
  7. Short duration, high intensity
  8. Will use an open science tool for discussion and collective note taking
  9. Will be organized by major concepts such as data & meta-data, workflows, code, data repositories OR will be organized around a central problem or theme, and we will work together through the steps to solve a problem
  10. There will be a specific, quantifiable outcome for the participants (i.e. we will learn how to do or use a specific set of tools for future work).

Summary of expectations from this list: the training and learning experience will emulate a scientific working group that has convened to solve a problem. In this case, how can we all get better at doing a certain set of scientific activities versus can a group aggregate and summarize a global alpine dataset for instance. These collaborative solving-models need not be exclusive.

Higher-order expectations that summarize all these open science workshop elements

  1. Experts, curated content, and contemporary tools.
  2. Everyone is focussed exclusively on the workshop, i.e. we all try to put our lives on hold to teach and learn together rapidly for a short time.
  3. Experiences are authentic and focus on problem solving.
  4. I will have to work trying things, but the slope of the learning curve/climb will be mediated by the workshop process.
  5. There will be some, but not too much, lecturing to give me the big picture highlights of why I need to know/use a specific concept or tool.

To leave a comment for the author, please follow the link and comment on their blog: R – christopher lortie.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

GitHub Growth Appears Scale Free

By Neil Gunther

In 2013, a blogger claimed that the growth of GitHub (GH) users follows a certain type of diffusion model called Bass diffusion. Growth here refers to the number of unique user IDs as a function of time, not the number project repositories, which can have a high degree of multiplicity.

In a response, I tweeted a plot that suggested GH growth might be following a power law, aka scale free growth. The tell-tale sign is the asymptotic linearity of the growth data on double-log axes, which the original blog post did not discuss. The periods on the x-axis correspond to years, with the first period representing calendar year 2008 and the fifth period being the year 2012.

Scale free networks can arise from preferential attachment to super-nodes that have a higher vertex degree and therefore more connections to other nodes, i.e., a kind of rich-get-richer effect. Similarly for GH growth viewed as a particular kind of social network. The interaction between software developers using GH can be thought of as involving super-nodes that correspond to influential users influencing prospective GH users to open a new account and contribute to their project.

On this basis, I predicted GH would reach 4 million users during October 2013 and 5 million users during March 2014 (yellow points in the Linear axes plot below). In fact, GH reached those values slightly earlier than predicted by the power law model, and slightly later than the dates predicted by the diffusion model.

Since 2013, new data has been reported so, I extended my previous analysis in R. Details of the respective models are contained in the R script at the end of this post. In the Linear axes plot, both the diffusion model and power model essentially form an envelope around the newer data: diffusive on the upper side (red curve) and power law on the lower side (blue curve). In thise sense, it could be argued that the jury is still out on which model offers the more reliable predictions.

However, there is an aspect of the diffusion model that was overlooked in 2013. It predicts that GH growth will eventually plateau at 20 million users in 2020 (the 12th period, not shown). The beginnings of this leveling off is apparent in the 10th period (i.e., 2017). By contrast, the power law model predicts that GH will reach 23.65 million users by the end of the same period (yellow point). Whereas the diffusion and power law models respectively represent the upper and lower edges of an envelope surrounding the more recent data in periods 6–9, their predictions will start to diverge in the 10th period.

“GitHub is not the only player in the market. Other companies like GitLab are doing a good job but GitHub has a huge head start and the advantage of the network effect around public repositories. Although GitHub’s network effect is weaker compared to the likes of Facebook/Twitter or Lyft/Uber, they are the default choice right now.”GitHub is Doing Much Better Than Bloomberg Thinks

Although there will inevitably be an equilibrium bound on the number of active GH users, it seems unlikely to be as small as 20 million, given the combination of GH’s first-mover advantage and its current popularity. Presumably the private investors in GH also hope it will be a large number. This year will tell.


# Data source ... https://classic.scraperwiki.com/scrapers/github_users_each_year/

#LINEAR axes plot
plot(df.gh3$index, df.gh3$users, xlab="Period (years)",
ylab="Users (million)", col="gray",
ylim=c(0, 3e7), xaxt="n", yaxt="n")
axis(side=1, tck=1, at=c(0, seq(12,120,12)), labels=0:10,
col.ticks="lightgray", lty="dotted")
axis(side=2, tck=1, at=c(0, 10e6, 20e6, 30e6), labels=c(0,10,20,30),
col.ticks="lightgray", lty="dotted")

# Simple exp model
curve(coef(gh.exp)[2] * exp(coef(gh.exp)[1] * (x/13)),
from=1, to=108, add=TRUE, col="red2", lty="dot dash")

# Super-exp model
curve(49100 * (x/13) * exp(0.54 * (x/13)),
from=1, to=120, add=TRUE, col="red", lty="dashed")

# Bass diffusion model
curve(21e6 * ( 1 - exp(-(0.003 + 0.83) * (x/13)) ) / ( 1 + (0.83 / 0.003) * exp(-(0.003 + 0.83) * (x/13)) ),
from=1, to=120, add=TRUE, col="red")

# Power law model
curve(10^coef(gh.fit)[2] * (x/13)^coef(gh.fit)[1], from=1, to=120, add=TRUE,
col="blue")

title(main="Linear axes: GitHub Growth 2008-2017")
legend("topleft",
legend=c("Original data", "New data", "Predictions", "Exponentital", "Super exp", "Bass diffusion", "Scale free"),
lty=c(NA,NA,NA,4,2,1,1), pch=c(1,19,21,NA,NA,NA,NA),
col=c("gray", "black", "yellow", "red", "red", "red", "blue"),
pt.bg = c(NA,NA,"yellow",NA,NA,NA,NA),
cex=0.75, inset=0.05)

Source:: R News

RProtoBuf 0.4.8: Windows support for proto3

By Thinking inside the box

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

Issue ticket #20 demonstrated that we had not yet set up Windows for version 3 of Google Protocol Buffers (“Protobuf”) — while the other platforms support it. So I made the change, and there is release 0.4.8.

RProtoBuf provides R bindings for the Google Protocol Buffers (“Protobuf”) data encoding and serialization library used and released by Google, and deployed as a language and operating-system agnostic protocol by numerous projects.

The NEWS file summarises the release as follows:

Changes in RProtoBuf version 0.4.8 (2017-01-17)

  • Windows builds now use the proto3 library as well (PR #21 fixing #20)

CRANberries also provides a diff to the previous release. The RProtoBuf page has an older package vignette, a ‘quick’ overview vignette, a unit test summary vignette, and the pre-print for the JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

ggedit 0.0.2: a GUI for advanced editing of ggplot2 objects

By Tal Galili

(This article was first published on R – R-statistics blog, and kindly contributed to R-bloggers)

Guest post by Jonathan Sidi, Metrum Research Group

Last week the updated version of ggedit was presented in RStudio::conf2017. First, a BIG thank you to the whole RStudio team for a great conference and being so awesome to answer the insane amount of questions I had (sorry!). For a quick intro to the package see the previous post.

To install the package:

devtools::install_github("metrumresearchgroup/ggedit",subdir="ggedit")

Highlights of the updated version.

  • verbose script handling during updating in the gagdet (see video below)
  • verbose script output for updated layers and theme to parse and evaluate in console or editor
  • colourpicker control for both single colours/fills and and palletes
  • output for scale objects eg scale*grandient,scale*grandientn and scale*manual
  • verbose script output for scales eg scale*grandient,scale*grandientn and scale*manual to parse and evaluate in console or editor
  • input plot objects can have the data in the layer object and in the base object.
    • ggplot(data=iris,aes(x=Sepal.Width,y=Sepal.Length,colour=Species))+geom_point()
    • ggplot(data=iris,aes(x=Sepal.Width,y=Sepal.Length))+geom_point(aes(colour=Species))
    • ggplot()+geom_point(data=iris,aes(x=Sepal.Width,y=Sepal.Length,colour=Species))
  • plot.theme(): S3 method for class ‘theme’
    • visualizing theme objects in single output
    • visual comparison of two themes objects in single output
    • will be expanded upon in upcoming post

RStudio::conf2017 Presentation

#devtools::install_github("metrumresearchgroup/ggedit",subdir="ggedit")
rm(list=ls())
library(ggedit)
#?ggedit

p0=list(
  Scatter=iris%>%ggplot(aes(x =Sepal.Length,y=Sepal.Width))+
    geom_point(aes(colour=Species),size=6),
  
  ScatterFacet=iris%>%ggplot(aes(x =Sepal.Length,y=Sepal.Width))+
    geom_point(aes(colour=Species),size=6)+
      geom_line(linetype=2)+
    facet_wrap(~Species,scales='free')+
    labs(title='Some Title')
  )

#a=ggedit(p.in = p0,verbose = T) #run ggedit
dat_url <- paste0("https://raw.githubusercontent.com/metrumresearchgroup/ggedit/master/RstudioExampleObj.rda")
load(url(dat_url)) #pre-run example

ldply(a,names)
##                     .id      V1           V2
## 1          UpdatedPlots Scatter ScatterFacet
## 2         UpdatedLayers Scatter ScatterFacet
## 3 UpdatedLayersElements Scatter ScatterFacet
## 4     UpdatedLayerCalls Scatter ScatterFacet
## 5         updatedScales Scatter ScatterFacet
## 6    UpdatedScalesCalls Scatter ScatterFacet
## 7         UpdatedThemes Scatter ScatterFacet
## 8     UpdatedThemeCalls Scatter ScatterFacet
plot(a)

comparePlots=c(p0,a$UpdatedPlots)
names(comparePlots)[c(3:4)]=paste0(names(comparePlots)[c(3:4)],"Updated")

Initial Comparison Plot

plot(as.ggedit(comparePlots))

Apply updated theme of first plot to second plot

comparePlots$ScatterFacetNewTheme=p0$ScatterFacet+a$UpdatedThemes$Scatter

plot(as.ggedit(comparePlots[c("ScatterFacet","ScatterFacetNewTheme")]),
      plot.layout = list(list(rows=1,cols=1),list(rows=2,cols=1))
     )

#Using Remove and Replace Function ##Overlay two layers of same geom

(comparePlots$ScatterMistake=p0$Scatter+a$UpdatedLayers$ScatterFacet[[1]])

Remove

(comparePlots$ScatterNoLayer=p0$Scatter%>%
  rgg(oldGeom = 'point'))

Replace Geom_Point layer on Scatter Plot

(comparePlots$ScatterNewLayer=p0$Scatter%>%
  rgg(oldGeom = 'point',
      oldGeomIdx = 1,
      newLayer = a$UpdatedLayers$ScatterFacet[[1]]))

Remove and Replace Geom_Point layer and add the new theme

(comparePlots$ScatterNewLayerTheme=p0$Scatter%>%
  rgg(oldGeom = 'point',
      newLayer = a$UpdatedLayers$ScatterFacet[[1]])+
  a$UpdatedThemes$Scatter)

Cloning Layers

A geom_point layer

(l=p0$Scatter$layers[[1]])
## mapping: colour = Species 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Clone the layer

(l1=cloneLayer(l))
## mapping: colour = Species 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
all.equal(l,l1)
## [1] TRUE

Verbose copy of layer

(l1.txt=cloneLayer(l,verbose = T))
## [1] "geom_point(mapping=aes(colour=Species),na.rm=FALSE,size=6,data=NULL,position="identity",stat="identity",show.legend=NA,inherit.aes=TRUE)"

Parse the text

(l2=eval(parse(text=l1.txt)))
## mapping: colour = Species 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
all.equal(l,l2)
## [1] TRUE

Back to our example

  #Original geom_point layer
  parse(text=cloneLayer(p0$ScatterFacet$layers[[1]],verbose = T))
## expression(geom_point(mapping = aes(colour = Species), na.rm = FALSE, 
##     size = 6, data = NULL, position = "identity", stat = "identity", 
##     show.legend = NA, inherit.aes = TRUE))
  #new Layer
  parse(text=a$UpdatedLayerCalls$ScatterFacet[[1]])
## expression(geom_point(mapping = aes(colour = Species), na.rm = FALSE, 
##     size = 3, shape = 22, fill = "#BD2020", alpha = 1, stroke = 0.5, 
##     data = NULL, position = "identity", stat = "identity", show.legend = NA, 
##     inherit.aes = TRUE))

<!—

Visualize Themes

pTheme=list()
(pTheme$Base=plot(a$UpdatedThemes$Scatter))

Visualize Part of Themes

(pTheme$Select=plot(a$UpdatedThemes$Scatter,themePart = c('plot','legend'),fnt = 18))

Visually Compare Theme

(pTheme$Compare=plot(obj=a$UpdatedThemes$Scatter,obj2 = ggplot2:::theme_get()))

—>


Jonathan Sidi joined Metrum Researcg Group in 2016 after working for several years on problems in applied statistics, financial stress testing and economic forecasting in both industrial and academic settings.

To learn more about additional open-source software packages developed by Metrum Research Group please visit the Metrum website.

Contact: For questions and comments, feel free to email me at: yonis@metrumrg.com or open an issue in github.

To leave a comment for the author, please follow the link and comment on their blog: R – R-statistics blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Git Gud with Git and R

By David Smith

If you’re doing any kind of in-depth programming in the R language (say, creating a report in Rmarkdown, or developing a package) you might want to consider using a version-control system. And if you collaborate with another person (or a team) on the work, it makes things infinitely easier when it comes to coordinating changes. Amongst other benefits, a version-control system:

  • Saves you from the worry of making irrevocable changes. Instead of keeping multiple versions of files around (are filenames like Report.Rmd; Report2.Rmd; Report-final.Rmd; Report-final-final.Rmd familiar?) you just keep the latest version of the file, knowing that the older versions are accessible should you need them.
  • Keeps a remote backup of your files. If you accidentally delete a critical file, you can retrieve it. If your hard drive crashes, it’s easy to restore the project.
  • Makes it easy to work with others. Multiple people can work on the same file at the same time, and it’s (relatively) easy to keep changes in sync.
  • Relatedly, it makes it easy to get a collaborator. Even if your project is currently a solo effort, you may want to get help in the future, and a version-control system makes it easy to add project members. If it’s an open-source project, you might even get contributions from people you don’t know!

There are many version control systems out there, but a popular one is Git. You’ve possibly interacted with projects (especially R packages) managed under Git on Github, the online version of Git. And while you can get a fair bit done just with your browser and GitHub, the real power comes by installing Git on your desktop. Using git’s command-line interface is a bear (here’s a fake, but representative example of the documentation), but fortunately RStudio and RTVS provide interfaces that make things much easier.

If you want to get started with Git and RStudio, Jenny Bryan has provided an excellent guide to setting up your system and using version control: Happy Git and Gitgub for the R User. The guide is quite long and detailed, but fear not: the pace is brisk, and provides everything you need to get going. During a two-hour workshop that Jenny presented at the RStudio conference, I was able to install Git for Windows, configure it with my GitHub credentials, connect it to RStudio, commit changes to an existing R package, and create and share my own repository. It’s easier than you think! Just start with the link below, and work your way through the sections.

Jenny Bryan: Happy Git and Gitgub for the R User

Source:: R News

Multivariate Apply Exercises

By John Akwei

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

mapply() works with multivariate arrays, and applys a function to a set of vector or list arguments. mapply() also simplifies the output.

Structure of the mapply() function:
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

Answers to the exercises are available here.

Exercise 1
Beginning level

Required dataframe:
PersonnelData <- data.frame(Representative=c(1:4),
Sales=c(95,110,115,90), Territory=c(1:4))

Using mapply(), find the classes of PersonnelData‘s columns.

Exercise 2
Beginning level

Print “PersonnelData” with the mapply() function.

Exercise 3
Beginning level

Use mapply() to inspect “PersonnelData” for numeric values.

Exercise 4
Intermediate level

Use mapply() to sum the vectors “5:10” and “20:25“.

Exercise 5
Intermediate level

Use mapply() to paste the vector “1:4” and “5:8“, with the separator “LETTERS[1:4]“.

Learn more about mapply, and the entire family of apply() functions in the online course Learn R by Intensive Practice. In this course you will learn how to:

  • Do any sort of manipulation with datasets
  • Create and master the manipulation of vectors, lists, dataframes, and matrices
  • Confidently write apply() functions and design any logic within the apply function.
  • Melt, reshape, aggregate, and make pivot tables from dataframes
  • And much more

Exercise 6
Intermediate level

Use mapply() to paste “PersonnelData$Representative“, “PersonnelData$Sales“, and “PersonnelData$Territory“, with the
MoreArgs=” argument of “list(sep="-")“.

Exercise 7
Advanced level

Required variable:
NewSales <- data.frame(Representative=c(1:4), Sales=c(104, 97, 112, 94), Territory=c(1:4))

Sum the corresponding elements of PersonnelData$Sales and NewSales$Sales.

Exercise 8
Advanced level

Required function:
merge.function <- function(x,y){return(x+y)}

Use merge.function to combine the Sales totals from PersonnelData and NewSales.

Exercise 9
Advanced level

mcmapply is a parallelized version of mapply.

The structure of mcmapply() is:
mcmapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE, mc.preschedule = TRUE, mc.set.seed = TRUE, mc.silent = FALSE, mc.cores = getOption("mc.cores", 2L), mc.cleanup = TRUE)

Required library:
library(parallel)

Use mcmapply() to generate 5 lists of 1:5 random numbers.

Exercise 10
Advanced level

Using mcmapply(), create a 10 by 10 matrix with 10 rows of the sequence 1:10:

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News