Fundamentals of R: Free course by General Assembly & DataCamp

By DataCamp

Screenshot 2015-03-25 17.29.47

(This article was first published on The DataCamp Blog » R, and kindly contributed to R-bloggers)

Together with General Assembly, DataCamp created a free set of videos on the fundamentals of R. Discover it now!

In a series of short videos, the team behind DataCamp teaches you about the fundamentals of R, an open-source statistical programming language. Use this course to understand the advantages and disadvantages of R, and discover at the same time how you can take your first steps into the amazing world of data science.

With the help of real-life case studies from Facebook and OKCupid, you’ll see the power of R and understand its advantages. Furthermore, a real time execution of R analysis via a walkthrough screencast is provided so you can start doing your own analysis. Hope you will enjoy it!

About:

  • DataCamp is an online Data Science school using video material and coding challenges to teach data analysis with R. For only $25/month you can start your data-driven career.
  • General Assembly is an education institution that is creating a unique community of learners that spans the globe. Their mission is to transform thinkers into creators through education in technology, business and design.

twittergoogle_pluslinkedin

The post Fundamentals of R: Free course by General Assembly & DataCamp appeared first on The DataCamp Blog .

To leave a comment for the author, please follow the link and comment on his blog: The DataCamp Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Targeted Learning R Packages for Causal Inference and Machine Learning

By Joseph Rickert

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Sherri Rose
Assistant Professor of Health Care Policy
Harvard Medical School

Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model, and parameter mapping. These estimators of causal inference parameters are double robust and have a variety of other desirable statistical properties.

Targeted maximum likelihood estimation built on the loss-based “super learning” system such that lower-dimensional parameters could be targeted (e.g., a marginal causal effect); the remaining bias for the (low-dimensional) target feature of the probability distribution was removed. Targeted learning for effect estimation and causal inference allows for the complete integration of machine learning advances in prediction while providing statistical inference for the target parameter(s) of interest. Further details about these methods can be found in the many targeted learning papers as well as the 2011 targeted learning book.

Practical tools for the implementation of targeted learning methods for effect estimation and causal inference have developed alongside the theoretical and methodological advances. While some work has been done to develop computational tools for targeted learning in proprietary programming languages, such as SAS, the majority of the code has been built in R.

Of key importance are the two R packages SuperLearner and tmle. Ensembling with SuperLearner allows us to use many algorithms to generate an ideal prediction function that is a weighted average of all the algorithms considered. The SuperLearner package, authored by Eric Polley (NCI), is flexible, allowing for the integration of dozens of prespecified potential algorithms found in other packages as well as a system of wrappers that provide the user with the ability to design their own algorithms, or include newer algorithms not yet added to the package. The package returns multiple useful objects, including the cross-validated predicted values, final predicted values, vector of weights, and fitted objects for each of the included algorithms, among others.

Below is sample code with the ensembling prediction package SuperLearner using a small simulated data set.

library(SuperLearner)
##Generate simulated data##
set.seed(27)
n<-500
data <- data.frame(W1=runif(n, min = .5, max = 1),
W2=runif(n, min = 0, max = 1),
W3=runif(n, min = .25, max = .75),
W4=runif(n, min = 0, max = 1))
data <- transform(data, #add W5 dependent on W2, W3
W5=rbinom(n, 1, 1/(1+exp(1.5*W2-W3))))
data <- transform(data, #add Y dependent on W1, W2, W4, W5
Y=rbinom(n, 1,1/(1+exp(-(-.2*W5-2*W1+4*W5*W1-1.5*W2+sin(W4))))))
summary(data)
 
##Specify a library of algorithms##
SL.library <- c("SL.nnet", "SL.glm", "SL.randomForest")
 
##Run the super learner to obtain predicted values for the super learner as well as CV risk for algorithms in the library##
fit.data.SL<-SuperLearner(Y=data[,6],X=data[,1:5],SL.library=SL.library, family=binomial(),method="method.NNLS", verbose=TRUE)
 
##Run the cross-validated super learner to obtain its CV risk##
fitSL.data.CV <- CV.SuperLearner(Y=data[,6],X=data[,1:5], V=10, SL.library=SL.library,verbose = TRUE, method = "method.NNLS", family = binomial())
 
##Cross validated risks##
mean((data[,6]-fitSL.data.CV$SL.predict)^2) #CV risk for super learner
fit.data.SL #CV risks for algorithms in the library

The final lines of code return the cross-validated risks for the super learner as well as each algorithm considered within the super learner. While a trivial example with a small data set and few covariates, these results demonstrate that the super learner, which takes a weighted average of the algorithms in the library, has the smallest cross-validated risk and outperforms each individual algorithm.

The tmle package, authored by Susan Gruber (Reagan-Udall Foundation), allows for the estimation of both average treatment effects and parameters defined by a marginal structural model in cross-sectional data with a binary intervention. This package also includes the ability to incorporate missingness in the outcome and the intervention, use SuperLearner to estimate the relevant components of the likelihood, and use data with a mediating variable. Additionally, TMLE and collaborative TMLE R code specifically tailored to answer quantitative trait loci mapping questions, such as those discussed in Wang et al 2011, is available in the supplementary material of that paper.

The multiPIM package, authored by Stephan Ritter (Omicia, Inc.), is designed specifically for variable importance analysis, and estimates an attributable-risk-type parameter using TMLE. This package also allows the use of SuperLearner to estimate nuisance parameters and produces additional estimates using estimating-equation-based estimators and g-computation. The package includes its own internal bootstrapping function to calculate standard errors if this is preferred over the use of influence curves, or influence curves are not valid for the chosen estimator.

Four additional prediction-focused packages are casecontrolSL, cvAUC, subsemble, and h2oEnsemble, all primarily authored by Erin LeDell (Berkeley). The casecontrolSL package relies on SuperLearner and performs subsampling in a case-control design with inverse-probability-of-censoring-weighting, which may be particularly useful in settings with rare outcomes. The cvAUC package is a tool kit to evaluate area under the ROC curve estimators when using cross-validation. The subsemble package was developed based on a new approach to ensembling that fits each algorithm on a subset of the data and combines these fits using cross-validation. This technique can be used in data sets of all size, but has been demonstrated to be particularly useful in smaller data sets. A new implementation of super learner can be found in the Java-based h2oEnsemble package, which was designed for big data. The package uses the H2O R interface to run super learning in R with a selection of prespecified algorithms.

Another TMLE package is ltmle, primarily authored by Joshua Schwab (Berkeley). This package mainly focuses on parameters in longitudinal data structures, including the treatment-specific mean outcome and parameters defined by a marginal structural model. The package returns estimates for TMLE, g-computation, and estimating-equation-based estimators.

The text above is a modified excerpt from the chapter “Targeted Learning for Variable Importance” by Sherri Rose in the forthcoming Handbook of Big Data (2015) edited by Peter Buhlmann, Petros Drineas, Michael John Kane, and Mark Van Der Laan to be published by CRC Press.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Using genomation to analyze methylation profiles from Roadmap epigenomics and ENCODE

By Katarzyna Wręczycka

(This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers)

The


Distribution of covered CpGs across gene regions

genomation facilitates visualization of given locations of features aggregated by exons, introns, promoters and TSSs. To find the distribution of covered CpGs within these gene structures, we will use transcript features we previously obtained. Here is the breakdown of the code
  1. Count overlap statistics between our CpGs from WGBS and RRBS H1 cell type and gene structures
  2. Calculate percentage of CpGs overlapping with annotation
  3. plot them in a form of pie charts



To leave a comment for the author, please follow the link and comment on his blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Another Interactive Map for the Cholera Dataset

By arthur charpentier

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Following my previous post, François (aka @FrancoisKeck) posted a comment mentionning another package I could use to get an interactive map, the rleafmap package. And the heatmap was here easy to include.

This time, we do not use openstreetmap. The first part is still the same, to get the data,

> require(rleafmap)
> library(sp)
> library(rgdal)
> library(maptools)
> library(KernSmooth)
> setwd("/home/arthur/Documents/")
> deaths <- readShapePoints("Cholera_Deaths")
> df_deaths <- data.frame(deaths@coords)
> coordinates(df_deaths)=~coords.x1+coords.x2
> proj4string(df_deaths)=CRS("+init=epsg:27700") 
> df_deaths = spTransform(df_deaths,CRS("+proj=longlat +datum=WGS84"))
> df=data.frame(df_deaths@coords)

To get a first visualisation, use

> stamen_bm <- basemap("stamen.toner")
> j_snow <- spLayer(df_deaths, stroke = FALSE)
> writeMap(stamen_bm, j_snow, width = 1000, height = 750, setView = c( mean(df[,1]),mean(df[,2])), setZoom = 14)

and again, using the + and the – in the top left area, we can zoom in, or out. Or we can do it manually,

> writeMap(stamen_bm, j_snow, width = 1000, height = 750, setView = c( mean(df[,1]),mean(df[,2])), setZoom = 16)

To get the heatmap, use

> library(spatstat)
> library(maptools)

> win <- owin(xrange = bbox(df_deaths)[1,] + c(-0.01,0.01), yrange = bbox(df_deaths)[2,] + c(-0.01,0.01))
> df_deaths_ppp <- ppp(coordinates(df_deaths)[,1],  coordinates(df_deaths)[,2], window = win)
> 
> df_deaths_ppp_d <- density.ppp(df_deaths_ppp, 
  sigma = min(bw.ucv(df[,1]),bw.ucv(df[,2])))
 
> df_deaths_d <- as.SpatialGridDataFrame.im(df_deaths_ppp_d)
> df_deaths_d$v[df_deaths_d$v < 10^3] <- NA

> stamen_bm <- basemap("stamen.toner")
> mapquest_bm <- basemap("mapquest.map")
 
> j_snow <- spLayer(df_deaths, stroke = FALSE)
> df_deaths_den <- spLayer(df_deaths_d, layer = "v", cells.alpha = seq(0.1, 0.8, length.out = 12))
> my_ui <- ui(layers = "topright")

> writeMap(stamen_bm, mapquest_bm, j_snow, df_deaths_den, width = 1000, height = 750, interface = my_ui, setView = c( mean(df[,1]),mean(df[,2])), setZoom = 16)

The amazing thing here are the options in the top right corner. For instance, we can remove some layers, e.g. to remove the points

or to change the background

To get an html file, instead of a standard visualisation in RStudio, use

> writeMap(stamen_bm, mapquest_bm, j_snow, df_deaths_den, width = 450, height = 350, interface = my_ui, setView = c( mean(df[,1]),mean(df[,2])), setZoom = 16, directView ="viewer")

which will generate the html table (as well as some additional files actually) above. Awesome, isn’t it?

To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Interactive pivot tables with R

By Markus Gesmann

(This article was first published on mages’ blog, and kindly contributed to R-bloggers)

I love interactive

## Install packages
library(devtools)
install_github("ramnathv/htmlwidgets")
install_github("smartinsightsfromdata/rpivotTable")
## Load rpivotTable
library(rpivotTable)
data(mtcars)
## One line to create pivot table
rpivotTable(mtcars, rows="gear", col="cyl", aggregatorName="Average",
vals="mpg", rendererName="Treemap")

The following animated Gif from Nicolas’ project page gives an idea of the interactive functionality of PivotTable.js.

Example of PivotTable.js Source: Nicolas Kruchten

Session Info

R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base

other attached packages:
[1] rpivotTable_0.1.3.4

loaded via a namespace (and not attached):
[1] digest_0.6.8 htmltools_0.2.6
[3] htmlwidgets_0.3.2 RJSONIO_1.3-0
[5] tools_3.1.3 yaml_2.1.13
This post was originally published on mages’ blog.

To leave a comment for the author, please follow the link and comment on his blog: mages’ blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

all syntax to download and analyze publicly-available survey microdata confirmed to work on windows, macintosh, and unix systems

By Anthony Damico

(This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers)

the creators of monetdb at centrum wiskunde and informatica generously provided me with a unix shell account to test out each of the scripts in the repository. much of my code already worked across platforms, but i have finished tweaking and testing, and can confirm that every script now works without changes on all three platforms. special thanks to hannes, our man in amsterdam. if you want to analyze millions of records, on your personal computer, for free, in seconds, read this.

To leave a comment for the author, please follow the link and comment on his blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

rPithon vs. rPython

By statcompute

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two packages are fundamentally different. Wihle rPithon communicates with Python from R through pipes, rPython accomplishes the same task with json. A major advantage of rPithon over rPython is that multiple Python processes can be started within a R session. However, rPithon is not very robust while exchanging large data objects between R and Python.

rPython Session

library(sqldf)
df_in <- sqldf('select Year, Month, DayofMonth from tbl2008 limit 5000', dbname = '/home/liuwensui/Documents/data/flights.db')
library(rPython)
### R DATA.FRAME TO PYTHON DICTIONARY ###
python.assign('py_dict', df_in)
### PASS PYTHON DICTIONARY BACK TO R LIST
r_list <- python.get('py_dict')
### CONVERT R LIST TO DATA.FRAME
df_out <- data.frame(r_list)
dim(df_out)
# [1] 5000    3
#
# real	0m0.973s
# user	0m0.797s
# sys	0m0.186s

rPithon Session

library(sqldf)
df_in <- sqldf('select Year, Month, DayofMonth from tbl2008 limit 5000', dbname = '/home/liuwensui/Documents/data/flights.db')
library(rPithon)
### R DATA.FRAME TO PYTHON DICTIONARY ###
pithon.assign('py_dict', df_in)
### PASS PYTHON DICTIONARY BACK TO R LIST
r_list <- pithon.get('py_dict')
### CONVERT R LIST TO DATA.FRAME
df_out <- data.frame(r_list)
dim(df_out)
# [1] 5000    3
#
# real	0m0.984s
# user	0m0.771s
# sys	0m0.187s

To leave a comment for the author, please follow the link and comment on his blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Improved memory usage and RJSONIO compatibility in jsonlite 0.9.15

By Jeroen Ooms

opencpu logo

(This article was first published on OpenCPU, and kindly contributed to R-bloggers)

The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. Last week version 0.9.15 appeared on CRAN which improves memory usage and compatibility with other packages.

Migrating to jsonlite

The upcoming release of shiny will switch from RJSONIO to jsonlite. To make the transition painless for shiny users, Winston Chang has added some compatibility options to jsonlite that mimic the (legacy) behavior of RJSONIO. The following wrapper results in the same output as RJSONIO::toJSON for the majority of cases. Hopefully this will make it easier for other package authors to make the transition to jsonlite as well.

# RJSONIO compatibility wrapper
toJSON_legacy <- function(x, ...) {
  jsonlite::toJSON(I(x), dataframe = "columns", null = "null", na = "null",
   auto_unbox = TRUE, use_signif = TRUE, force = TRUE,
   rownames = FALSE, keep_vec_names = TRUE, ...)
}

However be aware that the RJSONIO defaults can sometimes result in unexpected behavior and odd edge cases (which is why jsonlite was created in the first place). Therefore it is still recommended to switch to the jsonlite defaults when possible (see jsonlite paper for a discussion on the mapping). One exception is perhaps the auto_unbox argument, which many people seem to prefer to TRUE for encoding relatively simple static data structures.

Memory usage

The new version should use less memory when parsing JSON, especially from a file or URL. This is mostly due to a new push-parser implementation that can incrementally parse JSON in little pieces, which eliminates overhead of copying gigantic JSON strings. In addition, jsonlite now uses the new curl package for retrieving data via a connection interface.

mydata1 <- jsonlite::fromJSON("https://jeroenooms.github.io/data/dmd.json")

The call above is results in the same output as the call below, but it should consume less memory, especially for very large json files.

library(httr)
req <- GET("https://jeroenooms.github.io/data/dmd.json")
mydata2 <- jsonlite::fromJSON(content(req, "text"))

None of this changes anything in the API, these changes are all internal.

To leave a comment for the author, please follow the link and comment on his blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

MCMskv, Lenzerheide, Jan. 5-7, 2016

By xi’an

moonrise

(This article was first published on Xi’an’s Og » R, and kindly contributed to R-bloggers)

Following the highly successful [authorised opinion!, from objective sources] MCMski IV, in Chamonix last year, the BayesComp section of ISBA has decided in favour of a two-year period, which means the great item of news that next year we will meet again for MCMski V [or MCMskv for short], this time on the snowy slopes of the Swiss town of Lenzerheide, south of Zürich. The committees are headed by the indefatigable Antonietta Mira and Mark Girolami. The plenary speakers have already been contacted and Steve Scott (Google), Steve Fienberg (CMU), David Dunson (Duke), Krys Latuszynski (Warwick), and Tony Lelièvre (Mines, Paris), have agreed to talk. Similarly, the nine invited sessions have been selected and will include Hamiltonian Monte Carlo, Algorithms for Intractable Problems (ABC included!), Theory of (Ultra)High-Dimensional Bayesian Computation, Bayesian NonParametrics, Bayesian Econometrics, Quasi Monte Carlo, Statistics of Deep Learning, Uncertainty Quantification in Mathematical Models, and Biostatistics. There will be afternoon tutorials, including a practical session from the Stan team, tutorials for which call is open, poster sessions, a conference dinner at which we will be entertained by the unstoppable Imposteriors. The Richard Tweedie ski race is back as well, with a pair of Blossom skis for the winner!

As in Chamonix, there will be parallel sessions and hence the scientific committee has issued a call for proposals to organise contributed sessions, tutorials and the presentation of posters on particularly timely and exciting areas of research relevant and of current interest to Bayesian Computation. All proposals should be sent to Mark Girolami directly by May the 4th (be with him!).

Filed under: Kids, Mountains, pictures, R, Statistics, Travel, University life Tagged: ABC, BayesComp, Bayesian computation, Blossom skis, Chamonix, Glenlivet, Hamiltonian Monte Carlo, intractable likelihood, ISBA, MCMSki, MCMskv, Monte Carlo Statistical Methods, Richard Tweedie, ski town, STAN, Switzerland, Zurich

To leave a comment for the author, please follow the link and comment on his blog: Xi’an’s Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Fastest Growing Software for Scholarly Analytics: Python, R, KNIME…

By Bob Muenchen

Figure 2e. Change in the number of scholarly articles using each software in the most recent two complete years (2013 to 2014). Packages shown in red are "hot" and growing, while those shown in blue are "cooling down" or declining.

(This article was first published on r4stats.com » R, and kindly contributed to R-bloggers)

In my ongoing quest to “analyze the world of analytics”, I’ve added the following section below to The Popularity of Data Analysis Software:

It would be useful to have growth trend graphs for each of the analytics packages I track, but collecting such data is too time consuming since it must be re-collected every year (since search algorithms change). What I’ve done instead is collect data only for the past two complete years, 2013 and 2014. Figure 2e shows the percent change from 2013 to 2014, with the “hot” packages whose use is growing shown in red. Those whose use is declining or “cooling” are shown in blue. Since the number of articles tends to be in the thousands or tens of thousands, I have removed any software that had fewer than 100 articles in 2013. Going from one to five articles may represent 500% growth, but it’s not of much interest.

Figure 2e. Change in the number of scholarly articles using each software in the most recent two complete years (2013 to 2014). Packages shown in red are “hot” and growing, while those shown in blue are “cooling down” or declining.

The three fastest growing packages are all free and open source: Python, R and KNIME. All three saw more than 25% growth. Note that the Python figures are strictly for analytics use as defined here. At the other end of the scale are SPSS and SAS, both of which declined in use by around 25%. Recall that Fig. 2a shows that despite recent years of decline, SPSS is still extremely dominant for scholarly use.

Three of the packages whose use is growing implement the powerful and easy-to-use workflow or flowchart user interface: KNIME, RapidMiner and SPSS Modeler. As useful as that approach is, it’s not sufficient for success as we see with SAS Enterprise Miner, whose use declined nearly 15%.

It will be particularly interesting to see what the future holds for KNIME and RapidMiner. The companies were two of only four chosen by the Gartner Group as having both a complete vision of the future and the ability to execute that vision (Fig. 7a). Until recently, both were free and open source. RapidMiner then started charging for its current version, leaving its older version as the only free one. Recent offers to make it free for academic use don’t include use on projects with grant funding, so I expect KNIME’s higher rate of growth to remain faster than RapidMiner’s. However, in absolute terms, scholarly use of RapidMiner is currently almost twice that of KNIME, as shown in Fig. 2b.

To leave a comment for the author, please follow the link and comment on his blog: r4stats.com » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News