New leadership of the R Foundation

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The R Foundation, the Austria-based non-profit founded in 2003 to oversee the R Project, recently held leadership elections and added some new members. The new R Foundation Board is:

Co-Presidents: Duncan Murdoch, Martyn Plummer
Secretary General: Martin Mächler
Treasurer: Kurt Hornik
Members at large: John Chambers, Brian Ripley, Dirk Eddelbuettel

These names are familiar to anyone working with R, as all have contributed tremendously to the project. It’s impossible to single out any one contribution, but here’s my attempt. Duncan Murdoch is responsible for the Windows builds of R. Martyn Plummer contributed Bayesian and epidemiology packages. Martin Mächler maintains the Matrix package and many other numerical algorithms. Kurt Hornik maintains CRAN. John Chambers designed the object-orientation systems. Brian Ripley is a regular committer to the source code and mailing lists. Dirk Eddelbuettel maintains R on Debian.

The R Foundation has also recently added several new members to the leadership team (as “ordinary members“). Torsten Hothorn, Michael Lawrence, Uwe Ligges, Martin Morgan, Deepayan Sarkar, Marc Schwartz, Hadley Wickham and Achim Zeileis bring the total to 29. Former presidents Robert Gentleman and Ross Ihaka and former secretary Friedrich Leischremain on the board, as do auditors Peter Dalgaard and Roger Bivand.

I second Martin Mächler’s sentiment from the announcement:

Thank you all for supporting the R project, a Free Software (hence Open source) endeavor only made possible by the dedicated work of many thousands of volunteers and their contributions in several ways, including to become supporting (or ++) members of the R foundation!

R-announce mailing list: The R Foundation – New board

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Sketching Scatterplots to Demonstrate Different Correlations

By Tony Hirst

scatterCorr

(This article was first published on OUseful.Info, the blog… » Rstats, and kindly contributed to R-bloggers)

Looking just now for an openly licensed graphic showing a set of scatterplots that demonstrate different correlations between X and Y values, I couldn’t find one.

So here’s a quick R script for constructing one, based on a Cross Validated question/answer (Generate two variables with precise pre-specified correlation):

library(MASS)

corrdata=function(samples=200,r=0){
  data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2), empirical=TRUE)
  X = data[, 1]  # standard normal (mu=0, sd=1)
  Y = data[, 2]  # standard normal (mu=0, sd=1)
  data.frame(x=X,y=Y)
}

df=data.frame()
for (i in c(1,0.8,0.5,0.2,0,-0.2,-0.5,-0.8,-1)){
  tmp=corrdata(200,i)
  tmp['corr']=i
  df=rbind(df,tmp)
}

library(ggplot2)

g=ggplot(df,aes(x=x,y=y))+geom_point(size=1)
g+facet_wrap(~corr)+ stat_smooth(method='lm',se=FALSE,color='red')

And here’s an example of the result:

It’s actually a little tidier if we also add in + coord_fixed() to fix up the geometry/aspect ratio of the chart so the axes are of the same length:

So what sort of OER does that make this post?!;-)

PS methinks it would be nice to be able to use different distributions, such as a uniform distribution across x. Is there a similarly straightforward way of doing that?

To leave a comment for the author, please follow the link and comment on his blog: OUseful.Info, the blog… » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Preferring a preference index

By ibartomeus

1

(This article was first published on Bartomeus lab » Rstats, and kindly contributed to R-bloggers)

I’ve been reading about preference indexes lately, specifically for characterising pollinator preferences for plants, so here is what I learnt. Preference is defined as using an item (e.g. plant) more than expected given the item abundance.

First I like to use a quantitative framework (you can use ranks-based indices as in Williams et al 2011, which has nice properties too). The simplest quantitative index is the forage ratio:

#obs: observed frequency of visits to item 1
#exp: expected frequency of visits to item 1 (i.e. resource abundance/availability)
fr <- function(obs, exp){
    p_obs <- obs/sum(obs)
    p_exp <- exp/sum(exp)
    out <- p_obs/p_exp
    out
}

But it ranges from 0 to infinity, which makes it hard to interpret.

Second, the most used index is “Electivity” (Ivlev 1961), which is nice because is bounded between -1 and 1. “Jacobs” index is similar, but correcting for item depletion, which do not apply to my question, but see below for completness.

electicity <- function(obs, exp){
    p_obs <- obs/sum(obs)
    p_exp <- exp/sum(exp)
    out <- (p_obs-p_exp)/(p_obs+p_exp)
}

jacobs <- function(obs, exp){
    p_obs <- obs/sum(obs)
    p_exp <- exp/sum(exp)
    out <- (p_obs-p_exp)/(p_obs+p_exp-2*p_obs*p_exp)
    out
}

However, those indexes do not tell you if pollinators have general preferences over several items, and which of this items are preferred or not. To solve that we can use a simple chi test. Chi statistics can be used to assess if there is an overall preference. The Chi test Pearson residuals (obs-exp/√exp) estimate the magnitude of preference or avoidance for a given item based on deviation from expected values. Significance can be assessed by building Bonferroni confidence intervals (see Neu et al. 1974 and Beyers and Steinhorst 1984). That way you know which items are preferred or avoided.

chi_pref <- function(obs, exp, alpha = 0.05){
    chi <- chisq.test(obs, p = exp, rescale.p = TRUE)
    print(chi) #tells you if there is an overall preference. (sig = pref)
    res <- chi$residuals
    #res <- (obs-exp)/sqrt(exp) #hand calculation, same result.
    #calculate bonferoni Z-statistic for each plant.
    alpha <- alpha
    k <- length(obs)
    n <- sum(obs)
    p_obs <- obs/n
    ak <- alpha/(2*k)
    Zak <- abs(qnorm(ak))
    low_interval <- p_obs - (Zak*(sqrt(p_obs*(1-p_obs)/n)))
    upper_interval <- p_obs + (Zak*(sqrt(p_obs*(1-p_obs)/n)))
    p_exp <- exp/sum(exp)
    sig <- ifelse(p_exp >= low_interval & p_exp <= upper_interval, "ns", "sig")
    plot(c(0,k+1), c(min(low_interval),max(upper_interval)), type = "n", 
         ylab = "Preference", xlab = "items", las = 1)
    arrows(x0 = c(1:k), y0 = low_interval, x1 = c(1:k), y1 = upper_interval, code = 3
           ,angle = 90)
    points(p_exp, col = "red")
    out <- data.frame(chi_test_p = rep(chi$p.value, length(res)), 
                      chi_residuals = res, sig = sig)
    out
}

And we wrap up all indexes…

#wrap up for all indexes
preference <- function(obs, exp, alpha = 0.05){
    f <- fr(obs, exp)
    e <- electicity(obs, exp)
    #j <- jacobs(obs, exp)
    c <- chi_pref(obs, exp, alpha = alpha)
    out <- data.frame(exp = exp, obs = obs, fr = f, electicity = e, c)
    out
}

It works well for preferences among items with similar availability, and when all are used to some extent. The plot shows expected values in red, and the observed confidence interval in black. If is not overlapping the expected, indicates a significant preference.

x <- preference(obs = c(25, 22,30,40), exp = c(40,12,12,53))
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = 44.15, df = 3, p-value = 1.404e-09
##   exp obs     fr electicity chi_test_p chi_residuals sig
## 1  40  25 0.6250    -0.2308  1.404e-09        -2.372 sig
## 2  12  22 1.8333     0.2941  1.404e-09         2.887  ns
## 3  12  30 2.5000     0.4286  1.404e-09         5.196 sig
## 4  53  40 0.7547    -0.1398  1.404e-09        -1.786 sig

We can see that indices are correlated by simulating lots of random values.

x <- preference(obs = sample(c(0:100),100), exp = sample(c(0:100),100))
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = Inf, df = 99, p-value < 2.2e-16
pairs(x[,c(-5,-7)]) #more or less same patern, across indexes.

But the indexes do not behave well in two situations. First, when an item is not used, it is significanly un-preferred regardless of its abundance. I don’t like that, because a very rare plant is expected to get 0 visits from a biological point of view. This is an issue when building bonferroni intervals around 0, which are by definition 0, and any value different from 0 appears as significant.

x <- preference(obs = c(0,25,200), exp = c(0.00003,50,100))
3
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = 50, df = 2, p-value = 1.389e-11
##     exp obs     fr electicity chi_test_p chi_residuals sig
## 1 3e-05   0 0.0000    -1.0000  1.389e-11     -0.006708 sig
## 2 5e+01  25 0.3333    -0.5000  1.389e-11     -5.773502 sig
## 3 1e+02 200 1.3333     0.1429  1.389e-11      4.082486 sig

The second issue is that if you have noise due to sampling (as always), rare items are more likely to be wrongly assessed than common items.

#assume no preferences
x <- preference(obs = c(5,50,100), exp = c(5,50,100))
4
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = 0, df = 2, p-value = 1
##   exp obs fr electicity chi_test_p chi_residuals sig
## 1   5   5  1          0          1             0  ns
## 2  50  50  1          0          1             0  ns
## 3 100 100  1          0          1             0  ns
#now assume some sampling error of +-4 observations
x <- preference(obs = c(1,54,96), exp = c(5,50,100))
5
##  Chi-squared test for given probabilities
## 
## data:  obs
## X-squared = 3.671, df = 2, p-value = 0.1595
##   exp obs     fr electicity chi_test_p chi_residuals sig
## 1   5   1 0.2053  -0.659341     0.1595       -1.7539 sig
## 2  50  54 1.1086   0.051508     0.1595        0.7580  ns
## 3 100  96 0.9854  -0.007338     0.1595       -0.1438  ns

As a conclusion, If you have all items represented (not highly uneven distributions) or all used to some extent (not 0 visits) this is a great tool.

Happy to know better options in the comments, if available.

Gist with the code of this post here.

To leave a comment for the author, please follow the link and comment on his blog: Bartomeus lab » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Joint Models with Categorical Longitudinal Outcomes

By Dimitris Rizopoulos

(This article was first published on iProgn: Interactive Prediction Tools based on Joint Models, and kindly contributed to R-bloggers)

Extending the Standard Joint Model

In a previous post we have briefly introduced the framework of joint models for longitudinal and time-to-event data, and we have shown how models of this type can be fitted in R using package JMbayes. In that post we have presented what can be called the ‘standard’ joint model, in which we have one continuous longitudinal outcome and one survival endpoint. Starting from this standard joint model different extensions have been proposed in the literature to accommodate different types of longitudinal and survival data. The aim of this post is to present one of these extensions, namely joint models with categorical longitudinal responses, and how such models can be fitted using JMbayes.

Generalized Linear Mixed Effects Models

Before discussing joint models with categorical longitudinal outcomes we need first to introduce the framework of Generalized Linear Mixed Models (GLMMs). GLMMs are an extension of linear mixed models to allow response variables from different distributions, such as binary responses. Alternatively, you could think of GLMMs as an extension of generalized linear models (e.g., logistic regression) to include both fixed and random effects (hence mixed models). Contrary to linear mixed models, fitting GLMMs under maximum likelihood is computationally a much more challenging task. This is due to the fact that in the definition of the observed data log-likelihood there is an integral over the (normally distributed) random effects, which does not have, in general, a closed-form solution (for linear mixed models this integral does have a closed-form solution which facilitates computations). Under the Bayesian approach there is essentially no difference between linear mixed models and GLMMs because in both the random effects are considered parameters for which we wish to derive their posterior distribution. In R there are several packages that can fit GLMMs under either maximum likelihood or the Bayesian approach, such as lme4 and MCMCglmm, among others. However, in order to fit joint models with categorical longitudinal response using package JMbayes the user needs first to fit the GLMMs using function glmmPQL() from package MASS. Even though the PQL algorithm has been shown not to work that optimally, especially for dichotomous longitudinal outcomes, the purpose of fitting the GLMMs with glmmPQL() is to extract from the resulting model object the response vector and the design matrices for the fixed and random effects, and initial values for the parameters. The following code illustrates the use of glmmPQL() for fitting a mixed effects logistic regression using the Primary Biliary Cirrhosis (PBC) data set. Since in the PBC data there was no binary biomarker recorded during follow-up, we artificially create one by dichotomizing serum bilirubin at the threshold value of 1.8 mg/dL.

The syntax of glmmPQL() is similar to the one of function lme() from package nlme (the PQL algorithm actually requires fitting linear mixed effects model for a transformed response variable).

A Joint Model with a Binary Longitudinal Biomarker

To fit a joint model with a binary longitudinal outcome we need to appropriately define argument densLong of the jointModelBayes() function. This argument accepts a function that calculates the probability density function (and its natural logarithm) of the longitudinal outcome. The arguments of this function are y the vector of longitudinal responses, eta.y the subject-specific linear predictor (i.e., including both the fixed and random effects), scale a potential scale parameter (e.g., the measurement error standard deviation in the case of linear mixed models), log a logical argument denoting whether the pdf or the logarithm of the pdf is returned by the function, and data a data.frame that contains variables potentially used in the definition of densLong. For our example, the function we need to define is:



Following we fit a Cox model that represents the survival submodel of the joint model and we put all the above as arguments to jointModelBayes(), i.e.,


In the output the association parameter Assoct denotes the log hazard ratio, at any time point t, for a unit increase in the log odds of having serum bilirubin above the threshold value of 1.8 mg/dL, at the same time point.

To leave a comment for the author, please follow the link and comment on his blog: iProgn: Interactive Prediction Tools based on Joint Models.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

New R package for electricity forecasting

By Rob J Hyndman

(This article was first published on Hyndsight » R, and kindly contributed to R-bloggers)

Shu Fan and I have developed a model for electricity demand forecasting that is now widely used in Australia for long-term forecasting of peak electricity demand. It has become known as the “Monash Electricity Forecasting Model”. We have decided to release an R package that implements our model so that other people can easily use it. The package is called “MEFM” and is available on github. We will probably also put in on CRAN eventually.

The model was first described in Hyndman and Fan (2010). We are continually improving it, and the latest version is decribed in the model documentation which will be updated from time to time.

The package is being released under a GPL licence, so anyone can use it. All we ask is that our work is properly cited.

Naturally, we are not able to provide free technical support, although we welcome bug reports. We are available to undertake paid consulting work in electricity forecasting.

To leave a comment for the author, please follow the link and comment on his blog: Hyndsight » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

More Snowdoop Coming

By matloff

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

In spite of the banter between Yihui and me, I’m glad to hear that he may be interested in Snowdoop, as are some others. I’m quite busy this week (finishing writing my Parallel Computation for Data Science book, and still have a lot of Fall Quarter grading to do 🙂 ), but you’ll definitely be hearing more from me on Snowdoop and partools, including the following:

  • Vignettes for Snowdoop and the debugging tool.
  • Code snippets for splitting and coalescing files, including dealing with header records.
  • Code snippet implementing a distributed version of subset().

And yes, I’ll likely break down and put it on Github. 🙂 [I’m not old-fashioned, just nuisance-averse. 🙂 ] Watch this space for news, next installment maybe 3-4 days from now.

To leave a comment for the author, please follow the link and comment on his blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Introducing V8: An Embedded JavaScript Engine for R

By Jeroen Ooms

opencpu logo

(This article was first published on OpenCPU, and kindly contributed to R-bloggers)

JavaScript is an fantastic language for building applications. It runs on browsers, servers and databases, making it possible to design an entire web stack in a single language.

The OpenCPU JavaScript client already allows for calling R functions from JavaScript (see jsfiddles and apps). With the new V8 package we can now do the reverse as well: run JavaScript inside R!

The V8 Engine

V8 is Google’s open source, high performance JavaScript engine. It is written in C++ and implements ECMAScript as specified in ECMA-262, 5th edition. The V8 R package builds on C++ library to provide a completely standalone JavaScript engine within R:

library(V8)

# Create a new context
ct <- new_context();

# Evaluate some code
ct$eval("foo=123")
ct$eval("bar=456")
ct$eval("foo+bar")
# [1] "579"

However note that V8 by itself is just the naked JavaScript engine. Currently, there is no DOM, no network or disk IO, not even an event loop. Which is fine because we already have all of those in R. In this sense V8 resembles other foreign language interfaces such as Rcpp or rJava, but then for JavaScript.

A major advantage over the other foreign language interfaces is that V8 requires no compilers, external executables or other run-time dependencies to execute JavaScript. The entire engine is contained within a 6MB R package (2MB when zipped) and works on all major platforms.

ct$eval("JSON.stringify({x:Math.random()})")
# [1] "{"x":0.08649904327467084}"
ct$eval("(function(x){return x+1;})(123)")
# [1] "124"

Sounds promising? There is more!

V8 + jsonlite = awesome

The native data structure in JavaScript is basically JSON, hence we can use jsonlite for seamless data interchange between V8 and R:

ct$assign("mydata", mtcars)
out <- ct$get("mydata")
all.equal(out, mtcars)
# TRUE

Because jsonlite stores data in its natural structure, we can plug it staight into existing JavaScript libraries:

# Use a JavaScript library
ct$source("http://underscorejs.org/underscore-min.js")
ct$call("_.filter", mtcars, I("function(x){return x.mpg < 15}"))
#                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
# Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
# Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
# Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
# Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4

JavaScript Libraries

JavaScript libraries specifically written for the Browser (such as Jquery or D3) or libraries for Node that depend on disk/network functionality might not work in plain V8, but many of them actually do.

For example, crossfilter is a high performance data filtering library that I have used for creating D3 data dashboards in the browser, but crossfilter itself is just vanilla JavaScript:

ct$source("cdnjs.cloudflare.com/ajax/libs/crossfilter/1.3.11/crossfilter.min.js")

I’ll continue here in the next blog post later this week. Have a look at the (very short) package manual in the mean time.

To leave a comment for the author, please follow the link and comment on his blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Yikes…It’s Been Awile

By Andy

(This article was first published on Citizen-Statistician » R Project, and kindly contributed to R-bloggers)

Apparently our last blog post was in August. Dang. Where did five months go? Blog guilt would be killing me, but I swear it was just yesterday that Mine posted.

I will give a bit of review of some of the books that I read this semester related to statistics. Most recently, I finished Hands-On Matrix Algebra Using R: Active and Motivated Learning with Applications. This was a fairly readable book for those looking to understand a bit of matrix algebra. The emphasis is definitely in economics, but their are some statistics examples as well. I am not as sure where the “motivated learning” part comes in, but the examples are practical and the writing is pretty coherent.

The two books that I read that I am most excited about are Model Based Inference in the Life Sciences: A Primer on Evidence and The Psychology of Computer Programming. The latter, written in the 70’s, explored psychological aspects of computer programming, especially in industry, and on increasing productivity. Weinberg (the author) stated his purpose in the book was to study “computer programming as a human activity.” This was compelling on many levels to me, not the least of which is to better understand how students learn statistics when using software such as R.

Reading this book, along with participating in a student-led computing club in our department has sparked some interest to begin reading the literature related to these ideas this spring semester (feel free to join us…maybe we will document our conversations as we go). I am very interested in how instructor’s choose software to teach with (see concerns raised about using R in Harwell (2014). Not so fast my friend: The rush to R and the need for rigorous evaluation of data analysis and software in education. Education Research Quarterly.) I have also thought long and hard about not only what influences the choice of software to use in teaching (I do use R), but also about subsequent choices related to that decision (e.g., if R is adopted, which R packages will be introduced to students). All of these choices probably have some impact on student learning and also on students’ future practice (what you learn in graduate school is what you ultimately end up doing).

The Model Based Inference book was a shorter, readable version of Burnham and Anderson’s (2003) Springer volume on multimodel inference and information theory. I was introduced to these ideas when I taught out of Jeff Long’s, Longitudinal Data Analysis for the Behavioral Sciences Using R. They remained with me for several years and after reading Anderson’s book, I am going to teach some of these ideas in our advanced methods course this spring.

Anyway…just some short thoughts to leave you with. Happy Holidays.

To leave a comment for the author, please follow the link and comment on his blog: Citizen-Statistician » R Project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

10 new R jobs (for December 16th 2014) – from Intel to University of Iowa and more

By Tal Galili

r_jobs

This is the bimonthly R Jobs post (for 2014-12-16), based on the R-bloggers’ sister website: R-users.com.

If you are an employer who is looking to hire people from the R community, please visit this link to post a new R job (it’s free, and registration takes less than 10 seconds).

If you are a job seekers, please follow the links below to learn more and apply for your job of interest (or visit previous R jobs posts).

  1. Full-Time
    R CUDA developer
    St Jude Children’s Research Hospital – Posted by cowbell
    Memphis
    Tennessee, United States
    16 Dec2014
  2. Part-Time
    Student data scientist for the New Devices Group (NDG) at Intel
    Intel – Posted by omrimendels
    Yakum
    Center District, Israel
    15 Dec2014
  3. Full-Time
    Business Intelligence Senior Analyst
    Compassion International – Posted by andrewz
    Colorado Springs
    Colorado, United States
    15 Dec2014
  4. Full-Time
    Postdoctoral fellow: genome informatics in psychiatry at the University of Iowa
    University of Iowa – Posted by jmichaelson
    Iowa City
    Iowa, United States
    15 Dec2014
  5. Full-Time
    Statistical Programmer for work in life sciences
    Quantics Consulting Ltd – Posted by Quantics
    Edinburgh
    Scotland, United Kingdom
    15 Dec2014
  6. Full-Time
    Statistician (Graduate) for statistical consultancy business ןמ life sciences
    Quantics Consulting Ltd – Posted by Quantics
    Edinburgh
    Scotland, United Kingdom
    15 Dec2014
  7. Full-Time
    Business Intelligence Analyst
    Alvanon – Posted by Alvanon
    New York
    New York, United States
    10 Dec2014
  8. Full-Time
    NSF Postdoctoral Chemometrics Fellow
    eLutions Integrated Systems, Inc. – Posted by eLutions
    San Francisco
    California, United States
    5 Dec2014
  9. Freelance
    R Senior Developer for a Bay area startup
    GRANDSLA – Posted by stefano.spada
    Anywhere
    4 Dec2014
  10. Full-Time
    Clinical Data R Wrangler for clinical studies data on Type 1 Diabetes
    University of Southern California, Center for Applied Molecular Medicine – Posted by Naim Matasci
    Los Angeles
    California, United States
    4 Dec2014

Source:: R News

The Mandelbrot Set in R

By Myles Harrison

(This article was first published on everyday analytics, and kindly contributed to R-bloggers)

Introduction

I was having a conversation with someone I know about weather forecasts the other day and it went something like this:
“Yes, their forecasts are really not very good. They really need to work on improving them.”
“Well, yes, I think they’re okay, considering what they’re trying to do is impossible.”
The thought that a relatively simple set of equations can give rise to infinitely complex behavior is something which has intrigue and appeal beyond those mathematically minded and academic. Fractals became a part of pop culture, and the idea of chaos is not unknown to those outside mathematical research, as it was popularized by the term “
such where c is a set of complex numbers filling the complex plane.
3. The Mandelbrot set is the set where z remains bounded for all n.
And mathematically it can be shown that if under iteration z is greater than 2 than c is not in the set.
In practice one keeps track of the number of iterations it takes z to diverge for a given c, and then colours the points accordingly to their “speed of divergence”.

Execution

In R this is straightforward code. Pick some parameters about the space (range and resolution for x and y, number of iterations, etc.) and then iterate. Naively, the code could be executed as below:
However, if you know anything about writing good code in R, you know that for loops are bad, and it’s better to rely upon the highly optimized underpinnings of the language by using array-based functions like which and lapply. So a more efficient version of the same code is below:

We can then plunk these into functions where the different criteria for the rendered fractal are parameters:
Let’s compare the runtime between the two shall we? For the base settings, how does the naive version compare to the vectorized one?
> compare_runtimes()
user system elapsed
37.713 0.064 37.773
user system elapsed
0.482 0.094 0.576
>
The results speak for themselves: a ~65x decrease in runtime by using R indexing functions instead of for loops! This definitely speaks to the importance of writing optimized code taking advantage of R’s vectorized functions.
You can tweak the different parameters for the function to return different parts of the complex space. Below are some pretty example plots of what the script can do with different parameters:

Conclusion

What does all this have to do with my conversation about the weather? Why, everything, of course! It’s where chaos theory came from. Enjoy the pretty pictures. It was fun getting R to churn out some nice fractals and good to take a trip down memory lane.

References and Resources

The Mandelbot Set:
code on github:

To leave a comment for the author, please follow the link and comment on his blog: everyday analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News