RcppArmadillo 0.6.200.2.0

By Thinking inside the box

armadillo image

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

Yet another monthly upstream Armadillo update gets us the first changes to the new the 6.* series. This was preceded by two uploads of test released to GitHub-only. These two were tested both against all reverse-dependencies as usual. A matching upload to Debian will follow shortly.

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab.

This release is fairly straightforward with few changes:

Changes in RcppArmadillo version 0.6.200.2.0 (2015-10-31)

  • Upgraded to Armadillo 6.200.0 (“Midnight Blue Deluxe”)

    • expanded diagmat() to handle non-square matrices and arbitrary diagonals

    • expanded trace() to handle non-square matrices

Courtesy of CRANberries, there is also a diffstat report for the most recent CRAN release. As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Call for Help: Front-End Javascript/.NET Web App Developer

By Isaac Petersen

(This article was first published on Fantasy Football Analytics » R | Fantasy Football Analytics, and kindly contributed to R-bloggers)

Dear Fantasy Football Analytics Community,

Two years ago, we released web apps to help people make better decisions in fantasy football based on the wisdom of the crowd. Over the past two years, the community response has been incredibly supportive, and we continually improved the apps in response to user feedback. The community also contributed directly to the project, with a number of users making additions and edits to our public source R scripts on our GitHub repo. In sum, we provide free web apps built by the people, for the people.

This brings me to our call for help. Our lead front-end web developer can no longer commit time to the project, and we are looking for a replacement. We are looking for a front-end web developer to help develop our web apps (apps.fantasyfootballanalytics.net) using R/OpenCPU on an Azure server running an ASP .NET MVC web application. The apps also use HTML/CSS/javascript (Bootstrap, RequireJS, KnockoutJS). It would also be preferable for the developer to have some knowledge of American Football and fantasy football.

Crucial skills:

  • Javascript
  • .NET

Nice-to-have skills:

  • Knockout
  • Bootstrap
  • Azure
  • Linux
  • Design/UI

Bonus skills:

  • OpenCPU
  • R

If interested, please email the following to Isaac (isaac AT fantasyfootballanalytics DOT net):

  1. resume/CV
  2. brief description of relevant skills
  3. how much time you expect to be able to contribute

Sincerely,
Isaac Petersen

The post Call for Help: Front-End Javascript/.NET Web App Developer appeared first on Fantasy Football Analytics.

To leave a comment for the author, please follow the link and comment on their blog: Fantasy Football Analytics » R | Fantasy Football Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Don’t use stats::aggregate()

By John Mount

(This article was first published on Win-Vector Blog » R, and kindly contributed to R-bloggers)

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: do not use stats::aggregate().

Read on for our example.

For our example we create a data frame. The issue is: I am working in the Pacific time zone on Saturday October 31st 2015, and I have some time data that I want to work with that is in an Asian time zone.

print(date())
## [1] "Sat Oct 31 08:14:38 2015"
d <- data.frame(group='x',
 time=as.POSIXct(strptime('2006/10/01 09:00:00',
   format='%Y/%m/%d %H:%M:%S',
   tz="Etc/GMT+8"),tz="Etc/GMT+8"))  # I'd like to say UTC+8 or CST
print(d)
##   group                time
## 1     x 2006-10-01 09:00:00
print(d$time)
## [1] "2006-10-01 09:00:00 GMT+8"
str(d$time)
##  POSIXct[1:1], format: "2006-10-01 09:00:00"
print(unclass(d$time))
## [1] 1159722000
## attr(,"tzone")
## [1] "Etc/GMT+8"

Suppose I try to aggregate the data to find the earliest time for each group. I have a problem, aggregate loses the timezone and gives a bad answer.

d2 <- aggregate(time~group,data=d,FUN=min)
print(d2)
##   group                time
## 1     x 2006-10-01 10:00:00
print(d2$time)
## [1] "2006-10-01 10:00:00 PDT"

This is bad. Our time has lost its time zone and changed from 09:00:00 to 10:00:00. This violates John M. Chambers’ “Prime Directive” that:

computations can be understood and trusted.

Software for Data Analysis, John M. Chambers, Springer 2008, page 3.

The issue is the POSIXct time time is essentially a numeric array carrying around its timezone as an attribute. Most base R code has problems if there are extra attributes on a numeric array. So R-stat code tends to have a habit of dropping attributes when it can. it is odd that the class() is kept (which itself an attribute style structure) and the timezone is lost, but R is full of hand-specified corner cases.

dplyr gets the right answer.

library('dplyr')
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
by_group = group_by(d,group)
d3 <- summarize(by_group,min(time))
print(d3)
## Source: local data frame [1 x 2]
## 
##   group           min(time)
## 1     x 2006-10-01 09:00:00
print(d3[[2]])
## [1] "2006-10-01 09:00:00 GMT+8"

And plyr also works.

library('plyr')
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## 
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
d4 <- ddply(d,.(group),summarize,time=min(time))
print(d4)
##   group                time
## 1     x 2006-10-01 09:00:00
print(d4$time)
## [1] "2006-10-01 09:00:00 GMT+8"

To leave a comment for the author, please follow the link and comment on their blog: Win-Vector Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Why bother with magrittr

By civilstat

(This article was first published on Civil Statistician » R, and kindly contributed to R-bloggers)

I’ve seen R users swooning over the magrittr package for a while now, but I couldn’t make heads or tails of all these scary %>% symbols. Finally I had time for a closer look, and it seems potentially handy indeed. Here’s the idea and a simple toy example.

So, it can be confusing and messy to write (and read) functions from the inside out. This is especially true when functions take multiple arguments. Instead, magrittr lets you write (and read) functions from left to right.

Say you need to compute the LogSumExp function , and you’d like your code to specify the logarithm base explicitly.

In base R, you might write
log(sum(exp(MyData)), exp(1))
But this is a bit of a mess to read. It takes a lot of parentheses-matching to see that the exp(1) is an argument to log and not to one of the other functions.

Instead, with magrittr, you program from left to right:
MyData %>% exp %>% sum %>% log(exp(1))
The pipe operator %>% takes output from the left and uses it as the first argument of input on the right. Now it’s very clear that the exp(1) is an argument to log.

There’s a lot more you can do with magrittr, but code with fewer nested parentheses is already a good selling point for me.

Apart from cleaning up your nested functions, this approach to programming might be helpful if you write a lot of JavaScript code, for example if you make D3.js visualizations. R’s magrittr pipe is similar in spirit to JavaScript’s method chaining, so it might make context-switching a little easier.

To leave a comment for the author, please follow the link and comment on their blog: Civil Statistician » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

The Traveling Vampire Problem

By Francis Smart

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)
Let’s say you are a vampire and you would like to figure out the shortest route to visit the supple

necks of N maidens. But, there is only so much time in any night!

You can fly from location to location, ignoring barriers.

With a few maidens, the problem is trivial.

However, as you entice more and more maidens you find the task of route management increasingly complex.

You buy a computer but find that using a blanket search algorithm to check all possible routes quickly becomes very time consuming as each additional maiden is added to the optimization.

The problem you realize is that each additional maiden increases the number of routes significantly. This is because there is a number of routes is equal to the permutation of N select N = N!.

Four maidens, an easy problem.

So the number of routes:
1 maiden: 1=1
2 maidens: 1*2=2 (for example 1,2 or 2,1)
3 maidens: 1*2*3=6 (for example 1,2,3 or 1,3,2 or 2,1,3 or 2,3,1 or 3,2,1 or 3,1,2)
4 maidens: 1*2*3*4=24
5 maidens: 1*2*3*4*5=120
6 maidens: 1*2*3*4*5*6=720
7 maidens: 1*2*3*4*5*6*7=5,040
8 maidens: 1*2*3*4*5*6*7*8=40,320
9 maidens: 1*2*3*4*5*6*7*8*9=362,880
10 maidens: 1*2*3*4*5*6*7*8*9*10=3,628,800

As you start getting more and more maidens your algorithm to select the best route becomes extremely slow. You realize that using R your are going to face a practical limitation of spending as much time running the optimization as you will actually sucking necks. You know of Julia (which can run up to 500x faster than R) but you quickly realize that this is just postponing the problem. Even if you were running 500 times faster. Running the same algorithm on Julia is going to be four times faster after two more maidens (11*12/500=.26) but three times slower after 3 more maidens (11*12*13/500=3.4).

Seven Maidens. Getting a bit more tricky.

You consider hibernating for a hundred years to see if computational speed increases will simplify the problem but also realize that if you keep approaching the problem using a binary computer with the same strategies as previously, you will always face similar computational limits. Eventually, and even very far into the future you will run out of computer speed long before you run out of maidens.

Being a clever vamp, you decide to start looking into alternative strategies to solving this kind of problem. But that is for another day.

——————————–

For what it is worth, I wrote a traveling vamp optimizer allowing for an arbitrary number dimensions to be specified. The most complex problem it solved was a 10 maiden problem and took a little over an hour.

Two solutions for a 10 maiden problem. Top is shortest route while bottom is longest.

Find the code here.

To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Demo: R in SQL Server 2016

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

At the PASS Summit in Seattle this week, Microsoft’s Jason Wilcox and Gopi Kumar demonstrated a SQL Server 2016 application that embeds R to predict what time you need to leave to catch a flight, given traffic, check-in time, and the likelihood of a flight leaving early or being delayed.

The underlying preditive model was created with the RevoScaleR package in Microsoft R Services running in SQL Server 2016. And the prediction and the histogram you saw in the app was generated using R called in real time directly from SQL.

If you’d like to try R in SQL Server 2016 yourself, Henk Vandervalk provides a step-by-step guide to installing the components. And for more information about how the various components connect with each other, check out this article in Infoworld.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Curious about big data in Montreal?

By Murtaza Haider

(This article was first published on eKonometrics, and kindly contributed to R-bloggers)

Are you in Montreal and curious about big data? Well here is your chance to attend a session about the same at Concordia University on Tuesday, Nov. 03 at 6:00 pm.

www.BigDataUniversity.com, which is an IBM-led initiative is running meetups across North America to create awareness about, and training in, big data analytics.

BigDataUniversity runs MOOCs and through its online data scientist workbench provides access to python, R, and even Spark. Also, you can learn about Watson Analytics and see how you can work with the state-of-the-art in computing.

Further details are available at:

Getting started with Data Science and Introduction to Watson Analytics

http://www.meetup.com/YUL-Social-Mobile-Analytics-Cloud-Meetup/

When: Tuesday, November 3rd at 6-9 PM

Where: H1269, 12th floor of the Hall Bldg
(1455, blvd. De Maisonneuve ouest – Metro Guy-Concordia)

To leave a comment for the author, please follow the link and comment on their blog: eKonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Machine Learning with R

By Cory Lesmeister

(This article was first published on Fear and Loathing in Data Science, and kindly contributed to R-bloggers)

My effort on machine learning with R is now available.

To leave a comment for the author, please follow the link and comment on their blog: Fear and Loathing in Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Next Meetup of the Kasseler useR Group on November 11, 2015: Data analysis with R

By eoda GmbH

Datenanalyse mit R ist das Thema im nächsten Meetup der Kasseler useR Group

(This article was first published on eoda, R und Datenanalyse » eoda english R news, and kindly contributed to R-bloggers)

The next meeting at 6.30pm on November 11 will revolve around the subject “data analysis with R”. Experienced R users will present the topics “cluster analysis” and “Hidden Markov models”. It will take place at the Science Park Kassel.

The lecture on cluster analysis by Andreas Wygrabek will deal with different algorithms and the procedure of classification. Jens Bruno Wittek will present the implementation of Hidden Markov models in R and show practical examples of application.

The useR Group is looking forward to many participants and additional lectures on the topic “data analysis with R”.

Please sign up here if you would like to join: http://www.meetup.com/Kassel-useR-Group/

Review of the last meetup

Motivated by two interesting lectures, the R users exchanged their experiences of the use of R at the meetup in October.

On the occasion of the 7000th R package, Martin Schneider looked at the development of R and the status quo of the language. For the first time in the history of the Kasseler useR Group there was an English lecture by Dr. Paul Marrow about his work with R and advantages of the professional use of the language. The slides of the two presentations are available on the meetup website of the Kasseler useR Group (http://www.meetup.com/de/Kassel-useR-Group/files/).

To leave a comment for the author, please follow the link and comment on their blog: eoda, R und Datenanalyse » eoda english R news.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

littler 0.3.0 — on CRAN !!

By Thinking inside the box

max-heap image

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

A new major release of littler is now available. And for the first time in the nine years since 2006 when Jeff started the effort (which I joined not long after) we are now a CRAN package.

This required a rewrite of the build system, foregoing the calls to aclocal, autoheader, automake and leaving just a simpler autoconf layer creating a configure script and a simple src/Makevars.in. Not only does R provide such a robust and well-understood build system (which I got to understand reasonably well given all my R packages, but being on CRAN and leveraging its mechanism for installation and upgrades is clearly worth the change.

There may be a moment or two of transition. While we can create a binary in an R package, we cannot (generally) copy to /usr/bin or /usr/local/bin as part of the build process (for lack of write-rights to those directories). So if you do not have r in the $PATH and load the package, it makes a suggestion (which needs a linebreak which I added here):

R> library(littler)
The littler package provides 'r' as a binary.
You could link to the 'r' binary installed in
'/usr/local/lib/R/site-library/littler/bin/r' from
'/usr/local/bin' in order to use 'r' for scripting.
R> 

Similarly, you could copy (or softlink) r to ~/bin if that is in your $PATH.

The Debian (and Ubuntu) packages will continue to provide /usr/bin/r as before. Note thah these packages will now be called r-cran-littler to match all other CRAN packages.

The NEWS file entry is below.

Changes in littler version 0.3.0 (2015-10-29)

  • Changes in build system

    • First CRAN Release as R package following nine years of source releases

    • Script configure, src/Makevars.in and remainder of build system rewritten to take advantage of the R package build infrastructure

    • Reproducible builds are better supported as the (changing) compilation timestamps etc are only inserted for ‘verbose builds’ directly off the git repo, but not for Debian (or CRAN) builds off the release tarballs

  • Changes in littler functionality

    • Also source $R_HOME/etc/Rprofile.site and ~/.Rprofile if present

  • Changes in littler documentation

    • Added new vignette with examples

Full details for the littler release are provided as usual at the ChangeLog page.

The code is available via the GitHub repo, from tarballs off my littler page and the local directory here — and now of course all from its CRAN page and via install.packages("littler"). A fresh package has gone to the incoming queue at Debian where it will a few days as the binary packages was renamed from littler to r-cran-littler matching all other CRAN packages. Michael Rutter will probably have new Ubuntu binaries at CRAN once the source package gets into Debian proper.

Comments and suggestions are welcome at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News