Bio7 2.0 for Linux Released!

By » R

screenlinux

(This article was first published on » R, and kindly contributed to R-bloggers)

01.02.2015

I released the Linux version of Bio7 2.0 (see screenshot below).

For an overview of the new Bio7 features follow the link: Bio7 2.0 new features

Added Linux features:

  • A new Java menu action (in the Scripts menu) is available to start a Py4J server which can communicate to Java and CPython (if enabled in the native preferences of Bio7 and Py4J is installed in the local Python distribution). With this feature it is, e.g., possible to transfer ImageJ values to CPython or call the Bio7 Java API.
  • Improved the embedded native pseudo terminal to send signals to a process tree (e.g., SIGINT = STRG+C).
  • Added an option in the native preferences to interpret a Python script with a a native CPython interpreter >=3.0.

Installation:

Download Bio7 and simply unzip the Bio7 archive file in your preferred location. Bio7 comes bundled with a JRE so you don’t need to install Java separately. The Bio7 application was tested on Ubuntu 14.04 (minimum requirement).

R features:

For the Linux version of Bio7 2.0 R and Rserve have to be installed. For the use in Bio7 Rserve has to be compiled with a special flag to enable the cooperative mode (see below).

In a shell simply execute:

sudo PKG_CPPFLAGS=-DCOOPERATIVE R CMD INSTALL Rserve_1.8-1.tar.gz

The command will compile and install the Rserve package in your default Linux R application. This is necessary to share the R workspace when you switch from a local Rserve connection to the native Bio7 R console (see this video for an explanation of the new connection mode in Bio7).

Please note that the default R package location of Bio7 is: /usr/lib/R/site-library

The location (and the path to R) can be changed in the R preferences of Bio7 (e.g., menu: R->Preferences).

To leave a comment for the author, please follow the link and comment on his blog: » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Rediscovering Formula One Race Battlemaps

By Tony Hirst

battlemaps-unnamed-chunk-12-1

(This article was first published on OUseful.Info, the blog… » Rstats, and kindly contributed to R-bloggers)

A couple of days ago, I posted a recipe on the F1DataJunkie blog that described how to calculate track position from laptime data.

Using that information, as well as additional derived columns such as the identity of, and time to, the cars immediately ahead of and behind a particular selected driver, both in terms of track position and race position, I revisited a chart type I first started exploring several years ago – race battle charts.

The main idea behind the battlemaps is that they can help us search for stories amidst the runners.

[sourceoce language=’R’]dirattr=function(attr,dir=’ahead’) paste(attr,dir,sep=”)

#We shall find it convenenient later on to split out the initial data selection
battlemap_df_driverCode=function(driverCode){
lapTimes[lapTimes[‘code’]==driverCode,]
}

battlemap_core_chart=function(df,g,dir=’ahead’){
car_X=dirattr(‘car_’,dir)
code_X=dirattr(‘code_’,dir)
factor_X=paste(‘factor(position_’,dir,'<position)',sep='')
code_race_X=dirattr(‘code_race’,dir)
if (dir==’ahead’) diff_X=’diff’ else diff_X=’chasediff’

if (dir==”ahead”) drs=1000 else drs=-1000
g=g+geom_hline(aes_string(yintercept=drs),linetype=5,col=’grey’)

#Plot the offlap cars that aren’t directly being raced
g=g+geom_text(data=df[df[dirattr(‘code_’,dir)]!=df[dirattr(‘code_race’,dir)],],
aes_string(x=’lap’,
y=car_X,
label=code_X,
col=factor_X),
angle=45,size=2)
#Plot the cars being raced directly
g=g+geom_text(data=df,
aes_string(x=’lap’,
y=diff_X,
label=code_race_X),
angle=45,size=2)
g=g+scale_color_discrete(labels=c(“Behind”,”Ahead”))
g+guides(col=guide_legend(title=”Intervening car”))

}

battle_WEB=battlemap_df_driverCode(“WEB”)
g=battlemap_core_chart(battle_WEB,ggplot(),’ahead’)
battlemap_core_chart(battle_WEB,g,dir=’behind’)
[/sourcecode]

In this first sketch, from the 2012 Australian Grand Prix, I show the battlemap for Mark Webber:

We see how at the start of the race Webber kept pace with Alonso, albeit around about a second behind, at the same time as he drew away from Massa. In the last third of the race, he was closely battling with Hamilton whilst drawing away from Alonso. Coloured labels are used to highlight cars on a different lap (either ahead (aqua) or behind (orange)) that are in a track position between the selected driver and the car one place ahead or behind in terms of race position (the black labels). The y-axis is the time delta in milliseconds between the selected car and cars ahead (y > 0) or behind (y < 0). A dashed line at the +/- one second mark identifies cars within DRS range.

As well as charting the battles in the vicinity of a particular driver, we can also chart the battle in the context of a particular race position. We can reuse the chart elements and simply need to redefine the filtered dataset we are charting.

For example, if we filter the dataset to just get the data for the car in third position at the end of each lap, we can then generate a battle map of this data.
battlemap_df_position=function(position){
lapTimes[lapTimes[‘position’]==position,]
}

battleForThird=battlemap_df_position(3)

g=battlemap_core_chart(battleForThird,ggplot(),dir=’behind’)+xlab(NULL)+theme_bw()
g=battlemap_core_chart(battleForThird,g,’ahead’)+guides(col=FALSE)
g

For more details, see the original version of the battlemap chapter. For updates to the chapter, I recommend that you invest in a copy Wrangling F1 Data With R book if you haven’t already done so:-)

To leave a comment for the author, please follow the link and comment on his blog: OUseful.Info, the blog… » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Carl Boettiger on accessing online data with ROpenSci

By Noam Ross

(This article was first published on Noam Ross – R, and kindly contributed to R-bloggers)

*Today, Carl Boettiger gave a tutorial on the ROpenSci project and how to use their many packages to connect to online data repositories to retrieve and up deposit data. Here’s our screencast of the talk.

All the code from this talk is available at this github repository, and you can download it as a *.zip file here. Thanks to Carl for for a great session and the ROpenSci team for their work!

To leave a comment for the author, please follow the link and comment on his blog: Noam Ross – R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Comparing Flexible and Elastic Asset Allocation

By Ilya Kipnis

(This article was first published on QuantStrat TradeR » R, and kindly contributed to R-bloggers)

So recently, I tried to combine Flexible and Elastic Asset Allocation. The operative word being–tried. Essentially, I saw Flexible Asset Allocation as an incomplete algorithm — namely that although it was an excellent method for selecting securities, that there had to have been a better way to weigh stocks than a naive equal-weight scheme.

It turns out, the methods outlined in Elastic Asset Allocation weren’t doing the trick (that is, a four month cumulative return raised to the return weight multiplied by the correlation to a daily-rebalanced equal-weight index of the selected securities with cumulative return greater than zero). Either I managed a marginally higher return at the cost of much higher volatility and protracted drawdown, or I maintained my Sharpe ratio at the cost of much lower returns. Thus, I scrapped all of it, which was a shame as I was hoping to be able to combine the two methodologies into one system that would extend the research I previously blogged on. Instead, after scrapping it, I decided to have a look as to why I was running into the issues I was.

In any case, here’s the quick demo I did.

require(quantmod)
require(PerformanceAnalytics)
require(IKTrading)

symbols <- c("VTSMX", "FDIVX", "VEIEX", "VBMFX", "VFISX", "VGSIX", "QRAAX")

getSymbols(symbols, from="1990-01-01")
prices <- list()
for(i in 1:length(symbols)) {
  prices[[i]] <- Ad(get(symbols[i]))  
}
prices <- do.call(cbind, prices)
colnames(prices) <- gsub(".[A-z]*", "", colnames(prices))
ep <- endpoints(prices, "months")
adPrices <- prices
prices <- prices[ep,]
prices <- prices["1997-03::"]
adPrices <- adPrices["1997-04::"]

eaaOffensive <- EAA(monthlyPrices = prices, returnWeights = TRUE, cashAsset = "VBMFX", bestN = 3)
eaaOffNoCrash <- EAA(monthlyPrices = prices, returnWeights = TRUE, cashAsset ="VBMFX", 
                     bestN = 3, enableCrashProtection = FALSE)
faa <- FAA(prices = adPrices, riskFreeName = "VBMFX", bestN = 3, returnWeights = TRUE, stepCorRank = TRUE)
faaNoStepwise <- FAA(prices = adPrices, riskFreeName = "VBMFX", bestN = 3, returnWeights = TRUE, stepCorRank = FALSE)

eaaOffDaily <- Return.portfolio(R = Return.calculate(adPrices), weights = eaaOffensive[[1]])
eaaOffNoCrash <- Return.portfolio(R = Return.calculate(adPrices), weights = eaaOffNoCrash[[1]])
charts.PerformanceSummary(cbind(faa[[2]], eaaDaily))

comparison <- cbind(eaaOffDaily, eaaOffNoCrash, faa[[2]], faaNoStepwise[[2]])
colnames(comparison) <- c("Offensive EAA", "Offensive EAA (no crash protection)", "FAA (stepwise)", "FAA (no stepwise)")
charts.PerformanceSummary(comparison)

rbind(table.AnnualizedReturns(comparison), maxDrawdown(comparison))

Essentially, I compared FAA with the stepwise correlation rank algorithm, without it, and the offensive EAA with and without crash protection. The results were disappointing.

Here are the equity curves:

In short, the best default FAA variant handily outperforms any of the EAA variants.

And here are the statistics:

                          Offensive EAA Offensive EAA (no crash protection) FAA (stepwise) FAA (no stepwise)
Annualized Return             0.1247000                           0.1305000      0.1380000          0.131400
Annualized Std Dev            0.1225000                           0.1446000      0.0967000          0.089500
Annualized Sharpe (Rf=0%)     1.0187000                           0.9021000      1.4271000          1.467900
Worst Drawdown                0.1581859                           0.2696754      0.1376124          0.130865

Note of warning: if you run EAA, it seems it’s unwise to do it without crash protection (aka decreasing your stake in everything but the cash/risk free asset by a proportion of the number of negative return securities). I didn’t include the defensive variant of EAA since that gives markedly lower returns.

Not that this should discredit EAA, but on a whole, I feel that there should probably be a way to beat the (usually) equal-weight weighting scheme (sometimes the cash asset gets a larger value due to a negative momentum asset making it into the top assets by virtue of the rank of its volatility and correlation, and ends up getting zeroed out) that FAA employs, and that treating FAA as an asset selection mechanism as opposed to a weighting mechanism may yield some value. However, I have not yet found it myself.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn

To leave a comment for the author, please follow the link and comment on his blog: QuantStrat TradeR » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Reproducibility with Revolution R Open and the checkpoint package

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Thanks to everyone at the Chicago R User Group for giving me such a warm welcome for my presentation last night. In my talk, I gave an introduction to Revolution R Open, with a focus on how the checkpoint package makes sharing R code in a reproducible way easy:

If you’d like to try out the checkpoint package, it’s available on CRAN now. You can also install a newer, more efficient version from GitHub as follows:

library(devtools)
install_github(“RevolutionAnalytics/checkpoint”)

You can’t see my demo in the slides, but I ran this script in RStudio and demonstrated what happens when you adjust the checkpoint date. You can learn more about checkpoint and the Reproducible R Toolkit here.
To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

littler 0.2.2

By Thinking inside the box

max-heap image

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

A new minor release of littler is available now.

Several examples were added or extended:

  • a new script check.r to check a source tarball with R CMD check after loading required packages first (and a good use case was given in the recent UBSAN testing with Rocker post);
  • a new script to launch Shiny apps via runApp();
  • a new feature to install.r and install2.r whereby source tarballs are recognized and installed directly
  • new options to install.r and install2.r to set repos and lib location.

See the littler examples page for more details.

Another useful change is that r now reads either one (or both) of /etc/littler.r and ~/.littler.r. These are interpreted as standard R files, allowing users to provide initialization, package loading and more.

Carl Boettiger and I continue to make good use of these littler examples (to to install directly from CRAN or GitHub, or to run checks) in our Rocker builds of R for Docker.

Full details for the littler release are provided as usual at the ChangeLog page.

The code is available via the GitHub repo, from tarballs off my littler page and the local directory here. A fresh package has gone to the incoming queue at Debian; Michael Rutter will probably have new Ubuntu binaries at CRAN in a few days too.

Comments and suggestions are welcome via the mailing list or issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on his blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

How e Can Help You To Find The Love Of Your Life

By aschinchon

Prince

(This article was first published on Ripples, and kindly contributed to R-bloggers)

Match.com will bring more love to the planet than anything since Jesus Christ (Gary Kremen, founder of Match.com)

Charlie is a brilliant 39 years old mathematician living in Massachusetts. He lives alone and has dedicated his whole life to study. He has realized lately that theorems and books no longer satisfy him. Charlie has realized that needs to find love.

To find the love of his life, Charlie joined Match.com to try to have a date with a girl every week for a year (52 dates in total). Charlie has a method to rate each girl according her sympathy, her physical appearance, her punctuality, her conversation and her hobbies. This method allows him to compare girls with each other. Charlie wants to pick the best of the 52 girls according to their score, but there is a problem: he needs to be agile if he wants to woo a girl. If he spends so many time to call back after the first date, she may be in the arms of another man. So Charlie has to decide inmediately if the girl is the love of his life. His plan is as follows: he will start having some dates only to assess the candidates, without declaring his love to any of them. After this, he will try to win over the first girl better than any of the first candidates, according his scoring.

But, how many girls should discard to maximize the probability of choosing the best one? Discarding just one, probability of having a date with a better girl in the next date is very high. But will she be the best of all girls? Probably not. On the other hand, discarding many girls, makes very probable discarding also the best candidate.

Charlie did a simulation in R of the 52 dates to approximate the probability of choosing the best girl depending on the number of discarded girls. He obtained that the probability of choosing the best girl is maximal discarding the 19 first girls, as can be seen in the following graph:

Why 19? This is one of the rare places where can found the number e. You can see an explanation of the phenomenon here.

require(extrafont)
require(ggplot2)
n=52
sims=1000
results=data.frame(discards=numeric(0), triumphs=numeric(0))
for(i in 0:n)
{
  triumphs=0
  for (j in 1:sims) {
    opt=sample(seq(1:n), n, replace=FALSE)
    if (max(opt[1:i])==n)  triumphs=triumphs+0
    else triumphs=triumphs+(opt[i+1:n][min(which(opt[i+1:n] > max(opt[1:i])))]==n)}
  results=rbind(results, data.frame(discards=i, triumphs=triumphs/sims))
}
opts=theme(
  panel.background = element_rect(fill="darkolivegreen1"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="white", linetype = 1),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="black", size=20),
  axis.text.x = element_text(colour="black", size=20),
  text = element_text(size=25, family="xkcd"),
  legend.key = element_blank(),
  legend.background = element_blank(),
  plot.title = element_text(size = 40))
ggplot(results, aes(discards, triumphs))+
  geom_vline(xintercept = n/exp(1), size = 1, linetype=2, colour = "black", alpha=0.8)+
  geom_line(color="green4", size=1.5)+
  geom_point(color="gray92", size=8, pch=16)+
  geom_point(color="green4", size=6, pch=16)+
  ggtitle("How e can help you to find the love of your life")+
  xlab("Discards") +
  ylab("Prob. of finding the love of your life")+
  scale_x_continuous(breaks=seq(0, n, by = 2))+opts

To leave a comment for the author, please follow the link and comment on his blog: Ripples.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

The time series of JGB Interest Rate(10Y) over the last one year

By teramonagi

(This article was first published on My Life as a Mock Quant in English, and kindly contributed to R-bloggers)

World wide quantitative easing does not seem to end. I live and work in Japan which has the lowest interest rate.

Thanks to Quandl and RStudio, I can easily get the data of the interest rate and visualize it with R and dygraphs package!

You can understand how you download the data of the interest rate from Quandl and vizualize it as javascript graph(dygraphs) with R when you see the following document published in RPubs.

And also, You can download the entire content of the above document including R code from the following link:
To leave a comment for the author, please follow the link and comment on his blog: My Life as a Mock Quant in English.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

An R User Group Mini Conference

By Joseph Rickert

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

With a membership that spans the entire Bay Area and the ability to hold meetings from San Jose to San Francisco, the Bay Area useR Group is in the very fortunate position of being able to attract world-class R speakers on a regular basis. Meetings that feature multiple speakers often attract 80 to a 100 or more attendees and generate the excitement and feel of a “mini-conference”. Our recent January meeting set a new benchmark for this kind of event. Not only did we have three excellent speakers: Ryan Hafen, Hadley Wickham and Nick Elprin, but through blind luck there turned out to be some considerable synergy among the talks. Moreover, through the generosity of Mozilla, our host for the meeting, we have a video of the event to share.

Each speaker was allocated about 20 minutes to present. It is extremely challenging to give an informative, technical talk in this limited amount of time. However, as you will see, even though they express dismay at being pressured by the clock all three speakers do a superb job. From the audience’s point of view, I think 20 minutes is just about right though. An intense twenty minute talk holds the audience’s attention and is enough time for a speaker to convey the main points and present sufficient detail for motivated people to follow up on their own.

In the video below, the first speaker, Ryan Hafen presents an overview of the Tessera project supported by Purdue University, the Pacific Northwest National Laboratory, and Mozilla. Ryan first explains Bill Cleveland’s Divide and Recombine strategy and describes the underlying architecture (a key value store that supports the use of Map Reduce or Spark to implement the divide and recombine steps for large data sets). He then goes on to explain the use of Tukey style cognostics (metrics for each panel in an array of plots that indicate how important it is to look at a particular panel) and shows examples of interactive trellis visualizations using the latest ggvis technology.

Hadley Wickham, the second speaker, presents “Pure, predictable, pipeable: creating fluent interfaces with R”. Hadley begins by making the case for his magrittr pipe function, %>%, and then goes on to argue that adhering to the three principles of writing “pure”, “predictable” and “pipeable” functions greatly facilitates the goal of solving complex problems by combining simple pieces. Always an engaging and entertaining speaker, Hadley is at the top of his game here. No one comes better prepared for a talk, and few speakers can match Hadley’s passionate delivery and wry sense of humor. (Hadley’s examples of inconsistent R expressions are very amusing.)

Last in the line up, and not at all fazed about following Hadley, Nick Elprin presents a tutorial on parallelizing machine learning algorithms in R. He begins by emphasizing that he is targeting applications where the data will fit into the memory of a robust machine, and presents a strategy for parallelizing code that emphasizes understanding your code well enough so that you can employ parallelism where it will do the most good, and taking care to parallelize tasks to match your resources. After spinning up a server using the Domino platform that allows running R code from an Ipython notebook (See Nicks recent post on R Notebooks for details.) Nick runs through a series of thoughtful examples emphasizing different aspects of his parallelization principles. The following example contrasts execution time of a simple Random Forests model with a parallel version built around a foreach loop.

# random forests
wine <- read.csv( "winequality-red.csv", sep=';', header = TRUE ) 
head(wine)
y_dat = wine$quality
x_dat <- wine[,1:11]
 
 
library(randomForest)
num_trees = 500
system.time({
  randomForest(y = y_dat, x = x_dat, ntree = num_trees)
})
 
 
trees_per_core = floor(num_trees / numCores)
system.time({
  wine_model <- foreach(trees=rep(trees_per_core, numCores), .combine=combine, .multicombine=TRUE) %dopar% {
    randomForest(y = y_dat, x = x_dat, ntree = trees)
  }

The video is well worth watching all the way through. (Well, maybe you should skip the first 20 seconds.) Here are some additional resources for all three presentations.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

What does a Bayes factor feel like?

By FelixS

Marble_distirbutions_and_BF

(This article was first published on Nicebread » R, and kindly contributed to R-bloggers)

A Bayes factor (BF) is a statistical index that quantifies the evidence for a hypothesis, compared to an alternative hypothesis (for introductions to Bayes factors, see here, here or here).

Although the BF is a continuous measure of evidence, humans love verbal labels, categories, and benchmarks. Labels give interpretations of the objective index – and that is both the good and the bad about labels. The good thing is that these labels can facilitate communication (but see @richardmorey), and people just crave for verbal interpretations to guide their understanding of those “boring” raw numbers.

The bad thing about labels is that an interpretation should always be context dependent (Such as “30 min.” can be both a long time (train delay) or a short time (concert), as @CaAl said). But once a categorical system has been established, it’s no longer context dependent.

These labels can also be a dangerous tool, as they implicitly introduce cutoff values (“Hey, the BF jumped over the boundary of 3. It’s not anecdotal any more, it’s moderate evidence!”). But we do not want another sacred .05 criterion!; see also Andrew Gelman’s blog post and its critical comments. The strength of the BF is precisely its non-binary nature.

Several labels for paraphrasing the size of a BF have been suggested. The most common system seems to be the suggestion of Harold Jeffreys (1961):

Bayes factor BF_{10} Label
> 100 Extreme evidence for H1
30 – 100 Very strong evidence for H1
10 – 30 Strong evidence for H1
3 – 10 Moderate evidence for H1
1 – 3 Anecdotal evidence for H1
1 No evidence
1/3 – 1 Anecdotal evidence for H0
1/3 – 1/10 Moderate evidence for H0
1/10 – 1/30 Strong evidence for H0
1/30 – 1/100 Very strong evidence for H0
< 1/100 Extreme evidence for H0

Note: The original label for 3 < BF < 10 was “substantial evidence”. Lee and Wagenmakers (2013) changed it to “moderate”, as “substantial” already sounds too decisive. “Anecdotal” formerly was known as “Barely worth mentioning”.

Kass and Raftery suggested a comparable classification, only that the “strong evidence” category for them starts at BF > 20 (see also Wikipedia entry).

Getting a feeling for Bayes factors

How much is a BF_{10} of 3.7? It indicates that data occured 3.7x more likely under H_1 than under H_0, given the priors assumed in the model. Is that a lot of evidence for H_1? Or not?

Following Table 1, it can be labeled “moderate evidence” for an effect – whatever that means.

Some have argued that strong evidence, such as BFs > 10, are quite evident from eyeballing only:

“If your result needs a statistician then you should design a better experiment.” (attributed to Ernest Rutherford)

If you have to search for the statistically significant, then it’s not. #statistics #ddj #dataviz

— Edward Tufte (@EdwardTufte) 13. Januar 2015

Is that really the case? Can we just “see” it when there is an effect?

Let’s approach the topic a bit more experientially. What does such a BF look like, visually? We take the good old urn model as a first example.

Visualizing Bayes factors for proportions

Imagine the following scenario: When I give a present to my two boys (4 and 6 years old), it is not so important what it is. The most important thing is: “Is it fair?”. (And my boys are very sensitive detectors of unfairness).

Imagine you have bags with red and blue marbles. Obviously, the blue marbles are much better, so it is key to make sure that in each bag there is an equal number of red and blue marbles. Hence, for our familial harmony I should check whether reds and blues are distributed evenly or not. In statistical terms: H_0: p = 0.5, H_1: p != 0.5.

When drawing samples from the bags, the strongest evidence for an even distribution (H_0) is given when exactly the same number of red and blue marbles has been drawn. How much evidence for H_0 is it when I draw n=2, 1 red/1 blue? The answer is in Figure 1, upper table, first row: The BF_{10} is 0.86 in favor of H_1, resp. a BF_{01} of 1.16 in favor of H_0 – i.e., anecdotal evidence for an equal distribution.

You can get these values easily with the famous BayesFactor package for R:

[cc lang=”rsplus” escaped=”true”]
proportionBF(y=1, N=2, p=0.5)
[/cc]

What if I had drawn two reds instead? Then the BF would be 1.14 in favor of H_1 (see Figure 1, lower table, row 1).

[cc lang=”rsplus” escaped=”true”]
proportionBF(y=2, N=2, p=0.5)
[/cc]

Obviously, with small sample sizes it’s not possible to generate strong evidence, neither for H_0 nor for H_1. You need a minimal sample size to leave the region of “anecdotal evidence”. Figure 1 shows some examples how the BF gets more extreme with increasing sample size.

Figure 1.

These visualizations indeed seem to indicate that for simple designs such as the urn model you do not really need a statistical test if your BF is > 10. You can just see it from looking at the data (although the “obviousness” is more pronounced for large BFs in small sample sizes).

Maximal and minimal Bayes factors for a certain sample size

The dotted lines in Figure 2 show the maximal and the minimal BF that can be obtained for a given number of drawn marbles. The minimum BF is obtained when the sample is maximally consistent with H_0 (i.e. when exactly the same number of red and blue marbles has been drawn), the maximal BF is obtained when only marbles from one color are drawn.

max_min_BF_r-medium

Figure 2: Maximal and minimal BF for a certain sample size.

Figure 2 highlights two features:

  • If you have few data points, you cannot have strong evidence, neither for H_1 nor for H_0.
  • It is much easier to get strong evidence for H_1 than for H_0. This property depends somewhat on the choice of the prior distribution of H_1 effect sizes. If you expect very strong effects under the H_1, it is easier to get evidence for H_0. But still, with every reasonable prior distribution, it is easier to gather evidence for H_1.

Get a feeling yourself!

Here’s a shiny widget that let’s you draw marbles from the urn. Monitor how the BF evolves as you sequentially add marbles to your sample!

[Open app in separate window]

Teaching sequential sampling and Bayes factors

IMG_4037

When I teach sequential sampling and Bayes factors, I bring an actual bag with marbles (or candies of two colors).

In my typical setup I ask some volunteers to test whether the same amount of both colors is in the bag. (The bag of course has a cover so that they don’t see the marbles). They may sample as many marbles as they want, but each marble costs them 10 Cent (i.e., an efficiency criterium: Sample as much as necessary, but not too much!). They should think aloud, about when they have a first hunch, and when they are relatively sure about the presence or absence of an effect. I use a color mixture of 2:1 – in my experience this give a good chance to detect the difference, but it’s not too obvious (some teams stop sampling and conclude “no difference”).

This exercise typically reveals following insights (hopefully!)

  • By intuition, humans sample sequentially. When the evidence is not strong enough, more data is sampled, until they are sure enough about the (un)fairness of the distribution.
  • Intuitionally, nobody does a fixed-n design with a-priori power analysis.
  • Often, they stop quite soon, in the range of “anecdotal evidence”. It’s also my own impression: BFs that are still in the “anecdotal” range already look quite conclusive for everyday hypothesis testing (e.g., a 2 vs. 9 distribution; BF_{10} = 2.7). This might change, however, if in the scenario a wrong decision is associated with higher costs. Next time, I will try a scenario of prescription drugs which have potentially severe side effects.

The “interocular traumatic test”

The analysis so far seems to support the “interocular traumatic test”: “when the data are so compelling that conclusion hits you straight between the eyes” (attributed to Joseph Berkson; quoted from Wagenmakers, Verhagen, & Ly, 2014).

But the authors go on and quote Edwards et al. (1963, p. 217), who said: “…the enthusiast’s interocular trauma may be the skeptic’s random error. A little arithmetic to verify the extent of the trauma can yield great peace of mind for little cost.”.

In the next visualization we will see, that large Bayes factors are not always obvious.

Visualizing Bayes factors for group differences

What happens if we switch to group differences? European women have on average a self-reported height of 165.8 cm, European males of 177.9 cm – difference: 12.1 cm, pooled standard deviation is around 7 cm. (Source: European Community Household Panel; see Garcia, J., & Quintana-Domeque, C., 2007; based on ~50,000 participants born between 1970 and 1980). This translates to a Cohen’s d of 1.72.

Unfortunately, this source only contains self-reported heights, which can be subject to biases (males over-report their height on average). But it was the only source I found which also contains the standard deviations within sex. However, Meyer et al (2001) report a similar effect size of d = 1.8 for objectively measured heights.

Now look at this plot. Would you say the blue lines are obviously higher than the red ones?

Bildschirmfoto 2015-01-29 um 13.17.32

I couldn’t say for sure. But the BF_{10} is 14.54, a “strong” evidence!

If we sort the lines by height the effect is more visible:

Bildschirmfoto 2015-01-29 um 13.17.43

… and alternatively, we can plot the distributions of males’ and females’ heights:Bildschirmfoto 2015-01-29 um 13.17.58

Again, you can play around with the interactive app:

[Open app in separate window]

Can we get a feeling for Bayes factors?

To summarize: Whether a strong evidence “hits you between the eyes” depends on many things – the kind of test, the kind of visualization, the sample size. Sometimes a BF of 2.5 seems obvious, and sometimes it is hard to spot a BF>100 by eyeballing only. Overall, I’m glad that we have a numeric measure of strength of evidence and do not have to rely on eyeballing only.

Try it yourself – draw some marbles in the interactive app, or change the height difference between males and females, and calibrate your personal gut feeling with the resulting Bayes factor!

To leave a comment for the author, please follow the link and comment on his blog: Nicebread » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News