The R Consortium has funded half a million dollars to R projects

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The R Consortium passed a significant milestone this month: since its inception, the non-profit body has provided more than US$500,000 in grant funding to project proposed by the R Community. The R Consortium uses the dues from its member organizations to fund grant proposals, which are reviewed twice a year by its Infrastructure Steering Committee. (If you’d like to propose a project, proposals for the next round are being accepted through April 1.)

New projects funded in this round include:

  • Creating a new data type for R to unify the “units” and “errors” packages
  • Updating the R module for the Simplified Wrapper and Interface Generator (SWIG) to support modern R programming practices like reference classes
  • Providing an API and test framework to underlie the “future” package
  • A package to process spatiotemporal data held on servers with a dplyr-like syntax

With these new grants, the R Consortium has funded 21 projects in total, from R packages to community events to developer tools. You can read more about the new projects in the announcement linked below.

R Consortium: Announcing the second round of ISC Funded Projects for 2017

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

2018-03 Putting the Macron in Māori: Accented text in R Graphics

By pmur002

(This article was first published on R – Stat Tech, and kindly contributed to R-bloggers)

This report describes different methods for correctly rendering macrons in Māori text within R plots. The topics covered will also have relevance to rendering other special characters in R graphics and possibly to rendering macrons in other software.

Paul Murrell

Download

To leave a comment for the author, please follow the link and comment on their blog: R – Stat Tech.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

#17: Dependencies.

By Thinking inside the box

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

Dependencies are invitations for other people to break your package.
— Josh Ulrich, private communication

Welcome to the seventeenth post in the relentlessly random R ravings series of posts, or R4 for short.

Dependencies. A truly loaded topic.

As R users, we are spoiled. Early in the history of R, Kurt Hornik and Friedrich Leisch built support for packages right into R, and started the Comprehensive R Archive Network (CRAN). And R and CRAN had a fantastic run with. Roughly twenty years later, we are looking at over 12,000 packages which can (generally) be installed with absolute ease and no suprises. No other (relevant) open source language has anything of comparable rigour and quality. This is a big deal.

And coding practices evolved and changed to play to this advantage. Packages are a near-unanimous recommendation, use of the install.packages() and update.packages() tooling is nearly universal, and most R users learned to their advantage to group code into interdependent packages. Obvious advantages are versioning and snap-shotting, attached documentation in the form of help pages and vignettes, unit testing, and of course continuous integration as a side effect of the package build system.

But the notion of ‘oh, let me just build another package and add it to the pool of packages’ can get carried away. A recent example I had was the work on the prrd package for parallel recursive dependency testing — coincidentally, created entirely to allow for easier voluntary tests I do on reverse dependencies for the packages I maintain. It uses a job queue for which I relied on the liteq package by Gabor which does the job: enqueue jobs, and reliably dequeue them (also in a parallel fashion) and more. It looks light enough:

R> tools::package_dependencies(package="liteq", recursive=FALSE, db=AP)$liteq
[1] "assertthat" "DBI"        "rappdirs"   "RSQLite"   
R> 

Two dependencies because it uses an internal SQLite database, one for internal tooling and one for configuration.

All good then? Not so fast. The devil here is the very innocuous and versatile RSQLite package because when we look at fully recursive dependencies all hell breaks loose:

R> tools::package_dependencies(package="liteq", recursive=TRUE, db=AP)$liteq
 [1] "assertthat" "DBI"        "rappdirs"   "RSQLite"    "tools"     
 [6] "methods"    "bit64"      "blob"       "memoise"    "pkgconfig" 
[11] "Rcpp"       "BH"         "plogr"      "bit"        "utils"     
[16] "stats"      "tibble"     "digest"     "cli"        "crayon"    
[21] "pillar"     "rlang"      "grDevices"  "utf8"      
R>
R> tools::package_dependencies(package="RSQLite", recursive=TRUE, db=AP)$RSQLite
 [1] "bit64"      "blob"       "DBI"        "memoise"    "methods"   
 [6] "pkgconfig"  "Rcpp"       "BH"         "plogr"      "bit"       
[11] "utils"      "stats"      "tibble"     "digest"     "cli"       
[16] "crayon"     "pillar"     "rlang"      "assertthat" "grDevices" 
[21] "utf8"       "tools"     
R> 

Now we went from four to twenty-four, due to the twenty-two dependencies pulled in by RSQLite.

There, my dear friend, lies madness. The moment one of these packages breaks we get potential side effects. And this is no laughing matter. Here is a tweet from Kieran posted days before a book deadline of his when he was forced to roll a CRAN package back because it broke his entire setup. (The original tweet has by now been deleted; why people do that to their entire tweet histories is somewhat I fail to comprehened too; in any case the screenshot is from a private discussion I had with a few like-minded folks over slack.)

That illustrates the quote by Josh at the top. As I too have “production code” (well, CRANberries for one relies on it), I was interested to see if we could easily amend RSQLite. And yes, we can. A quick fork and few commits later, we have something we could call ‘RSQLighter’ as it reduces the dependencies quite a bit:

R> IP  installed.packages()   # using my installed mod'ed version
R> tools::package_dependencies(package="RSQLite", recursive=TRUE, db=IP)$RSQLite
 [1] "bit64"     "DBI"       "methods"   "Rcpp"      "BH"        "bit"      
 [7] "utils"     "stats"     "grDevices" "graphics" 
R>

That is less than half. I have not proceeded with the fork because I do not believe in needlessly splitting codebases. But this could be a viable candidate for an alternate or shadow repository with more minimal and hence more robust dependencies. Or, as Josh calls, the tinyverse.

Another maddening aspect of dependencies is the ruthless application of what we could jokingly call Metcalf’s Law: the likelihood of breakage does of course increase with the number edges in the dependency graph. A nice illustration is this post by Jenny trying to rationalize why one of the 87 (as of today) tidyverse packages has now state “ORPHANED” at CRAN:

An invitation for other people to break your code. Well put indeed. Or to put rocks up your path.

But things are not all that dire. Most folks appear to understand the issue, some even do something about it. The DBI and RMySQL packages have saner strict dependencies, maybe one day things will improve for RMariaDB and RSQLite too:

R> tools::package_dependencies(package=c("DBI", "RMySQL", "RMariaDB"), recursive=TRUE, db=AP)
$DBI
[1] "methods"

$RMySQL
[1] "DBI"     "methods"

$RMariaDB
 [1] "bit64"     "DBI"       "hms"       "methods"   "Rcpp"      "BH"       
 [7] "plogr"     "bit"       "utils"     "stats"     "pkgconfig" "rlang"    

R> 

And to be clear, I do not believe in giving up and using everything via docker, or virtualenvs, or packrat, or … A well-honed dependency system is wonderful and the right resource to get code deployed and updated. But it required buy-in from everyone involved, and an understanding of the possible trade-offs. I think we can, and will, do better going forward.

Or else, there will always be the tinyverse

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Using Stata and R together

By Robert

(This article was first published on R – Dataviz – Stats – Bayes, and kindly contributed to R-bloggers)

I’m running a one day course with Timberlake, Stata’s UK distributors on this topic. We’ll run it next Friday the 9th of March and again later in the year (10 Aug, 6 Dec), each time at Cass Business School in the City of London. If you use one of these, and have at least had a quick look at the other, then this course is for you. I won’t introduce the software from scratch — installation etc — but I assume you are comfortable working in one of them. We’ll use RStudio as an IDE but you can apply what you learn in any R GUI, or none.

The learning outcomes are:

  • understand the differences between a functional language and an imperative language
  • know several strengths and weaknesses of both Stata and R
  • be able to include chunks of R code inside a Stata do file and have them run from Stata
  • be able to include chunks of Stata code inside an R script and have them run from R
  • understand the limitations of passing data back and forth between Stata and R, and how to spot problems

If you want to know more, you can email me or get in touch on Twitter. If you want to book a place or ask about practicalities, travel etc, check out the Timberlake page.

To leave a comment for the author, please follow the link and comment on their blog: R – Dataviz – Stats – Bayes.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Introducing The Calibre De Cartier Chronograph: An Impressive New Interior Movement

By casualinference

(This article was first published on Cartier Replica Watches Sale, and kindly contributed to R-bloggers)

The 2013 force majeur is a big news for Cartier: they introduced their first internal timing movement, the 1904-CH MC, on the Cartier chronograph. Self-winding vertical clutches Column wheel motors are brand-based Cartier campaigns. We got it today and it impressed me in a few ways.

The vertical clutch cylinder rollers are known for their smooth and fast motion, and Caliber is no exception. Start, stop and reset with just the right pressure and with a satisfying “click” to make the timer a pleasure. Cartier Calibre Replica Watches used a linear “zeroing” hammer to ensure instantaneous reset of all hands regardless of the pressure on the reset pusher. For self-winding, the rotor is on a ceramic ball bearing and utilizes a click system instead of the traditional reverse, ensuring more efficient two-way rotation. Power reserve is a fairly common 48 hours.

Visually, this watch features bold, sporty Cartier-style hints complementing brushed and polished surfaces, sword-swords and familiar oversized Roman dials. However, the double and engraved railroad tracks border well balanced and limited, and the pushrod is integrated into the crown guard. This chronograph will be constructed of steel and rose gold, using a three-way bracelet or belt.

The watch retails for $ 10,800, including the retail price of steel and leather, b01base from Rolex Daytona to Breitling, and IWC’s Model 89365 timing system into an ever-increasing The internal timer field. Given Cartier’s reputation and a good watch reputation, its competitors will certainly pay close attention to this watch. You can also be us.

To leave a comment for the author, please follow the link and comment on their blog: Cartier Replica Watches Sale.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Using R to Reason & Test Theory: A Case Study from the Field of Reading Education

By tylerrinker

sceince-explain-our-reading-brain1

(This article was first published on R – TRinker’s R Blog, and kindly contributed to R-bloggers)

This past week I was preparing slides for a reading assessment class with a lecture focus on the Visual Word Form Area [VWFA] (Cohen, et al., 2000). This is an area of the brain that is hypothesized to be able to see words (plus morphemes and likely smaller chunks) as shapes, as picture forms and that may have a connecting link between the visual and language portions of the brain.

brain_image

In a sense it allows a proficient reader to see words and know them in the same way that we see people’s faces and we know them (if we’ve encountered them before). Essentially, phonics is useful, particularly at certain points in our reading development but is rather inefficient and not the work horse of a proficient reader’s reading process. Instant word recognition is required for fluency and comprehension. For additional information on the reading process see the video below.

Cohen, L., Dehaene, S., Naccache, L., Lehéricy, S., Dehaene-Lambertz, G., Hénaff, M. A., Michel, F. (2000). The visual word form area: Spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. Brain. 123(2). 291–307. doi:10.1093/brain/123.2.291. ISSN 0006-8950. PMID 10648437

As I prepared for class I wanted to demonstrate two points about the VWFA to students:

  1. General shape of words is an attribute used by the VWFA
  2. The first and last letter are very important to instant word recognition; the exact ordering of the individual letters (graphemes) of the middle portions of words less important

The first can be evidenced by altering the case of letters within words and seeing it does indeed slow down reading rate. The second can be demonstrated by randomly reordering the inner letters within a word. The amount of case changing or reordering of letters are parameters that can be changed and can slow down reading rate in varied ways. What better way to demonstrate this than using R to reason and programatically allow the testing of theory. The two sections below show R code that tests the (1) altering case theory of the VWFA and (2) the lowered importance of the ordering of the middle letters of words.

First you’ll need to install my textshape package to get started:

if (!require("pacman")) install.packages("pacman"); library(pacman)
pacman::p_load(textshape)

Altering Case Effects

Mayall, Humphreys & Olson (1997) show that letter case randomization can disrupt the ability to process words. If true, this is evidence that the VWFA (if it exists) uses a general shape attribute for word recognition since mixing case alters shape not letters.

Mayall, K., Humphreys, G. W., & Olson, A. (1997). Disruption to word or letter processing?: The origins of case-mixing effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(5), 1275-1286. 10.1037/0278-7393.23.5.1275

The R script below is a function that takes text and randomly replaces a set proportion of lower case letters with upper case. In the script below I show some text that has 2%, 10%, and 50% (worst case; no pun intended) of lower case letters randomly replaced with upper case letters. The reader can informally see that indeed the letters are the same but the picture quality seen by the brain reduces the ability to process the words. Secondly, 2% change is less disruptive than 50%. This is evidence that there is a VWFA and one of the attributes it uses is word shape.

#' Randomly Change the Case of Letters Within Words
#' 
#' Following Mayall, Humphreys, & Olson (1997), this function randomly 
#' converts a proportion of lower case letters to upper case.
#' 
#' @param x A vector of text strings to upper case.
#' @param prop A proportion of graphemes to change the case of.
#' @param wrap An integer value of how wide to wrap the strings.  Using the default 
#' code{NULL} disables this feature.
#' @param ldots ignored.
#' @return Prints wrapped lines with internal graphemes randomly converted to 
#' upper case.
#' @references 
#' Mayall, K., Humphreys, G. W., & Olson, A. (1997). Disruption to word or letter 
#' processing?: The origins of case-mixing effects. Journal of Experimental Psychology: 
#' Learning, Memory, and Cognition, 23(5), 1275-1286. 10.1037/0278-7393.23.5.1275
#' @export
random_upper  0 & prop 
## 10% random upper
random_upper(x, .02, 60)

## MAny English words Are formed by taking basic words and
## adding combinations of prefixes and suffixes to them.

## 10% random upper
random_upper(x, .3, 60)

## Many English words are forMed by taking basic worDs aNd
## adding Combinations of prEFixes and SuffIxeS to them.

## 50% random upper
random_upper(x, .5, 60)

## MaNy EnglIsH WORDs ARE FoRmED BY taking BAsic WOrDs And
## AddiNg CombinATIoNS OF pRefIXEs AnD sUffIXES to thEM.

Transposing Internal Letters

Another interesting phenomenon is the transposing of letters within the middle of words. This was popularized as an Internet meme & hoax about research at Cambridge University:

fqfy4h2

While the claim the meme makes about the research and Cambridge wasn’t true, obviously, there is an element of truth to the inner word transpose effect noted by researchers in the 70s and 80s (e.g., McCusker, Gough, & Bias, 1981). Indeed the reader can still understand the message but there is a cognitive cost to scrambling letters (Rayner, White, Johnson, & Liversedge, 2006).

McCusker, L. X., Gough, P. B., & Bias, R. G. (1981). Word recognition inside out and outside in. Journal Of Experimental Psychology: Human Perception And Performance, 7(3), 538-551. doi:10.1037/0096-1523.7.3.538

Rayner, K., White, S. J., Johnson, R. L., & Liversedge, S. P. (2006). Raeding wrods with jubmled lettres: There is a cost. Psychological Science, 17(3), 192-193. 10.1111/j.1467-9280.2006.01684.x

The R code below allows the user to group the inner portion of words as character ngrams, reorder within these grams, and optionally reorder the position of the reordered ngram groups. Both the size of the ngrams and the reordering of ngram group position are parameters of the effect that we can alter and informally observe via our self reported effects in our ability to read the strings after altering various parameters. The larger the ngram unit the more the inner portion of words will be scrambled.

The sample.grams parameter allows us to see the effect of keeping scrambled ngram groups in their original position or not. Indeed the longer words are, and the more thorough the remix, the bigger the cost of the letter transpose is. When commonly (or expected) co-occurring ngrams are located randomly (far away from each other) this also may contribute to the cost on scrambling effect. In the final code chunk i allow the first and/or the last letter of words to be scrambled as well. This is evidence that the VWFA is keyed in on the first and last letters and that certain letters are expected to be close to one another.

#' Transpose Internal Letters Within Words
#' 
#' Following a famous Internet meme and Rayner, White, Johnson, & Liversedge 
#' (2006), this function randomly scrambles the internal (not the first or last
#' letter of > 3 character words) letters.
#' 
#' @details Internet meme:
#' 
#' It deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt 
#' tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be 
#' a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the 
#' huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
#'
#' @param x A vector of text strings to scramble.
#' @param gram.length The length of gram groups to scramble.  Setting this lower
#' will keep expected graphemes close together.  Setting it to a high value (e.g.,
#' 100) will allow the positions of graphemes to deviate farther from the expected
#' clustering.
#' @param sample.grams logical.  If code{TRUE} then the ngram groups don't retain
#' their original location.  For example, let's say we had the sequence code{123456}. 
#' Sampling grams of length 3 (code{gram.length} may produce code{231564}. Setting 
#' code{sample.grams = TRUE} may further produce code{564231}.
#' @param wrap An integer value of how wide to wrap the strings.  Using the default 
#' code{NULL} disables this feature.
#' @param remix.first logical.  If code{TRUE} the first letter is allowed to be 
#' remixed as well.
#' @param remix.last logical.  If code{TRUE} the last letter is allowed to be 
#' remixed as well.
#' @param ldots ignored.
#' @return Prints wrapped lines with internal graphemes scrambled.
#' @references 
#' Rayner, K., White, S. J., Johnson, R. L., & Liversedge, S. P. (2006). Raeding 
#' wrods with jubmled lettres: There is a cost. Psychological Science, 17(3), 
#' 192-193. 10.1111/j.1467-9280.2006.01684.x
#' @export
random_scramble 
## Bigram & retain mixed ngram group locations
random_scramble(x, gram.length = 2, wrap = 70)

## Aroccnidg to a sduty at an Esngilh Uevrstiniy, it deosn't mteatr in
## what order the ltteers in a word are, the only ipmantrot tnihg is
## that the first and lsat letetr be at the rghit place. The rset can be
## a tatol mses and you can stlil read it wuoihtt pbleorm. Tihs is
## becuase the hamun mnid does not raed eevry lteetr by itself but the
## word as a whole.
## 
## A vehicle exploedd at a polcie cinheckopt naer the UN hrtaueaerdqs in
## Baghadd on Monady kilnlig the bomber and an Iaqri police oceffir
## 
## Big cncouil tax iescnreas tihs yaer hvae sezeequd the iecnmos of mnay
## pneerisons
## 
## A docotr has aitmdetd the mhgtenalsuar of a teegane cnacer pneitat
## who died aetfr a hiptaosl durg bulednr.

## Bigram and reorder the mixed ngram groups
random_scramble(x, gram.length = 2, sample.grams = FALSE, wrap = 70)

## According to a sutdy at an Enlgish Univesrity, it deosn't matetr in
## waht order the lteters in a word are, the olny impotrant thing is
## that the first and last letter be at the right plcae. The rset can be
## a toatl mess and you can still read it wihtuot problem. Tihs is
## because the hmuan mnid deos not raed eevry letetr by istlef but the
## wrod as a whole.
## 
## A vehilce epxoledd at a police checkponit near the UN haedquarters in
## Bgahdad on Monady killing the bmoebr and an Iraqi polcie officer
## 
## Big council tax icnreases this yaer have suqeezed the incoems of mnay
## penisonres
## 
## A dcootr has amditetd the mnaslaughetr of a tenegae cnacer ptaeint
## who deid afetr a hosipatl drug blunder.

## Gram length randomly between 2-5 retain mixed ngram group locations
random_scramble(x, gram.length = 2:5, wrap = 70)

## Accroindg to a sdtuy at an Esilgnh Unveisirty, it d'onset mtetar in
## waht oerdr the lerttes in a wrod are, the only iantormpt tihng is
## taht the frist and last lteetr be at the rhigt plcae. The rset can be
## a taotl mess and you can stlil read it wtuioht pelrobm. This is
## bacusee the haumn mnid deos not raed eervy letter by ieltsf but the
## wrod as a wolhe.
## 
## A viclhee eplodxed at a picloe conikpehct near the UN huaqdearters in
## Bgadhad on Mnoady kililng the bbemor and an Iqari picole ofefcir
## 
## Big cionucl tax ieeascnrs this year hvae seqeuezd the imeocns of mnay
## pnseiorens
## 
## A dotocr has atedmitd the mslautenahgr of a tgaenee cecanr ptianet
## who died after a hatspiol drug bdnelur.

## 5-gram retain mixed ngram group locations
random_scramble(x, gram.length = 5, wrap = 70)

## Accdnirog to a sdtuy at an Elisgnh Usrietnivy, it dnsoe't matetr in
## what oerdr the lrettes in a word are, the olny ianmrotpt tinhg is
## taht the frist and lsat leettr be at the rghit pcale. The rset can be
## a ttoal mses and you can sitll read it woutiht peborlm. Tihs is
## bceause the hmaun mind does not read evrey letetr by ilstef but the
## word as a wolhe.
## 
## A vichlee exolpedd at a piolce conkipehct near the UN hrtearuqdeas in
## Bhagdad on Mndoay knliilg the boembr and an Iqrai poicle ofceifr
## 
## Big cniuocl tax iseenracs this year have seeuqzed the ioncems of mnay
## pernsoines
## 
## A docotr has atemtidd the mlsaanheugtr of a tgeenae canecr pteinat
## who died afetr a haptoisl durg beunldr.

let’s ramp it up a bit more and see the effect when we allow the first and last letter to be remixed as well.

## Bigram & retain fixed ngram group locations & remix last letter
random_scramble(x, wrap = 70, remix.last = TRUE)

## Angdiorcc to a sutyd at an Elignsh Uityinsrev, it dns'toe mretta in
## wtha oerrd teh lteters in a wdro aer, teh olny iorpmntat tngih is
## ttah teh frist adn ltsa lertte be at teh rgiht plaec. The rset can be
## a ttoal mses and yuo can sllti rdea it withotu porlbme. Tihs is
## bceesau the hnaum mdin dsoe nto rdae eveyr lttree by ilftse but teh
## wdro as a wohel.
## 
## A velheci eedlodxp at a piloce chekctnipo nera teh UN heartqdrsaue in
## Bgaaddh on Modnay klingil teh berbom and an Iraiq pcieol ociffre
## 
## Big cliuocn txa incasrese tsih yare heva sezqeued the iesomnc of mnya
## pissnoneer
## 
## A dotroc has aittmdde the merhtnagusla of a teeegan cancre patient
## who ddie after a hitspoal durg berlund.

## Bigram & retain fixed ngram group locations & remix first letter
random_scramble(x, wrap = 70, remix.first = TRUE)

## drinoccAg to a tsduy at an Ensiglh inUverstiy, it esod'nt tetamr in
## waht roder the reletts in a rowd are, hte lony rtminapot thing is
## that hte rsift and last eelttr be at hte ghrit aclpe. hTe rest can be
## a taotl sems and oyu acn still eard it thouwit obrpelm. hiTs is
## ebacuse hte uhamn mind deos ont eard ervey etletr by itself but the
## owrd as a hwloe.
## 
## A hievcle olpxeded at a cilpoe nicepkohct aenr hte UN adtrehaureqs in
## daBaghd on ndaoMy kiinllg the bbemor and an aqIri polcie ofcefir
## 
## iBg ocuncil tax eainrcess ihts eyar vahe ezsqueed hte ocinmes of namy
## isprenoens
## 
## A dootcr has tdamited hte utelaamhgnsr of a gaentee caecnr entiapt
## who died teafr a taiphosl urdg deblunr.

## Bigram & retain fixed ngram group locations & remix first + last letters
random_scramble(x, wrap = 70, remix.first = TRUE, remix.last = TRUE) 

## Acicongdr to a yduts ta na Enlihsg tysiivUnre, it t'nsoed amtter ni
## what redro the terstel ni a owrd rae, the noly aimtroptn htign is
## htat the trsif dna stla tterel be ta eht igthr pleca. ehT erts cna eb
## a latot mess nad yuo nca illst daer it wihttuo mprleob. hTis si
## aubcese the hunam mind does otn ader every eltter by tiesfl but eht
## word sa a hwole.
## 
## A ehvleci pldoxede ta a poilce ecntoikpch arne eht UN daehsrrauqte ni
## hdagBda on yaMond illking the rebmbo and na Iqiar poliec ofrecfi
## 
## gBi unocilc txa niesscrae thsi arey have zedeueqs het nimocse fo amyn
## peiorssnne
## 
## A roctod sah adimedtt the ugerlasnmaht fo a teeneag cncare patietn
## how ided etafr a hospitla gudr rnuedbl.

## 5-gram & retain fixed ngram group locations & remix first + last letters
random_scramble(x, wrap = 70, gram.length = 5, remix.first = TRUE, remix.last = TRUE) 

## dingrccoA to a duyst ta na gEinlsh nievUsrity, ti denso't meratt in
## twha drreo the rtestle in a owrd aer, eht ynol nattmiopr tgnhi is
## atth eth isfrt nad salt rtlete be at het trghi claep. Teh tser nac be
## a tlota mses nda ouy nac ltsli read ti witothu rpleobm. Tish si
## ebseuac teh hmaun imdn sdoe otn edra reeyv tetelr by tsilef but hte
## owdr as a ewohl.
## 
## A lehviec epxdldeo ta a oplcie toipnehkcc nrae eth NU auqhedaesrrt ni
## dahgdBa no Mandoy lnlgiki het rombbe dna na rqaIi pcolie ofciefr
## 
## gBi lnciuco txa eassencir tish raey ehav zdeeeqsu hte comnise fo mayn
## osnerniesp
## 
## A doctro hsa edttmdia teh resmnlaathug fo a tneaege cancer nettipa
## woh eddi fatre a hosipatl rdug blnerdu.

This post showed how I recently used R for some quick theory testing and demonstration. Of course the code could be optimized but the point is quick exploration of concepts I’m reading in the literature.

Similar, quick, iterative, testing with R could be done by researchers/teachers across many fields. The field of visualization comes to mind. This could be made into a shiny app to allow non-technical users to still interact with the code. It is my hope that both the content and code is of interest.

To leave a comment for the author, please follow the link and comment on their blog: R – TRinker’s R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Using RcppArmadillo to price European Put Options

By Rcpp Gallery

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Introduction

In the quest for ever faster code, one generally begins exploring ways to integrate C++
with R using Rcpp. This post provides an example of multiple
implementations of a European Put Option pricer. The implementations are done in pure R,
pure Rcpp using some Rcpp sugar functions,
and then in Rcpp using
RcppArmadillo, which exposes the
incredibly powerful linear algebra library, Armadillo.

Under the Black-Scholes model The value of a European put option has the closed form solution:

where

and

Armed with the formulas, we can create the pricer using just R.

put_option_pricer  function(s, k, r, y, t, sigma) {

    d1  (log(s / k) + (r - y + sigma^2 / 2) * t) / (sigma * sqrt(t))
    d2  d1 - sigma * sqrt(t)

    V  pnorm(-d2) * k * exp(-r * t) - s * exp(-y * t) * pnorm(-d1)

    V
}

# Valuation with 1 stock price
put_option_pricer(s = 55, 60, .01, .02, 1, .05)
[1] 5.52021
# Valuation across multiple prices
put_option_pricer(s = 55:60, 60, .01, .02, 1, .05)
[1] 5.52021 4.58142 3.68485 2.85517 2.11883 1.49793

Let’s see what we can do with Rcpp. Besides explicitely stating the
types of the variables, not much has to change. We can even use the sugar function,
Rcpp::pnorm(), to keep the syntax as close to R as possible. Note how we are being
explicit about the symbols we import from the Rcpp namespace: the basic vector type, and
well the (vectorized) ‘Rcpp Sugar’ calls log() and pnorm() calls. Similarly, we use
sqrt() and exp() for the calls on an atomic double variables from the C++ Standard
Library. With these declarations the code itself is essentially identical to the R code
(apart of course from requiring both static types and trailing semicolons).

#include                                         
using Rcpp::NumericVector;
using Rcpp::log;
using Rcpp::pnorm;
using std::sqrt;
using std::log;

// [[Rcpp::export]]
NumericVector put_option_pricer_rcpp(NumericVector s, double k, double r, double y, double t, double sigma) {

    NumericVector d1 = (log(s / k) + (r - y + sigma * sigma / 2.0) * t) / (sigma * sqrt(t));
    NumericVector d2 = d1 - sigma * sqrt(t);
    
    NumericVector V = pnorm(-d2) * k * exp(-r * t) - s * exp(-y * t) * pnorm(-d1);
    return V;
}

We can call this from R as well:

# Valuation with 1 stock price
put_option_pricer_rcpp(s = 55, 60, .01, .02, 1, .05)
[1] 5.52021
# Valuation across multiple prices
put_option_pricer_rcpp(s = 55:60, 60, .01, .02, 1, .05)
[1] 5.52021 4.58142 3.68485 2.85517 2.11883 1.49793

Finally, let’s look at
RcppArmadillo. Armadillo has a
number of object types, including mat, colvec, and rowvec. Here, we just use
colvec to represent a column vector of prices. By default in Armadillo, * represents
matrix multiplication, and % is used for element wise multiplication. We need to make
this change to element wise multiplication in 1 place, but otherwise the changes are just
switching out the types and the sugar functions for Armadillo specific functions.

Note that the arma::normcdf() function is in the upcoming release of
RcppArmadillo, which is
0.8.400.0.0 at the time of writing and still in CRAN’s incoming. It also requires the
C++11 plugin.

#include 
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]

using arma::colvec;
using arma::log;
using arma::normcdf;
using std::sqrt;
using std::log;


// [[Rcpp::export]]
colvec put_option_pricer_arma(colvec s, double k, double r, double y, double t, double sigma) {
  
    colvec d1 = (log(s / k) + (r - y + sigma * sigma / 2.0) * t) / (sigma * sqrt(t));
    colvec d2 = d1 - sigma * sqrt(t);
    
    // Notice the use of % to represent element wise multiplication
    colvec V = normcdf(-d2) * k * exp(-r * t) - s * exp(-y * t) % normcdf(-d1); 

    return V;
}

Use from R:

# Valuation with 1 stock price
put_option_pricer_arma(s = 55, 60, .01, .02, 1, .05)
        [,1]
[1,] 5.52021
# Valuation across multiple prices
put_option_pricer_arma(s = 55:60, 60, .01, .02, 1, .05)
        [,1]
[1,] 5.52021
[2,] 4.58142
[3,] 3.68485
[4,] 2.85517
[5,] 2.11883
[6,] 1.49793

Finally, we can run a speed test to see which comes out on top.

s  matrix(seq(0, 100, by = .0001), ncol = 1)

rbenchmark::benchmark(R = put_option_pricer(s, 60, .01, .02, 1, .05),
                      Arma = put_option_pricer_arma(s, 60, .01, .02, 1, .05),
                      Rcpp = put_option_pricer_rcpp(s, 60, .01, .02, 1, .05), 
                      order = "relative", 
                      replications = 100)[,1:4]
  test replications elapsed relative
2 Arma          100   6.409    1.000
3 Rcpp          100   7.917    1.235
1    R          100   9.091    1.418

Interestingly, Armadillo comes out on top here on this (multi-core)
machine (as Armadillo uses OpenMP where available in newer versions). But the difference
is slender, and there is certainly variation in repeated runs. And the nicest thing about
all of this is that it shows off the “embarassment of riches” that we have in the R and
C++ ecosystem for multiple ways of solving the same problem.

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Image Recognition and Object Detection

By R on Locke Data Blog

(This article was first published on R on Locke Data Blog, and kindly contributed to R-bloggers)

In this latest blog, I’m responding to a cry for help. Someone got in touch with us recently asking for some advice on image detection algorithms, so let’s see what we can do!
They already know what algorithms they want to use, so let’s start with those. Hang on no, for the uninitiated, let’s start with what even is an image detection algorithm?
“An image detection algorithm takes an image, or piece of an image as an input, and outputs what it thinks the image contains.

To leave a comment for the author, please follow the link and comment on their blog: R on Locke Data Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Using R/exams for Short Exams during a Statistics Course

By R/exams

nops1.pdf

(This article was first published on R/exams, and kindly contributed to R-bloggers)

Experiences with using R/exams for multiple automated and randomized exams in an applied statistics course for environmental scientists at Universität Koblenz-Landau.

Guest post by Ralf B. Schäfer (Universität Koblenz-Landau, Institute for Environmental Sciences).

Background

In many study courses in Germany, exams are written directly after the course period, resulting in a high workload for students within a short period. While we have recommended to students to continuously learn and practice for the exams during the course period, such recommendations were not widely taken up. Therefore, we decided, after discussion with the student representatives, to replace the exam at the end of our course in Applied Statistics for Environmental Scientists by six short exams. The exams were held approximately every third week and were designed to be completed within 15 minutes. Given the resulting increase in exam numbers and workload in a course with almost 100 students, we decided to transition to automated exams.

Exam implementation

After a brief review of available tools, we identified R/exams as matching our requirements for automation, which were the following:

  • easy to set up,
  • availability of supporting resources such as tutorials and examples,
  • usable for paper exams,
  • mixed exercises possible (e.g., free text, multiple choice),
  • automated creation of randomized exams and exercises,
  • automated evaluation of filled exams.

Our implementation was guided by the available supporting resources at http://www.R-exams.org/tutorials/ and https://CRAN.R-project.org/package=exams (vignettes). Indeed, the resources made the transition to automated exams very smooth and we strongly recommend to dig into the resources if you plan to use the package. Noteworthy, the help by the package developer Achim Zeileis on a few minor glitches was incredibly fast and supportive. The main transition work was to recast our questions and exercises into the R/Markdown format employed by R/exams (alternatively you could use R/LaTeX).

The short written exams were generated by exams2nops using a mix of multiple-choice and numeric exercises (treated as open-ended questions by exams2nops). As an example, the code below creates a PDF (nops1.pdf) from three such exercise templates (1_mch_GLM_LM.Rmd, 1_num_multi.Rmd, 2_mch_PCA.Rmd) along with the institute logo (logo.png). The multiple-choice exercises can be scanned fully automatically while the open-ended question has to be marked manually before being scanned as well.

## load package
library("exams")

## exams2nops
## PDF output in NOPS format (1 file per exam)
## -> for exams that can be printed, scanned, and automatically evaluated in R

## define an exam (= list of exercises)
myexam 

nops1.pdf
nops1.pdf

The transition came with several advantages, particularly to have completely randomized exams:

  1. The answers to multiple choice questions can be shuffled (by setting exshuffle: TRUE in the metadata of the R/Markdown files, see 1_mch_GLM_LM.Rmd, 2_mch_PCA.Rmd).
  2. The questions included in a single exam can be randomly sampled from a pool of questions by using vectors as list elements when defining the list of exercises/questions.
  3. Questions relying on data sets can be randomized through resampling (see 1_num_multi.Rmd for an example). The following lines represent the crucial part of the code to generate a random data set:
data("USairpollution", package = "HSAUR2")  
nsize 

Challenges and outlook

Randomization is particularly interesting because the exams can be re-used and students cannot copy from other exams (at least not as easily). The main challenges in using R/exams were actually due to human fallibility. Despite stated on the exam and explained before the exam, several students completed their exam incorrectly, requiring manual revisiting and correction, e.g., not marking the boxes on the exam sheet correctly:

One of the challenges with automated exams is incorrect marking.

These were about 5-10% in the first round and less than 5% of exams in subsequent exams. Overall, R/exams saved us many hours of time for correcting exams and we will certainly continue using it and it has attracted attention by other faculty members. In the future, we plan to complement our course by practical exercises in R using the learning management platform OpenOLAT. For this, we will benefit from the exams package options to create and export exercises to learning management platforms such as OLAT or Moodle.

To leave a comment for the author, please follow the link and comment on their blog: R/exams.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

New releases: Microsoft R Client 3.4.3, Microsoft ML Server 9.3

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

An update to Microsoft R Client, Microsoft’s distribution of open source R with additional proprietary packages — including RevoScaleR (for data analysis at scale) and MicrosoftML (for machine learning) — is now available. Microsoft R Client 3.4.3 updates the R engine to R 3.4.3, and (on Linux) now supports deploying computations to a remote SQL Server with the sqlrutils package.

Microsoft R Client 3.4.3 is free to download and use, and as the video above shows is designed for developing analytical applications that will be deployed to production servers. It works with Microsoft Machine Learning Server 9.3, also released today. You can use Microsoft ML Server to scale your R analysis to data sets of any size or workloads of any intensity: as a server or cluster of servers on premises or in the Azure cloud, or as part of a hybrid architecture with Azure Stack.

For more information on the new capabilities in Microsoft ML Server and Microsoft R Client, take a look at the announcement linked below.

Machine Learning Blog: Introducing the Microsoft Machine Learning Server 9.3 Release

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News