Playing with Leaflet (and Radar locations)

By arthur charpentier

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Yesterday, my friend Fleur did show me some interesting features of the leaflet package, in R.

library(leaflet)

In order to illustrate, consider locations of (fixed) radars, in several European countries. To get the data, use

download.file("http://carte-gps-gratuite.fr/radars/zones-de-danger-destinator.zip","radar.zip")
unzip("radar.zip")
 
 ext_radar=function(nf){
radar=read.table(file=paste("destinator/",nf,sep=""), sep = ",", header = FALSE, stringsAsFactors = FALSE)
 radar$type <- sapply(radar$V3, function(x) {z=as.numeric(unlist(strsplit(x, " ")[[1]])); return(z[!is.na(z)])})
  radar <- radar[,c(1,2,4)]
  names(radar) <- c("lon", "lat", "type")
  return(radar)}
 
L=list.files("./destinator/")
nl=nchar(L)
id=which(substr(L,4,8)=="Radar" & substr(L,nl-2,nl)=="csv")
 
radar_E=NULL
for(i in id) radar_E=rbind(radar_E,ext_radar(L[i]))

(to be honest, if you run that code, you will get several countries, but not France… but if you want to add it, you should be able to do so…). The first tool is based on popups. If you click on a point on the map, you get some information, such as the speed limit where you can find a radar. To get a nice pictogram, use

fileUrl <- "http://evadeo.typepad.fr/.a/6a00d8341c87ef53ef01310f9238e6970c-800wi"
download.file(fileUrl,"radar.png", mode = 'wb')
RadarICON <- makeIcon(  iconUrl = fileUrl,   iconWidth = 20, iconHeight = 20)

And then, use to following code get a dynamic map, mentionning the variable that should be used for the popup

m <- leaflet(data = radar_E) 
m <- m %>% addTiles() 
m <- m %>% addMarkers(~lon, ~lat, icon = RadarICON, popup = ~as.character(type))
m

Because the picture is a bit heavy, with almost 20K points, let us focus only on France,

The other interesting (not to say amazing) tool is a tool that creates clusters.

We have here the number of radars in a neighborhood of the point we see on the map. So far, it looks like polygons used are abitrary…

The code is extremely simple

m <- leaflet(radar_E) 
m <- m %>% addTiles()
m <- m %>% addMarkers(~lon, ~lat, clusterOptions = markerClusterOptions())
m

I still wonder how we can change the polygons (e.g. using more standard administrative regions). But I have to admit that this option is awesome.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Learn R interactively with our new Introduction to R tutorial

By DataCamp

interface

(This article was first published on The DataCamp Blog » R, and kindly contributed to R-bloggers)

At DataCamp we are obsessed with how we can make our R courses and learning technology better. As a result, we’re continuously innovating and nothing stays the same for very long.

Today, we’re proud to announce we’ve made some huge updates to our free Introduction to R tutorial. It’s our most popular course, taken by over 90,000 R enthusiasts (so if you haven’t, you should start today!), and we’re very excited to see everyone’s response to the new version. In total the course now has over a hundred interactive coding challenges and 16 fun videos explaining you the basics of R.

What’s new

Over the past two years we have gotten amazing feedback on the first version of the Introduction to R tutorial. So when we decided to rewrite the course we’ve had a detailed look at all this feedback to find out how best to update and improve the tutorial’s challenges.

One of the major improvements is that the course now makes use of the latest version of our correction system, meaning you will get even more tailored and detailed feedback on the mistakes you make. Every existing exercise was rewritten, clarified, and tested from scratch, and tons of new exercises were added (including an entire chapter on graphics). Furthermore, we decided to add short explainer videos and slides to every chapter to introduce you to the concepts and theory covered.

What you’ll learn

This free introduction to R tutorial will help you master the basics of R. In seven sections, you will cover its basic syntax, making you ready to undertake your own first data analysis using R. Starting from variables and basic operations, you will learn how to handle data structures such as vectors, matrices, lists and data frames. In the final section, you will dive deeper into the graphical capabilities of R, and create your own stunning data visualizations. No prior knowledge in programming or data science is required.

In general, the focus is on actively understanding how to code your way through interesting data science tasks. By using in-browser coding challenges you will experiment with the different aspects of the R language in real time, and you will receive instant and personalized feedback that guides you to the solution.

Loving this new update? Start with our brand new Introduction to R Tutorial, and experience the changes yourself. Don’t forget to let us know what you think about it!

twittergoogle_pluslinkedin

The post Learn R interactively with our new Introduction to R tutorial appeared first on The DataCamp Blog .

To leave a comment for the author, please follow the link and comment on their blog: The DataCamp Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

#MonthOfJulia Day 25: Interfacing with Other Languages

By Andrew Collier

Julia-Logo-Other-Languages

(This article was first published on Exegetic Analytics » R, and kindly contributed to R-bloggers)

Julia has native support for calling C and FORTRAN functions. There are also add on packages which provide interfaces to C++, R and Python. We’ll have a brief look at the support for C and R here. Further details on these and the other supported languages can be found on github.

Why would you want to call other languages from within Julia? Here are a couple of reasons:

  • to access functionality which is not implemented in Julia;
  • to exploit some efficiency associated with another language.

The second reason should apply relatively seldom because, as we saw some time ago, Julia provides performance which rivals native C or FORTRAN code.

C

C functions are called via ccall(), where the name of the C function and the library it lives in are passed as a tuple in the first argument, followed by the return type of the function and the types of the function arguments, and finally the arguments themselves. It’s a bit klunky, but it works!

julia> ccall((:sqrt, "libm"), Float64, (Float64,), 64.0)
8.0

It makes sense to wrap a call like that in a native Julia function.

julia> csqrt(x) = ccall((:sqrt, "libm"), Float64, (Float64,), x);
julia> csqrt(64.0)
8.0

This function will not be vectorised by default (just try call csqrt() on a vector!), but it’s a simple matter to produce a vectorised version using the @vectorize_1arg macro.

julia> @vectorize_1arg Real csqrt;
julia> methods(csqrt)
# 4 methods for generic function "csqrt":
csqrt{T<:Real}(::AbstractArray{T<:Real,1}) at operators.jl:359
csqrt{T<:Real}(::AbstractArray{T<:Real,2}) at operators.jl:360
csqrt{T<:Real}(::AbstractArray{T<:Real,N}) at operators.jl:362
csqrt(x) at none:6

Note that a few extra specialised methods have been introduced and now calling csqrt() on a vector works perfectly.

julia> csqrt([1, 4, 9, 16])
4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0

R

I’ll freely admit that I don’t dabble in C too often these days. R, on the other hand, is a daily workhorse. So being able to import R functionality into Julia is very appealing. The first thing that we need to do is load up a few packages, the most important of which is RCall. There’s great documentation for the package here.

julia> using RCall
julia> using DataArrays, DataFrames

We immediately have access to R’s builtin data sets and we can display them using rprint().

julia> rprint(:HairEyeColor)
, , Sex = Male

       Eye
Hair    Brown Blue Hazel Green
  Black    32   11    10     3
  Brown    53   50    25    15
  Red      10   10     7     7
  Blond     3   30     5     8

, , Sex = Female

       Eye
Hair    Brown Blue Hazel Green
  Black    36    9     5     2
  Brown    66   34    29    14
  Red      16    7     7     7
  Blond     4   64     5     8

We can also copy those data across from R to Julia.

julia> airquality = DataFrame(:airquality);
julia> head(airquality)
6x6 DataFrame
| Row | Ozone | Solar.R | Wind | Temp | Month | Day |
|-----|-------|---------|------|------|-------|-----|
| 1   | 41    | 190     | 7.4  | 67   | 5     | 1   |
| 2   | 36    | 118     | 8.0  | 72   | 5     | 2   |
| 3   | 12    | 149     | 12.6 | 74   | 5     | 3   |
| 4   | 18    | 313     | 11.5 | 62   | 5     | 4   |
| 5   | NA    | NA      | 14.3 | 56   | 5     | 5   |
| 6   | 28    | NA      | 14.9 | 66   | 5     | 6   |

rcopy() provides a high-level interface to function calls in R.

julia> rcopy("runif(3)")
3-element Array{Float64,1}:
 0.752226
 0.683104
 0.290194

However, for some complex objects there is no simple way to translate between R and Julia, and in these cases rcopy() fails. We can see in the case below that the object of class lm returned by lm() does not diffuse intact across the R-Julia membrane.

julia> "fit <- lm(bwt ~ ., data = MASS::birthwt)" |> rcopy
ERROR: `rcopy` has no method matching rcopy(::LangSxp)
 in rcopy at no file
 in map_to! at abstractarray.jl:1311
 in map_to! at abstractarray.jl:1320
 in map at abstractarray.jl:1331
 in rcopy at /home/colliera/.julia/v0.3/RCall/src/sexp.jl:131
 in rcopy at /home/colliera/.julia/v0.3/RCall/src/iface.jl:35
 in |> at operators.jl:178

But the call to lm() was successful and we can still look at the results.

julia> rprint(:fit)

Call:
lm(formula = bwt ~ ., data = MASS::birthwt)

Coefficients:
(Intercept)          low          age          lwt         race  
    3612.51     -1131.22        -6.25         1.05      -100.90  
      smoke          ptl           ht           ui          ftv  
    -174.12        81.34      -181.95      -336.78        -7.58 

You can use R to generate plots with either the base functionality or that provided by libraries like ggplot2 or lattice.

julia> reval("plot(1:10)");                # Will pop up a graphics window...
julia> reval("library(ggplot2)");
julia> rprint("ggplot(MASS::birthwt, aes(x = age, y = bwt)) + geom_point() + theme_classic()")
julia> reval("dev.off()")                  # ... and close the window.

Watch the videos below for some other perspectives on multi-language programming with Julia. Also check out the complete code for today (including examples with C++, FORTRAN and Python) on github.



The post #MonthOfJulia Day 25: Interfacing with Other Languages appeared first on Exegetic Analytics.

To leave a comment for the author, please follow the link and comment on their blog: Exegetic Analytics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Combining Choropleth Maps and Reference Maps in R

By Ari Lamstein

(This article was first published on AriLamstein.com » R, and kindly contributed to R-bloggers)

Recent updates to my mapping packages now make it easy to combine choropleth maps and reference maps in R. All you have to do is pass the parameter reference_map = TRUE to the existing functions. This should “just work”, regardless of which region you zoom in on or what data you display. The following table shows the affected functions and their packages.

Map Function Package
US States state_choropleth choroplethr
US Counties county_choropleth choroplethr
US ZIP Codes zip_choropleth choroplethrZip
California Census Tracts ca_tract_choropleth choroplethrCaCensusTract

If you want to learn more about how to use these packages, sign up for my free email course Learn to Map Census Data in R.

Install the Packages

Here is how to get the version of the packages that have these changes:


install.packages("choroplethr")

# install.packages("devtools")
library(devtools)
install_github('arilamstein/choroplethrZip@v1.4.0')
install_github("arilamstein/choroplethrCaCensusTract@v1.1.0")

In my experience reference maps provide the most value when viewing small regions. So let’s start with viewing the smallest geographic unit that my packages support: Census Tracts.

Census Tracts

Consider this choropleth map which shows income estimates of census tracts in Los Angeles county from 2013:

Some natural questions this map raises are “What is that large tract in the northeast?” and “Why is Los Angeles county discontiguous?” Both of these questions can be easily answered by combining the choropleth map with a reference map:

The large tract in the northeast is a forest, and Los Angeles is discontiguous because it has two large islands. Here is the code to create those maps:

library(choroplethrCaCensusTract)
data(df_ca_tract_demographics)
df_ca_tract_demographics$value = df_ca_tract_demographics$per_capita

ca_tract_choropleth(df_ca_tract_demographics,
    title       = "2013 Los Angeles Census Tractn Per Capita Income",
    legend      = "Income",
    num_colors  = 4,
    county_zoom = 6037)

ca_tract_choropleth(df_ca_tract_demographics,
    title         = "2013 Los Angeles Census Tractn Per Capita Income",
    legend        = "Income",
    num_colors    = 4,
    county_zoom   = 6037,
    reference_map = TRUE)

ZIP Codes

Consider this choropleth which shows income estimates of Manhattan Zip Code Tabulated Areas (ZCTAs) in 2013:

manhattan-zip-1

Someone not familiar with Manhattan’s geography might wonder what the dark neighborhood on the east is, and what the light neighborhood in the north is. Combining the choropleth map with a reference map answers those questions.

manhattan-zip-2

The low-income neighborhood in the north is Harlem, and the high income neighborhood in the east is the Upper East Side.

Here is the source code to create those maps:


library(choroplethrZip)
data(df_zip_demographics)
df_zip_demographics$value = df_zip_demographics$per_capita_income

zip_choropleth(df_zip_demographics,
    title       = "2013 Manhattan ZIP Code Income Estimates",
    legend      = "Per Capita Income",
    county_zoom = 36061)

zip_choropleth(df_zip_demographics,
    title         = "2013 Manhattan ZIP Code Income Estimates",
    legend        = "Per Capita Income",
    county_zoom   = 36061,
    reference_map = TRUE)

Counties

Reference maps can also be useful when viewing county choropleths. Consider this map which shows county populations in California:

ca-county-1

A common question when viewing this map is “What is the large, low-population county on the eastern part of the state?” Adding a reference map allows us to easily answer the question:

ca-county-2

The county in question contains Death Valley – the hottest, driest and lowest point in North America.

Here is the code to produce those maps:

library(choroplethr)
data(df_pop_county)
county_choropleth(df_pop_county,
    title      = "2012 California County Population Estimates",
    legend     = "Population",
    state_zoom = "california")
 
county_choropleth(df_pop_county,
    title         = "2012 California County Population Estimates",
    legend        = "Population",
    state_zoom    = "california",
    reference_map = TRUE)

States

You can also combine choropleth maps with reference maps at the state level:

state-1state-2

At this time you cannot make reference maps with maps that contain insets, such as maps of the 50 US states. Here is the code to produce those maps:

library(choroplethr)
data(df_pop_state)
data(continental_us_states)

state_choropleth(df_pop_state,
    title  = "2012 State Population Estimates",
    legend = "Population",
    zoom   = continental_us_states)

state_choropleth(df_pop_state,
    title         = "2012 State Population Estimates",
    legend        = "Population",
    zoom          = continental_us_states,
    reference_map = TRUE)

More Information

If you are curious about how this code works, then look at the function render_with_reference_map. If you have technical questions, the best place to ask is the choroplethr google group.

LEARN TO MAP CENSUS DATA
Subscribe and get my free email course: Mapping Census Data in R!
100% Privacy. We don’t spam.

The post Combining Choropleth Maps and Reference Maps in R appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: AriLamstein.com » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

For Some Definition of “Directly” and/or “Contort”

By hrbrmstr

6a00d8341e992c53ef01b7c7d60168970b-300wi

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

Junk Charts did a post on Don’t pick your tool before having your design and made a claim that this:

“cannot be produced directly from a tool (without contorting your body in various painful locations)”.

I beg to differ.

With R & ggplot2, I get to both pick my tool and design at the same time since I have a very flexible and multi-purpose tool. I also don’t believe that the code below qualifies as “contortions”, though I am a ggplot2 fanboi. It’s no different than Excel folks clicking on radio buttons and color pickers, except my process is easily repeatable & scalable once finalized (this is not finalized as it’s not 100% parameterized but it’s not difficult to do that last part).

library(ggplot2)
 
dat <- data.frame(year=2010:2015,
                  penalties=c(627, 625, 653, 617, 661, 730))
 
avg <- data.frame(val=mean(head(dat$penalties, -1)),
                  last=dat$penalties[6],
                  lab="5-YrnAvg")
 
gg <- ggplot(dat, aes(x=year, y=penalties))
gg <- gg + geom_point()
gg <- gg + scale_x_continuous(breaks=c(2010, 2014, 2015), limits=c(NA, 2015.1))
gg <- gg + scale_y_continuous(breaks=c(600, 650, 700, 750), 
                              limits=c(599, 751), expand=c(0,0))
gg <- gg + geom_segment(data=avg, aes(x=2010, xend=2015, y=val, yend=val), linetype="dashed")
gg <- gg + geom_segment(data=avg, aes(x=2015, xend=2015, y=val, yend=last), color="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=val), shape=4, size=3)
gg <- gg + geom_text(data=avg, aes(x=2015, y=val), label="5-YrnAvg", size=2.5, hjust=-0.3)
gg <- gg + geom_point(data=avg, aes(x=2015, y=700), shape=17, col="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=730), shape=4, size=3)
gg <- gg + labs(x=NULL, y="Number of Penalties", 
                title="NFL Penalties Jumped 15% in thenFirst 3 Weeks of the 2015 Seasonn")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.grid.major.x=element_blank())
gg <- gg + theme(panel.grid.major.y=element_line(color="white"))
gg <- gg + theme(panel.background=element_rect(fill="#f3f2f7"))
gg <- gg + theme(axis.ticks=element_blank())
gg

forblog-1

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Rebuilding Map Example With Apply Functions

By Christoph Safferling

(This article was first published on Opiate for the masses, and kindly contributed to R-bloggers)

Yesterday Hadley’s functional programming package purrr was published to CRAN. It is designed to bring convenient functional programming paradigma and add another data manipulation framework for R.

“Where dplyr focusses on data frames, purrr focusses on vectors” – Hadley Wickham in a blogpost

The core of the package consists of map functions, which operate similar to base apply functions. So I tried to rebuild the first example used to present the map function in the blog and Github repo with apply. The example looks like this:

library(purrr)

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

Let’s do it with apply now:

mtcars %>% 
  split(.$cyl) %>% 
  lapply(function(x) lm(mpg~wt, data = x)) %>% 
  lapply(summary) %>% 
  sapply(function(x) x$r.squared)
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

As you can see, map works with 3 different inputs – function names (exactly like apply), anonymous function as formula and character to select elements. Apply on the other hand only accepts functions, but with a little piping voodoo we can also shortcut the anonymous functions like this:

mtcars %>% 
  split(.$cyl) %>% 
  lapply(. %>% lm(mpg~wt, data = .)) %>% 
  lapply(summary) %>% 
  sapply(. %>% .$r.squared)
##         4         6         8 
## 0.5086326 0.4645102 0.4229655

Which is almost as appealing as the map alternative. Checking the speed of both approaches also reveals no significant time differences:

library(microbenchmark)

map <- function() {mtcars %>% split(.$cyl) %>% map(~ lm(mpg ~ wt, data = .)) %>% map(summary) %>% map_dbl("r.squared")}
apply <- function() {mtcars %>% split(.$cyl) %>% lapply(. %>% lm(mpg~wt, data = .)) %>% lapply(summary) %>% sapply(. %>% .$r.squared)}

microbenchmark(map, apply)
## Unit: nanoseconds
##   expr min lq   mean median uq   max neval
##    map  14 27  31.69     27 27   513   100
##  apply  14 27 281.34     27 27 24556   100

Now don’t get me wrong, I don’t want to say, that purrr is worthless or reduntant. I just picked the most basic function of the package and explained what it does by rewriting with apply functions. The map function and their derivatives are convenient alternatives to apply in my eyes without computational overhead. Furthermore the package offers very interesting functions like zip_n and lift. If you love apply and word with lists, you definitely should check out purrr.

Rebuilding Map Example With Apply Functions was originally published by Kirill Pomogajko at Opiate for the masses on September 30, 2015.

To leave a comment for the author, please follow the link and comment on their blog: Opiate for the masses.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Unbalanced Data Is a Problem? No, BALANCED Data Is Worse

By matloff

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

Say we are doing classification analysis with classes labeled 0 through m-1. Let Ni be the number of observations in class i. There is much handwringing in the machine learning literature over situations in which there is a wide variation among the Ni. I will argue here, though, that the problem is much worse in the case in which there is — artificially — little or no variation among those sample sizes.

To simplify matters, in what follows I will take m = 2, with the population class probabilities denoted by p and 1-p. Let Y be 1 or 0, according to membership in Class 1 or 0, and let X be the vector of v predictor variables.

First, what about this problem of lack of balance? If your data are a random sample from the target population, then the lack of balance is natural if p is near 0 or 1, and there really isn’t much you can do about it, short of manufacturing data. (Some have actually proposed that, in various forms.) And with a parametric model, say a logit, you may do fairly well if the model is pretty accurate over the range of X. To be sure, the lack of balance may result in substantial within-class misclassification rates even if the overall rate is low. One can try different weightings and the like, but one is pretty much stuck with it.

But at least in this unbalanced situation, you will get consistent estimators of the regression function P(Y = 1 | X = t), as the sample size grows. That’s not true for what I will call the artificially balanced case. Here the Ni are typically the same or nearly so, and arise from our doing separate samplings of each of the classes. Clearly we cannot estimate p in this case, and it matters. Here’s why.

By an elementary derivation, we have that (at the population level)

P(Y | X = t) = 1 / (1 + [(1-p)/p] [f(t)/g(t)]) Eqn. (1)

where f and g are the densities of X within Classes 0 and 1. Consider the logistic model. Equation (1) implies that

β0 + β1 t1 + … + βv tv = -ln[(1-p)/p] – ln[f(t)/g(t)] Eqn. (2)

From this you can see that

β0 = -ln[(1-p)/p], Eqn. (3)

which in turn implies that if the sample sizes are chosen artificially, then our estimate of β0 in the output of R’s glm() function (or any other code for logit) will be wrong. If our goal is Prediction, this will cause a definite bias. And worse, it will be a permanent bias, in the sense that we will not have consistent estimates as the sample size grows.

So, arguably the problem of (artificially) balanced data is worse than the unbalanced case.

The remedy is easy, though. Equation (2) shows that even with the artificially balanced sampling scheme, our estimates of βi WILL be consistent for i > 0 (since the within-class densities of X won’t change due to the sampling scheme). So, if we have an external estimate of p, we can just substitute it in Equation (3) to get the right intercept term, and then happily do our future classifications.

As an example, consider the UCI Letters data set. There, the various English letters have approximately equal sample sizes, quite counter to what we know about English. But there are good published sources for the true frequencies.

Now, what if we take a nonparametric regression approach? We can still use Equation (1) to make the proper adjustment. For each t at which we wish to predict class membership, we do the following:

  • Estimate the left-hand side (LHS) of (1) nonparametrically, using any of the many methods on CRAN, or the version of kNN in my regtools package.
  • Solve for the estimated ratio f(t)/g(t).
  • Plug back into (1), this time with the correct value of (1-p)/p from the external source, now providing the correct value of the LHS.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

ThinkToStart is looking for R Authors

By Julian Hillebrand

Rlogo

(This article was first published on ThinkToStart » R Tutorials, and kindly contributed to R-bloggers)

ThinkToStart is growing and we need more people who help us on our mission to bridge the gap between data and business. Therefore we are looking for motivated R fans that help us to produce more great content.

What is ThinkToStart?

ThinkToStart started as a blog mainly about R tutorials a little bit more than 2 years ago. After some time I realised that there is a huge need not just for hands-on R tutorials but also for putting them in a business context. So I am currently transforming ThinkToStart from a blog to a platform. I wrote an article about the mission and you can find it here: http://thinktostart.com/thinktostart-bridging-gap-data-business/

Reaching the goal of connecting business and data is not a task that one person can achieve alone. For that it needs several people working together to spread this message. At the moment ThinkToStart is in the phase of finding the way how to communicate this message. So we try to work on both ends: The data science side with R tutorials and business topics with the new Social Media category.

R Tutorials:

R tutorials can basically be about everything. I personally focus heavily on Social Media but I can think of endless topics where R tutorials are needed.

We can offer you the chance to position yourself as a brand and present you even more as an expert for R and data science in general. You can link to all your private social media accounts or websites in your public author profile and you can of course tell everybody you are part of ThinkToStart. Normally a lot of readers want to find out more about an author of an article. Because of the focus on different categories we can also present our work to people who visit the site for example because of a new social media post. Think of somebody visiting the site because of a new post about how effective Facebook is for marketing. And then he or she gets offered a real hands on tutorial with R exactly about this topic. This is how we can show other people the power of R and data science.

We are really excited to get to know you!

If you are interested please drop us line here:

[contact-form-7]

The post ThinkToStart is looking for R Authors appeared first on ThinkToStart.

To leave a comment for the author, please follow the link and comment on their blog: ThinkToStart » R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

purrr 0.1.0

By hadleywickham

(This article was first published on RStudio Blog, and kindly contributed to R-bloggers)

Purrr is a new package that fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr. Like many of my recent packages, it works with magrittr to allow you to express complex operations bying combining simple pieces in a standard way.

Install it with:

install.packages("purrr")

Purrr wouldn’t be possible without Lionel Henry. He wrote a lot of the package and his insightful comments helped me rapidly iterate towards a stable, useful, and understandable package.

Map functions

The core of purrr is a set of functions for manipulating vectors (atomic vectors, lists, and data frames). The goal is similar to dplyr: help you tackle the most common 90% of data manipulation challenges. But where dplyr focusses on data frames, purrr focusses on vectors. For example, the following code splits the built-in mtcars dataset up by number of cylinders (using the base split() function), fits a linear model to each piece, summarises each model, then extracts the the (R^2):

mtcars %>%
  split(.$cyl) %>%
  map(~lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")
#>     4     6     8 
#> 0.509 0.465 0.423

The first argument to all map functions is the vector to operate on. The second argument, .f specifies what to do with each piece. It can be:

  • A function, like summary().
  • A formula, which is converted to an anonymous function, so that ~ lm(mpg ~ wt, data = .) is shorthand for function(x) lm(mpg ~ wt, data = x).
  • A string or number, which is used to extract components, i.e. "r.squared" is shorthand for function(x) x[[r.squared]] and 1 is shorthand for function(x) x[[1]].

Map functions come in a few different variations based on their inputs and output:

  • map() takes a vector (list or atomic vector) and returns a list. map_lgl(), map_int(), map_dbl(), and map_chr() take a vector and return an atomic vector. flatmap() works similarly, but allows the function to return arbitrary length vectors.
  • map_if() only applies .f to those elements of the list where .p is true. For example, the following snippet converts factors into characters:
    iris %>% map_if(is.factor, as.character) %>% str()
    #> 'data.frame':    150 obs. of  5 variables:
    #>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    #>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    #>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    #>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    #>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

    map_at() works similarly but instead of working with a logical vector or predicate function, it works with a integer vector of element positions.

  • map2() takes a pair of lists and iterates through them in parallel:
    map2(1:3, 2:4, c)
    #> [[1]]
    #> [1] 1 2
    #> 
    #> [[2]]
    #> [1] 2 3
    #> 
    #> [[3]]
    #> [1] 3 4
    map2(1:3, 2:4, ~ .x * (.y - 1))
    #> [[1]]
    #> [1] 1
    #> 
    #> [[2]]
    #> [1] 4
    #> 
    #> [[3]]
    #> [1] 9

    map3() does the same thing for three lists, and map_n() does it in general.

  • invoke(), invoke_lgl(), invoke_int(), invoke_dbl(), and invoke_chr() take a list of functions, and call each one with the supplied arguments:
    list(m1 = mean, m2 = median) %>%
      invoke_dbl(rcauchy(100))
    #>    m1    m2 
    #> 9.765 0.117
  • walk() takes a vector, calls a function on piece, and returns its original input. It’s useful for functions called for their side-effects; it returns the input so you can use it in a pipe.

Purrr and dplyr

I’m becoming increasingly enamoured with the list-columns in data frames. The following example combines purrr and dplyr to generate 100 random test-training splits in order to compute an unbiased estimate of prediction quality. These tools are still experimental (and currently need quite a bit of extra scaffolding), but I think the basic approach is really appealing.

library(dplyr)
random_group <- function(n, probs) {
  probs <- probs / sum(probs)
  g <- findInterval(seq(0, 1, length = n), c(0, cumsum(probs)),
    rightmost.closed = TRUE)
  names(probs)[sample(g)]
}
partition <- function(df, n, probs) {
  n %>% 
    replicate(split(df, random_group(nrow(df), probs)), FALSE) %>%
    zip_n() %>%
    as_data_frame()
}

msd <- function(x, y) sqrt(mean((x - y) ^ 2))

# Genearte 100 random test-training splits, 
cv <- mtcars %>%
  partition(100, c(training = 0.8, test = 0.2)) %>% 
  mutate(
    # Fit the model
    model = map(training, ~ lm(mpg ~ wt, data = .)),
    # Make predictions on test data
    pred = map2(model, test, predict),
    # Calculate mean squared difference
    diff = map2(pred, test %>% map("mpg"), msd) %>% flatten()
  )
cv
#> Source: local data frame [100 x 5]
#> 
#>                   test             training   model     pred  diff
#>                 (list)               (list)  (list)   (list) (dbl)
#> 1  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.70
#> 2  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.03
#> 3  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.29
#> 4  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  4.88
#> 5  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.20
#> 6  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  4.68
#> 7  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.39
#> 8  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.82
#> 9  <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  2.56
#> 10 <data.frame [7,11]> <data.frame [25,11]> <S3:lm> <dbl[7]>  3.40
#> ..                 ...                  ...     ...      ...   ...
mean(cv$diff)
#> [1] 3.22

Other functions

There are too many other pieces of purrr to describe in detail here. A few of the most useful functions are noted below:

  • zip_n() allows you to turn a list of lists “inside-out”:

    x <- list(list(a = 1, b = 2), list(a = 2, b = 1))
    x %>% str()
    #> List of 2
    #>  $ :List of 2
    #>   ..$ a: num 1
    #>   ..$ b: num 2
    #>  $ :List of 2
    #>   ..$ a: num 2
    #>   ..$ b: num 1
    
    x %>%
      zip_n() %>%
      str()
    #> List of 2
    #>  $ a:List of 2
    #>   ..$ : num 1
    #>   ..$ : num 2
    #>  $ b:List of 2
    #>   ..$ : num 2
    #>   ..$ : num 1
    
    x %>%
      zip_n(.simplify = TRUE) %>%
      str()
    #> List of 2
    #>  $ a: num [1:2] 1 2
    #>  $ b: num [1:2] 2 1
  • keep() and discard() allow you to filter a vector based on a predicate function. compact() is a helpful wrapper that throws away empty elements of a list.
    1:10 %>% keep(~. %% 2 == 0)
    #> [1]  2  4  6  8 10
    1:10 %>% discard(~. %% 2 == 0)
    #> [1] 1 3 5 7 9
    
    list(list(x = TRUE, y = 10), list(x = FALSE, y = 20)) %>%
      keep("x") %>% 
      str()
    #> List of 1
    #>  $ :List of 2
    #>   ..$ x: logi TRUE
    #>   ..$ y: num 10
    
    list(NULL, 1:3, NULL, 7) %>% 
      compact() %>%
      str()
    #> List of 2
    #>  $ : int [1:3] 1 2 3
    #>  $ : num 7
  • lift() (and friends) allow you to convert a function that takes multiple arguments into a function that takes a list. It helps you compose functions by lifting their domain from a kind of input to another kind. The domain can be changed to and from a list (l), a vector (v) and dots (d).
  • cross2(), cross3() and cross_n() allow you to create the Cartesian product of the inputs (with optional filtering).
  • A number of functions let you manipulate functions: negate(), compose(), partial().
  • A complete set of predicate functions provides predictable versions of the is.* functions: is_logical(), is_list(), is_bare_double(), is_scalar_character(), etc.
  • Other equivalents functions wrap existing base R functions into to the consistent design of purrr: replicate() -> rerun(), Reduce() -> reduce(), Find() -> detect(), Position() -> detect_index().

Design philosophy

The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R.

  • Anonymous functions are verbose in R, so we provide two convenient shorthands. For predicate functions, ~ .x + 1 is equivalent to function(.x) .x + 1. For chains of transformations functions, . %>% f() %>% g() is equivalent to function(.) . %>% f() %>% g().
  • R is weakly typed, so we can implement general zip_n(), rather than having to specialise on the number of arguments. That said, we still provide map2() and map3() since it’s useful to clearly separate which arguments are vectorised over. Functions are designed to be output type-stable (respecting Postel’s law) so you can rely on the output being as you expect.
  • R has named arguments, so instead of providing different functions for minor variations (e.g. detect() and detectLast()) we use a named arguments.
  • Instead of currying, we use ... to pass in extra arguments. Arguments of purrr functions always start with . to avoid matching to the arguments of .f passed in via ....
  • Instead of point free style, use the pipe, %>%, to write code that can be read from left to right.

To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Gallery of Images from Boston EARL Conference

By Angela Roberts

EARL-London-111

(This article was first published on Mango Solutions, and kindly contributed to R-bloggers)

Were you at the EARL London R Conference this year? See if you can find yourself in our selection of images of this popular event.


EARL-London-109
EARL-London-108
EARL-London-106
EARL-London-103
EARL-London-102
EARL-London-101
EARL-London-100
EARL-London-90
EARL-London-91
EARL-London-92
EARL-London-94
EARL-London-95
EARL-London-96
EARL-London-97
EARL-London-98
EARL-London-99
EARL-London-88
EARL-London-87
EARL-London-74
EARL-London-73
EARL-London-72
EARL-London-68
EARL-London-67
EARL-London-64
EARL-London-62 - Copy
EARL-London-41 - Copy
EARL-London-42 - Copy
EARL-London-45 - Copy
EARL-London-46 - Copy
EARL-London-48 - Copy
EARL-London-51 - Copy
EARL-London-56
EARL-London-58
EARL-London-60 - Copy
EARL-London-39 - Copy
EARL-London-38 - Copy
EARL-London-33 - Copy
EARL-London-24 - Copy
EARL-London-20 - Copy
EARL-London-19 - Copy
EARL-London-12 - Copy
EARL-London-11 - Copy
EARL-London-51
EARL-London-7
EARL-London-1
EARL-London-4 - Copy
EARL-London-5 - Copy
EARL-London-9 - Copy
EARL-London-30
EARL-London-49
EARL-London-50
EARL-London-47
EARL-London-85
EARL-London-100
EARL-London-32
EARL-London-45
EARL-London-30
EARL-London-16
EARL-London-8
EARL-London-3
EARL-London-68
EARL-London-109
EARL-London-105
EARL-London-100

Look good? Wish you were there? Still a chance to come to EARL Boston on 2-4 November 2015.

This BOSTON EARL 2015 conference will be held at the Microsoft NERD Exhibition Center in Cambridge Massachusetts. With stunning views over the Charles River and a great location within Cambridge, it is the perfect venue for a city conference. The Conference Reception will take place on Tuesday 3rd November in the Boston Museum of Science.

With over 40 speakers, 4 workshops and 2 evening drinks receptions, don’t miss out.

Take a look here

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News