RcppGetconf 0.0.2

By Thinking inside the box

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

A first update for the recent RcppGetconf package for reading system configuration — not unlike getconf from the libc library — is now out. Almost immediately after I tweeted / blogged asking for help with OS X builds, fellow Rcpp hacker Qiang Kou created a clever pull request allowing for exactly that. So now we cover two POSIX systems that matter most these days — Linux and OS X — but as there are more out there, please do try, test and send those pull requests.

We also added a new function getConfig() retrieving a single (given) value complementing the earlier catch-all function getAll(). You can find out more about RcppGetconf from the local RcppGetconf page and the GitHub repo.

Changes in this release are as follows:

Changes in inline version 0.0.2 (2016-08-01)

  • A new function getConfig for single values was added.

  • The struct vars is now defined more portable allowing compilation on OS X (PR #1 by Qiang Kou).

Courtesy of CRANberries, there is a diffstat report. More about the package is at the local RcppGetconf page and the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Using parameter and multiparameters with sp_execute_external_script

By R on Guangchuang Yu

(This article was first published on R – TomazTsql, and kindly contributed to R-bloggers)

With RTM version of SQL Server 2016, sp_execute_external_script stored procedure has undergone couple of changes prior to it’s final outlook. The parametrization of this external procedure somehow resembles a typical extended stored procedure.

Indeed, sp_execute_external_script is an extended stored procedure written using CLR (whereas stored procedures are natively written in T-SQL) and their main purpose it that they run external commands that a normal T-SQL stored procedure could not handle.

Those who are (have been) working with any kind of external stored procedure or stored procedure using

AS { EXTERNAL NAME assembly_name.class_name.method_name }

you will be familiar with the sp_execute_external_script notation.

  EXECUTE sys.sp_execute_external_script
          @language = 
         ,@script = 
         ,@input_data_1 = 
         ,@input_data_1_name =
         ,@output_data_1_name =
         ,@parallel =
         ,@params = 
         ,@parameter1 =

Parameters @params and @parameter1 are interesting, but what might be a bit puzzling are numbers at the end of the names of @input_data_1, @input_data_1_name,… They have no technical meaning (as far as I have found out) since they don’t enumerate anything and if you by common sense create @input_data_2 parameter, you will get an error in return. In a way this error would have been expected, since joining two SQL Statements into one R dataset would just be nonsense. It is more likely that numbers just denote data columns or data parameters can be enumerated within string value of a particular input parameter and that you need at least one of the items if you are using this parameter.

So parameters with enumerator number in the names, these parameters can hold more values and both parameters @params and @parameter1 are paired:

@params is list of input parameter declarations and

@parameter1 is list of values for the input parameters

just like for @input_data_1 and @input_data_1_name parameters.

Simple example would be getting Chi-Square value and statistical significance in one run out of R:

USE WideWorldImporters;
GO

DECLARE @F_Value VARCHAR(1000)
DECLARE @Signif VARCHAR(1000)


  EXECUTE sys.sp_execute_external_script
          @language = N'R'
         ,@script = N'mytable <- table(WWI_OrdersPerCustomer$CustomerID, WWI_OrdersPerCustomer$Nof_Orders) 
                     data.frame(margin.table(mytable, 2))
                     Ch <- unlist(chisq.test(mytable))
                     F_Val <- as.character(Ch[1])
                     Sig <- as.character(Ch[3])'
         ,@input_data_1 = N'select TOP 10 CustomerID, count(*) as Nof_Orders 
from [Sales].[Orders] GROUP BY CustomerID'
         ,@input_data_1_name = N'WWI_OrdersPerCustomer'
         ,@params = N' @F_Val VARCHAR(1000) OUTPUT, @Sig VARCHAR(1000) OUTPUT'
         ,@F_Val = @F_Value OUTPUT
         ,@Sig = @Signif OUTPUT


SELECT 
       @F_Value AS CHI_Value
      ,@Signif AS CHI_Square_SIGNIFICANCE;
GO

With @param and @parameter1 I was able to get two separate values from a list of a statistical test (against some sample data) in one run. Of course, the result of unlist function can be added to data.frame and easier parsed but what if I wanted to have data displayed as a frequencies and also test of statistical significance, I can simply do:

USE WideWorldImporters;
GO

DECLARE @F_Value VARCHAR(1000)
DECLARE @Signif VARCHAR(1000)


  EXECUTE sys.sp_execute_external_script
          @language = N'R'
         ,@script = N'mytable <- table(WWI_OrdersPerCustomer$CustomerID, 
WWI_OrdersPerCustomer$Nof_Orders) 
                     data.frame(margin.table(mytable, 2))
                     Ch <- unlist(chisq.test(mytable))
                     F_Val <- as.character(Ch[1])
                     Sig <- as.character(Ch[3])
                     OutputDataSet<-data.frame(margin.table(mytable, 2))'
         ,@input_data_1 = N'select TOP 10 CustomerID, count(*) as Nof_Orders 
from [Sales].[Orders] GROUP BY CustomerID'
         ,@input_data_1_name = N'WWI_OrdersPerCustomer'
         ,@params = N' @F_Val VARCHAR(1000) OUTPUT, @Sig VARCHAR(1000) OUTPUT'
         ,@F_Val = @F_Value OUTPUT 
         ,@Sig = @Signif OUTPUT
 WITH RESULT SETS(
                  (Cust_data INT
                  ,Freq INT)
                  )

SELECT @F_Value AS CHI_Value
    ,@Signif AS CHI_Square_SIGNIFICANCE

As you can see, there is result set clauses added and R script has 3 outputs defined; 1 for the data.frame output and 2 variables through parameters for statistical significance; as shown on print-screen:

Such export of the results is always very useful. In Reporting Services, in Power BI or simply in SSMS when running the resulsts.

Code available at Github.

Happy R-SQLing!

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

7 new R jobs from around the world (2016-08-01)

By Tal Galili

r_jobs

Here are the new R Jobs for 2016-08-01.

To post your R job on the next post

Just visit this link and post a new R job to the R community. You can either post a job for free (which works great), or pay $50 to have your job featured (and get extra exposure).

Current R jobs

Job seekers: please follow the links below to learn more and apply for your R job of interest:

New Featured Jobs

  1. Full-Time
    Data Scientist – Analytics
    Booking.com – Posted by work_at_booking
    Anywhere
    1 Aug2016
  2. Full-Time
    Economic Analyst
    Thumbtack – Posted by Thumbtack
    Anywhere
    26 Jul2016
  3. Full-Time
    Research Analyst
    National Network for Safe Communities – Posted byNNSCjobs
    New York
    New York, United States
    21 Jul2016
  4. Full-Time
    Research and Analytics Associate
    Hodges Ward Elliott – Posted by tkiely@hwerealestate.com
    New York
    New York, United States
    20 Jul2016
  5. Full-Time
    Monitoring and Data Analytics Officer
    International Labour Organization – Better Work programme – Posted by eisenbraun
    Genève
    Genève, Switzerland
    20 Jul2016
  6. Full-Time
    Junior Data Scientist @ Enkhuizen, Noord-Holland, Netherlands
    Enza Zaden – Posted by EnzaZaden
    Enkhuizen
    Noord-Holland, Netherlands
    18 Jul2016
  7. Full-Time
    Data Analytics Associate
    Income Discovery – Posted by ANunan
    Hoboken
    New Jersey, United States
    11 Jul2016
  8. Full-Time
    R Programming Rock Star (10080)
    Object Systems International – Posted bysnielsen@objectsystems.com
    Salt Lake City
    Utah, United States
    7 Jul2016
  9. Full-Time
    Data Scientist / Quantitative Analyst
    Sporting Data Limited – Posted by sportingdata
    London
    England, United Kingdom
    27 Jun2016
  10. Full-Time
    Senior Data Scientist
    Global Strategy Group – Posted by datanorms
    New York
    New York, United States
    20 Jun2016

More New Jobs

  1. Full-Time
    Data Scientist – Analytics
    Booking.com – Posted by work_at_booking
    Anywhere
    1 Aug2016
  2. Full-Time
    Economic Analyst
    Thumbtack – Posted by Thumbtack
    Anywhere
    26 Jul2016
  3. Full-Time
    Research Analyst
    National Network for Safe Communities – Posted byNNSCjobs
    New York
    New York, United States
    21 Jul2016
  4. Full-Time
    Research and Analytics Associate
    Hodges Ward Elliott – Posted by tkiely@hwerealestate.com
    New York
    New York, United States
    20 Jul2016
  5. Full-Time
    Monitoring and Data Analytics Officer
    International Labour Organization – Better Work programme – Posted by eisenbraun
    Genève
    Genève, Switzerland
    20 Jul2016
  6. Full-Time
    Worldwide CoC Team –Data Scientist and Managing Strategy Consultant
    IBM – Posted by andywxman
    Anywhere
    20 Jul2016
  7. Full-Time
    Director of Quantitative Analysis & Research @ Phoenix, Arizona, United States
    Arizona State Retirement System – Posted by tracyd
    Phoenix
    Arizona, United States
    19 Jul2016

In R-users.com you can see all the R jobs that are currently available.

R-users Resumes

R-users also has a resume section which features CVs from over 200 R users. You can submit your resume (as a “job seeker”) or browse the resumes for free.

(you may also look at previous R jobs posts).

Source:: R News

Sportsbook Betting (Part 1): Odds

By Andrew Collier

odds-tennis

(This article was first published on R – Exegetic Analytics, and kindly contributed to R-bloggers)


This series of articles was written as support material for Statistics exercises in a course that I’m teaching for iXperience. In the series I’ll be using illustrative examples for wagering on a variety of Sportsbook events including Horse Racing, Rugby and Tennis. The same principles can be applied across essentially all betting markets.

Odds

To make some sense of gambling we’ll need to understand the relationship between odds and probability. Odds can be expressed as either “odds on” or “odds against”. Whereas the former is the odds in favour of an event taking place, the latter reflects the odds that an event will not happen. Odds against is the form in which gambling odds are normally expressed, so we’ll focus on that. The odds against are defined as the ratio, L/W, of losing outcomes (L) to winning outcomes (W). To link these odds to probabilities we note that the winning probability, p, is W/(L+W). The odds against are thus equivalent to (1-p)/p.

To make this more concrete, consider the odds against rolling a 6 with a single die. The number of losing outcomes is L = 5 (for all of the other numbers on the die: 1, 2, 3, 4 and 5) while the number of winning outcomes is W = 1. The odds against are thus 5/1, while the winning probability is 1/(5+1) = 1/6.

Fractional Odds

Fractional odds are quoted as L/W, L:W or L-W. From a gambler’s perspective these odds reflect the net winnings relative to the stake. For example, fractional odds of 5/1 imply that the gambler stands to make a profit of 50 on a stake of 10. In addition to the profit, a winning gambler gets the stake back too. So, in the previous scenario, the gambler would receive a total of 60. Conversely, factional odds of 1/2 would pay out 10 for a stake of 20. Odds of 1/1 are known as “even odds” or “even money”, and will pay out the same amount as was wagered.

The numerator and denominator in fractional odds are always integers.

In a fair game a player who placed a wager at fractional odds of L/W would reasonably expect to win L/W times his wager.

Decimal Odds

Decimal odds quote the ratio of the full payout (including original stake) to the stake. Using the same symbols as above, this is equivalent to the ratio (L+W)/W or 1+L/W. The decimal odds are numerically equal to the fractional odds plus 1. In a fair game the decimal odds are also the inverse of the probability of a winning outcome. This makes sense because the inverse of the decimal odds is W/(L+W).

From a gambler’s perspective, decimal odds reflect the gross total which will be paid out relative to the stake. For example, decimal odds of 6.0 are equivalent to fractional odds of 5/1 and imply that the gambler stands to get back 60 on a stake of 10. Similarly, decimal odds of 1.5 are the same as fractional odds of 1/2, and a winning gambler would get back 30 on a wager of 20.

Decimal odds are quoted as a positive number greater than 1.

Odds and Probability

As indicated above, there is a direct relationship between odds and probabilities. For a fair game, this relationship is simple: the probabilities are the reciprocal of the decimal odds. And for a fair game, the sum of the probabilities of all possible outcomes must be 1.

The reciprocal relationship between decimals odds and probabilities implies that outcomes with the lowest odds are the most likely to be realised. This might not tie up with the conventional understanding of odds, but is a consequence of the fact that we are looking at the odds against that outcome.

<!–

Example: Fair Odds on Dice

In a simple dice game the player will win if the die lands on 6 but lose on all other outcomes. The fractional odds against a win are 5/1, which translates into decimal odds of 6.0. If a player wagers 10 then he stands to win 60 (with a probability 1/6) or lose 10 (with a probability 5/6). The expected outcome for the player is

&gt; (probability = c(win = 1, lose = 5) / 6)
    win    lose 
0.16667 0.83333 
&gt; payout = c(win = 5, lose = -1)
&gt; sum(probability * payout)
[1] 0.00

–>

Example: Fair Odds on Rugby

The Crusaders are playing the Hurricanes at the AMI Stadium. A bookmaker is offering 1/2 odds on the Crusaders and 2/1 odds on the Hurricanes. These fractional odds translate into decimal odds of 1.5 and 3.0 respectively. Based on these odds, the implied probabilities of either team winning are

> (odds = c(Crusaders = 1.5, Hurricanes = 3))
 Crusaders Hurricanes 
       1.5        3.0 
> (probability = 1 / odds)
 Crusaders Hurricanes 
   0.66667    0.33333 

The Crusaders are perceived as being twice as likely to win. Since they are clearly the favourites for this match it stands to reason that there would be more wagers placed on the Crusaders than on the Hurricanes. In fact, on the basis of the odds we would expect there to be roughly twice as much money placed on the Crusaders.

A successful wager of 10 on the Crusaders would yield a net win of 5, while the same wager on the Hurricanes would stand to yield a net win of 20. If we include the initial stake then we get the corresponding gross payouts of 15 and 30.

> (odds - 1) * 10                                          # Net win
Crusaders Hurricanes 
5         20 
> odds * 10                                                # Gross Win
Crusaders Hurricanes 
15         30 

In keeping with the reasoning above, suppose that a total of 2000 was wagered on the Crusaders and 1000 was wagered on the Hurricanes. In the event of a win by the Crusaders the bookmaker would keep the 1000 wagered on the Hurricanes, but pay out 1000 on the Crusaders wagers, leaving no net profit. Similarly, if the Hurricanes won then the bookmaker would pocket the 2000 wagered on the Crusaders but pay out 2000 on the Hurricanes wagers, again leaving no net profit. The bookmaker’s expected profit based on either outcome is zero. This does not represent a very lucrative scenario for a bookmaker. But, after all, this is a fair game.

From a punter’s perspective, a wager on the Crusaders is more likely to be successful, but is not particularly rewarding. By contrast, the likelihood of a wager on the Hurricanes paying out is lower, but the potential reward is appreciably higher. The choice of a side to bet on would then be dictated by the punter’s appetite for risk and excitement (or perhaps simply their allegiance to one team or the other).

The expected outcome, which weights the payout by its likelihood, of a wager on either the Crusaders or the Hurricanes is zero.

> (probability = c(win = 2, lose = 1) / 3)                 # Wager on Crusaders
win    lose 
0.66667 0.33333 
> payout = c(win = 0.5, lose = -1)
> sum(probability * payout)
[1] 0
> (probability = c(win = 1, lose = 2) / 3)                 # Wager on Hurricanes
win    lose 
0.33333 0.66667 
> payout = c(win = 2, lose = -1)
> sum(probability * payout)
[1] 0

Again this is because the odds represent a fair game.

Most games of chance are not fair, so the situation above represents a special (and not very realistic) case. Let’s look at a second example which presents the actual odds being quoted by a real bookmaker.

Example: Real Odds on Tennis

The odds below are from an online betting website for the tennis match between Madison Keys and Venus Williams. These are real, live odds and the implications for the player and the bookmaker are slightly different.

We’ll focus our attention on the overall winner, for which the decimal odds on Madison Keys are 1.83, while those on Venus Williams are 2.00.

> (odds = c(Madison = 1.83, Venus = 2.00))
Madison   Venus 
   1.83    2.00 
> (probability = 1 / odds)
Madison   Venus 
0.54645 0.50000 

The first thing that you’ve observed is that the implied probabilities do not sum to 1. We’ll return to this point in the next article.

The odds quoted for each player very similar, which implies that the bookmaker considers these players to be evenly matched. Madison Keys has slightly lower odds, which suggests that she is a slightly stronger contender. A wager on either player will not yield major rewards because of the low odds. However, at the same time, a wager on either player has a similar probability of being successful: both around 50%.

Let’s look at another match. Below are the odds from the same online betting website for the game between Novak Djokovic and Radek Stepanek.

odds-tennis-men

The odds for this game are profoundly different to those for the ladies match above.

> (odds = c(Novak = 1.03, Radek = 16.00))
Novak Radek 
 1.03 16.00 
> (probability = 1 / odds)
  Novak   Radek 
0.97087 0.06250 

Novak Djokovic is considered to be the almost certain winner. A wager on him thus has the potential to produce only 3% winnings. Radek Stepanek, on the other hand, is a rank outsider in this match. His perceived chance of winning is low. As a result, the potential returns should he win are large.

In the next instalment we’ll examine how bookmakers’ odds ensure their profit yet provide a potentially rewarding (and entertaining) experience for gamblers.

The post Sportsbook Betting (Part 1): Odds appeared first on Exegetic Analytics.

To leave a comment for the author, please follow the link and comment on their blog: R – Exegetic Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Pipe-friendly bootstrapping with list-variables in #rstats

By Daniel

(This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers)

A few days ago, my package sjstats was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely bootstrap(), boot_ci(), boot_se() and boot_p(). In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.

First, load the required libraries:

library(dplyr)
library(sjstats)

Now, init the sample data and fit a regular model. The model estimates how the dependency (e42dep) of an older person is related to the burden of care (neg_c_7) of a person who provides care to the frail older people:

data(efc)
fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)

I demonstrate the boot_ci()-function, so the confidence intervals are of interest here:

confint(fit)

>                  2.5 %    97.5 %
> (Intercept)  5.3374378 7.7794162
> e42dep       1.2929451 1.7964296
> c161sex     -0.1193198 0.9871336

Now let’s see to obtain bootstrapped confidence intervals for this model. First, the bootstrap()-function generates bootstrap replicates and returns a data-frame with just one column, $strap, which is a list-variable with bootstrap samples:

bootstrap(efc, 1000)

This is how the list-variable looks like:

# A tibble: 1,000 x 1
                     strap
                    <list>
1  <data.frame [908 x 26]>
2  <data.frame [908 x 26]>
3  <data.frame [908 x 26]>
4  <data.frame [908 x 26]>
5  <data.frame [908 x 26]>
6  <data.frame [908 x 26]>
7  <data.frame [908 x 26]>
8  <data.frame [908 x 26]>
9  <data.frame [908 x 26]>
10 <data.frame [908 x 26]>
# ... with 990 more rows

Since all data frames are saved in a list, you can use lapply() to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named models in the example below). Then, using lapply() again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated e42dep) for each „bootstrap“ model and save these coefficients in another variable (named dependency in the example below). Finally, we use the boot_ci()-function to calculate confidence intervals of the bootstrapped coefficients.

The complete code looks like this:

efc %>% 
  # generate bootstrape replicates, saved in
  # the list-variable 'strap'
  bootstrap(1000) %>% 
  # run linear model on all bootstrap samples
  mutate(models = lapply(.$strap, function(x) {
    lm(neg_c_7 ~ e42dep + c161sex, data = x)
  })) %>%
  # extract coefficient for "e42dep" (dependency) variable
  mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>%
  # compute boostrapped confidence intervals
  boot_ci(dependency)

And the result (depending on your seed()) is:

conf.low conf.high 
1.303847  1.790724

The complete code, put together:

library(dplyr)
library(sjstats)
data(efc)

fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
confint(fit)

efc %>% 
  bootstrap(1000) %>% 
  mutate(models = lapply(.$strap, function(x) {
    lm(neg_c_7 ~ e42dep + c161sex, data = x)
  })) %>%
  mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>%
  boot_ci(dependency)

Tagged:

To leave a comment for the author, please follow the link and comment on their blog: R – Strenge Jacke!.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Results from the R Shapefile Contest!

By Ari Lamstein

(This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers)

Today I am happy to announce the results from the R Shapefile Contest.

The contest was an incredible success – there were 19 entries that covered a range of topics. Each entry was well thought out, and I encourage you to read each of them.

Here are the entries, in order of submission:

Bonus: Get all the entries as a PDF!
Title Author
Cambridge Outdoor Lighting Ordinance Kent S Johnson
Long-Term View of Tornado Risk: County-Level Tornado Rates Adjusted for Population & Exposure James B. Elsner, Thomas H. Jagger, and Tyler Fricker
Airport Effects on U.S. County Unemployment Rates Robby Powell
Australian Federal Election 2016 – Polling Place Breakdown Jonathan Carroll
National Propensity to Cycle Tool Propensity to Cycle Tool team
Getting Started With CaricRture Chris Brunsdon
Spatial neighbors in R – an interactive illustration Kyle Walker
Spatstat_Object_To_Shapefile.R David Maupin
Working with Shapefiles Dennis Chandler
Crime in Greece in 2010 Nikos Papakonstantinou
R & Shapefile, a short script geoobserver
SociocaRtograpy: Delhi Crime Map Parth Khare
Hong Kong Population Center of Gravity (COG) Fung Yip
Overview of ground-based rainfall measurement network data quality for Venezuela Andrew Sajo
Venezuelan rainfall dynamics Andrew Sajo
Washington, DC Parking Violations Andrew Breza
Marine Boundaries in R: Reading EEZ Shapefiles Daniel Palacios
London Crime Analysis Henry Partridge
Twitter Sentiment analysis of Trump and Clinton Charlie Thompson

Please join me in thanking each of the entrants!

Goals of the Contest

As a reminder, the goal of the contest was to “do something in R, with a shapefile, that does something other than make a choropleth map”. This goal was entirely selfish: I have spent years analyzing data using choropleth maps. But as I don’t have a background in geospatial statistics, I am really not aware of what other analytical techniques I can be using. I hoped that by running a contest I could learn some more useful techniques that I could then apply to my own work.

And the winner is …

There are actually two winners to the contest. They both provided concise explanations, and real-world demonstrations, of geospatial concepts that I was simply not aware of.

  • Spatial neighbors in R – an interactive illustration by Kyle Walker. Kyle is a geography professor. This might have allowed him to intuitively understand the types of analyses that I was looking for. His entry demonstrates different definitions of “neighbor” in spatial statistics, and how those definitions can effect interpretations of the data.
  • London Crime Analysis by Henry Partridge goes a step further. Henry developed an application to map different types of crime in London. He then used Moran’s I to calculate spatial autocorrelation. There were actually several entries that deal with mapping crime, but only Henry’s entry introduced this extra step beyond a choropleth maps.

It’s worth pointing out that both of the winning entries used RStudio’s Shiny framework.

Honorable Mention

Several entries besides the winners stood out as teaching me something new in the area of R and shapefiles in a concise, enjoyable way:

Prizes

As a reminder, both of the winners will get two prizes:

  1. A free copy of my course Mapmaking in R with Choroplethr ($99 value) and
  2. A free copy of my course Shapefiles for R Programmers ($99 value).

I will be in touch with the winners today about how to get their copies of the courses.

Bonus: Get all the entries as a PDF!

The post Results from the R Shapefile Contest! appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News