Dockerizing a Shiny App

By Flavio Barros

(This article was first published on Flavio Barros » r-bloggers, and kindly contributed to R-bloggers)

After a long pause of more than four months, I am finally back to post here. Unfortunately, many commitments prevented me keep posting, but coming back, i changed the deployment (now this blog runs entirely within a docker container with some other cool things I intend to post more forward) and wrote this post.

1. R e apps Shiny

If you are reading this post here, you probably know what Shiny is. OK, but in the case you don’t, you can see it in action! This is the App that i dockerized. Soon you will able to run it at any computer with docker installed.

2. Docker

If you somehow accompanies the open source world news then you probably have heard of Docker. Docker is a fantastic tool for creating software containers, totally isolated from the host operating system. It works like a virtual machine, but it’s much lighter.

The idea behind docker is that the developer creates a container with all dependencies that he wants, make sure that everything works and done. The staff responsible for software deployment does not need to know what is inside the container, it just needs to be able to run container on the server. While this feature could be achieved with virtual machines, they ended up coming with much more than necessary, so the VM files are too large and the host system becomes very slow.

On the other hand, docker does not use a full OS, it shares the same host kernel (yes, it needs to run on Linux) but is a completely isolated environment. So run a docker containers is much lighter than run a virtual machine. Docker features do not stop there, it also allows a kind of versioning and has a kind of github for containers, the Docker Hub, where the user can download and use ready images for various software, such as MySQL, Postgres , LAMP, WordPress, RStudio, among others. If you want to better understand what is Docker, watch this video.

3. Dockerizing a Shiny app

I just showed you an example of a Shiny app running on RStudio locally. For development it’s ok, but if I want to make it available to anyone? One solution is send the project files. For a basic shiny application just two files are needed (ui.R and server.R).

But what if I want to put on the web? There are two alternatives:

1) A Shiny Server

2) The PaaS shinyapps.io

Option 1) can be very complicated for some users, sometimes not workable, due to the need to install and configure a server.

Option 2) is more interesting, however you it can be expensive, since the free plan can be very limited for some needs.

How docker can help? Initially with the docker you can create a shiny server using one command. This greatly simplifies deployment of a server. See this short video:

You just need:

docker run --rm -p 3838:3838 rocker/shiny

WINDOWS AND MAC USERS: You will need boot2docker to reproduce this.

This solution seems to solve the problem. However you can still find several problems such as:

1) How can i put my apps on the server?

2) How can i get a url directly to my app?

3) And this 3838, how can i change it?

4) How can i create an image for my app?

To solve these problems I created a sample container, with a sample app, which appears in the browser as the image is running. Its available at Docker Hub, and it’s ready to test and use. The source code is on Github.

At the following videos i show you how to deploy this app locally and on Digital Ocean. First local:

And on Digital Ocean:

Note that when using the aforementioned command, you do not returns straight to the terminal and you will need a Ctrl + C to close the container. So, to keep the container running and return to the terminal, you should use &.

docker run --rm -p 80:80 flaviobarros/shiny &

You can run this app on Amazon Web Services, Google Cloud and Microsoft Azure, as all of them have support for docker. However, my suggestion is Digital Ocean that is a lot easier to use.

IMPORTANT: through any link to Digital Ocean in this post, you will earn U$10.00 credit without commitment to keep up the service. With this credit you can keep a simple VPS with 512MB RAM, for two months for free!

The post Dockerizing a Shiny App appeared first on Flavio Barros.

To leave a comment for the author, please follow the link and comment on his blog: Flavio Barros » r-bloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Le Monde puzzle [#909]

By xi’an

(This article was first published on Xi’an’s Og » R, and kindly contributed to R-bloggers)

Another of those “drop-a-digit” Le Monde mathematical puzzle:

Find all integers n with 3 or 4 digits an single interior zero digit, such that removing that zero digit produces a divider of x.

As in puzzle #904, I made use of the digin R function:

digin=function(n){
  as.numeric(strsplit(as.character(n),"")[[1]])}

and simply checked all integers up to 10⁶:

plura=divid=NULL
for (i in 101:10^6){
 dive=rev(digin(i))
 if ((min(dive[1],rev(dive)[1])>0)&
    (sum((dive[-c(1,length(dive))]==0))==1)){
   dive=dive[dive>0]
   dive=sum(dive*10^(0:(length(dive)-1)))
 if (i==((i%/%dive)*dive)){
   plura=c(plura,i)
   divid=c(divid,dive)}}}

which leads to the output

> plura
1] 105 108 405 2025 6075 10125 30375 50625 70875
> plura/divid
[1] 7 6 9 9 9 9 9 9 9

leading to the conclusion there is no solution beyond 70875. (Allowing for more than a single zero within the inner digits sees many more solutions.)

Filed under:

To leave a comment for the author, please follow the link and comment on his blog: Xi’an’s Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Introduction to Applied Econometrics With R

By Dave Giles

(This article was first published on Econometrics Beat: Dave Giles’ Blog, and kindly contributed to R-bloggers)
I came across a January post from David Smith at Revolution Analytics, in his Revolutions blog. It’s titled, An Introduction to Applied Econometrics With R, and it refers to a very useful resource that’s been put together by Bruno Rodrigues of the University of Strasbourg. It’s called Introduction to Programming Econometrics With R, and you can download it from here.
Bruno’s material is a work in progress, but it’s definitely worth checking out if you’re looking for something to help economics students learn about R in an introductory statistics/econometrics course.

© 2015, David E. Giles

To leave a comment for the author, please follow the link and comment on his blog: Econometrics Beat: Dave Giles’ Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

The First NY R Conference

By Joseph Rickert

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

Last Friday and Saturday the NY R Conference briefly lit up Manhattan’s Union Square neighborhood as the center of the R world. You may have caught some of the glow on twitter. Jared Lander, volunteers from the New York Open Statistical Programming Meetup along with the staff at Workbench (the conference venue) set the bar pretty darn high for a first time conference.

The list of speakers was impressive (a couple of the presentations approached the sublime), the venue was bright and upscale, the food was good, and some of the best talks ran way over the time limit but somehow the clock slowed down to sync to the schedule.

But the best part of the conference was the vibe! It was a sweet brew of competency, cooperation and fun. The crowd, clearly out to enjoy themselves, provided whatever lift the speakers needed to be at the top of their game. For example, when near the very end of the second day Stefan Karpinsky’s PC just “up and died” as he was about to start his Julia to R demo the crowd hung in there with him and Stefan managed an engaging, ad lib, no visuals 20 minute talk. It was also uncanny how the talks seemed to be arranged in just the right order. Mike Dewar, a data scientist with the New York Times, gave the opening presentation which featured some really imaginative and impressive data visualizations that wowed the audience. But Bryan Lewis stole back the thunder, and the applause, later in the morning when as part of his presentation on htmlwidgets he reproduced Dewar’s finale viz with mushroom data.

Bryan has posted his slides on his site here along with a promise to post all of the code soon.

The slides from all of the presentations have yet to be posted on the NY R Conference website. So, all I can do here today is to provide an opportunity sample drawn from postings I have managed to find scattered about the web. Here are Winston Chang’s talk on Dashboarding with Shiny, Jared Lander’s talk on Making R Go Faster and Bigger, Wes McKinney’s talk on Data Frames, my talk on Reproducibility with the checkpoint package, and Max Richman’s talk on R for Survey Analysis.

For the rest of the presentations, we will have to wait for the slides to become available on the conference site. There is a lot to look forward to: Vivian Peng’s presentation on Storytelling and Data Visualization will be worth multiple viewings and you will not want to miss Hilary Parker’s hilarious “grand slam” talk on Reproducible Analysis in Production featuring explainR and complainR. But for sure, look for Andrew Gelman’s talk: But When You Call Me A Bayesian I Know I’m Not the Only One. Gelman delivered what was possibly the best technical talk ever, but we will have to wait for the conference video to reassess that.

Was Gelman’s talk really the best ever, or was it just the magic of his delivery and the mood of the audience that made it seem so? Either way, I’m glad I was there.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

EARL2015 Conference Agenda (London) Announced

By Liz Matthews

EARL2015 Logo Landscape

(This article was first published on Mango Solutions, and kindly contributed to R-bloggers)

The full agenda for the London EARL2015 Conference has now been announced. The conference focusses on the real world commercial usage and business applications of the R Language and features speakers from companies such as Shell, Allianz, KPMG, UBS, Millward Brown, Lloyd’s of London, Jacobs Douwe Egberts and LateRooms.

The full agenda has been published on the EARL Website; to access it please click here

Please note that due to space restrictions the pre-conference workshops have only limited places. Please book early to avoid disappointment.

To leave a comment for the author, please follow the link and comment on his blog: Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Circle packing in R (again)

By Michael Bedward

(This article was first published on Last Resort Software, and kindly contributed to R-bloggers)

Back in 2010 I posted some
And here’s the code:


# Create some random circles, positioned within the central portion
# of a bounding square, with smaller circles being more common than
# larger ones.

ncircles limits inset rmax
xyr x = runif(ncircles, min(limits) + inset, max(limits) - inset),
y = runif(ncircles, min(limits) + inset, max(limits) - inset),
r = rbeta(ncircles, 1, 10) * rmax)

# Next, we use the `circleLayout` function to try to find a non-overlapping
# arrangement, allowing the circles to occupy any part of the bounding square.
# The returned value is a list with elements for the layout and the number
# of iterations performed.

library(packcircles)

res cat(res$niter, "iterations performed")

# Now draw the before and after layouts with ggplot

library(ggplot2)
library(gridExtra)

## plot data for the `before` layout
dat.before
## plot dta for the `after` layout returned by circleLayout
dat.after
doPlot ggplot(dat) +
geom_polygon(aes(x, y, group=id), colour="brown", fill="burlywood", alpha=0.3) +
coord_equal(xlim=limits, ylim=limits) +
theme_bw() +
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank()) +
labs(title=title)

grid.arrange(
doPlot(dat.before, "before"),
doPlot(dat.after, "after"),
nrow=1)

To leave a comment for the author, please follow the link and comment on his blog: Last Resort Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Wakefield: Random Data Set (Part II)

By tylerrinker

(This article was first published on TRinker’s R Blog » R, and kindly contributed to R-bloggers)

This post is part II of a series detailing the GitHub package, wakefield, for generating random data sets. The First Post (part I) was a test run to gauge user interest. I received positive feedback and some ideas for improvements, which I’ll share below.

The post is broken into the following sections:

You can view just the R code HERE or PDF version HERE

1 Brief Package Description

First we’ll use the pacman package to grab the wakefield package from GitHub and then load it as well as the handy dplyr package.

if (!require("pacman")) install.packages("pacman"); library(pacman)
p_install_gh("trinker/wakefield")
p_load(dplyr, wakefield)

The main function in wakefield is r_data_frame. It takes n (the number of rows) and any number of variable functions that generate random columns. The result is a data frame with named, randomly generated columns. Below is an example, for details see Part I or the README

set.seed(10)

r_data_frame(n = 30,
    id,
    race,
    age(x = 8:14),
    Gender = sex,
    Time = hour,
    iq,
    grade, 
    height(mean=50, sd = 10),
    died,
    Scoring = rnorm,
    Smoker = valid
)
## Source: local data frame [30 x 11]
## 
##    ID     Race Age Gender     Time  IQ Grade Height  Died    Scoring
## 1  01    White  11   Male 01:00:00 110  90.7     52 FALSE -1.8227126
## 2  02    White   8   Male 01:00:00 111  91.8     36  TRUE  0.3525440
## 3  03    White   9   Male 01:30:00  87  81.3     39 FALSE -1.3484514
## 4  04 Hispanic  14   Male 01:30:00 111  83.2     46  TRUE  0.7076883
## 5  05    White  10 Female 03:30:00  95  80.1     51  TRUE -0.4108909
## 6  06    White  13 Female 04:00:00  97  93.9     61  TRUE -0.4460452
## 7  07    White  13 Female 05:00:00 109  89.5     44  TRUE -1.0411563
## 8  08    White  14   Male 06:00:00 101  92.3     63  TRUE -0.3292247
## 9  09    White  12   Male 06:30:00 110  90.1     52  TRUE -0.2828216
## 10 10    White  11   Male 09:30:00 107  88.4     47 FALSE  0.4324291
## .. ..      ... ...    ...      ... ...   ...    ...   ...        ...
## Variables not shown: Smoker (lgl)

2 Improvements

2.1 Repeated Measures Series

Big thanks to Ananda Mahto for suggesting better handing of repeated measures series and providing concise code to extend this capability.

The user may now specify the same variable function multiple times and it is named appropriately:

set.seed(10)

r_data_frame(
    n = 500,
    id,
    age, age, age,
    grade, grade, grade
)
## Source: local data frame [500 x 7]
## 
##     ID Age_1 Age_2 Age_3 Grade_1 Grade_2 Grade_3
## 1  001    28    33    32    80.2    87.2    85.6
## 2  002    24    35    31    89.7    91.7    86.8
## 3  003    26    33    23    92.7    85.7    88.7
## 4  004    31    24    28    82.2    90.0    86.0
## 5  005    21    21    29    86.5    87.0    88.4
## 6  006    23    28    25    85.6    93.5    86.7
## 7  007    24    22    26    89.3    90.3    87.6
## 8  008    24    21    23    92.4    88.3    89.3
## 9  009    29    23    32    86.4    84.4    88.2
## 10 010    26    34    32    97.6    84.2    90.6
## .. ...   ...   ...   ...     ...     ...     ...

But he went further, recommending a short hand for variable, variable, variable. The r_series function takes a variable function and j number of columns. It can also be renamed with the name argument:

set.seed(10)

r_data_frame(n=100,
    id,
    age,
    sex,
    r_series(gpa, 2),
    r_series(likert, 3, name = "Question")
)
## Source: local data frame [100 x 8]
## 
##     ID Age    Sex GPA_1 GPA_2        Question_1        Question_2
## 1  001  28   Male  3.00  4.00 Strongly Disagree   Strongly Agree 
## 2  002  24   Male  3.67  3.67          Disagree           Neutral
## 3  003  26   Male  3.00  4.00          Disagree Strongly Disagree
## 4  004  31   Male  3.67  3.67           Neutral   Strongly Agree 
## 5  005  21 Female  3.00  3.00             Agree   Strongly Agree 
## 6  006  23 Female  3.67  3.67             Agree             Agree
## 7  007  24 Female  3.67  4.00          Disagree Strongly Disagree
## 8  008  24   Male  2.67  3.00   Strongly Agree            Neutral
## 9  009  29 Female  4.00  3.33           Neutral Strongly Disagree
## 10 010  26   Male  4.00  3.00          Disagree Strongly Disagree
## .. ... ...    ...   ...   ...               ...               ...
## Variables not shown: Question_3 (fctr)

2.2 Dummy Coding Expansion of Factors

It is sometimes nice to expand a factor into j (number of groups) dummy coded columns. Here we see a factor version and then a dummy coded version of the same data frame:

set.seed(10)

r_data_frame(n=100,
    id,
    age,
    sex,
    political
)
## Source: local data frame [100 x 4]
## 
##     ID Age    Sex    Political
## 1  001  28   Male Constitution
## 2  002  24   Male Constitution
## 3  003  26   Male     Democrat
## 4  004  31   Male     Democrat
## 5  005  21 Female Constitution
## 6  006  23 Female     Democrat
## 7  007  24 Female     Democrat
## 8  008  24   Male   Republican
## 9  009  29 Female Constitution
## 10 010  26   Male     Democrat
## .. ... ...    ...          ...

The dummy coded version…

set.seed(10)

r_data_frame(n=100,
    id,
    age,
    r_dummy(sex, prefix = TRUE),
    r_dummy(political)
)
## Source: local data frame [100 x 9]
## 
##     ID Age Sex_Male Sex_Female Constitution Democrat Green Libertarian
## 1  001  28        1          0            1        0     0           0
## 2  002  24        1          0            1        0     0           0
## 3  003  26        1          0            0        1     0           0
## 4  004  31        1          0            0        1     0           0
## 5  005  21        0          1            1        0     0           0
## 6  006  23        0          1            0        1     0           0
## 7  007  24        0          1            0        1     0           0
## 8  008  24        1          0            0        0     0           0
## 9  009  29        0          1            1        0     0           0
## 10 010  26        1          0            0        1     0           0
## .. ... ...      ...        ...          ...      ...   ...         ...
## Variables not shown: Republican (int)

2.3 Factor to Numeric Conversion

There are times when you feel like a factor and the when you feel like an integer version. This is particularly useful with Likert-type data and other ordered factors. The as_integer function takes a data.frame and allows the user t specify the indices (j) to convert from factor to numeric. Here I show a factor data.frame and then the integer conversion:

set.seed(10)

r_data_frame(5,
    id, 
    r_series(likert, j = 4, name = "Item")
)
## Source: local data frame [5 x 5]
## 
##   ID          Item_1   Item_2          Item_3            Item_4
## 1  1         Neutral    Agree        Disagree           Neutral
## 2  2           Agree    Agree         Neutral   Strongly Agree 
## 3  3         Neutral    Agree Strongly Agree              Agree
## 4  4        Disagree Disagree         Neutral             Agree
## 5  5 Strongly Agree   Neutral           Agree Strongly Disagree

As integers…

set.seed(10)

r_data_frame(5,
    id, 
    r_series(likert, j = 4, name = "Item")
) %>% 
    as_integer(-1)
## Source: local data frame [5 x 5]
## 
##   ID Item_1 Item_2 Item_3 Item_4
## 1  1      3      4      2      3
## 2  2      4      4      3      5
## 3  3      3      4      5      4
## 4  4      2      2      3      4
## 5  5      5      3      4      1

2.4 Viewing Whole Data Set

dplyr has a nice print method that hides excessive rows and columns. Typically this is great behavior. Sometimes you want to quickly see the whole width of the data set. We can use View but this is still wide and shows all columns. The peek function shows minimal rows, truncated columns, and prints wide for quick inspection. This is particularly nice for text strings as data. dplyr prints wide data sets like this:

r_data_frame(100,
    id, 
    name,
    sex,
    sentence    
)
## Source: local data frame [100 x 4]
## 
##     ID     Name    Sex
## 1  001   Gerald   Male
## 2  002    Jason   Male
## 3  003 Mitchell   Male
## 4  004      Joe Female
## 5  005   Mickey   Male
## 6  006   Michal   Male
## 7  007   Dannie Female
## 8  008   Jordan   Male
## 9  009     Rudy Female
## 10 010   Sammie Female
## .. ...      ...    ...
## Variables not shown: Sentence (chr)

Now use peek:

r_data_frame(100,
    id, 
    name,
    sex,
    sentence    
) %>% peek
## Source: local data frame [100 x 4]
## 
##     ID    Name    Sex   Sentence
## 1  001     Jae Female Excuse me.
## 2  002 Darnell Female Over the l
## 3  003  Elisha Female First of a
## 4  004  Vernon Female Gentlemen,
## 5  005   Scott   Male That's wha
## 6  006   Kasey Female We don't h
## 7  007 Michael   Male You don't 
## 8  008   Cecil Female I'll get o
## 9  009    Cruz Female They must 
## 10 010  Travis Female Good night
## .. ...     ...    ...        ...

2.5 Visualizing Column Types and NAs

When we build a large random data set it is nice to get a sense of the column types and the missing values. The table_heat (also plot for tbl_df class) does this. Here I’ll generate a data set, add missing values (r_na), and then plot:

set.seed(10)

r_data_frame(n=100,
    id,
    dob,
    animal,
    grade, grade,
    death,
    dummy,
    grade_letter,
    gender,
    paragraph,
    sentence
) %>%
   r_na() %>%
   plot(palette = "Set1")

3 Table of Variable Functions

There are currently 66 wakefield based variable functions to chose for building columns. Use variables() to see them or variables(TRUE) to see a list of them broken into variable types. Here’s an HTML table version:


age dob height_in month speed
animal dummy income name speed_kph
answer education internet_browser normal speed_mph
area employment iq normal_round state
birth eye language paragraph string
car gender level pet upper
children gpa likert political upper_factor
coin grade likert_5 primary valid
color grade_letter likert_7 race year
date_stamp grade_level lorem_ipsum religion zip_code
death group lower sat
dice hair lower_factor sentence
died height marital sex
dna height_cm military smokes

4 Possible Uses

4.1 Testing Methods

I personally will use this most frequently when I’m testing out a model. For example say you wanted to test psychometric functions, including the cor function, on a randomly generated assessment:

dat <- r_data_frame(120,
    id, 
    sex,
    age,
    r_series(likert, 15, name = "Item")
) %>% 
    as_integer(-c(1:3))

dat %>%
    select(contains("Item")) %>%
    cor %>%
    heatmap</code></pre>
<p><img src="random_data_blog2_files/figure-html/unnamed-chunk-15-1.png" title="" alt="" width="450" /></p>
</div>
<div id="unique-student-data-for-course-assignments" class="section level2">
<h2><span class="header-section-number">4.2</span> Unique Student Data for Course Assignments</h2>
<p>Sometimes it's nice if students each have their own data set to work with but one in which you control the parameters. Simply supply the students with a unique integer id and they can use this inside of <code>set.seed</code> with a <strong>wakefield</strong> <code>r_data_frame</code> you've constructed for them in advance. Viola 25 instant data sets that are structurally the same but randomly different.</p>
1student_id <- ## INSERT YOUT ID HERE
    
set.seed(student_id)

dat <- function(1000,
    id, 
    gender,
    religion,
    internet_browser,
    language,
    iq,
    sat,
    smokes
)

4.3 Blogging and Online Help Communities

wakefield can make data sharing on blog posts and online hep communities (e.g., TalkStats, StackOverflow) fast, accessible, and with little space or cognitive effort. Use variables(TRUE) to see variable functions by class and select the ones you want:

variables(TRUE)
## $character
## [1] "lorem_ipsum" "lower"       "name"        "paragraph"   "sentence"   
## [6] "string"      "upper"       "zip_code"   
## 
## $date
## [1] "birth"      "date_stamp" "dob"       
## 
## $factor
##  [1] "animal"           "answer"           "area"            
##  [4] "car"              "coin"             "color"           
##  [7] "dna"              "education"        "employment"      
## [10] "eye"              "gender"           "grade_level"     
## [13] "group"            "hair"             "internet_browser"
## [16] "language"         "lower_factor"     "marital"         
## [19] "military"         "month"            "pet"             
## [22] "political"        "primary"          "race"            
## [25] "religion"         "sex"              "state"           
## [28] "upper_factor"    
## 
## $integer
## [1] "age"      "children" "dice"     "level"    "year"    
## 
## $logical
## [1] "death"  "died"   "smokes" "valid" 
## 
## $numeric
##  [1] "dummy"        "gpa"          "grade"        "height"      
##  [5] "height_cm"    "height_in"    "income"       "iq"          
##  [9] "normal"       "normal_round" "sat"          "speed"       
## [13] "speed_kph"    "speed_mph"   
## 
## $`ordered factor`
## [1] "grade_letter" "likert"       "likert_5"     "likert_7"

Then throw the inside of r_data_fame to make a quick data set to share.

r_data_frame(8,
    name,
    sex,
    r_series(iq, 3)
) %>%
    peek %>%
    dput

5 Getting Involved

If you’re interested in getting involved with use or contributing you can:

  1. Install and use wakefield
  2. Provide feedback via comments below
  3. Provide feedback (bugs, improvements, and feature requests) via wakefield’s Issues Page
  4. Fork from GitHub and give a Pull Request

Thanks for reading, your feedback is welcomed.


*Get the R code for this post HERE
*Get a PDF version this post HERE

To leave a comment for the author, please follow the link and comment on his blog: TRinker’s R Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Modeling the Latent Structure That Shapes Brand Learning

By Joel Cadwell

(This article was first published on Engaging Market Research, and kindly contributed to R-bloggers)

What is a brand? Metaphorically, the brand is the white sphere in the middle of this figure, that is, the ball surrounded by the radiating black cones. Of course, no ball has been drawn, just the conic thorns positioned so that we construct the sphere as an organizing structure (a form of reification in Gestalt psychology). Both perception and cognition organize input into Gestalts or Wholes generalizing previously learned forms and configurations.

It is because we are familiar with pictures like the following that we impose an organization on the black objects and “see” a ball with spikes. You did not need to be thinking about “spikey balls” for the figure recruits its own interpretative frame. Similarly, brands and product categories impose structure on feature sets. The brand may be an associative net (what comes to mind when I say “McDonald’s”), but that network is built on a scaffolding that we can model using R.

In a previous post, I outlined how product categories are defined by their unique tradeoff of strengths and weaknesses. In particular, the features associated with fast food restaurants fall along a continuum from easy to difficult to deliver. Speed is achievable. Quality food seems to be somewhat harder to serve. Brands within the product category can separate themselves by offering their own unique affordances.

My example was Subway Sandwich offering “fresh” fast food. Respondents were given a list of 8 attributes (seating, menu selection, ease of ordering, food preparation, taste, filling, healthy and fresh) and asked to check off which attributes Subway successfully delivered.

The item characteristic curves in the above figure were generated using the R package ltm (latent trait modeling). The statistical model comes from achievement testing, which is why the attribute is called an item and the x-axis is labeled as ability. Test items can be arrayed in terms of their difficulty with the harder questions answered correctly only by the smarter students. Ability has been standardized, so the x-axis shows z-scores. The curves are the logistic functions displaying how the probability of endorsing each of the eight attributes is a function of each respondent’s standing on the x-axis. Only 6 of the 8 attribute names can be seen on the chart with the labels for the lowest two curves, menu and seating, falling off the chart.

The zero point for ability is the average for the sample filling in the checklist. The S-curve for “fresh” has the highest elevation and is the farthest to the left. Reading up from ability equals zero, we can see that on the average more than 80% are likely to tell us that Subway serves fresh food. You can see the gap between the four higher curves for fresh, healthy, filling and taste and the four lower curves for preparation, ordering, menu and seating. The lowest S-curve indicates that the average respondent would check seating with a likelihood of less than 20%.

What is invariant is the checklist pattern. Those who love Subway might check all the attributes except for the bottom one or two. For example, families may be fine with everything but the available seating. On the other hand, those looking for a greasy hamburger might reluctantly endorse fresh or healthy and nothing else. As one moves from left to right along the Ability scale, the checklist is filled in with fresh, then healthy, and so on in an order reflecting the brand image. Fresh is easy for Subway. Healthy is only a little more difficult, but seating can be a problem. Moreover, an individual who is happy with the seating and the menu is likely to be satisfied with the taste and the freshness of the food. Response patterns follow an ordering that reflects the underlying scaffolding holding the brand concept together.

Latent trait or item response theory is a statistical model where we estimate the parameters of the equation specifying the relationship between the latent x-axis and response probability. R offers nonparametric alternatives such as KernSmoothIRT and princurve. Hastie’s work on principal curves may be of particular interest since it comes from outside of achievement testing. A more general treatment of the same issues enables us to take a different perspective and see how observed responses are constrained by the underlying data generation process.

Branding is the unique spin added to a product category that has evolved out of repeated interactions between consumer needs and providers skill to satisfy those needs at a profit. The data scientist can see the impact of that branding when we model consumer perceptions. Consumers are not scientists running comparative tests under standardized conditions. Moreover, inferences are made when experience is lacking. Our take-out only customer feels comfortable commenting on seating, although they may have only glanced on their way in or out. It gets worse for ratings on scales and for attributes that the average consumer lacks the expertise to evaluate (e.g., credence attributes associated with quality, reliability or efficacy).

We often believe that product experience is self-evident and definitive when, in fact, it may be ambiguous and even seductive. Much of what we know about products, even products that we use, has been shaped by a latent structure learned from what we have heard and read. Even if the thorns are real, the scaffolding comes from others.

To leave a comment for the author, please follow the link and comment on his blog: Engaging Market Research.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

the most patronizing start to an answer I have ever received

By xi’an

(This article was first published on Xi’an’s Og » R, and kindly contributed to R-bloggers)

Another occurrence [out of many!] of a question on X validated where the originator (primitivus petitor) was trying to get an explanation without the proper background. On either Bayesian statistics or simulation. The introductory sentence to the question was about “trying to understand how the choice of priors affects a Bayesian model estimated using MCMC” but the bulk of the question was in fact failing to understand an R code for a random-walk Metropolis-Hastings algorithm for a simple regression model provided in a introductory blog by Florian Hartig. And even more precisely about confusing the R code dnorm(b, sd = 5, log = T) in the prior with rnorm(1,mean=b, sd = 5, log = T) in the proposal…

“You should definitely invest some time in learning the bases of Bayesian statistics and MCMC methods from textbooks or on-line courses.” X

So I started my

To leave a comment for the author, please follow the link and comment on his blog: Xi’an’s Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Visualizing fits, inference, implications of (G)LMMs with Jaime Ashander

By Noam Ross

(This article was first published on Noam Ross – R, and kindly contributed to R-bloggers)

A couple of weeks at the Davis R Users’ Group, Jaime Ashander gave a presentation on and visualizing and diagnosing (G)LMMs in R. Here’s the video:

Jaime also wrote up the notes from his talk, including all the code, on his blog here (with the raw R Markdown file on github here). The material in the blog post is expanded and improved upon from the original talk, though this means the video and posted code don’t match exactly.

To leave a comment for the author, please follow the link and comment on his blog: Noam Ross – R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News