Different demand functions and optimal price estimation in R

By insightr

(This article was first published on R – insightR, and kindly contributed to R-bloggers)

By Yuri Fonseca

Demand models

In the previous post about pricing optimization (link here), we discussed a little about linear demand and how to estimate optimal prices in that case. In this post we are going to compare three different types of demand models for homogeneous products and how to find optimal prices for each one of them.

For the linear model, demand is given by:

where alpha is the slope of the curve and beta the intercept. For the linear model, the elasticity goes from zero to infinity. Another very common demand model is the constant-elasticity model, given by:

displaystyle ln d(p) = alpha ln p + beta,
or

displaystyle d(p) = d_0 e^beta p^alpha = Cp^alpha,

where alpha is the elasticity of the demand and C is a scale factor. A much more interesting demand curve is given by the logistic/sigmoide function:

displaystyle d(p) = Cfrac{e^{alpha p + beta}}{1 + e^{alpha p + beta}} = frac{C}{1+e^{-alpha(p - p_0)}},

where C is a scale factor and alpha measures price sensitivity. We also can observe p_0 = -alpha/beta as the inflection point of the demand.

Some books changes the signs of the coefficients using the assumption that alpha is a positive constant and using a minus sign in front of it. However, it does not change the estimation procedure or final result, it is just a matter of convenience. Here, we expect alpha to be negative in the three models.

In the Figure below we can check a comparison among the shapes of the demand models:

library(ggplot2)
library(reshape2)
library(magrittr)

linear = function(p, alpha, beta) alpha*p + beta
constant_elast = function(p, alpha, beta) exp(alpha*log(p)+beta)
logistic = function(p, c, alpha, p0) c/(1+exp(-alpha*(p-p0)))

p = seq(1, 100)
y1 = linear(p, -1, 100)
y2 = constant_elast(p, -.5, 4.5)
y3 = logistic(p, 100, -.2, 50)

df = data.frame('Prices' = p, 'Linear' = y1, 'Constant_elast' = y2, 'Logistic' = y3)
df.plot = melt(df, id = 'Prices') %>% set_colnames(c('Prices', 'Model', 'Demand'))

ggplot(df.plot) + aes(x = Prices, y = Demand) +
  geom_line(color = 'blue', alpha = .6, lwd = 1) +
  facet_grid(~Model)

plot of chunk demand_models

Of course that in practice prices does not change between 1 and 100, but the idea is to show the main differences in the shape of the models.

All the models presented above have positive and negative points. Although local linear approximation may be reasonable for small changes in prices, sometimes this assumption is too strong and does not capture the correct sensitivity of bigger price changes. In the constant elasticity model, even though it is a non-linear relationship between demand and price, the constant elasticity assumption might be too restrictive. Moreover, it tends to over estimate the demand for lower and bigger prices. In a fist moment, I would venture to say that the logistic function is the most robust and realistic among the three types.

Pricing with demand models

In a general setting, one have for the total profit function:

displaystyle L(p) = d(p)(p-c),

where, L gives the profit, d is the demand function that depends of the price and c is the marginal cost. Taking the derivative with respect to price we have:

displaystyle L'(p) = d'(p)(p - c) + d(p).

Making L'(p) = 0 to calculate the optimum price (first order condition), we have:

displaystyle d'(p^star)(p^star - c) + d(p^star) = 0
displaystyle d'(p^star)p^star + d(p^star) = d'(p^star)c,

which is the famous condition that in the optimal price, marginal cost equals marginal revenue. Next, let’s see how to calculate the optimum prices for each demand functions.

Linear model

For the linear model d'(p) = alpha. Hence:

displaystyle d'(p^star)p + d(p^star) = d'(p^star)c,
displaystyle alpha p^star + alpha p^star + beta = alpha c,
displaystyle p^star = frac{alpha c - beta}{2alpha}.

Example:

library(tidyverse)

# Synthetic data
p = seq(80,130)
d = linear(p, alpha = -1.5, beta = 200) + rnorm(sd = 5, length(p))
c = 75
profit = d*(p-c)

# Fit of the demand model
model1 = lm(d~p)
profit.fitted = model1$fitted.values*(p - c)

# Pricing Optimization
alpha = model1$coefficients[2]
beta = model1$coefficients[1]
p.max.profit = (alpha*c - beta)/(2*alpha)

# Plots
df.linear = data.frame('Prices' = p, 'Demand' = d,
                       'Profit.fitted' = profit.fitted, 'Profit' = profit)

ggplot(select(df.linear, Prices, Demand)) + aes(x = Prices, y = Demand) +
  geom_point() + geom_smooth(method = lm)

plot of chunk profit_linear_demand

ggplot(select(df.linear, Prices, Profit)) + aes(x = Prices, y = Profit) +
  geom_point() + geom_vline(xintercept = p.max.profit, lty = 2) +
  geom_line(data = df.linear, aes(x = Prices, y = Profit.fitted), color = 'blue')

plot of chunk profit_linear_demand

Constant elasticity model

For the constant elasticity model, since lim_{Delta rightarrow 0}frac{Delta D}{Delta p} = d'(p), we have that:

displaystyle epsilon = frac{%D}{%p} = frac{pDelta D}{DDelta p} = -frac{d'(p)p}{D}.

Therefore,

displaystyle d'(p^star)p^star + d(p^star) = d'(p^star)c,
displaystyle frac{d'(p^star)p^star}{d(p^star)} + 1 = frac{d'(p^star)c}{d(p^star)},
displaystyle -epsilon + 1 = epsilon frac{c}{p^star},
displaystyle p^star = frac{epsilon c}{1-epsilon} = frac{c}{1-1/epsilon}.

Moreover, knowing that frac{%D}{%p} sim frac{Delta ln D}{Delta ln p} and using the constant elasticity model, we have that:

displaystyle epsilon sim lim_{Delta rightarrow0} frac{Delta ln D}{Delta ln P} = frac{dln D}{dln p} = alpha.

Thus, we can calculate the optimum profit price for the constant elasticity model as:

displaystyle p^star = frac{c}{1 - frac{1}{|alpha|}}

It is interesting to note that one needs 1″ title=”|alpha| > 1″>, otherwise the profit function will be convex with respect to price and the optimal price will be infty. If one have a monopolistic market, normally this assumption holds.

Example:

# Synthetic data
p = seq(80,130)
d = constant_elast(p, alpha = -3, beta = 15)*exp(rnorm(sd = .15, length(p)))
c = 75
profit = d*(p-c)

# Fitting of demand model
model2 = lm(log(d)~log(p))
profit.fitted = exp(model2$fitted.values)*(p - c)

# pricing optimization
alpha = model2$coefficients[2]
p.max.profit = c/(1-1/abs(alpha))

# Plots
df.const_elast = data.frame('Prices' = p, 'Demand' = d,
                       'Profit.fitted' = profit.fitted, 'Profit' = profit)

ggplot(select(df.const_elast, Prices, Demand)) + aes(x = log(Prices), y = log(Demand)) +
  geom_point() + geom_smooth(method = lm)

plot of chunk profit_constant_elastc

ggplot(select(df.const_elast, Prices, Profit)) + aes(x = Prices, y = Profit) +
  geom_point() + geom_vline(xintercept = p.max.profit, lty = 2) +
  geom_line(data = df.const_elast, aes(x = Prices, y = Profit.fitted), color = 'blue')

plot of chunk profit_constant_elastc

Logistic model

For the logistic function, one can check that d'(p) = alpha d(p)(1-d(p)/C). Thus:

displaystyle d'(p^star)(p^star - c) + d(p^star) = 0,
displaystyle alpha d(p^star)(1-d(p^star)/C)(p^star-c) + d(p^star) = 0,
displaystyle alpha(1-d(p^star)/C)(p^star-c) + 1 = 0,
displaystyle frac{alpha e^{-alpha(p^star - p_0)}(p^star - c) + 1+ e^{-alpha(p^star - p_0)}}{1+ e^{-alpha(p^star - p_0)}} = 0,
displaystyle alpha(p^star-c)+1]e^{-alpha(p^star - p_0)} + 1 = 0.

Since the last equation above does not have an analytical solution (at least we couldn’t solve it), one can easily find the result with a newton-step algorithm or minimization problem. We will use the second approach with the following formulation:

displaystyle min_{p in mathbb{R}} big{(}[alpha(p-c)+1]e^{-alpha(p - p_0)} + 1big{)}^2

Example:

# Objective functions for optimization
demand_objective = function(par, p, d) sum((d - logistic(p, par[1], par[2], par[3]))^2)
price_objective = function(p, alpha, c, p0) (exp(-alpha*(p-p0))*(alpha*(p-c)+1) + 1)^2 

# A cleaner alternative for pricing optimization is to min:
price_objective2 = function(p, c, alpha, C, p0) -logistic(p, C, alpha, p0)*(p-c)

# synthetic data
p = seq(80,130)
c = 75
d = logistic(p, 120, -.15, 115) + rnorm(sd = 10, length(p))
profit = d*(p-c)

# Demand fitting, we can't use lm anymore
par.start = c(max(d), 0, mean(d)) # initial guess

demand_fit = optim(par = par.start, fn = demand_objective, method = 'BFGS',
                   p = p, d = d)

par = demand_fit$par # estimated parameters for demand function
demand.fitted = logistic(p, c = par[1], alpha = par[2], p0 = par[3])
profit.fitted = demand.fitted*(p - c)

# Pricing Optimization, we don't have a closed expression anymore
price_fit = optim(mean(p), price_objective, method = 'BFGS',
                  alpha = par[2], c = c, p0 = par[3])

# or

price_fit2 = optim(mean(p), price_objective2, method = 'BFGS',
                  c = c, C = par[1], alpha = par[2], p0 = par[3]) 

# both results are almost identical
p.max.profit = price_fit$par

# Graphics
df.logistic = data.frame('Prices' = p, 'Demand' = d, 'Demand.fitted' = demand.fitted,
                       'Profit.fitted' = profit.fitted, 'Profit' = profit)

ggplot(select(df.logistic, Prices, Demand)) + aes(x = Prices, y = Demand) +
  geom_point() +
  geom_line(data = df.logistic, aes(x = Prices, y = Demand.fitted), color = 'blue')

plot of chunk profit_logistic_demand

ggplot(select(df.logistic, Prices, Profit)) + aes(x = Prices, y = Profit) +
  geom_point() + geom_vline(xintercept = p.max.profit, lty = 2) +
  geom_line(data = df.logistic, aes(x = Prices, y = Profit.fitted), color = 'blue')

plot of chunk profit_logistic_demand

I hope you liked the examples. In the next post we will discuss about choice models, which are demand models when products are heterogeneous. Goodbye and good luck!

References

Phillips, Robert Lewis. Pricing and revenue optimization. Stanford University Press, 2005.

To leave a comment for the author, please follow the link and comment on their blog: R – insightR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

rqdatatable: rquery Powered by data.table

By John Mount

Presenttimings 18

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package.

rquery is already one of the fastest and most teachable (due to deliberate conformity to Codd’s influential work) tools to wrangle data on databases and big data systems. And now rquery is also one of the fastest methods to wrangle data in-memory in R (thanks to data.table, via a thin adaption supplied by rqdatatable).

Teaching rquery and fully benchmarking it is a big task, so in this note we will limit ourselves to a single example and benchmark. Our intent is to use this example to promote rquery and rqdatatable, but frankly the biggest result of the benchmarking is how far out of the pack data.table itself stands at small through large problem sizes. This is already known, but it is a much larger difference and at more scales than the typical non-data.table user may be aware of.

The R package development candidate rquery 0.5.0 incorporates a number of fixes and improvements. One interesting new feature is the DBI package is now suggested or optional, instead of required. This means rquery is ready to talk to non-DBI big data systems such as SparkR (example here) and it let us recruit a very exciting new rquery service provider: data.table!

data.table is, by far, the fastest way to wrangle data at scale in-memory in R. Our experience is that it starts to outperform base R internals and all other packages at moderate data sizes such as mere tens or hundreds of rows. Of course data.table is most famous for its performance in the millions of rows and gigabytes of data range.

However, because of the different coding styles there are not as many comparative benchmarks as one would like. So performance is often discussed as anecdotes or rumors. As a small step we are going to supply a single benchmark based on our “score a logistic regression by hand” problem from “Let’s Have Some Sympathy For The Part-time R User” (what each coding solution looks like can be found here).

In this note we compare idiomatic solutions to the example problem using: rquery, data.table, base R (using stats::aggregate()), and dplyr. dplyr is included due to its relevance and popularity. Full details of the benchmarking can be found here and full results here. One can always do more benchmarking and control for more in experiments. One learns more from a diversity of benchmarks than from critiquing any one benchmark, so we will work this example briefly and provide links to a few others benchmarks. Our measurements confirm the common (correct) observation and conclusion: that data.table is very fast. Our primary new observation is that the overhead from the new rqdatatable adapter is not too large and rqdatatable is issuing reasonable data.table commands.

Both the rquery and dplyr solutions can be run in multiple modalities: allowing the exact same code to be used in memory or on a remote big data system (a great feature, critical for low-latency rehearsal and debugging). These two systems can be run as follows.

  • rquery is a system for describing operator trees. It deliberately does not implement the data operators, but depends on external systems for implementations. Previously any sufficiently standard SQL92 database that was R DBI compliant could serve as a back-end or implementation. This already includes the industrial scale database PostgreSQL and the big data system Apache Spark (via the SparklyR package). The 0.5.0 development version of rquery relaxes the DBI requirement (allowing rquery to be used directly with SparkR) and admits the possibility of non-SQL based implementations. We have a new data.table based implementation in development as the rqdatatable package.
  • dplyr also allows multiple implementations (in-memory, DBI SQL, or data.table). We tested all three, and the dplyr pipeline worked identically in-memory and with PostgreSQL. However, the dtplyr pipeline did not generate valid data.table commands, due to an issue with window functions or ranking, so we were not able to time it using data.table.

We are thus set up to compare to following solutions to the logistic scoring problem:

  • A direct data.table solution running in memory.
  • The base R stats::aggregate() solution working on in-memory data.frames.
  • The rquery solution using the data.table service provider rqdatatable to run in memory.
  • The rquery solution sending data to PostgreSQL, performing the work in database, and then pulling results back to memory.
  • The dplyr solution working directly on in-memory data.frames.
  • The dplyr solution working directly on in-memory tibble:tbls (we are not counting any time for conversion).
  • The dplyr solution sending data to PostgreSQL, performing the work in database, and then pulling results back to memory.

Running a 1,000,000 row by 13 column example can be summarized with the following graph.

The vertical dashed line is the median time that repeated runs of the base R stats::aggregate() solution took. We can consider results to the left of it as “fast” and results to the right of it as “slow.” Or in physical terms: data.table and rquery using data.table each take about 1.5 seconds on average. Whereas dplyr takes over 20 seconds on average. These two durations represent vastly different user experiences when attempting interactive analyses.

We have run some more tests to try to see how this is a function of problem scale (varying the number of rows of the data). Due to the large range (2 to 10,000,000 rows) we are using log scales, but they unfortunately are just not as readable as the linear scales.

Presenttimings 22
What we can read off this graph includes:

  • data.table is always the fastest system (or at worst indistinguishable from the fastest system) for this example,
    at the scales of problems tested, and for this configuration and hardware.
  • The data.table backed version of rquery becomes comparable to native data.table itself at around 100,000 rows. This is evidence the translation overhead is not too bad for this example and that the sequence
    of data.table commands issued by rqdatatable are fairly good practice data.table.
  • The database backed version of rquery starts to outperform dplyr at around 10,000 rows. Note: all database measurements include the overhead of moving the data to the database and then moving the results back to R. This is slower than how one would normally use a database in production: with data starting and ending on the database and no data motion between R and the database.
  • dplyr appears to be slower than the base R stats::aggregate() solution at all measured scales (it is always above the shaded region).
  • It is hard to read, but changes in heights are ratios of runtimes. For example the data.table based solutions are routinely over 10 times faster that the dplyr solutions once we get to 100,000 rows or more. This is an object size of only about 10 megabytes and is well below usual “use data.table once you are in the gigabytes range” advice.

Of course benchmarks depend on the example problems, versions, and machines- so results will vary. That being said, large differences often have a good chance of being preserved across variations of tests (and we share another grouped example here, and a join example here; for the join example dplyr is faster at smaller problem sizes- so results do depend on task and scale).

We are hoping to submit the rquery update to CRAN in August and then submit rqdatatable as a new CRAN package soon after. Until then you can try both packages by a simple application of:

devtools::install_github("WinVector/rqdatatable")

These are new packages, but we think they can already save substantial development time, documentation time, debugging time, and machine investment in “R and big data” projects. Our group (Win-Vector LLC) is offering private training in rquery to get teams up to speed quickly.

Note: rqdatatable is an implementation of rquery supplied by data.table, not a data.table scripting tool (as rqdatatable does not support important data.table features not found in rquery, such as rolling joins).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Fancy Plot (with Posterior Samples) for Bayesian Regressions

By Dominique Makowski

(This article was first published on Dominique Makowski, and kindly contributed to R-bloggers)

As Bayesian models usually generate a lot of samples (iterations), one could want to plot them as well, instead (or along) the posterior “summary” (with indices like the 90% HDI). This can be done quite easily by extracting all the iterations in get_predicted from the psycho package.

The Model

# devtools::install_github("neuropsychology/psycho.R")  # Install the last psycho version if needed

# Load packages
library(tidyverse)
library(psycho)

# Import data
df  psycho::affective

# Fit a logistic regression model
fit  rstanarm::stan_glm(Sex ~ Adjusting, data=df, family = "binomial")

We fitted a Bayesian logistic regression to predict the sex (W / M) with one’s ability to flexibly adjust to his/her emotional reaction.

Plot

To visualize the model, the most neat way is to extract a “reference grid” (i.e., a theorethical dataframe with balanced data). Our refgrid is made of equally spaced predictor values. With it, we can make predictions using the previously fitted model. This will compute the median of the posterior prediction, as well as the 90% credible interval. However, we’re interested in keeping all the prediction samples (iterations). Note that get_predicted automatically transformed log odds ratios (the values in which the model is expressed) to probabilities, easier to apprehend.

# Generate a new refgrid
refgrid  df %>% 
  dplyr::select(Adjusting) %>% 
  psycho::refdata(length.out=10)

# Get predictions and keep iterations
predicted  psycho::get_predicted(fit, newdata=refgrid, keep_iterations=TRUE)

# Reshape this dataframe to have iterations as factor
predicted  predicted %>% 
  tidyr::gather(Iteration, Iteration_Value, starts_with("iter"))

# Plot all iterations with the median prediction
ggplot(predicted, aes(x=Adjusting)) +
  geom_line(aes(y=Iteration_Value, group=Iteration), size=0.3, alpha=0.01) +
  geom_line(aes(y=Sex_Median), size=1) + 
  ylab("Probability of being a mann") +
  theme_classic()

Credits

This package helped you? Don’t forget to cite the various packages you used 🙂

You can cite psycho as follows:

  • Makowski, (2018). The psycho Package: an Efficient and Publishing-Oriented Workflow for Psychological Science. Journal of Open Source Software, 3(22), 470. https://doi.org/10.21105/joss.00470
To leave a comment for the author, please follow the link and comment on their blog: Dominique Makowski.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Ceteris Paribus Plots – a new DALEX companion

By smarterpoland

(This article was first published on English – SmarterPoland.pl, and kindly contributed to R-bloggers)

If you like magical incantations in Data Science, please welcome the Ceteris Paribus Plots. Otherwise feel free to call them What-If Plots.

Ceteris Paribus (latin for all else unchanged) Plots explain complex Machine Learning models around a single observation. They supplement tools like breakDown, Shapley values, LIME or LIVE. In addition to feature importance/feature attribution, now we can see how the model response changes along a specific variable, keeping all other variables unchanged.

How cancer-risk-scores change with age? How credit-scores change with salary? How insurance-costs change with age?

Well, use the ceterisParibus package to generate plots like the one below.
Here we have an explanation for a random forest model that predicts apartments prices. Presented profiles are prepared for a single observation marked with dashed lines (130m2 apartment on 3rd floor). From these profiles one can read how the model response is linked with particular variables.

Instead of original values on the OX scale one can plot qunatiles. This way one can put all variables in a single plot.

And once all variables are in the same scale, one can compare two or more models.

Yes, they are model agnostic and will work for any model!
Yes, they can be interactive (see plot_interactive function or examples below)!
And yes, you can use them with other DALEX explainers!
More examples with R code.

To leave a comment for the author, please follow the link and comment on their blog: English – SmarterPoland.pl.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

StatCheck the Game

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

If you don’t get enough joy from publishing scientific papers in your day job, or simply want to experience what it’s like to be in a publish-or-perish environment where the P-value is the only important part of a paper, you might want to try StatCheck: the board game where the object is to publish two papers before any of your opponents.

As the game progresses, players combine “Test”, “Statistic” and “P-value” cards to form the statistical test featured in the paper (and of course, significant tests are worth more than non-significant ones). Opponents may then have the opportunity to play a “StatCheck” card to challenge the validity of the test, which can then be verified using a companion R package or online Shiny application. Other modifier cards include “Bayes Factor” (which can be used to boost the value of your own papers, or diminish the value of an opponents’), “Post-Hoc Theory” (improving the value of already-published papers), and “Behind the Paywall” (making it more difficult to challenge the validity of your statistics).

StatCheck The Game was created by Sacha Epskamp and Adela Isvoranu, who provide all the the code to create the cards as open source on GitHub, along with instructions to print and play with your own game materials. You can find everything you need (except the needed 8-sided die and some like-minded friends to play with) at the link below.

StatCheck: An open source game for methodological terrorists!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon

By Tinniam V Ganesh

(This article was first published on R – Giga thoughts …, and kindly contributed to R-bloggers)

The second edition of my book ‘Practical Machine Learning with R and Python – Machine Learning in stereo’ is now available in both paperback ($10.99) and kindle ($7.99/Rs449) versions. This second edition includes more content, extensive comments and formatting for better readability.

In this book I implement some of the most common, but important Machine Learning algorithms in R and equivalent Python code.
1. Practical machine with R and Python: Second Edition – Machine Learning in Stereo(Paperback-$10.99)
2. Practical machine with R and Python Second Edition – Machine Learning in Stereo(Kindle- $7.99/Rs449)

This book is ideal both for beginners and the experts in R and/or Python. Those starting their journey into datascience and ML will find the first 3 chapters useful, as they touch upon the most important programming constructs in R and Python and also deal with equivalent statements in R and Python. Those who are expert in either of the languages, R or Python, will find the equivalent code ideal for brushing up on the other language. And finally,those who are proficient in both languages, can use the R and Python implementations to internalize the ML algorithms better.

Here is a look at the topics covered

Table of Contents
Preface …………………………………………………………………………….4
Introduction ………………………………………………………………………6
1. Essential R ………………………………………………………………… 8
2. Essential Python for Datascience ……………………………………………57
3. R vs Python …………………………………………………………………81
4. Regression of a continuous variable ……………………………………….101
5. Classification and Cross Validation ………………………………………..121
6. Regression techniques and regularization ………………………………….146
7. SVMs, Decision Trees and Validation curves ………………………………191
8. Splines, GAMs, Random Forests and Boosting ……………………………222
9. PCA, K-Means and Hierarchical Clustering ………………………………258
References ……………………………………………………………………..269

Pick up your copy today!!
Hope you have a great time learning as I did while implementing these algorithms!

To leave a comment for the author, please follow the link and comment on their blog: R – Giga thoughts ….

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Praise you like I should: Shiny Appreciation Month

By Mango Solutions

(This article was first published on Mango Solutions, and kindly contributed to R-bloggers)

Aimée Gott, Education Practice Lead

Back in the summer of 2012 I was meant to be focusing on one thing: finishing my thesis. But, unfortunately for me, a friend and former colleague came back from a conference (JSM) and told me all about a new package that she had seen demoed.

“You should sign up for the beta testing and try it out,” she said.

So, I did.

That package was Shiny and after just a couple of hours of playing around I was hooked. I was desperate to find a way to incorporate it into my thesis, but never managed to; largely due to the fact it wasn’t available on CRAN until a few months after I had submitted and because, at the time, it was quite limited in its functionality. However, I could see the potential – I was really excited about the ways it could be used to make analytics more accessible to non-technical audiences. After joining Mango I quickly became a Shiny advocate, telling everyone who would listen about how great it was.

Six years on at Mango, not a moment goes by when somebody in the team isn’t using Shiny for something. From prototyping to large scale deployments, we live and breathe Shiny. And we are extremely grateful to the team at RStudio—led by Joe Cheng—for the continued effort that they are putting in to its development. It really is a hugely different tool to the package I beta tested so long ago.

As Shiny has developed and the community around it has become greater so too has the need to teach it because more people than ever are looking to become Shiny users. For a number of years, we have been teaching the basics of Shiny to those who want to get started, and more serious development tools to those who want to deploy apps in production. But increasingly, we have seen a demand for more. And as the Shiny team have added more and more functionality it was time for a major update to our teaching materials.

Over the past six months we have had many long discussions over what functionality should be included. We have debated best practices, we have drawn on all of our combined experiences of both learning and deploying Shiny, and we eventually reached a consensus over what we felt was best for industry users of Shiny to learn.

We are now really pleased to announce an all new set of Shiny training courses.

Our courses cover everything from taking your first steps in building a Shiny application, to building production-ready applications and a whole host of topics in between. For those who want to take a private course we can tailor to your needs, and topics as diverse as getting the most from tables in DT to managing database access in apps can all be covered in just a few days.

For us, an important element of these courses, is that they are all taught by data science consultants who have hands-on experience building and deploying apps for commercial use. These consultants are supported by platform experts who can advise on the best approaches for getting an app out to end users so that you can see the benefits of using Shiny as quickly as possible.

But, one blog post was never going to be enough for all of the Shiny enthusiasts at Mango to share their passion. We needed more time, more than one blog post and more ways to share with the community.

Therefore, Mango are declaring June to be Shiny Appreciation Month!

For the whole of June, we will be talking all things Shiny. Follow us on Twitter where we will be sharing tips, ideas and resources. To get involved, share your own with us and the Shiny community, using #ShinyAppreciation. On the blog we will be sharing, among other things, some of the ways we are using Shiny in industry and some of the technical challenges we have had to overcome.

Watch this space for updates but, for now, if you want to know more about the Shiny training that we offer, take a look at our training pages. If you are based in the UK we will be running public Shiny courses in London (see below for the currently scheduled dates). We will also be offering a snapshot of the materials for intermediate Shiny users at London EARL in September.

Public course dates:

Introduction to Shiny: 17th July
Intermediate Shiny: 18th July, 5th September
Advanced Shiny: 6th September

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Coloring Sudokus

By @aschinchon

(This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers)

Someday you will find me
caught beneath the landslide
(Champagne Supernova, Oasis)

I recently read a book called Snowflake Seashell Star: Colouring Adventures in Numberland by Alex Bellos and Edmund Harris which is full of mathematical patterns to be coloured. All images are truly appealing and cause attraction to anyone who look at them, independently of their age, gender, education or political orientation. This book demonstrates how maths are an astonishing way to reach beauty.

One of my favourite patterns are tridokus, a sophisticated colored version of sudokus. Coloring a sudoku is simple: once that is solved it is enough to assign a color to each number (from 1 to 9). If you superimpose three colored sudokus with no cells at the same position sharing the same color, and using again nine colors, the resulting image is a tridoku:

There is something attractive in a tridoku due to the balance of colors but also they seem a quite messy: they are a charmingly unbalanced. I wrote a script to generalize the concept to n-dokus. The idea is the same: superimpose n sudokus without cells sharing color and position (I call them disjoint sudokus) using just nine different colors. I did’n’t prove it, but I think the maximum amount of sudokus can be overimposed with these constrains is 9. This is a complete series from 1-doku to 9-doku (click on any image to enlarge):








I am a big fan of colourlovers package. These tridokus are colored with some of my favourite palettes from there:




Just two technical things to highlight:

  • There is a package called sudoku that generates sudokus (of course!). I use it to obtain the first solved sudoku which forms the base.
  • Subsequent sudokus are obtained from this one doing two operations: interchanging groups of columns first (there are three groups: columns 1 to 3, 4 to 6 and 7 to 9) and interchanging columns within each group then.

You can find the code here: do you own colored n-dokus!

To leave a comment for the author, please follow the link and comment on their blog: R – Fronkonstin.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News