Learn Shiny in an Afternoon!

By Ari Lamstein

(This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers)

Today I am happy to announce my newest course: Learn Shiny in an Afternoon!

To learn more, watch this video:

Shiny has revolutionized the way that I view data analysis. Instead of creating a static report, I can now create a web app that lets users explore datasets on their own. Learning to create Shiny apps will improve the quality of your analyses.

Learn Shiny in an Afternoon contains 15 video lessons that contain screencasts as well as downloadable code. It is designed to teach you to create your own apps in just an afternoon.

Learn Shiny in an Afternoon is the latest addition to my membership. By becoming a member you will gain access to this course, my other courses, the members-only forum and live office hours. Join today!

Explore Membership

The post Learn Shiny in an Afternoon! appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

R 3.3.2 now available

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

R 3.3.2, the latest update to the R language, was released today. Binary releases for Linux and Mac are available now from your local CRAN mirror, and the Windows builds will be available shortly.

As a minor update to the R 3.3 series, this update focuses mainly on fixing bugs and doesn’t make any major changes to the langauge. As a result, you can expect existing scripts and packages to continue to work if you’re upgrading from R 3.3.1. This update includes some performance improvements (particularly in calculation of eigenvalues), better handling of date axes in graphics, and improved documentation for the methods package. (Fun fact: when printed as a 9Mb PDF reference manual, the documentation for the R base and recommended packages now runs to 3452 pages. That’s almost 3 copies of War and Peace!)

The nickname for this release is “Sincere Pumpkin Patch”, in recognition of the Halowe’en release date and (per RWeekly) references this clip from “It’s the Great Pumpkin, Charlie Brown”.

For the official announcement from the R core team including the detailed list of changes, see the posting in the R-announce mailing list linked below.

R-announce mailing list: R 3.3.2 is released

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

ShinyProxy 0.6.0 released!

By tobias

(This article was first published on Open Analytics – Blog, and kindly contributed to R-bloggers)
Monday 31 October 2016 – 15:58

ShinyProxy is a novel, open source platform to deploy Shiny apps for the enterprise or larger organizations.

Why is this needed?

There is currently no valid open source alternative that offers this functionality.

What does it offer?

  • authentication
  • authorization
  • securing traffic with TLS/SSL
  • usage statistics
  • scalability

This is free and open source, is there also a paying and proprietary version?

If we can paraphrase the Discourse guys:

There is only one version of ShinyProxy – the awesome open source version. There’s no super secret special paid commercial version with better or more complete features. Because ShinyProxy is 100% open source, now and forever, it belongs to you as much as it belongs to us. That’s how community works.

Want to learn more and give it a try?

The project has full documentation here:

http://shinyproxy.io

There is a

besides a

Get community support?

https://support.openanalytics.eu

Other blog posts will follow. In the mean time have fun!

This post is about:

r, shinyproxy

To leave a comment for the author, please follow the link and comment on their blog: Open Analytics – Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Running the Numbers – How Can Hamilton Still Take the 2016 F1 Drivers’ Championship?

By Tony Hirst

f1_driver_championship_scenarios__2016

(This article was first published on Rstats – OUseful.Info, the blog…, and kindly contributed to R-bloggers)

Way back in 2012, I posted a simple R script for trying to work out the finishing combinations in the last two races of that year’s F1 season for Fernando Alonso and Sebastien Vettel to explore the circumstances under which Alonso could take the championship (Paths to the F1 2012 Championship Based on How They Might Finish in the US Grand Prix); I also put together a simple shiny version of the script to make it bit more app like (Interactive Scenarios With Shiny – The Race to the F1 2012 Drivers’ Championship), which I also updated for the 2014 season (F1 Championship Race, 2014 – Winning Combinations…).

And now we come to 2016, and once again, with two races to go, there are two drivers in with a chance of winning overall… But what race finishing combinations could see Hamilton make a last stand and reclaim his title? The F1 Drivers’ Championship Scenarios, 2016 shiny app will show you…

You can find the code in a gist here:

To leave a comment for the author, please follow the link and comment on their blog: Rstats – OUseful.Info, the blog….

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Weighted Effect Coding: Dummy coding when size matters

By Rense Nieuwenhuis

(This article was first published on Rense Nieuwenhuis » R-Project, and kindly contributed to R-bloggers)

If your regression model contains a categorical predictor variable, you commonly test the significance of its categories against a preselected reference category. If all categories have (roughly) the same number of observations, you can also test all categories against the grand mean using effect (ANOVA) coding. In observational studies, however, the number of observations per category typically varies. We published a paper in the International Journal of Public Health, showing how all categories can be tested against the sample mean.

In a second paper in the same journal, the procedure is expanded to regression models that test interaction effects. Within this framework, the weighted effect coded interaction displays the extra effect on top of the main effect found in a model without the interaction effect. This offers a promising new route to estimate interaction effects in observational data, where different category sizes often prevail.

To apply the procedures introduced in these papers, called weighted effect coding, procedures are made available for R, SPSS, and Stata. For R, we created the ‘wec’ package which can be installed by typing:

install.packages(“wec”)

To leave a comment for the author, please follow the link and comment on their blog: Rense Nieuwenhuis » R-Project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Poster/cheatsheet for R/BioC package genomation

By altuna

(This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers)
We prepared a poster/cheatsheet for the bioconductor package genomation, which is a package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input. [download from slideshare for a better resolution]
To leave a comment for the author, please follow the link and comment on their blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Detecting outliers and fraud with R and SQL Server on my bank account data – Part 1

By tomaztsql

(This article was first published on R – TomazTsql, and kindly contributed to R-bloggers)

Detecting outliers and fraudulent behaviour (transactions, purchases, events, actions, triggers, etc.) takes a large amount of experiences and statistical/mathetmatical background.

One of the samples Microsoft provided with release of new SQL Server 2016 was using simple logic of Benford’s law. This law works great with naturally occurring numbers and can be applied across any kind of problem. By naturally occurring, it is meant a number that is not generated generically such as a page number in a book, incremented number in your SQL Table, sequence number of any kind, but numbers that are occurring irrespective from each other, in nature (length or width of trees, mountains, rivers), length of the roads in the cities, addresses in your home town, city/country populations, etc. The law calculates the log distribution of numbers from 1 to 9 and stipulates that number one will occur 30% of times, number two will occur 17% of time, number three will occur 12% of the time and so on. Randomly generated numbers will most certainly generate distribution for each number from 1 to 9 with probability of 1/9. It might also not work with restrictions; for example height expressed in inches will surely not produce Benford function. My height is 188 which is 74 inches or 6ft2. All three numbers will not generate correct distribution, even though height is natural phenomena.

So Probability of number starting with number n equals to log(n+1) – log(n) with base 10. Keeping in mind this formula. So what is probability that a number starts with 29, is log(30) – log(29) = 1.4771 – 1.4623 = 0.0148. But keep in mind that this law applies only to numbers where number 1 appears approx. 30% of the time.

So just a quick example to support this theory, we will take

benfordL <- function(d){return(log(1+1/(1:9))/log(10))}
#distribution of numbers according to formula
benfordL(1:9)
#plot with numbers
plot(benfordL(1:9))

Scatterplot with log distribution of numbers from 1 to 9:

Code for Benford’s Law is available at RosettaCode for your favorite programming language.

Now I want to check this rule agains the data in WideWorldImporters. I will use the following query to test Benford’s Law:

SELECT TransactionAmount
FROM [WideWorldImporters].[Sales].[CustomerTransactions]
WHERE
    TransactionAmount > 0

For this manner I will execute R Script:

DECLARE @RScript NVARCHAR(MAX)
SET @RScript = N'
            WWWTrans <- InputDataSet
            get_first_num <- function(number){return(as.numeric(substr(number,1,1)))}
            pbenford <- function(d){return(log10(1+(1/d)))}
            lead_dig <- mapply(get_first_num,WWWTrans$TransactionAmount)
            obs_freq_WWWTrans <- table(lead_dig)/1000
            OutputDataSet <- data.frame(obs_freq_WWWTrans)'

DECLARE @SQLScript NVARCHAR(MAX)
SET @SQLScript = N'
        SELECT 
            TransactionAmount
        FROM [WideWorldImporters].[Sales].[CustomerTransactions]
        WHERE 
            TransactionAmount > 0'

 EXECUTE sp_execute_external_script
          @language = N'R'
        , @script = @RScript
        , @input_data_1 = @SQLScript
        
WITH result SETS ((
                     lead_dig INT
                    ,Freq DECIMAL(10,2)
));

By comparing distribution of general Benford’s Law and TransactionAmount there are some discrepancies but in general they can follow same log distribution.

2016-10-31-11_06_22-plot-zoom

In Part 2 I will cover outlier detection with GLM and Random forest using my bank account dataset.

For Benford’s Law there are couple of R packages available Benford.Analysis and BenfordTests.

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

xlim_tree: set x axis limits for only Tree panel

By R on Guangchuang Yu

(This article was first published on R on Guangchuang YU, and kindly contributed to R-bloggers)

A ggtree user recently asked me the following question in google group:

I try to plot long tip labels in ggtree and usually adjust them using xlim(), however when creating a facet_plot xlim affects all plots and minimizes them.

Is it possible to work around this and only affect the tree and it’s tip labels leaving the other plots in facet_plot unaffected?

This is indeed a desire feature, as ggplot2 can’t automatically adjust xlim for text since the units are in two different spaces (data and pixel).

Here is an example, the tip labels are truncated.

set.seed(2016-10-31)
tr =rtree(50)
tr$tip.label = paste(tr$tip.label, tr$tip.label, sep="_")
p <- ggtree(tr) + geom_tiplab(align=TRUE) + theme_tree2()
d = data.frame(id = tr$tip.label, v= rnorm(50))

facet_plot(p, geom=geom_point, data=d, mapping=aes(x=v), panel='dot') + 
        ggtitle("truncated tip labels")

If we only visualize the tree, it is easy to solve by using xlim() to allocate more space for the lables. But xlim() works for all panels. Combination of facet_plot() and xlim() will produce figure with many spaces.

facet_plot(p+xlim(NA, 6), geom=geom_point, data=d, mapping=aes(x=v), panel='dot') + 
        ggtitle("xlim applies to all panels")

To overcome this issue, ggtree provides xlim_tree to set x axis limits for only the Tree panel.

facet_plot(p+xlim_tree(6), geom=geom_point, data=d, mapping=aes(x=v), panel='dot') + 
        ggtitle('*xlim_tree* only change x axis limits of *Tree* panel')

# or using:
# facet_plot(p, geom=geom_point, data=d, mapping=aes(x=v), panel='dot') + xlim_tree(6)

Citation

G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. doi:10.1111/2041-210X.12628.

To leave a comment for the author, please follow the link and comment on their blog: R on Guangchuang YU.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Fastest Way to Add New Variables to A Large Data.Frame

By statcompute

(This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers)
pkgs <- list("hflights", "doParallel", "foreach", "dplyr", "rbenchmark", "data.table")
lapply(pkgs, require, character.only = T)

data(hflights)

benchmark(replications = 10, order = "user.self", relative = "user.self",
  transform = {
    ### THE GENERIC FUNCTION MODIFYING THE DATA.FRAME, SIMILAR TO DATA.FRAME() ###
    transform(hflights, wday = ifelse(DayOfWeek %in% c(6, 7), 'weekend', 'weekday'), delay = ArrDelay + DepDelay)
  },
  within    = {
    ### EVALUATE THE EXPRESSION WITHIN THE LOCAL ENVIRONMENT ###
    within(hflights, {wday = ifelse(DayOfWeek %in% c(6, 7), 'weekend', 'weekday'); delay = ArrDelay + DepDelay})
  },
  mutate   = {
    ### THE SPECIFIC FUNCTION IN DPLYR PACKAGE TO ADD VARIABLES ###
    mutate(hflights, wday = ifelse(DayOfWeek %in% c(6, 7), 'weekend', 'weekday'), delay = ArrDelay + DepDelay)
  },
  foreach = {
    ### SPLIT AND THEN COMBINE IN PARALLEL ###
    registerDoParallel(cores = 2)
    v <- c(names(hflights), 'wday', 'delay')
    f <- expression(ifelse(hflights$DayOfWeek %in% c(6, 7), 'weekend', 'weekday'),
                    hflights$ArrDelay + hflights$DepDelay)
    df <- foreach(fn = iter(f), .combine = mutate, .init = hflights) %dopar% {
      eval(fn)
    }
    names(df) <- v
  },
  data.table = {
    ### DATA.TABLE ###
    data.table(hflights)[, c("wday", "delay") := list(ifelse(hflights$DayOfWeek %in% c(6, 7), 'weekend', 'weekday'), hflights$ArrDelay + hflights$DepDelay)]
  }
)

#         test replications elapsed relative user.self sys.self user.child
# 4    foreach           10   1.442    1.000     0.240    0.144      0.848
# 2     within           10   0.667    2.783     0.668    0.000      0.000
# 3     mutate           10   0.679    2.833     0.680    0.000      0.000
# 5 data.table           10   0.955    3.983     0.956    0.000      0.000
# 1  transform           10   1.732    7.200     1.728    0.000      0.000

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

ratio-of-uniforms [#2]

By xi’an

(This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers)

Following my earlier post on Kinderman’s and Monahan’s (1977) ratio-of-uniform method, I must confess I remain quite puzzled by the approach. Or rather by its consequences. When looking at the set A of (u,v)’s in R⁺×X such that 0≤u²≤ƒ(v/u), as discussed in the previous post, it can be represented by its parameterised boundary

u(x)=√ƒ(x),v(x)=x√ƒ(x) x in X

Similarly, since the simulation from ƒ(v/u) can also be derived [check Luc Devroye’s Non-uniform random variate generation in the exercise section 7.3] from a uniform on the set B of (u,v)’s in R⁺×X such that 0≤u≤ƒ(v+u), on the set C of (u,v)’s in R⁺×X such that 0≤u³≤ƒ(v/√u)², or on the set D of (u,v)’s in R⁺×X such that 0≤u²≤ƒ(v/u), which is actually exactly the same as A [and presumably many other versions!, for which I would like to guess the generic rule of construction], there are many sets on which one can consider running simulations. And one to pick for optimality?! Here are the three sets for a mixture of two normal densities:

For instance, assuming slice sampling is feasible on every one of those three sets, which one is the most efficient? While I have no clear answer to this question, I found on Sunday night that a generic family of transforms is indexed by a differentiable monotone function h over the positive half-line, with the uniform distribution being taken over the set

H={(u,v);0≤u≤h(f(v/g(u))_}

when the primitive G of g is the inverse of h, i.e., G(h(x))=x. [Here are the slides I gave at the Warwick reading group on Devroye’s book:]

Filed under: Books, R, Statistics Tagged: Luc Devroye, mixtures of distributions, Non-Uniform Random Variate Generation, pseudo-random generator, R, ratio of uniform algorithm, slice sampler

To leave a comment for the author, please follow the link and comment on their blog: R – Xi’an’s Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News