R 3.4.1 “Single Candle” released

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The R core team announced today the release of R 3.4.1 (codename: Single Candle). This release fixes a few minor bugs reported after the release of R 3.4.0, including an issue sometimes encountered when attempting to install packages on Windows, and problems displaying functions including Unicode characters (like “日本語”) in the Windows GUI. The other fixes are mostly relevant to packages developers and those building R from source, and you can see the full list in the announcement linked below.

At the time of writing, Windows builds are already available at the main CRAN cloud mirror, and the Debian builds are out, but Mac builds aren’t there quite yet (unless you want to build from source). Binaries for all platforms should propagate across the mirror network over the next couple of days.

r-announce mailing list: R 3.4.1 is released

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Stan Weekly Roundup, 30 June 2017

By Bob Carpenter

TM version of logo

(This article was first published on R – Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers)

Here’s some things that have been going on with Stan since the last week’s roundup

The post Stan Weekly Roundup, 30 June 2017 appeared first on Statistical Modeling, Causal Inference, and Social Science.

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Modeling, Causal Inference, and Social Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Data Visualization with googleVis exercises part 5

By Euthymios Kasvikis

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

Candlestick, Pie, Gauge, Intensity Charts

In the fifth part of our journey we will meet some special but more and more usable types of charts that googleVis provides. More specifically you will learn about the features of Candlestick, Pie, Gauge and Intensity Charts.

Read the examples below to understand the logic of what we are going to do and then test yous skills with the exercise set we prepared for you. Lets begin!

Answers to the exercises are available here.

Package & Data frame

As you already know, the first thing you have to do is install and load the googleVis package with:
install.packages("googleVis")
library(googleVis)

Secondly we will create an experimental data frame which will be used for our charts’ plotting. You can create it with:
co=data.frame(country=c("US", "GB", "BR"),
population=c(15,17,19),
size=c(33,42,22))

NOTE: The charts are created locally by your browser. In case they are not displayed at once press F5 to reload the page.

Candlestick chart

It is quite simple to create a Candlestick Chart with googleVis. We will use the “OpenClose” dataset. Look at the example below:
CandleC
options=list(legend='none'))
plot(CandleC)

Exercise 1

Create a list named “CandleC” and pass to it the “OpenClose” dataset as an candlestick chart. HINT: Use gvisCandlestickChart().

Exercise 2

Plot the the candlestick chart. HINT: Use plot().

Pie chart

It is quite simple to create a Pie Chart with googleVis. We will use the “CityPopularity” dataset. Look at the example below:
PieC
plot(PieC)

Exercise 3

Create a list named “PieC” and pass to it the “CityPopularity” dataset as a pie chart. HINT: Use gvisPieChart().

Learn more about using GoogleVis in the online course Mastering in Visualization with R programming. In this course you will learn how to:

  • Work extensively with the GoogleVis package and its functionality
  • Learn what visualizations exist for your specific use case
  • And much more

Exercise 4

Plot the the pie chart. HINT: Use plot().

Gauge

The gauge chart is not very common compared with those we saw before but can be useful under certain circumstances. We will use the “CityPopularity” dataset. Look at the example:
GaugeC
plot(GaugeC)

Exercise 5

Create a list named “GaugeC” and pass to it the “CityPopularity” dataset as a gauge chart. HINT: Use gvisGauge().

Exercise 6

Plot the the gauge. HINT: Use plot().

The gauge gives you the ability to use colours in order to separate easier each area from the other. For example:
options=list(min=0, max=1200, blueFrom=900,blueTo=1200,greenFrom=600,
greenTo=900, yellowFrom=300, yellowTo=600,
redFrom=0, redTo=300, width=400, height=300)

Exercise 7

Separate the gauge to three areas by colours of your choice, from 0 to 900 and plot it. HINT: Use list().

Exercise 8

Set width to 400 and height to 300. HINT: Use width and height.

Intensity Map

The last chart we are going to see in this part is the Intensity Map.
It is quite simple to create an Intensity Map with googleVis. We will use the experimental data frame “co” we created before. Look at the example below:
IntensityC
plot(IntensityC)

Exercise 9

Create a list named “IntensityC” and pass to it the “co” dataset you just created as an intenisty map. HINT: Use gvisIntensityMap().

Exercise 10

Plot the the intensity map. HINT: Use plot().

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

HexJSON HTMLWidget for R, Part 3

By Tony Hirst

(This article was first published on Rstats – OUseful.Info, the blog…, and kindly contributed to R-bloggers)

In HexJSON HTMLWidget for R, Part 1 I described a basic HTMLwidget for rendering hexJSON maps using d3-hexJSON, and HexJSON HTMLWidget for R, Part 2 described updates for supporting colour.

Having booked off today for emergency family cover that turned out not to be required, I had another stab at the package, so it now supports the following additional features…

Firstly, I had a go at popping some “base” hexjson files into a location within the package from which I could load them (checkin). Based on a crib from here, which suggests putting datafiles into an extdata folder in the package inst/ folder, from where devtools::build() makes them available in the built package root directory.

hexjsonbasefiles 

With the files in place, we can use any base hexjson files included included in the package as the basis for hexmaps.

I also added the ability to switch off labels although later in the day I simplified this process…

One thing that was close to the top of my list was the ability to merge the contents of a dataframe into a hexJSON object. In particular, for a row identified by a particular key value associated with a hex key value, I wanted to map columns onto hex attributes. The hexjson object is represented as a list, so this required a couple of things: firstly, getting the dataframe data into an appropriate list form, secondly merging this into the hexjson list using the rlist::merge() function. Here's the gist of the trick I ended up with, which was to construct a list split() from each row in a dataframe, with the rowname as the list name, using lapply(.., as.list):

ll=lapply(split(customdata, rownames(customdata)), as.list)
jsondata$hexes = list.merge(jsondata$hexes, ll)

A hexjsondatamerge(hexjson,df) function takes a hexjson file and merges the contents of the dataframe into the hexes:

The contents of a dataframe can also be merged in directly when creating a hexjsonwidget:

Having started to work with dataframes, it also seemed like it might be sensible to support the creation of a hexjson object directly from a dataframe. This uses a similar trick to the one used in creating the nested list for the merge function:

hexjsonfromdataframe 

hexjsonpart3_6

As you might expect, we can then use the hexjson object to create a hexjsonwidget:

A hexjsonwidget can also be created directly from a dataframe:

In creating the hexjson-from-dataframe, I also refactored some of the other bits of code to simplify the number of parameters I'd started putting into the hexjsonwidget() function, in effect overloading them so the same named parameter could be used in different supporting functions.

I think that's pretty much it from the developments I had in mind for the package. Now all I need to do is put it into practice… testing for which will, no doubt, throw up issues!)

To leave a comment for the author, please follow the link and comment on their blog: Rstats – OUseful.Info, the blog….

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Source:: R News

Cubic and Smoothing Splines in R

By Anish Singh Walia

Splines are a smooth and flexible way of fitting Non linear Models and learning the Non linear interactions from the data.In most of the methods in which we fit Non linear Models to data and learn Non linearities is by transforming the data or the variables by applying a Non linear transformation.

Cubic Splines

Cubic Splines with knots(cutpoints) at (xi_K , K = 1, 2… k) is a piece-wise cubic polynomial with continious derivatives upto order 2 at each knot. They have continuous 1st and 2nd derivative. The order of continuity is = ( (d – 1) ) , where (d) is the degree of polynomial. Now we can represent the Model with truncated power Basis function (b(x)). What happens is that we transform the variables (X_i) by applying a Basis function (b(x)) and fit a model using these transformed variables which adds non linearities to the model and enables the splines to fit smoother and flexible Non linear functions.

The Regression Equation Becomes —

(f(x) = y_i = alpha + beta_1.b_1(x_i) + beta_2.b_2(x_i) + …. beta_{k+3}.b_{k+3}(x_i) + epsilon_i)

#loading the Splines Packages
require(splines)
#ISLR contains the Dataset
require(ISLR)
attach(Wage) #attaching Wage dataset
?Wage #for more details on the dataset
agelims

Now let's fit a Cubic Spline with 3 Knots (cutpoints)
The idea here is to transform the variables and add a linear combination of the variables using the Basis power function to the regression function f(x).The ( bs() ) function is used in R to fit a Cubic Spline.

#3 cutpoints at ages 25 ,50 ,60
fit## 
## Call:
## lm(formula = wage ~ bs(age, knots = c(25, 40, 60)), data = Wage)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.832 -24.537  -5.049  15.209 203.207 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       60.494      9.460   6.394 1.86e-10 ***
## bs(age, knots = c(25, 40, 60))1    3.980     12.538   0.317 0.750899    
## bs(age, knots = c(25, 40, 60))2   44.631      9.626   4.636 3.70e-06 ***
## bs(age, knots = c(25, 40, 60))3   62.839     10.755   5.843 5.69e-09 ***
## bs(age, knots = c(25, 40, 60))4   55.991     10.706   5.230 1.81e-07 ***
## bs(age, knots = c(25, 40, 60))5   50.688     14.402   3.520 0.000439 ***
## bs(age, knots = c(25, 40, 60))6   16.606     19.126   0.868 0.385338    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 39.92 on 2993 degrees of freedom
## Multiple R-squared:  0.08642,    Adjusted R-squared:  0.08459 
## F-statistic: 47.19 on 6 and 2993 DF,  p-value: 

Now plotting the Regression Line

#Plotting the Regression Line to the scatterplot   
plot(age,wage,col="grey",xlab="Age",ylab="Wages")
points(age.grid,predict(fit,newdata = list(age=age.grid)),col="darkgreen",lwd=2,type="l")
#adding cutpoints
abline(v=c(25,40,60),lty=2,col="darkgreen")

Gives this plot:

The Dashed Lines are the Cutpoints or the Knots. The above Plot shows the smoothing and local effect of Cubic Splines.

Smoothing Splines

These are mathematically more challenging but they are more smoother and flexible as well. But, it does not require the selection of the number of Knots, but require selection of only a Roughness Penalty which accounts for the wiggliness(fluctuations) and controls the roughness of the function and variance of the Model. Another important thing to remember in Smoothing Splines are that they have a Knot for every unique value of ((x_i)).Our aim in Smoothing Splines is to minimize the Error function which is modified by adding a Roughness Penalty which penalizes it for Roughness(Wiggliness) and high variance.

(minimize:{ g in RSS} : sumlimits_{i=1}^n ( y_i – g(x_i) )^2 + lambda int g”(t)^2 dt , quad lambda > 0) .

( lambda is the Tuning Parameter )

#fitting smoothing splines using smooth.spline(X,Y,df=...)
fit1

Gives this plot:

Now as we can notice that the Red line i.e Smoothing Spline is more wiggly and fits data more flexibly.This is probably due to high degrees of freedom. The best way to select the value of (lambda) and df is Cross Validation . Now we have a direct method to implement cross validation in R using smooth.spline().

Implementing Cross Validation to select value of λ and Implement Smoothing Splines:

fit2## Call:
## smooth.spline(x = age, y = wage, cv = TRUE)
## 
## Smoothing Parameter  spar= 0.6988943  lambda= 0.02792303 (12 iterations)
## Equivalent Degrees of Freedom (Df): 6.794596
## Penalized Criterion: 75215.9
## PRESS: 1593.383
#It selects $lambda=0.0279$ and df = 6.794596 as it is a Heuristic and can take various values for how rough the #function is
plot(age,wage,col="grey")
#Plotting Regression Line
lines(fit2,lwd=2,col="purple")
legend("topright",("Smoothing Splines with 6.78 df selected by CV"),col="purple",lwd=2)

Gives this plot:

This Model is also very Smooth and Fits the data well.

Conclusion

Hence this was a simple overview of Cubic and Smoothing Splines and how they transform variables and add Non linearities to the Model and are more flexible and smoother than other techniques. It is better in terms of extrapolation and is more smoother.Other techniques such as Polynomial regression is very bad at extrapolation and oscillates a lot once it gets out of boundaries and it becomes very wiggly and fluctuating which shows the signs of High Variance and mostly Overfits at larger values of degree of polynomials. The main thing to remember while fitting Non linear Models to the data is that we need to do some transformations to data or the variables in order to make the model more flexible and stronger in learning Non linear Interactions between the Inputs (X_i) and Output ( Y ) variables.

Hope you guys liked the article and make sure to like and share it. Cheers!

    Related Post

    1. Chi-Squared Test – The Purpose, The Math, When and How to Implement?
    2. Missing Value Treatment
    3. R for Publication by Page Piccinini
    4. Assessing significance of slopes in regression models with interaction
    5. First steps with Non-Linear Regression in R

    Source:: R News

    Model Operational Loss Directly with Tweedie GLM

    By statcompute

    (This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers)

    In the development of operational loss forecasting models, the Frequency-Severity modeling approach, which the frequency and the severity of a Unit of Measure (UoM) are modeled separately, has been widely employed in the banking industry. However, sometimes it also makes sense to model the operational loss directly, especially for UoMs with non-material losses. First of all, given the low loss amount, the effort of developing two models, e.g. frequency and severity, might not be justified. Secondly, for UoMs with low losses due to low frequencies, modeling the frequency and the severity separately might overlook the internal connection between the low frequency and the subsequent low loss amount. For instance, when the frequency N = 0, then the loss L = $0 inevitably.

    The Tweedie distribution is defined as a Poisson sum of Gamma random variables. In particular, if the frequency of loss events N is assumed a Poisson distribution and the loss amount L_i of an event i, where i = 0, 1 … N, is assumed a Gamma distribution, then the total loss amount L = SUM[L_i] would have a Tweedie distribution. When there is no loss event, e.g. N = 0, then Prob(L = $0) = Prob(N = 0) = Exp(-Lambda). However, when N > 0, then L = L_1 + … + L_N > $0 is governed by a Gamma distribution, e.g. sum of I.I.D. Gamma also being Gamma.

    For the Tweedie loss, E(L) = Mu and VAR(L) = Phi * (Mu ** P), where P is called the index parameter and Phi is the dispersion parameter. When P approaches 1 and therefore VAR(L) approaches Phi * E(L), the Tweedie would be similar to a Poisson-like distribution. When P approaches 2 and therefore VAR(L) approaches Phi * (E(L) ** 2), the Tweedie would be similar to a Gamma distribution. When P is between 1 and 2, then the Tweedie would be a compound mixture of Poisson and Gamma, where P and Phi can be estimated.

    To estimate a regression with the Tweedie distributional assumption, there are two implementation approaches in R with cplm and statmod packages respectively. With the cplm package, the Tweedie regression can be estimated directly as long as P is in the range of (1, 2), as shown below. In the example, the estimated index parameter P is 1.42.

    > library(cplm)
    > data(FineRoot)
    > m1  summary(m1)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.0611  -0.6475  -0.3928   0.1380   1.9627  
    
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -1.95141    0.14643 -13.327  
    

    The statmod package provides a more general and flexible solution with the two-stage estimation, which will estimate the P parameter first and then estimate regression parameters. In the real-world practice, we could do a coarse search to narrow down a reasonable range of P and then do a fine search to identify the optimal P value. As shown below, all estimated parameters are fairly consistent with ones in the previous example.

    > library(tweedie)
    > library(statmod)
    > prof  prof$p.max
    [1] 1.426531
    > m2  summary(m2)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.0712  -0.6559  -0.3954   0.1380   1.9728  
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -1.95056    0.14667 -13.299  
    

    To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

    Source:: R News

    Colorcoded map: regional population structures at a glance

    By Ilya Kashnitsky

    fig1

    (This article was first published on Ilya Kashnitsky, and kindly contributed to R-bloggers)

    Data visualization is quite often a struggle to represent multiple relevant dimensions preserving the readability of the plot. In this post I will show my recent multidimensional dataviz prepared for Rostock Retreat Visualization, an event that gathered demographers for an amazing “three days long coffebreak”.

    European population is rapidly ageing. But the process is not happening uniformly in all parts of Europe (see my recent paper for more info). Regions differ quite a lot: Eastern Europe still undergoes demographic dividend; Southern European regions form a cluster of lowest-low fertility; Western Europe experiences the greying of the baby boomers; urban regions attract young professionals and force out young parents; peripheral rural regions lose their youths forever… How can we grasp all the differences at a glance?

    Here I want to present a colorcoded map. For each NUTS-3 region the unique color is produced by mixing red, green, and blue color spectrums in the proportions that reflect,correspondingly, relative shares of elderly populating (aged 65+), population at working ages (15-64), and kids (0-14).

    Each of the three variables mapped here is scaled between 0 and 1: otherwise, the map would be just green with slightly variations in tones because the share of working age population is ranged between 65-75% for modern European regions. Thus, it is important to note that this map is not ment to be able to inform the reader of the exact population structure in a specific region. Rather, it provides a snapshot of all the regional population structures, facilitating comparisons bettween them. So, by design, the colors are only meaningful in comparison only for the given set of regions in a given year, in this case 2015. If we want cross-year comparisons, the variables are to be scaled across the whole timeseries, meaning that each separate map would, most likely, become less contrast.

    In the map we can easily spot the major differences between subregions of Europe. Turkey is still having relatively high fertility, especially in the south-eastern Kurdish part, thus it has higher share of kids and it’s colored in blueish tones. The high-fertility Ireland is also evidently blue in the map. East-European regions are green due to the still lasting demographic dividend. Southern Europe is ageing fastest, thus the colors are reddish.

    We can also see most of the major capital regions that are bright-green as opposed to the depleted periphery. In some countries there are huge regional differences: Northern and Southern Italy, Eastern and Western Germany.

    It is striking how clearly can we see the borders between European countries: Poland and Germany, Czech Republic and Slovakia, Portugal and Spain, France and all the neighbors. The slowly evolving population structures bare imprints of unique populations’ histories, that largely correspond with state borders.

    The obvious drawback of the map is that it is not colorblind friendly, and there is no way to make it so because color is the main player in this dataviz.

    To leave a comment for the author, please follow the link and comment on their blog: Ilya Kashnitsky.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

    Source:: R News

    HexJSON HTMLWidget for R, Part 2

    By Tony Hirst

    (This article was first published on Rstats – OUseful.Info, the blog…, and kindly contributed to R-bloggers)

    In my previous post – HexJSON HTMLWidget for R, Part 1 – I described a first attempt at an HTMLwidget for displaying hexJSON maps using d3-hexJSON.

    I had another play today and added a few extra features, including the ability to:

    • add a grid (as demonstrated in the original d3-hexJSON examples),
    • modify the default colour of data and grid hexes,
    • set the data hex colour via a col attribute defined on a hexJSON hex, and
    • set the data hex label via a label attribute defined on a hexJSON hex.

    We can now also pass in the path to a hexJSON file, rather than just the hexJSON object:

    Here’s the example hexJSON file:

    And here’s an example of the default grid colour and a custom text colour :

    I’ve also tried to separate out the code changes as separate commits for each feature update: code checkins. For example, here’s where I added the original colour handling.

    I’ve also had a go at putting some docs in place, generated using roxygen2 called from inside the widget code folder with devtools::document(). (The widget itself gets rebuilt by running the command devtools::install().)

    Next up – some routines to annotate a base hexJSON file with data to colour and label the hexes. (I’m also wondering if I should support the ability to specify arbitrary hexJSON hex attribute names for label text (label) and hex colour (col), or whether to keep those names as a fixed requirement?)

    To leave a comment for the author, please follow the link and comment on their blog: Rstats – OUseful.Info, the blog….

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

    Source:: R News

    Algebra of Sets in R

    By Aaron Schlegel

    Venn diagram of relative complement of two sets

    (This article was first published on R – Aaron Schlegel, and kindly contributed to R-bloggers)
    Part 4 of 4 in the series Set Theory

    The set operations, union and intersection, the relative complement and the inclusion relation (subsets) subseteq are known as the algebra of sets. The algebra of sets can be used to find many identities related to set relations that will be discussed later. We turn now to introducing the relative complement.

    Relative Complement

    The relative complement of two sets A and B is defined as the members of A not in B and is denoted A - B (or A / B). More formally, the relative complement of two sets is defined as:

     large{A - B = {x in A space | space x notin B }}

    Just like the set operations union and intersection, the relative complement can be visualized using Venn diagrams.

    The shaded area represents the relative complement A - B.

    For example, consider the following three sets A, B, C.

    • Let A be the set of all Calico cats
    • Let B be the set of all Manx cats
    • Let C be the set of all male cats

    What are the elements of the set A cup (B - C)? Start with the relative complement in parentheses, which is the set of all nonmale Manx cats. It wouldn’t be as correct to state B - C is the set of all female Manx cats as it hasn’t been explicitly defined that the cats not in the set C are female. Then the union of A and this set is, therefore, the set of all cats who are either Calico or Manx nonmales (or both).

    Determining the elements of the set (A cup B) - C proceeds in the same way. The union of A and B represents the set of all cats who are Calico or Manx or both. Thus the relative complement of this set C is then the set of all nonmale cats who are either or both Calico or Manx.

    The set (A - C) cup (B - C) simplifies to one of the sets discussed above. The relative complement A - C is the set of all nonmale Calico cats while B - C is the set of all nonmale Manx cats. The union of these two sets thus results in the set of all nonmale cats who are either Calico or Manx, which is the same as the set (A cup B) - C.

    We can define an R function to find the relative complement of two sets.

    relcomp 
    

    Find the relative complements of the sets A = {1,2,3,4,5} and B = {1,3,5,7}

    a 
    
    print(relcomp(b, a))
    ## [1] 7
    
    Set Identities

    Many identities can be formed using the set operations we have explored.

    Commutative Laws

     large{A cup B = B cup A}
     large{A cap B = B cap A}

    We can show this identity using the isequalset() and set.union() functions we created in the previous post on union and intersections.

    a 
    
    isequalset(set.union(a, b), set.union(b, a))
    ## [1] TRUE
    
    isequalset(set.intersection(a, b), set.intersection(b, a))
    ## [1] TRUE
    

    Associative Laws

     large{A cup (B cup C) = (A cup B) cup C}
     large{A cap (B cap C) = (A cap B) cap C}

    Create a third set c.

    c 
    

    Starting with the first associative law A cup (B cup C) = (A cup B) cup C

    assoc.rhs 
    

    Showing the second associative law, A cap (B cap C) = (A cap B) cap C

    assoc2.rhs 
    

    Distributive Laws

     large{A cap (B cup C) = (A cap B) cup (A cap C)}
     large{A cup (B cap C) = (A cup B) cap (A cup C)}

    starting with the first distributive law, A cap (B cup C) = (A cap B) cup (A cap C).

    dist.rhs 
    

    Which are equal sets as member order does not matter when determining the equality of two sets. The second distributive law, A cup (B cap C) = (A cup B) cap (A cup C) can be demonstrated likewise.

    dist2.rhs 
    

    De Morgan's Laws

     large{C - (A cup B) = (C - A) cap (C - B)}
     large{C - (A cap B) = (C - A) cup (C - B)}

    We can use the function to find the relative complement of two sets we wrote earlier to show De Morgan's laws. Starting with the first law, C - (A cup B) = (C - A) cap (C - B)

    morgan.rhs 
    

    The second De Morgan's law, C - (A cap B) = (C - A) cup (C - B) can be shown similarly.

    morgan2.rhs 
    

    De Morgan's laws are often stated without C, being understood as a fixed set. All sets are a subset of some larger set, which can be called a ‘space,' or S. If one considers the space to be the set of all real numbers mathbb{R}, and A and B to be two subsets of S (mathbb{R}), then De Morgan's laws can be abbreviated as:

     large{-(A cup B) = - A cap - B}
     large{-(A cap B) = - A cup - B}

    We will close the post by stating some identities with the assumption A subseteq S

     large{A cup S = S qquad A cap S = A}
     large{A cup - A = S qquad A cap - A = varnothing}

    Though we cannot directly program the set of all real numbers mathbb{R} as it is an uncountable set, we can show these identities by using a subset of mathbb{R} where a set A is a subset of that subset.

    Generate the set A as the set of integers from one to ten and S, our simulated set of all real numbers, as the set of integers from one to 100.

    a 
    

    Show the first identity: A cup S = S

    isequalset(set.union(a, s), s)
    ## [1] TRUE
    

    Second identity: A cap S = A

    isequalset(set.intersection(a, s), a)
    ## [1] TRUE
    

    Third identity: A cup - A = S

    isequalset(set.union(a, relcomp(s, a)), s)
    ## [1] TRUE
    

    Fourth identity: A cap - A = varnothing

    set.intersection(a, relcomp(s, a))
    ## logical(0)
    
    References

    Enderton, H. (1977). Elements of set theory (1st ed.). New York: Academic Press.

    The post Algebra of Sets in R appeared first on Aaron Schlegel.

    To leave a comment for the author, please follow the link and comment on their blog: R – Aaron Schlegel.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

    Source:: R News

    How R is used by the FDA for regulatory compliance

    By David Smith

    (This article was first published on Revolutions, and kindly contributed to R-bloggers)

    I was recently alerted (thanks Maëlle and Mikhail!) to an enlightening presentation from last years’ useR! conference. (This year’s useR! conference takes place next week in Belgium.) Paul H Schuette, Scientific Computing Coordinator at the FDA Center for Drug Evaluation and Research (CDER), talked about how R is used in the process of regulating and approving drugs at the FDA.

    In what has become a common theme of FDA presentations at R conferences, Schuette refutes the fallacy that SAS is the only software that can be used for FDA submissions, by sponsors such as pharmaceutical companies. On the contrary, he says “sponsors may propose to use R, and R has been used by some sponsors for certain types of analyses and simulations (post-market).”

    The myth persists despite the FDA’s Statistical Software Clarifying Statement declaring that any suitable software can be used. This is probably because some data-exchange regulations do require the use of the “XPT” (also known as SAS XPORT) file format, but that data format is an open standard and not restricted to SAS. XPT files can be read into R with the built-in read.xport function, and exported from R with the write.xport function in the SASxport package. (If you have legacy data in other SAS formats, here’s a handy SAS macro to export XPT files.) The R Foundation also provides guidance on how R complies with other FDA regulations in the document R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments.

    In addition to sponsors using R in submission, R is also used internally at the FDA. Statisticians there may use the statistical package of their choice, provided it’s fit for the purpose. The software used includes SAS, R, Minitab and Stata. Schuette notes that R is used specifically for:

    • Statistical review of data analysis in clinical trial submissions. The primary goal here is, “Can we, on our own, replicate the conclusions of the sponsor?”
    • Methodology development, innovation and evaluation.
    • Graphics (in some cases, the detailed information folded and included with prescription medications feature R graphics).
    • Simulations.
    • The openFDA initiative, including this LRT Signal Analysis for a Drug Shiny application.

    Check out the entire presentation, embedded below.

    Channel 9: Using R in a regulatory environment: FDA experiences.

    To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

    Source:: R News