The Power of Standards and Consistency

By hrbrmstr

🔗

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

I’m going to (eventually) write a full post on the package I’m mentioning in this one : . The TLDR on osqueryr is that it is an R DBI wrapper (that has just enough glue to also be plugged into dbplyr) for osquery. The TLDR on osquery is that it “exposes an operating system as a high-performance relational database. This design allows you to write SQL-based queries efficiently and easily to explore operating systems.”

In short, osquery turns the metadata and state information of your local system (or remote system(s)) into a SQL-compliant database. It also works on Windows, Linux, BSD and macOS. This means you can query a fleet of systems with a (mostly) normalized set of tables and get aggregated results. Operations and information security staff use this to manage systems and perform incident response tasks, but you can use it to get just about anything and there are even more powerful modes of operation for osquery. But, more on all the features of osquery[r] in another post.

If you are skeptical, here’s some proof (which I need to show regardless of your skepticism state). First, a local “connection”:


library(DBI)
library(osqueryr)

con 

then, a remote “connection”:


con2 

“You’re talking an awful lot about the package when you said this was a post on ‘standards’ and ‘consistency’.”

True, but we needed that bit above for context. To explain what this post has to do with “standards” and “consistency” I also need to tell you a bit more about how both osquery and the osqueryr package are implemented.

You can read about osquery in-depth starting at the link at the top of this post, but the authors of the tool really wanted a consistent idiom for accessing system metadata with usable, normalized output. They chose (to use a word they didn’t but one that works for an R audience) a “data frame” as the output format and picked the universal language of “data frames” — SQL — as the inquiry interface. So, right there are examples of both standards and consistency: using SQL vs coming up with yet-another-query-language and avoiding the chaos of the myriad of outputs from various system commands by making all results conform to a rectangular data structure.

Let’s take this one-step further with a specific example. All modern operating systems have the concept of a “process” and said processes have (mostly) similar attributes. However, the commands used to get a detailed listing of those processes differ (sometimes wildly) from OS to OS. The authors of osquery came up with a set of schemas to ensure a common, rectangular output and naming conventions (note that some schemas are unique to a particular OS since some elements of operating systems have no useful counterparts on other operating systems).

The osquery authors also took consistency and standards to yet-another-level by taking advantage of a feature of SQLite called virtual tables. That enables them to have C/C++/Objective-C “glue” that gets called when a query is made so they can dispatch the intent to the proper functions or shell commands and then send all the results back — or — use the SQLite engine capabilities to do joining, filtering, UDF-calling, etc to produce rich, targeted rectangular output back.

By not reinventing the wheel and relying on well-accepted features like data frames, SQL and SQLite the authors could direct all their focus on solving the problem they posited.

“Um, you’re talking alot about everything but R now.”

We’re getting to the good (i.e. “R”) part now.

Because the authors didn’t try to become SQL parser writer experts and relied on the standard SQL offerings of SQLite, the queries made are “real” SQL (if you’ve worked with more than one database engine, you know how they all implement different flavours of SQL).

Because these queries are “real” SQL, we can write an R DBI driver for it. The DBI package aims “[to define] a common interface between R and database management systems (DBMS). The interface defines a small set of classes and methods similar in spirit to Perl’s DBI, Java’s JDBC, Python’s DB-API, and Microsoft’s ODBC. It defines a set of classes and methods defines what operations are possible and how they are performed.”

If you look at the osqueryr package source, you’ll see a bunch of DBI boilerplate code (which is in the r-dbi organization example code) and only a handful of “touch points” for the actual calls to osqueryi (the command that processes SQL). No handling of anything but passing on SQL to the osqueryi engine and getting rectangular results back. By abstracting the system call details, R users can work with a familiar, consistent, standard interface and have full access to the power of osquery without firing up a terminal.

But it gets even better.

As noted above, one design aspect of osquery was to enable remote usage. Rather than come up with yet-another-daemon-and-custom-protocol, the osquery authors suggest ssh as one way of invoking the command on remote systems and getting the rectangular results back.

Because the osqueryr package used the sys package for making local system calls, there was only a tiny bit of extra effort required to switch from sys::exec_internal() to a sibling call in the ssh package — ssh::ssh_exec_internal() when remote connections were specified. (Said effort could have been zero if I chose a slightly different function in sys, too.)

Relying on well-accepted standards made both osqueryi and the R DBI-driver work seamlessly without much code at all and definitely without a rats nest of nested if/else statements and custom httr functions.

But it gets even more better-er

Some folks like & grok SQL, others don’t. (Humans have preferences, go figure.)

A few years ago, Hadley (do I even need to use his last name at this point in time?) came up with the idea to have a more expressive and consistent way to work with data frames. We now know this as the tidyverse but one core element of the tidyverse is dplyr, which can really level-up your data frame game (no comments about data.table, or the beauty of base R, please). Not too long after the birth of dplyr came the ability to work with remote, rectangular, SQL-based data sources with (mostly) the same idioms.

And, not too long after that, the remote dplyr interface (now, dbplyr) got up close and personal with DBI. Which ultimately means that if you make a near-fully-compliant DBI interface to a SQL back-end you can now do something like this:


library(DBI)
library(dplyr)
library(osqueryr)

con %
  filter(port != "", protocol == "17") %>% # 17 == TCP
  distinct(name, port, address, pid)
## # Source:   lazy query [?? x 4]
## # Database: OsqueryConnection
##    address name              pid   port 
##  1 0.0.0.0 BetterTouchTool   46317 57183
##  2 0.0.0.0 Dropbox           1214  17500
##  3 0.0.0.0 SystemUIServer    429   0    
##  4 0.0.0.0 SystemUIServer    429   62240
##  5 0.0.0.0 UserEventAgent    336   0    
##  6 0.0.0.0 WiFiAgent         493   0    
##  7 0.0.0.0 WiFiProxy         725   0    
##  8 0.0.0.0 com.docker.vpnkit 732   0    
##  9 0.0.0.0 identityservicesd 354   0    
## 10 0.0.0.0 loginwindow       111   0    
## # ... with more rows

The src_dbi() call wires up everything for us because d[b]plyr can rely on DBI doing it’s standard & consistent job and DBI can rely on the SQLite processing crunchy goodness of osqueryi to ultimately get us a list of really dangerous (if not firewalled off) processes that are listening on all network interfaces. (Note to self: find out why the BetterTouchTool and Dropbox authors feel the need to bind to 0.0.0.0)

FIN

What did standards and consistency get us?

  • The osquery authors spent time solving a hard problem vs creating new data formats and protocols
  • Rectangular data (i.e. “data frame”) provides consistency and structure which ends up causing more freedom
  • “Standard” SQL enables a consistent means to work with rectangular data
  • ssh normalizes (secure) access across systems with a consistent protocol
  • A robust, well-defined standard mechanism for working with SQL databases enabled nigh instantaneous wiring up of a whole new back-end to R
  • ssh and sys common idioms made working with the new back-end on remote systems as easy as is on a local system
  • Another robust, well-defined modern mechanism for working with rectangular data got wired up to this new back-end with (pretty much) one line of code because of the defined standard and expectation of consistency (and works for local and remote)

Standards and consistency are pretty darned cool.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

RStudio:addins part 3 – View objects, files, functions and more with 1 keypress

By Jozef's Rblog

The addins in action

(This article was first published on Jozef’s Rblog, and kindly contributed to R-bloggers)

Introduction

In this post in the RStudio:addins series we will try to make our work more efficient with an addin for better inspection of objects, functions and files within RStudio. RStudio already has a very useful View function and a Go To Function / File feature with F2 as the default keyboard shortcut and yes, I know I promised automatic generation of @importFrom roxygen tags in the previous post, unfortunately we will have to wait a bit longer for that one but I believe this one more than makes up for it in usefulness.

The addin we will create in this article will let us use RStudio to View and inspect a wide range of objects, functions and files with 1 keypress.

The addins in action

Retrieving objects from sys.frames

As a first step, we need to be able to retrieve the value of the object we are looking for based on a character string from a frame within the currently present sys.frames() for our session. This may get tricky, as it is not sufficient to only look at parent frames, because we may easily have multiple sets of “parallel” call stacks, especially when executing addins.

An example can be seen in the following screenshot, where we have a browser() call executed during the Addin execution itself. We can see that our current frame is 18 and browsing through its parent would get us to frames 17 -> 16 -> 15 -> 14 -> 0 (0 being the .GlobalEnv). The object we are looking for is however most likely in one of the other frames (9 in this particular case):

Example of sys.frames

Example of sys.frames

getFromSysframes  0)) {
    warning("Expecting a non-empty character of length 1. Returning NULL.")
    return(invisible(NULL))
  }
  validframes 

Viewing files, objects, functions and more efficiently

As a second step, we write a function to actually view our object in RStudio. We have quite some flexibility here, so as a first shot we can do the following:

  1. Open a file if the selection (or the selection with quotes added) is a path to an existing file. This is useful for viewing our scripts, data files, etc. even if they are not quoted, such as the links in your Rmd files
  2. Attempt to retrieve the object by the name and if found, try to use View to view it
  3. If we did not find the object, we can optionally still try to retrieve the value by evaluating the provided character string. This carries some pitfalls, but is very useful for example for
    • viewing elements of lists, vectors, etc. where we need to evaluate [, [[ or $ to do so.
    • viewing operation results directly in the viewer, as opposed to writing them out into the console, useful for example for wide matrices that (subjectively) look better in the RStudio viewer, compared to the console output
  4. If the View fails, we can still show useful information by trying to View its structure, enabling us to inspect objects that cannot be coerced to a data.frame and therefore would fail to be viewed.
viewObject  0)) {
    message("Invalid input, expecting a non-empty character of length 1")
    return(invisible(1L))
  }

  ViewWrap  0 && grepl("Error", Viewout)) {
    # could not view, try to at least View the str of the object
    strcmd 

This function can of course be improved and updated in many ways, for example using the summary method instead of str for selected object classes, or showing contents of .csv (or other data) files already read into a data.frame.

The addin function, updating the .dcf file and key binding

If you followed the previous posts in the series, you most likely already know what is coming up next. First, we need a function serving as a binding for the addin that will execute out viewObject function on the active document’s selections:

viewSelection 

Secondly, we update the inst/rstudio/addins.dcf file by adding the binding for the newly created addin:

Name: viewSelection
Description: Tries to use View to View the object defined by a text selected in RStudio
Binding: viewSelection
Interactive: false

Finally, we re-install the package and assign the keyboard shortcut in the Tools -> Addins -> Browse Addins... -> Keyboard Shortcuts... menu. Personally I assigned a single F4 keystroke for this, as I use it very often:

Assigning a keyboard shortcut to use the Addin

Assigning a keyboard shortcut to use the Addin

The addin in action

Now, let’s view a few files, a data.frame, a function and a try-error class object just pressing F4.

TL;DR – Just give me the package

References

Did you find the article helpful or interesting? Help others find it by sharing

To leave a comment for the author, please follow the link and comment on their blog: Jozef’s Rblog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Tips for great graphics

By aghaynes

(This article was first published on R – Insights of a PhD, and kindly contributed to R-bloggers)

R is a great program for generating top-notch graphics. But to get the best out of it, you need to put in a little more work. Here are a few tips for adapting your R graphics to make them look a little better.

1) Dont use the “File/Save as…/” menu.

If you set up your graphic in the first place then theres no need to post-process (eg crop, scale etc) the graphic in other software.

Use the graphic devices (jpeg(), tiff(), postscript(), etc), set your height and width to whatever you want the finished product to be and then create the graph.

tiff("~/Manuscript/Figs/Fig1.tiff", width =2, height =2, units ="in", res = 600)
plot(dist ~ speed, cars) # cars is a base R dataset - data(cars)
dev.off()

The first argument to a graphic device such as tiff or jpeg is the filename, to which you can include the path. So that I dont have to worry about the order of arguments, I include the argument name. Width and height specify the width and height of the graphic in the units provided, in this case inches, but pixels, cm or mm can also be used. The resolution tells R how higher quality the graphic should be, the higher the better, but if you go too high, you could find that you have problems running out of space. I find 600 a nice compromise. Nice crisp lines, smooth curves and sharp letters. You can then import that straight into MS Word or whatever other word processor you use, or upload to go with that manuscript youve worked so hard on. Although, you could find yourself with BIG files. A 4×6 inch figure I made recently was 17.1MB! And for the web, 100 or 200 is probably enough.

This technique also provides you with the same output every time, which is not the case if you adjust the size of the default device window produced by plot()

2) Dont be afraid to change the default settings!

Personally, I find that a 1 inch margin at the base of the graphic is a bit generous. I also find the ticks a bit long and the gap between tick and the axis label a bit big. Luckily, these things are easy to change!

jpeg("t1.jpeg", width=3, height=3, units="in", res=100)
plot(dist~speed, cars) 
dev.off()

The above code produces this figure.

See what I mean about the margins?

Heres how to change it!

par(mai=c(0.5, 0.5, 0.1, 0.1) )
plot(dist ~ speed, cars, tck = -0.01, las=1, mgp=c(1.4,0.2,0))

That call to par changes the “MArgin in Inches” setting. The tck argument to par deals with TiCK length (negative for outside, positive for inside) while mgp controls on which line certain things are printed (titles are the first argument, labels are the second and the axis itself is the third). The las argument controls the rotation of the labels (1 for horizontal, 2 for perpendicular and 3 for vertical).

This leads me nicely to number 3: Dont be afraid to have separate lines for different parts of your plot.

This allows far more freedom and flexibility than setting par arguments in the plot argument. You can have different mgp settings for each axis for instance.

par(mai=c(0.4, 0.5, 0.1, 0.1))
plot(dist ~ speed, cars, xaxt="n", mgp=c(1.4,0.2,0), las=1, tck=-0.01)
axis(side=1, tck = -0.01, las=1, mgp=c(0.5,0.2,0))
mtext("speed", side=1, line= 1)

This plots the same graph, but allows different distances for the x and y axes, in terms of margin and where the title is situated. The axis function places an axis on the side determined by its side argument and mtext places Margin TEXT, again at the side in its argument and in this case on line 1.

To leave a comment for the author, please follow the link and comment on their blog: R – Insights of a PhD.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

WVPlots now at version 1.0.0 on CRAN!

By John Mount

Unnamed chunk 2 1

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN!

The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions. This is a package to produce plots while you are in the middle of another task.

For example the plot below showing both an observed discrete empirical distribution (as stems) and a matching theoretical distribution (as bars) is a built in “one liner.”

set.seed(52523)
d  data.frame(wt=100*rnorm(100))
WVPlots::PlotDistCountNormal(d,'wt','example')

The graph above is actually the product of a number of presentation decisions:

  • Using a discrete histogram approach to summarize data (instead of a kernel density approach) to create a presentation more familiar to business partners.
  • Using a Cleveland style dot with stem plot instead of wide bars to emphasize the stem heights represent total counts (and not the usual accidental misapprehension that bar areas represent totals).
  • Automatically fitting and rendering the matching (properly count-scaled) normal distribution as thin translucent bars for easy comparison (again to try and de-emphasize area).

All of these decisions are triggered by choosing which plot to use from the WVPlots library. In this case we chose WVPlots::PlotDistCountNormal. For an audience of analysts we might choose an area/density based representation (by instead specifying WVPlots::PlotDistDensityNormal) which is shown below:

WVPlots::PlotDistDensityNormal(d,'wt','example')
Unnamed chunk 3 1

Switching the chosen plot simultaneously changes many of the details of the presentation. WVPlots is designed to make this change simple by insisting an a very simple unified calling convention. The plot calls all insist on roughly the following arguments:

  • frame: data frame containing the data to be presented.
  • xvar: name of the x variable column in the data frame.
  • yvar: name of the y variable column in the data frame (not part of the shown density plots!).
  • title: text title for the plot.

This rigid calling interface is easy to remember and makes switching between plot types very easy. We have also make title a required argument, as we feel all plots should be labeled.

What we are trying to do is separate the specification of exactly what plot we want from the details of how to produce it. We find this separation of concerns and encapsulation of implementation allows us to routinely use rich annotated graphics. Below are a few more examples:

set.seed(34903490)
x = rnorm(50)
y = 0.5*x^2 + 2*x + rnorm(length(x))
frm = data.frame(x=x,y=y,yC=y>=as.numeric(quantile(y,probs=0.8)))
frm$absY  abs(frm$y)
frm$posY = frm$y > 0
WVPlots::ScatterHist(frm, "x", "y", smoothmethod="lm", 
                     title="Example Linear Fit")
Unnamed chunk 4 1
set.seed(34903490)
y = abs(rnorm(20)) + 0.1
x = abs(y + 0.5*rnorm(20))

frm = data.frame(model=x, value=y)

frm$costs=1
frm$costs[1]=5
frm$rate = with(frm, value/costs)

frm$isValuable = (frm$value >= as.numeric(quantile(frm$value, probs=0.8)))
gainx = 0.10  # get the top 10% most valuable points as sorted by the model

# make a function to calculate the label for the annotated point
labelfun = function(gx, gy) {
  pctx = gx*100
  pcty = gy*100
  
  paste("The top ", pctx, "% most valuable points by the modeln",
        "are ", pcty, "% of total actual value", sep='')
}

WVPlots::GainCurvePlotWithNotation(frm, "model", "value", 
                                   title="Example Gain Curve with annotation", 
                          gainx=gainx,labelfun=labelfun) 
Unnamed chunk 5 1
set.seed(52523)
d = data.frame(meas=rnorm(100))
threshold = 1.5
WVPlots::ShadedDensity(d, "meas", threshold, tail="right", 
                       title="Example shaded density plot, right tail")
Unnamed chunk 6 1
set.seed(34903490)
frm = data.frame(x=rnorm(50),y=rnorm(50))
frm$z  frm$x+frm$y
WVPlots::ScatterHistN(frm, "x", "y", "z", title="Example Joint Distribution")
Unnamed chunk 7 1
set.seed(34903490)
x = rnorm(50)
y = 0.5*x^2 + 2*x + rnorm(length(x))
frm = data.frame(x=x,yC=y>=as.numeric(quantile(y,probs=0.8)))
WVPlots::ROCPlot(frm, "x", "yC", TRUE, title="Example ROC plot")
Unnamed chunk 8 1

We know this collection doesn’t rise to the standard of a complete “grammar of graphics.” But it can become (through accumulation) a re-usable repository of a number of specific graphing tasks done well. It is also a chance to eventually document presentation design decisions (though we haven’t gotten far on that yet). The complete set of graphs is shown in the WVPlots_example vignette.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Reflections on the ROpenSci Unconference

By David Smith

👇

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

I had an amazing time this week participating in the 2018 ROpenSci Unconference, the sixth annual ROpenSci hackathon bringing together people to advance the tools and community for scientific computing with R. It was so inspiring to be among such a talented and dedicated group of people — special kudos goes to the organizing committee for curating such a great crowd. (I heard there were over 200 hundred nominations from which the 65 or so attendees were selected.)

The idea behind the unconference is to spend two full days hacking on projects of interest to the community. Before the conference begins, the participants suggest projects as Github issues and begin discussions there. On the first day of the conference (after an icebreaker), the participants vote for projects they’d be interested in working on, and then form up into groups of 2-6 people or so to work on them. And then everyone gets to work! You can get a sense of the activity by looking at the #runconf18 hashtag on Twitter (and especially the photos).

I joined the “Tensorflow Probability for R” team, where we worked mainly with the greta package, which uses Tensorflow Probability to implement a Markov-Chain Monte-Carlo system to find solutions to complex statistical models. I hadn’t used greta before, so I focused on trying out some simple examples to understand how greta works. In the process I learned that greta is a really powerful package: it solves many of the same problems as stan, but with a really elegant R interface that generalizes beyond Bayesian models. (Over the next couple of weeks I’ll elaborate the examples for the greta package and write a more detailed blog post.)

At the end of the second day, everyone gets together to “report out” on their projects: a three minute presentation to review the progress from the hackathon. You can browse the list of projects here: follow the links to the Github repositories and check out the Readme.md files for details on each project.

A sincere thank you to all participants in #runconf18

This threadincludes links to all project repos: https://t.co/2PhAz4zSuK#rstats pic.twitter.com/8SICcWkQ0v

— rOpenSci (@rOpenSci) May 25, 2018

On a personal note, it was also a great joy that my team could sponsor the unconference and provide the venue for this year’s event. The Microsoft Reactor in Seattle turned out to be a great place to hold a hackathon like this, with plenty of space for everyone to form into small teams and find a comfortable place to work. This kind of event is exactly why we are opening Reactor spaces around the world, as spaces for the community to meet, gather and work, and it was great to see that vision realized in such a great way this week.

Thanks once again to all of the participants and especially the organizers (with a special shout-out to Stefanie Butland) for such a wonderful event. I can’t wait for the next one!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

How to plot with patchwork

By Euthymios Kasvikis

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

INTRODUCTION

The goal of patchwork is to make it simple to combine separate ggplots into the same graphic. As such it tries to solve the same problem as gridExtra::grid.arrange() and cowplot::plot_grid but using an API that incites exploration and iteration.

Installation
You can install patchwork from github with:

# install.packages("devtools")
devtools::install_github("thomasp85/patchwork")

The usage of patchwork is simple: just add plots together!

library(ggplot2)
library(patchwork)

p1
p2

p1 + p2

you are of course free to also add the plots together as part of the same plotting operation:

ggplot(mtcars) +
geom_point(aes(mpg, disp)) +
ggplot(mtcars) +
geom_boxplot(aes(gear, disp, group = gear))

layouts can be specified by adding a plot_layout() call to the assemble. This lets you define the dimensions of the grid and how much space to allocate to the different rows and columns

p1 + p2 + plot_layout(ncol = 1, heights = c(3, 1))

If you need to add a bit of space between your plots you can use plot_spacer() to fill a cell in the grid with nothing

p1 + plot_spacer() + p2

You can make nested plots layout by wrapping part of the plots in parentheses – in these cases the layout is scoped to the different nesting levels

p3
p4

p4 + {
p1 + {
p2 +
p3 +
plot_layout(ncol = 1)
}
} +
plot_layout(ncol = 1)

Advanced features
In addition to adding plots and layouts together, patchwork defines some other operators that might be of interest. – will behave like + but put the left and right side in the same nesting level (as opposed to putting the right side into the left sides nesting level). Observe:

p1 + p2 + p3 + plot_layout(ncol = 1)

this is basically the same as without braces (just like standard math arithmetic) – the plots are added sequentially to the same nesting level. Now look:

p1 + p2 - p3 + plot_layout(ncol = 1)

Now p1 + p2 and p3 is on the same level.

Often you are interested in just putting plots besides or on top of each other. patchwork provides both | and / for horizontal and vertical layouts respectively. They can of course be combined for a very readable layout syntax:

(p1 | p2 | p3) /
p4

There are two additional operators that are used for a slightly different purpose, namely to reduce code repetition. Consider the case where you want to change the theme for all plots in an assemble. Instead of modifying all plots individually you can use & or * to add elements to all subplots. The two differ in that * will only affect the plots on the current nesting level:

(p1 + (p2 + p3) + p4 + plot_layout(ncol = 1)) * theme_bw()

whereas & will recurse into nested levels:

p1 + (p2 + p3) + p4 + plot_layout(ncol = 1) & theme_bw()

Note that parenthesis is required in the former case due to higher precedence of the * operator. The latter case is the most common so it has deserved the easiest use.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Programmatically creating text output in R – Exercises

By sindri

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

In the age of Rmarkdown and Shiny, or when making any custom output from your data you want your output to look consistent and neat. Also, when writing your output you often want it to obtain a specific (decorative) format defined by the html or LaTeX engine. These exercises are an opportunity to refresh our memory on functions such as paste, sprintf, formatC and others that are convenient tools to achieve these ends. All of the solutions rely partly on the ultra flexible sprintf() but there are no-doubt many ways to solve the exercises with other functions, feel free to share your solutions in the comment section.

Example solutions are available here.

Exercise 1

Print out the following vector as prices in dollars (to the nearest cent):
c(14.3409087337707, 13.0648270623048, 3.58504267621646, 18.5077076398145,
16.8279241011882)
. Example: $14.34

Exercise 2

Using these numbers c(25, 7, 90, 16) make a vector of filenames in the following format: file_025.txt. That is, left pad the numbers so they are all three digits.

Exercise 3

Actually, if we are only dealing numbers less than hundred file_25.txt would have been enough. Change the code from last exercise so that the padding is progammatically decided by the biggest number in the vector.

Exercise 4

Print out the following haiku on three lines, right aligned, with the help of cat. c("Stay the patient course.", "Of little worth is your ire.", "The network is down.").

Exercise 5

Write a function that converts a number to its hexadecimal representation. This is a useful skill when converting bmp colors from one representation to another. Example output:

      tohex(12)
      [1] "12 is c in hexadecimal"

Exercise 6

Take a string and programmatically surround it with the html header tag h1

Exercise 7

Back to the poem from exercise 4, let R convert to html unordered list. So that it would appear like the following in a browser:

  • Stay the patient course.
  • Of little worth is your ire.
  • The network is down.

Exercise 8

Here is a list of the currently top 5 movies on imdb.com in terms of rating c("The Shawshank Redemption", "The Godfather", "The Godfather: Part II", "The Dark Knight", "12 Angry Men", "Schindler's List") convert them into a list compatible with written text.

Example output:

[1] "The top ranked films on imdb.com are The Shawshank Redemption, The Godfather, The Godfather: Part II, The Dark Knight, 12 Angry Men and Schindler's List"

Exercise 9

Now you should be able to solve this quickly: write a function that converts a proportion to a percentage that takes as input the number of decimal places. Input of 0.921313 and 2 decimal places should return "92.13%"

Exercise 10

Improve the function from last exercise so the percentage take consistently 10 characters by doing some left padding. Raise an error if percentage already happens to be longer than 10.

(Image by Daniel Friedman).

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

2018-05 Selective Raster Graphics

By pmur002

(This article was first published on R – Stat Tech, and kindly contributed to R-bloggers)

This report explores ways to render specific components of an R plot in raster format, when the overall format of the plot is vector. For example, we demonstrate ways to draw raster data symbols within a PDF scatter plot. A general solution is provided by the grid.rasterize function from the R package ‘rasterize’.

Paul Murrell

Download

To leave a comment for the author, please follow the link and comment on their blog: R – Stat Tech.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

A Comparative Review of the RKWard GUI for R

By Bob Muenchen

(This article was first published on R – r4stats.com, and kindly contributed to R-bloggers)

Introduction

RKWard is a free and open source Graphical User Interface for the R software, one that supports beginners looking to point-and-click their way through analyses, as well as advanced programmers. You can think of it as a blend of the menus and dialog boxes that R Commander offers combined with the programming support that RStudio provides. RKWard is available on Windows, Mac, and Linux.

This review is one of a series which aims to help non-programmers choose the Graphical User Interface (GUI) that is best for them. However, I do include a cursory overview of how RKWard helps you work with code. In most sections, I’ll begin with a brief description of the topic’s functionality and how GUIs differ in implementing it. Then I’ll cover how RKWard does it.

Figure 1. RKWard’s main control screen containing an open data editor window (big one), an open dialog box (right) and its output window (lower left).

Terminology

There are various definitions of user interface types, so here’s how I’ll be using these terms:

GUI = Graphical User Interface specifically using menus and dialog boxes to avoid having to type programming code. I do not include any assistance for programming in this definition. So GUI users are people who prefer using a GUI to perform their analyses. They often don’t have the time required to become good programmers.

IDE = Integrated Development Environment which helps programmers write code. I do not include point-and-click style menus and dialog boxes when using this term. IDE users are people who prefer to write R code to perform their analyses.

Installation

The various user interfaces available for R differ quite a lot in how they’re installed. Some, such as jamovi or BlueSky Statistics, install in a single step. Others install in multiple steps, such as R Commander and Deducer. Advanced computer users often don’t appreciate how lost beginners can become while attempting even a single-step installation. I work at the University of Tennessee, and our HelpDesk is flooded with such calls at the beginning of each semester!

Installing RKWard on Windows is done in a single step since its installation file contains both R and RKWard. However, Mac and Linux users have a two-step process, installing R first, then download RKWard which links up to the most recent version of R that it finds. Regardless of their operating system, RKWard users never need to learn how to start R, then execute the install.packages function, and then load a library. Installers for all three operating systems are available here.

The RKWard installer obtains the appropriate version of R, simplifying the installation and ensuring complete compatibility. However, if you already had a copy of R installed, depending on its version, you could end up with a second copy.

RKWard minimizes the size of its download by waiting to install some R packages until you actually try to use them for the first time. Then it prompts you, offering default settings that will get the package you need.

On Windows, the installation file is 136 megabytes in size.

Plug-ins

When choosing a GUI, one of the most fundamental questions is: what can it do for you? What the initial software installation of each GUI gets you is covered in the Graphics, Analysis, and Modeling section of this series of articles. Regardless of what comes built-in, it’s good to know how active the development community is. They contribute “plug-ins” which add new menus and dialog boxes to the GUI. This level of activity ranges from very low (RKWard, BlueSky, Deducer) through moderate (jamovi) to very active (R Commander).

Currently all plug-ins are included with the initial installation. You can see them using the menu selection Settings> Configure Packages> Manage RKWard Plugins. There are only brief descriptions of what they do, but once installed, you can access the help files with a single click.

RKWard add-on modules are part of standard R packages and are distributed on CRAN. Their package descriptions include a field labeled, “enhances: rkward”. You can sort packages by that field in RKWard’s package installation dialog where they are displayed with the RKWard icon.

Continued here…

To leave a comment for the author, please follow the link and comment on their blog: R – r4stats.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

EARL London 2018 – Agenda announced

By Mango Solutions

(This article was first published on Mango Solutions, and kindly contributed to R-bloggers)

The EARL London 2018 agenda is now available. Explore the schedule of keynotes, presentations, and lightning talks that cover a huge range of topics, including the best uses of Shiny, R in healthcare, using R for good, and R in finance. The brilliant lineup of speakers-who represent a wide range of industries-is sure to provide inspiration for all R users of all levels.

We have surveyed the Mango team to find out what talks they are most looking forward to:

Lung cancer detection with deep learning in R – David Smith, Microsoft
David will be taking us through an end-to-end example of building a deep learning model to predict lung cancer from image data. Anything that helps to improve healthcare is a fascinating subject.

Using network analysis of colleague relationships to find interesting new investment managers – Robin Penfold, Willis Towers Watson
Using the tidyverse, tidygraph and ggraph, Robin has established a way to save time and money by researching the backgrounds of all institutional investors in one go, as opposed to one at a time.

As right as rain: developing a sales forecasting model for Dyson using R – Dan Erben, Dyson
We love hearing about how organisations have adopted a data-driven approach using R. Dan’s talk will summarise how Dyson developed a statistical model for sales forecasting and the path they have taken to adopt it in the business.

Interpretable Machine Learning with LIME: now and tomorrow – Kasia Kulma, Aviva
Explaining models is a really valuable tool in communicating with the business. Kasia will be using breast cancer data as a specific case scenario.

Text mining in child social care – James Lawrence, The Behavioural Insight Team
James will be sharing how The Behavioural Insight Team have been able to use R’s text mining tools on unstructured data to improve child social services actions, significantly reducing the potential for harm without overburdening an already stretched social service provider.

Experience of using R in the productive environment of Boehringer Ingelheim – Matthias Trampisch, Boehringer Ingelheim
Boehringer Ingelheim have been making headway with R and Matthias will be sharing their journey.

Using R to Drive Revenue for your Online Business – Catherine Gamble, Marks and Spencer
Online business is fast-paced and competitive, Catherine shares how Marks and Spencer have been using R to gain the upperhand and a make difference to their bottomline.

Don’t miss out on early bird tickets

Buy your tickets before 31 July to make the most of the early bird ticket prices – a full conference pass–which includes workshops, access to the conference presentations, and a ticket to the evening event at the Imperial War Museum–is only £800. Buy tickets now.

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News