News of ggtree

By R on G. Yu

(This article was first published on R on G. Yu, and kindly contributed to R-bloggers)

A new version of ggtree that works with ggplot2 (version >= 2.0.0) is now availabel.

new layers

Some functions, add_legend, hilight, annotation_clade and annotation_clade2 were removed. Instead we provide layer functions, geom_treescale, geom_hilight and geom_cladelabel. You can use + operator to add layers using these layer functions.

In addtion, we provide geom_point2, geom_text2 and geom_segment2 which works exactly as geom_point, geom_text and geom_segment except they allow ggtree users do subsetting.

rescale tree

Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in ggtree we can re-scale a phylogenetic tree by any numerical variable inferred by evolutionary analysis. For example using substitution rate to scale a time-scaled tree inferred by BEAST, or using dN/dS to re-scale a tree analyzed by CodeML.

New Hampshire eXtended format (NHX) supported

This is a feature request from ggtree user. The NHX format is also commonly used in phylogenetic software (e.g. PHYLODOG, RevBayes, etc).

The original vignette was too long and I split them into several short ones. These vignettes should be more easy to follow. You can access the vignette via http://guangchuangyu.github.io/ggtree/. More examples and detail explanation were provided. If there are something that I didn’t explain well, please don’t hesitate to let me know. I will appreciate your help in improving the ggtree vignettes.

To leave a comment for the author, please follow the link and comment on their blog: R on G. Yu.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Fun with Heatmaps and Plotly

By Riddhiman

(This article was first published on Modern Data » R, and kindly contributed to R-bloggers)

Just because we all like numbers doesn’t mean we can’t have some fun.

Here’s to wishing to everyone a very Happy New Year !

# install.packages("jpeg") 

library(jpeg)
library(plotly)

# Download a jpeg file from imgur
URL <- "http://i.imgur.com/FWsFq6r.jpg"
file <- tempfile()
download.file(URL, file, mode = "wb")

# Read in JPEG file
j <- readJPEG(file)
j <- j[,,1]

# Create an empty matrix
img.mat <-  mat.or.vec(nrow(j), ncol(j))

# Identify elements where there is data
idx <- j > 0

# Add some glitter like effect
img.mat[idx] <-  sample(x = seq(0,1,by = 0.1), size = sum(idx), replace = T)

# Add some glitter to background
idx <-  j == 0
img.mat[idx] <-  sample(seq(0.7,0.9,0.01), size = sum(idx), replace = T)

# Invert the matrix or else it prints upside down
img.mat[nrow(img.mat):1,] <- img.mat[1:nrow(img.mat),]

# Plot !!!
x.axisSettings <- list(
  title = "Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning. -Albert Einstein",
  titlefont = list(
    family = 'Arial, sans-serif',
    size = 12,
    color = 'black'
  ),
  zeroline = FALSE,
  showline = FALSE,
  showticklabels = FALSE,
  showgrid = FALSE,
  ticks = ""
)

y.axisSettings <- list(
  title = "",
  zeroline = FALSE,
  showline = FALSE,
  showticklabels = FALSE,
  showgrid = FALSE,
  ticks = ""
)


bordercolor = "#ffa64d"
borderwidth = 20

nCol = ncol(img.mat)
nRow = nrow(img.mat)

plot_ly(z = img.mat, colorscale = "Hot", type = "heatmap", showscale = F, hoverinfo = "none") %>%
  layout(xaxis = x.axisSettings,
         yaxis = y.axisSettings,

         # Add a border
         shapes = list(

           # left border
           list(type = 'rect', fillcolor = bordercolor, line = list(color = bordercolor),
                x0 = 0, x1 = borderwidth,
                y0 = 0, y1 = nRow),

           # Right border
           list(type = 'rect', fillcolor = bordercolor, line = list(color = bordercolor),
                x0 = nCol - borderwidth, x1 = nCol,
                y0 = 0, y1 = nRow),

           # Top border
           list(type = 'rect', fillcolor = bordercolor, line = list(color = bordercolor),
                x0 = 0, x1 = nCol,
                y0 = nRow, y1 = nRow - borderwidth),

           # Bottom border
           list(type = 'rect', fillcolor = bordercolor, line = list(color = bordercolor),
                x0 = 0, x1 = nCol,
                y0 = 0, y1 = borderwidth)))

To leave a comment for the author, please follow the link and comment on their blog: Modern Data » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

An R function return and assignment puzzle

By John Mount

NewImage

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Here is an R programming puzzle. What does the following code snippet actually do? And ever harder: what does it mean? (See here for some material on the difference between what code does and what code means.)

f <- function() { x <- 5 }
f()

In R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" the code appears to call the function f() and return nothing (nothing is printed). When teaching I often state that you should explicitly use a non-assignment expression as your return value. You should write code such as the following:

f <- function() { x <- 5; x }
f()
## [1] 5

(We are showing an R output as being prefixed with ##.)

But take a look at the this:

f <- function() { x <- 5 }
print(f())
## [1] 5

It prints! Read further for what is really going on.

What is going on is: in R in the absence of an explicit return() statement functions always return the value of the last statement executed. Also in R assignment is itself a value returning expression (returning the value assigned). So the original function f <- function() { x <- 5 } is in fact returning a 5. We just don’t see it. The 5 returned is “invisible” (see the “return values” section of Advanced R, Hadley Wickham, CRC 2015 for details).

As we said: R assignments return values. So you can return them and you can chain them like so:

a <- b <- c <- 5
print(a)
## [1] 5

What happens is the assignment x <- 5 returns a value (in this case 5), but that value has an attribute marking it invisible. This is why when you assign a value to a variable in R you don’t see printing as a side effect. For example we don’t see anything printed when we type the following:

x <- 5

We can remove the invisible attribute by adding parenthesis as follows:

( x <- 5 )
## [1] 5

Assignment also strips the invisible attribute, so we can write code like the following:

f <- function() { x <- 5 }
z &lt- f()
z
## [1] 5

(We can think of the expression z &lt- f() as removing the invisible attribute from the 5 stored in the variable z and then returning a new value 5 that is again invisible. So we don’t see any printing during the assignment, but the value stored in z is now visible. Likely all of the visibility notes are stored in a reference handle and not actually in the values to allow efficient re-use of values.)

This is subtle and strange, and one of the reasons it can be hard to first approach R. R has fairly subtle semantics, but that is part of why it is so safe to program in and so powerful to use.

In language design I tend to prefer more transparency (the user reliably seeing something more directly related to what is going on- something vitally important for learning and debugging) and would have opted for assignment not returning a value (another way to suppress needless printing).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Creating multi-tab reports with R and jQuery UI

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Matt Parker, Data Scientist at Microsoft

One of the great advantages of R’s openness is its extensibility. R’s abundant
packages are the most conspicuous example of that extensibility, and
Revolution R Enterprise is a powerful example of how far it can stretch.

But R is also part of an entire ecosystem of open tools that can be linked
together. For example,
Markdown,
Pandoc, and
knitr
combine to make R an incredible tool for dynamic reporting and reproducible
research. If your chosen output format is HTML, you’ve linked into yet another
open ecosystem with countless further extensions.

One of those extensions – and the focus of this post – is
jQuery UI.
jQuery UI makes a set of JavaScript’s most useful moves available to developers
as a robust, easy-to-implement toolkit ideal for adding a bit of interactivity
to your knitr reports.

Tabs

For example: it’s easy to use
jQuery UI’s Tabs widget
to split a long report across several tabs of a webpage. Tabs are great for
splitting complex reports up by topic, or for providing different types of
users with customized views of the results.

To get a sense of what this conversion might look like, here’s
a simple R-Markdown report without tabs
(Rmd source):

… and the same report with tabs
(source):

Here’s how I added tabs to the report.

1) First, I downloaded jQuery UI. Picking the right place to store the library
can be tricky, but as long as the jQuery UI files are accessible to knitr when
it’s building the report, you’ll be okay. For this demo report, I just unzipped
the files right next to the .rmd source.

2) Next, I added a few lines to the element of the report. Every
webpage has a element. knitr would typically build this for you, but
in this case we need to write it manually to be sure that the jQuery UI scripts
and CSS are linked in the HTML output.

<head>
  <meta charset="utf-8">
  <title>Reported Active Tuberculosis Cases in the United States, 1993-2013</title>
  <link rel="stylesheet" href="jquery-ui/jquery-ui.min.css">
  <script src="jquery-ui/external/jquery/jquery.js"></script>
  <script src="jquery-ui/jquery-ui.js"></script>
  <script>
  $(function() {
    $( "#tabs" ).tabs();
  });
  </script>
</head>

3) Next, I created the navigation bar by creating an HTML chunk (div) with a
list inside of it (ul). Each item in that list (li) represents one tab
that I’d like the page to have. Finally, I make each of those list items a link
with a unique tag (),
and give the link a title (Nationally, By State, Treatment Completion).

<div id="tabs">
<ul>
<li><a href="#nation">Nationally</a></li>
<li><a href="#states">By State</a></li>
<li><a href="#treatment">Treatment Completion</a></li>
</ul>

Don’t worry if you don’t understand the HTML syntax here – you can just copy
and edit the code above.

4) Finally, I marked out which sections of R-Markdown I wanted to put on each
tab by surrounding that section with a div:

<div id="nation">

## Reported Active TB Cases in the United States, 1993-2013

```{r nation}

tbstats %>% 
    group_by(Year) %>% 
    summarise(n_cases = sum(Count)) %>% 
    ggplot(aes(x = Year, y = n_cases)) +
        geom_line(size = 2) +
        labs(x = "Year Reported",
             y = "Number of Cases",
             title = "Reported Active TB Cases in the United States") +
        expand_limits(y = 0)


```

</div>

There are two crucial details here:
– the div has an id that corresponds to one of the tabs I’ve created
(href=#nation corresponds to

)
– the div is closed with a

tag. Without this, the entire report
would be included on the first tab.

5) Click the “Knit HTML” button! knitr will convert your R-Markdown into plain
Markdown, and then call Pandoc to complete the conversion into gloriously-tabbed
HTML.

Tabs are very handy for reporting – but the whole HTML/CSS/JavaScript ecosystem
is at your disposal. If you’ve seen other good reporting tricks in HTML, let us
know in the comments below.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

“Mapmaking in R with Choroplethr” is now out!

By Ari Lamstein

(This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers)

I am happy to announce that my new course, Mapmaking in R with Choroplethr, is now available. In honor of its launch I am offering a 15% discount to anyone who purchases before January 1. You will also be able to attend a Q&A webinar with me on January 5th.

Creating this course has been a full time job over the last few weeks. The course has 25 lessons and is in the format of screencasts with downloadable code. The lessons are organized into 5 sections:

  1. Introduction. This section makes sure that you have the software necessary to complete the rest of the course, and shows you how to make a simple map.
  2. Exploratory Data Analysis. This section explains basic tools that R provides for exploratory data analysis, and how choroplethr complements them when analyzing spatial data.
  3. United States Maps goes into detail about the 4 US maps (states, counties, ZIPs and tracts) that choroplethr ships with. It explains where they come from, how they are represented in R, and how to use them on your own projects. Many examples are provided.
  4. International Maps explains the 216 international maps that choropethr ships with. This includes state / province (Administrative Level 1) maps of 215 countries as well as a world map of 171 countries. This section explains where the maps come from and how they are represented in R. It also explains how to convert your data into the format necessary to have it to work with choroplethr. Several examples are provided.
  5. Conclusion.This is my favorite section. I walk you through dozens of datasets that I have either already analyzed in R (i.e. US Census, US Unemployment, World Bank) or am aware of and would like to map in the future. I provide links so that you can start analyzing and mapping them immediately. I also explain several ways to get help with your project.

Click here to learn more.

Remember, these bonuses are only available for a limited time. To get the discount and the webinar you need to purchase before January 1.

The post “Mapmaking in R with Choroplethr” is now out! appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

R recommended usage for professional developers

By Derek Jones

(This article was first published on The Shape of Code » R, and kindly contributed to R-bloggers)

R is not one of those languages where there is only one way of doing something, the language is blessed/cursed with lots of ways of doing the same thing.

Teaching R to professional developers is easy in the sense that their fluency with other languages will enable them to soak up this small language like a sponge, on the day they learn it. The problems will start a few days after they have been programming in another language and go back to using R; what they learned about R will have become entangled in their general language knowledge and they will be reduced to trial and error, to figure out how things work in R (a common problem I often have with languages I have not used in a while, is remembering whether the if-statement has a then keyword or not).

My Empirical software engineering book uses R and is aimed at professional developers; I have been trying to create a subset of R specifically for professional developers. The aims of this subset are:

  • behave like other languages the developer is likely to know,
  • not require knowing which way round the convention is in R, e.g., are 2-D arrays indexed in row-column or column-row order,
  • reduces the likelihood that developers will play with the language (there is a subset of developers who enjoy exploring the nooks and crannies of a language, creating completely unmaintainable code in the process).

I am running a workshop based on the book in a few weeks and plan to teach them R in 20 minutes (the library will take a somewhat longer).

Here are some of the constructs in my subset:

  • Use subset to extract rows meeting some condition. Indexing requires remembering to do it in row-column order and weird things happen when commas accidentally get omitted.
  • Always call read.csv with the argument as.is=TRUE. Computers now have lots of memory and this factor nonsense needs to be banished to history.
  • Try not to use for loops. This will probably contain array/data.frame indexing, which provide ample opportunities for making mistakes, use the *apply or *ply functions (which have the added advantage of causing code to die quickly and horribly when a mistake is made, making it easier to track down problems).
  • Use head to remove the last N elements from an object, e.g., head(x, -1) returns x with the last element removed. Indexing with the length minus one is a disaster waiting to happen.

It’s a shame that R does not have any mechanism for declaring variables. Experience with other languages has shown that requiring variables to be declared before use catches lots of coding errors (this could be an optional feature so that those who want their ‘freedom’ can have it).

We now know that support for case-sensitive identifiers is a language design flaw, but many in my audience will not have used a language that behaves like this and I have no idea how to help them out.

There are languages in common use whose array bounds start at one. I will introduce R as a member of this club. Not much I can do to help out here, except the general suggestion not to do array indexing.

Suggestions based on reader’s experiences welcome.

To leave a comment for the author, please follow the link and comment on their blog: The Shape of Code » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

R / Shiny Poll Results

By C

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

A few days ago I posted a poll directed towards R (and Shiny Users). Thank you to all who participated for your time and thoughtful responses. A RMarkdown Report (Code on Github) highlights the results to the easy-to-summarize questions. A few interesting insights:

  • 78% of R users use Windows. The largest group of Windows users uses Windows alone, followed by Windows and Linux, Windows and Mac, and all three platforms.
  • Shiny’s appeal is evident among those polled as 1/3 report no experience with web technologies. There is a tech heavy presence however since the remaining 2/3 are conversant with web technologies – almost 20% of those polled could be described as “full-stack developers”.
  • Most of the response to the poll occurred over about 3 days… though responses continued to be taken for several more days.

There are a ton of fascinating insights in the free-form responses. The R community remains a diverse, difficult to categorize group of individuals that share a common appreciation for R. Shiny has generated a lot of excitement. New developments for the platform largely seem to line up with the interests of the community who want to see easier development, more interactivity and additional options for deployment.

I had hoped to do additional analysis (and still might) but figured folks would be interested in the results. I’d be interested to see what other insights folks might derive from the Raw Data in the Google Spreadsheet.

To leave a comment for the author, please follow the link and comment on their blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

BH 1.68.0-1

By Thinking inside the box

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

A new release of BH is now on CRAN. BH provides a large part of the Boost C++ libraries as a set of template headers for use by R, possibly with Rcpp as well as other packages.

This release both upgrades the version of Boost to the recently-released upstream version Boost 1.60.0 and also adds Boost Phoenix.

A brief summary of changes from the NEWS file is below.

Changes in version 1.60.0-1 (2015-12-24)

  • Upgraded to Boost 1.60 installed directly from upstream source

  • Added Boost phoenix as discussed in GH ticket #19

Courtesy of CRANberries, there is also a diffstat report for the most recent release.

Comments and suggestions are welcome via the mailing list or the issue tracker at the GitHubGitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

World Map Panel Plots with ggplot2 2.0 & ggalt

By hrbrmstr

facetmaps

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

James Austin (@awhstin) made some #spiffy 4-panel maps with base R graphics but also posited he didn’t use ggplot2 because:

ggplot2 and maps currently do not support world maps at this point, which does not give us a great overall view.

That is certainly a box I would not put ggplot2 into, especially with the newly updated R maps (et al) packages, ggplot2 2.0 and my (still in development) ggalt package (though this was all possible before ggplot2 2.0 and ggalt). NOTE: I have no idea why I get so defensive about ggplot2 besides the fact that it’s one the best visualization tools ever created.

Here’s all you need to use the built-in facet options of ggplot2 to make the 4-panel plot (as James points out, you can get the data file from here: (CLIWOC15.csv)[http://www.austinwehrwein.com/wp-content/uploads/2015/12/CLIWOC15.csv]):

library(ggplot2)  # FYI you need v2.0
library(dplyr)    # yes, i could have not done this and just used 'subset' instead of 'filter'
library(ggalt)    # devtools::install_github("hrbrmstr/ggalt")
library(ggthemes) # theme_map and tableau colors
 
world <- map_data("world")
world <- world[world$region != "Antarctica",] # intercourse antarctica
 
dat <- read.csv("CLIWOC15.csv")        # having factors here by default isn't a bad thing
dat <- filter(dat, Nation != "Sweden") # I kinda feel bad for Sweden but 4 panels look better than 5 and it doesn't have much data
 
gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=region),
                    color="white", fill="#7f7f7f", size=0.05, alpha=1/4)
gg <- gg + geom_point(data=dat, 
                      aes(x=Lon3, y=Lat3, color=Nation), 
                      size=0.15, alpha=1/100)
gg <- gg + scale_color_tableau()
gg <- gg + coord_proj("+proj=wintri")
gg <- gg + facet_wrap(~Nation)
gg <- gg + theme_map()
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(legend.position="none")
gg

You can use a separate shapefile if you want, but this is quite minimalist (a feature James suggests is desirable) and emphasizes the routes quite nicely IMO.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

ggplot2 version 2 adds extensibility and other improvements

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Despite the ggplot2 project — the most popular data visualization package for R — being in maintenance mode, RStudio’s Hadley Wickham has given the R community a surprise gift with a version 2.0.0 update for ggplot2. According to Hadley this is a “huge” update with more than 100 fixes and improvements.

The most significant addition is that ggplot2 now has a formal extension mechanism, which means that package authors can now create their own geoms (plot types), statistics (data aggregation/transformation methods) and themes. There are also a number of smaller improvements, including making it easy to draw curved lines between points with geom_curve, a way to suppress overlapping text labels, and a way to add labels with rounded enclosing boxes to plots. There are also a few minor changes to the appearance of plots, with some changes to colors and text sizes to improve readability and (my personal favorite) the elimination of the diagonal line that used to appear in the color boxes in legends.

If you’re new to ggplot2, getting to grips with the Grammar of Graphics system can be a steep learning curve, but it’s well worth it in terms of long-term productivity in creating beautiful graphics with R. Hadley Wickham’s book, Elegant Graphics for Data Analysis, is a great place to get started. Experienced ggplot2 users will also appreciate the freshly-updated ggplot2 cheatsheet from the RStudio team.

This new update is mostly compatible with older versions of ggplot2, but it may require updates to existing code in some cases. (If you want to try the new ggplot2 but still have access to old versions for compatibility, take a look at the checkpoint package.) The updated ggplot2 package is now available for download via CRAN with install.packages(“ggplot2”), or via the ggplot2 Github repository.

RStudio Blog: ggplot2 2.0.0

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News