Why R? 2018 Conference – Registration and Call for Papers Opened

By Marcin Kosiński

(This article was first published on http://r-addict.com, and kindly contributed to R-bloggers)

The first edition of Polish R Users Conferences called Why R? took place on 27-29 September
at Warsaw University of Technology – Faculty of Mathematics and Information Science. The event was so successful that we’ve decided to launch a second edition of the conference.

About the Why R? 2018 conference

We are pleased to announce that Why R? 2018 Conference will be organized by STWUR (Wroclaw R Users Group). The second official meeting of Polish R enthusiasts will be held in Wroclaw 2-5 July 2018. As the meeting is held in English, we are happy to invite R users from other countries.

The main topic of this conference is very strongly based around mlr R package for machine learning (over 4500 downloads per month). The creator of the package, Bernd Bischl, will be an invited speaker of the Why R? 2018 and more people involved in the project will conduct workshops and give specialized talks on the mlr ecosystem. With that strong focus on machine learning, we hope to gather broader audience, including people for whom the R is a side-interest, but they are keen on learning more about data science.

Important dates

Registration

  • 09.03.2018: EARLY BIRD REGISTRATION OPENS
  • 06.05.2018: EARLY BIRD REGISTRATION ENDS
  • EARLY BIRD FEE: 450PLN/100EUR
  • STUDENT FEE: 200PLN/50EUR
  • REGULAR FEE: 650PLN/150EUR

Calls

  • 09.03.2018: ALL CALLS OPEN
  • 30.04.2018: WORKSHOP CALL CLOSES
  • 25.05.2018: PRESENTATION CALLS CLOSES
  • 01.06.2018: LIGHTNING TALKS CALL CLOSES

Abstract submissions are format free, but please do not exceed 400 words and state clearly a chosen call. The abstract submission form is available here or during the registration.

Keynotes

Among multiple workshops we are planning to host, the above keynotes have confirmed their talks at Why R? 2018: Bernd Bischl (Ludwig-Maximilians-University of Munich), Tomasz Niedzielski (University of Wroclaw), Thomas Petzoldt (Dresden University of Technology), Maciej Eder (Pedagogical University of Cracow), Leon Eyrich Jessen (Technical University of Denmark).

Programme

The following events will be hosted during the Why R? 2018 conference:

  • plenary lectures of invited speakers,
  • lightning talks,
  • poster session,
  • community session,
  • presentation of different R enthusiasts’ groups,
  • Why R? paRty,
  • session of sponsors,
  • workshops – blocks of several small to mediumsized courses (up to 50 people) for R users at different levels of proficiency.

Pre-meetings

We are organizing pre-meetings in many European cities to cultivate the R experience of knowledge sharing. You are more than welcome to visit upcoming event and check photos and presentations from previous ones. There are still few meetings that are being organized and are not yet added to the map. If you are interested in co-organizing a Why R? pre-meeting in your city, let us know (under kontakt_at_whyr.pl) and the Why R? Foundation can provide speakers for the venue!





Past event

Why R? 2017 edition, organized in Warsaw, gathered 200 participants. The Facebook reach of the conference page exceeds 15 000 users, with almost 800 subscribers. Our official web page had over 8000 unique visitors and over 12 000 visits in general. To learn more about Why R? 2017 see the conference after movie (https://vimeo.com/239259242).

To leave a comment for the author, please follow the link and comment on their blog: http://r-addict.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Amsterdam in an R leaflet nutshell

By Longhow Lam

amsterdamanimatie2

(This article was first published on R – Longhow Lam’s Blog, and kindly contributed to R-bloggers)

The municipal services of Amsterdam (The Netherlands) is providing open panorama images. See here and here. A camera car has driven around in the city, and now you can download these images.

Per neighborhood of Amsterdam I randomly sampled 20 images and put them in an animated gif using R magick and the put it on a interactive leaflet map.

Before you book your tickets to Amsterdam, have a quick look here on the leaflet first 🙂

To leave a comment for the author, please follow the link and comment on their blog: R – Longhow Lam’s Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Big changes behind the scenes in R 3.5.0

By David Smith

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

A major update to R is now available. The R Core group has announced the release of R 3.5.0, and binary versions for Windows and Linux are now available from the primary CRAN mirror. (The Mac release is forthcoming.)

Probably the biggest change in R 3.5.0 will be invisible to most users — except by the performance improvements it brings. The ALTREP project has now been rolled into R to use more efficient representations of many vectors, resulting in less memory usage and faster computations in many common situations. For example, the sequence vector 1:1000000 is now represented just by its start and end value, instead of allocating a vector of a million elements as earlier versions of R would do. So while R 3.4.3 takes about 1.5 seconds to run x on my laptop, it's instantaneous in R 3.5.0.

There have been improvements in other areas too, thanks to ALTREP. The output of the sort function has a new representation: it includes a flag indicating that the vector is already sorted, so that sorting it again is instantaneous. As a result, running x is now free the second and subsequent times you run it, unlike earlier versions of R. This may seem like a contrived example, but operations like this happen all the time in the internals of R code. Another good example is converting a numeric to a character vector: as.character(x) is now also instantaneous (the coercion to character is deferred until the character representation is actually needed). This has significant impact in R’s statistical modelling functions, which carry around a long character vector that usually contains just numbers — the row names — with the design matrix. As a result, the calculation:

d 

runs about 4x faster on my system. (It also uses a lot less memory: running the equivalent command with 10x more rows failed for me in R 3.4.3 but succeeded in 3.5.0.)

The ALTREP system is designed to be extensible, but in R 3.5.0 the system is used exclusively for the internal operations of R. Nonetheless, if you'd like to get a sneak peek on how you might be able to use ALTREP yourself in future versions of R, you can take a look at this vignette (with the caveat that the interface may change when it's finally released).

There are many other improvements in R 3.5.0 beyond the ALTREP system, too. You can find the full details in the announcement, but here are a few highlights:

  • All packages are now byte-compiled on installation. R’s base and recommended packages, and packages on CRAN, were already byte-compiled, so this will have the effect of improving the performance of packages installed from Github and from private sources.
  • R’s performance is better when many packages are loaded, and more packages can be loaded at the same time on Windows (when packages use compiled code).
  • Improved support for long vectors, by functions including object.size, approx and spline.
  • Reading in text data with readLines and scan should be faster, thanks to buffering on text connections.
  • R should handle some international data files better, with several bugs related to character encodings having been resolved.

Because R 3.5.0 is a major release, you will need to re-install any R packages you use. (The installr package can help with this.) On my reading of the release notes, there haven’t been any major backwardly-incompatible changes, so your old scripts should continue to work. Nonetheless, given the significant changes behind the scenes, it might be best to wait for a maintenance release before using R 3.5.0 for production applications. But for developers and data science work, I recommend jumping over to R 3.5.0 right away, as the benefits are significant.

You can find the details of what’s new in R 3.5.0 at the link below. As always, many thanks go to the R Core team and the other volunteers who have contributed to the open source R project over the years.

R-announce mailing list: R 3.5.0 is released

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

The current state of the Stan ecosystem in R

By Jonah

(This article was first published on R – Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers)

This post is by Jonah.

Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of Stan users who are releasing their own packages interfacing with rstan or one of our other packages.

Interfaces

rstanarm and brms: Version 2.17.4 of rstanarm and version 2.2.0 of brms were both released to provide compatibility with the new features in loo v2.0.0. Two of the new vignettes for the loo package show how to use it with rstanarm models, and we have also just released a draft of a vignette on how to use loo with brms and rstan for many “non-factorizable” models (i.e., observations not conditionally independent). brms is also now officially supported by the Stan Development Team (welcome Paul!) and there is a new category for it on the Stan Forums.

rstan: The next release of the rstan package (v2.18), is not out yet (we need to get Stan 2.18 out first), but it will include a loo() method for stanfit objects in order to save users a bit of work. Unfortunately, we can’t save you the trouble of having to compute the point-wise log-likelihood in your Stan program though! There will also be some new functions that make it a bit easier to extract HMC/NUTS diagnostics.

Visualization

bayesplot: A few weeks ago we released version 1.5.0 of the bayesplot package (mc-stan.org/bayesplot), which also integrates nicely with loo 2.0.0. In particular, the diagnostic plots using the leave-one-out cross-validated probability integral transform (LOO-PIT) from our paper Visualization in Bayesian Workflow (preprint on arXiv, code on GitHub) are easier to make with the latest bayesplot release. Also, TJ Mahr continues to improve the bayesplot experience for ggplot2 users by adding (among other things) more functions that return the data used for plotting in a tidy data frame.

shinystan: Unfortunately, there hasn’t been a shinystan release in a while because I’ve been busy with all of these other packages, papers, and various other Stan-related things. We’ll try to get out a release with a few bug fixes soon. (If you’re annoyed by the lack of new features in shinystan recently let me know and I will try to convince you to help me solve that problem!)

Other tools

projpred:Version 0.8.0 of the projpred package (I still need to make its website) for projection predictive variable selection for GLMs was also released shortly after the loo update in order to take advantage of the improvements to the Pareto smoothed importance sampling algorithm. projpred can already be used quite easily with rstanarm models and we are working on improving its compatibility with other packages for fitting Stan models.

rstantools: Unrelated to the loo update, we also released version 1.5.0 of the rstantools package (mc-stan.org/rstantools), which provides functions for setting up R packages interfacing with Stan. The major changes in this release are that usethis::create_package() is now called to set up the package (instead of utils::package.skeleton), fewer manual changes to files are required by users after calling rstan_package_skeleton(), and we have a new vignette walking through the process of setting up a package (thanks Stefan Siegert!). Work is being done to keep improving this process, so be on the lookout for more updates soonish.

Stan related R packages from other developers

There are now well over fifty packages on CRAN that depend in some way on one of our R packages mentioned above! You can find most of them by looking at the “Reverse dependencies” section on the CRAN page for rstan, but that doesn’t count the ones that depend on bayesplot, shinystan, loo, etc., but not rstan.

Unfortunately, given the growing number of these packages, we haven’t been able to look at each one of them in detail. For obvious reasons we prioritize giving feedback to developers who reach out to us directly to ask for comments and to those developers who make an effort to our recommendations for developers of R packages interfacing with Stan (included with the rstantools package since its initial release in 2016). If you are developing one of these packages and would like feedback please let us know on the Stan Forums. Our time is limited but we really do make a serious effort to answer every single question asked on the forums.

My primary feelings about this trend of developing Stan-based R packages are ones of excitement and gratification. It’s really such an honor to have so many people developing these packages based on all the work we’ve done! There are also a few things I’ve noticed that I hope will change going forward. I’ll wrap up this post by highlighting two of these issues that I hope developers will take seriously:

(1) Unit testing

(2) Naming user-facing functions

The number of these packages that have no unit tests (or very scant testing) is a bit scary. Unit tests won’t catch every possible bug (we have lots of tests for our packages and people still find bugs all the time), but there is really no excuse for not unit testing a package that you want other people to use. If you care enough to do everything required to create your package and get it on CRAN, and if you care about your users, then I think it’s fair to say that you should care enough to write tests for your package. And there’s really no excuse these days with the availability of packages like testthatto make this process easier than it used to be! Can anyone think of a reasonable excuse for not unit testing a package before releasing it to CRAN and expecting people to use it? (Not a rhetorical question. I really am curious given that it seems to be relatively common or at least not uncommon.) I don’t mean to be too negative here. There are also many packages that seem to have strong testing in place! My motivation for bringing up this issue is that it is in the best interest of our users.

Regarding function naming: this isn’t nearly as big of a deal as unit testing, it’s just something I think developers (including myself) of packages in the Stan R ecosystem can do to make the experience better for our users. rstanarm and brms both import the generic functions included with rstantools in order to be able to define methods with consistent names. For example, whether you fit a model with rstanarm or with brms, you can call log_lik() on the fitted model object to get the pointwise log-likelihood (it’s true that we still have a bit left to do to get the names across rstanarm and brms more standardized, but we’re actively working on it). If you are developing a package that fits models using Stan, we hope you will join us in trying to make it as easy as possible for users to navigate the Stan ecosystem in R.

The post The current state of the Stan ecosystem in R appeared first on Statistical Modeling, Causal Inference, and Social Science.

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Modeling, Causal Inference, and Social Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

R 3.5.0 is released! (major release with many new features)

By Tal Galili

logo

(This article was first published on R – R-statistics blog, and kindly contributed to R-bloggers)

R 3.5.0 (codename “Joy in Playing”) was released yesterday. You can get the latest binaries version from here. (or the .tar.gz source code from here).

This is a major release with many new features and bug fixes, the full list is provided below.

Upgrading R on Windows and Mac

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package. If you only see the option to upgrade to an older version of R, then change your mirror or try again in a few hours (it usually take around 24 hours for all CRAN mirrors to get the latest version of R).

If you are using Mac you can easily upgrade to the latest version of R using Andrea Cirillo’s updateR package. The package is not on CRAN, so you’ll need to run the following code in Rgui:

install.packages("devtools")
devtools::install_github("AndreaCirilloAC/updateR")
updateR(admin_password = "PASSWORD") # Where "PASSWORD" stands for your system password

Later this year Andrea and I intend to merge the updateR package into installr so that the updateR function will work seamlessly in both Windows and Mac. Stay tuned

CHANGES IN R 3.5.0

SIGNIFICANT USER-VISIBLE CHANGES

  • All packages are by default byte-compiled on installation. This makes the installed packages larger (usually marginally so) and may affect the format of messages and tracebacks (which often exclude .Call and similar).

NEW FEATURES

  • factor() now uses order() to sort its levels, rather than sort.list(). This allows factor() to support custom vector-like objects if methods for the appropriate generics are defined. It has the side effect of making factor() succeed on empty or length-one non-atomic vector(-like) types (e.g., "list"), where it failed before.
  • diag() gets an optional names argument: this may require updates to packages defining S4 methods for it.
  • chooseCRANmirror() and chooseBioCmirror() no longer have a useHTTPS argument, not needed now all R builds support https:// downloads.
  • New summary() method for warnings() with a (somewhat experimental) print() method.
  • (methods package.) .self is now automatically registered as a global variable when registering a reference class method.
  • tempdir(check = TRUE) recreates the tempdir() directory if it is no longer valid (e.g. because some other process has cleaned up the ‘/tmp‘ directory).
  • New askYesNo() function and "askYesNo" option to ask the user binary response questions in a customizable but consistent way. (Suggestion of PR#17242.)
  • New low level utilities ...elt(n) and ...length() for working with ... parts inside a function.
  • isTRUE() is more tolerant and now true in
       x 
    

    New function isFALSE() defined analogously to isTRUE().

  • The default symbol table size has been increased from 4119 to 49157; this may improve the performance of symbol resolution when many packages are loaded. (Suggested by Jim Hester.)
  • line() gets a new option iter = 1.
  • Reading from connections in text mode is buffered, significantly improving the performance of readLines(), as well as scan() and read.table(), at least when specifying colClasses.
  • order() is smarter about picking a default sort method when its arguments are objects.
  • available.packages() has two new arguments which control if the values from the per-session repository cache are used (default true, as before) and if so how old cached values can be to be used (default one hour).These arguments can be passed from install.packages(), update.packages() and functions calling that: to enable this available.packages(), packageStatus() anddownload.file() gain a ... argument.
  • packageStatus()‘s upgrade() method no longer ignores its ... argument but passes it to install.packages().
  • installed.packages() gains a ... argument to allow arguments (including noCache) to be passed from new.packages(), old.packages(), update.packages() and packageStatus().
  • factor(x, levels, labels) now allows duplicated labels (not duplicated levels!). Hence you can map different values of x to the same level directly.
  • Attempting to use names on an S4 derivative of a basic type no longer emits a warning.
  • The list method of within() gains an option keepAttrs = FALSE for some speed-up.
  • system() and system2() now allow the specification of a maximum elapsed time (‘timeout’).
  • debug() supports debugging of methods on any object of S4 class "genericFunction", including group generics.
  • Attempting to increase the length of a variable containing NULL using length() still has no effect on the target variable, but now triggers a warning.
  • type.convert() becomes a generic function, with additional methods that operate recursively over list and data.frame objects. Courtesy of Arni Magnusson (PR#17269).
  • lower.tri(x) and upper.tri(x) only needing dim(x) now work via new functions .row() and .col(), so no longer call as.matrix() by default in order to work efficiently for all kind of matrix-like objects.
  • print() methods for "xgettext" and "xngettext" now use encodeString() which keeps, e.g. "n", visible. (Wish of PR#17298.)
  • package.skeleton() gains an optional encoding argument.
  • approx(), spline(), splinefun() and approxfun() also work for long vectors.
  • deparse() and dump() are more useful for S4 objects, dput() now using the same internal C code instead of its previous imperfect workaround R code. S4 objects now typically deparse perfectly, i.e., can be recreated identically from deparsed code.dput(), deparse() and dump() now print the names() information only once, using the more readable (tag = value) syntax, notably for list()s, i.e., including data frames.

    These functions gain a new control option "niceNames" (see .deparseOpts()), which when set (as by default) also uses the (tag = value) syntax for atomic vectors. On the other hand, without deparse options "showAttributes" and "niceNames", names are no longer shown also for lists. as.character(list( c (one = 1))) now includes the name, as as.character(list(list(one = 1))) has always done.

    m:n now also deparses nicely when m > n.

    The "quoteExpressions" option, also part of "all", no longer quote()s formulas as that may not re-parse identically. (PR#17378)

  • If the option setWidthOnResize is set and TRUE, R run in a terminal using a recent readline library will set the width option when the terminal is resized. Suggested by Ralf Goertz.
  • If multiple on.exit() expressions are set using add = TRUE then all expressions will now be run even if one signals an error.
  • mclapply() gets an option affinity.list which allows more efficient execution with heterogeneous processors, thanks to Helena Kotthaus.
  • The character methods for as.Date() and as.POSIXlt() are more flexible via new arguments tryFormats and optional: see their help pages.
  • on.exit() gains an optional argument after with default TRUE. Using after = FALSE with add = TRUE adds an exit expression before any existing ones. This way the expressions are run in a first-in last-out fashion. (From Lionel Henry.)
  • On Windows, file.rename() internally retries the operation in case of error to attempt to recover from possible anti-virus interference.
  • Command line completion on :: now also includes lazy-loaded data.
  • If the TZ environment variable is set when date-time functions are first used, it is recorded as the session default and so will be used rather than the default deduced from the OS if TZ is subsequently unset.
  • There is now a [ method for class "DLLInfoList".
  • glm() and glm.fit get the same singular.ok = TRUE argument that lm() has had forever. As a consequence, in glm(*, method = ), user specified methods need to accept a singular.ok argument as well.
  • aspell() gains a filter for Markdown (‘.md‘ and ‘.Rmd‘) files.
  • intToUtf8(multiple = FALSE) gains an argument to allow surrogate pairs to be interpreted.
  • The maximum number of DLLs that can be loaded into R e.g. via dyn.load() has been increased up to 614 when the OS limit on the number of open files allows.
  • Sys.timezone() on a Unix-alike caches the value at first use in a session: inter alia this means that setting TZ later in the session affects only the current time zone and not the system one.Sys.timezone() is now used to find the system timezone to pass to the code used when R is configured with –with-internal-tzcode.
  • When tar() is used with an external command which is detected to be GNU tar or libarchive tar (aka bsdtar), a different command-line is generated to circumvent line-length limits in the shell.
  • system(*, intern = FALSE), system2() (when not capturing output), file.edit() and file.show() now issue a warning when the external command cannot be executed.
  • The “default” ("lm" etc) methods of vcov() have gained new optional argument complete = TRUE which makes the vcov() methods more consistent with the coef()methods in the case of singular designs. The former (back-compatible) behavior is given by vcov(*, complete = FALSE).
  • coef() methods (for lm etc) also gain a complete = TRUE optional argument for consistency with vcov().
    For "aov", both coef() and vcov() methods remain back-compatibly consistent, using the other default, complete = FALSE.
  • attach(*, pos = 1) is now an error instead of a warning.
  • New function getDefaultCluster() in package parallel to get the default cluster set via setDefaultCluster().
  • str(x) for atomic objects x now treats both cases of is.vector(x) similarly, and hence much less often prints "atomic". This is a slight non-back-compatible change producing typically both more informative and shorter output.
  • write.dcf() gets optional argument useBytes.
  • New, partly experimental packageDate() which tries to get a valid "Date" object from a package ‘DESCRIPTION‘ file, thanks to suggestions in PR#17324.
  • tools::resaveRdaFiles() gains a version argument, for use when packages should remain compatible with earlier versions of R.
  • ar.yw(x) and hence by default ar(x) now work when x has NAs, mostly thanks to a patch by Pavel Krivitsky in PR#17366. The ar.yw.default()‘s AIC computations have become more efficient by using determinant().
  • New warnErrList() utility (from package nlme, improved).
  • By default the (arbitrary) signs of the loadings from princomp() are chosen so the first element is non-negative.
  • If –default-packages is not used, then Rscript now checks the environment variable R_SCRIPT_DEFAULT_PACKAGES. If this is set, then it takes precedence over R_DEFAULT_PACKAGES. If default packages are not specified on the command line or by one of these environment variables, then Rscript now uses the same default packages as R. For now, the previous behavior of not including methods can be restored by setting the environment variable R_SCRIPT_LEGACY to yes.
  • When a package is found more than once, the warning from find.package(*, verbose=TRUE) lists all library locations.
  • POSIXt objects can now also be rounded or truncated to month or year.
  • stopifnot() can be used alternatively via new argument exprs which is nicer and useful when testing several expressions in one call.
  • The environment variable R_MAX_VSIZE can now be used to specify the maximal vector heap size. On macOS, unless specified by this environment variable, the maximal vector heap size is set to the maximum of 16GB and the available physical memory. This is to avoid having the R process killed when macOS over-commits memory.
  • sum(x) and sum(x1,x2,..,x) with many or long logical or integer vectors no longer overflows (and returns NA with a warning), but returns double numbers in such cases.
  • Single components of "POSIXlt" objects can now be extracted and replaced via [ indexing with 2 indices.
  • S3 method lookup now searches the namespace registry after the top level environment of the calling environment.
  • Arithmetic sequences created by 1:n, seq_along, and the like now use compact internal representations via the ALTREP framework. Coercing integer and numeric vectors to character also now uses the ALTREP framework to defer the actual conversion until first use.
  • Finalizers are now run with interrupts suspended.
  • merge() gains new option no.dups and by default suffixes the second of two duplicated column names, thanks to a proposal by Scott Ritchie (and Gabe Becker).
  • scale.default(x, center, scale) now also allows center or scale to be “numeric-alike”, i.e., such that as.numeric(.) coerces them correctly. This also eliminates a wrong error message in such cases.
  • par*apply and par*applyLB gain an optional argument chunk.size which allows to specify the granularity of scheduling.
  • Some as.data.frame() methods, notably the matrix one, are now more careful in not accepting duplicated or NA row names, and by default produce unique non-NA row names. This is based on new function .rowNamesDF(x, make.names = *) where the logical argument make.names allows to specify how invalid row names rNms are handled. .rowNamesDF() is a “workaround” compatible default.
  • R has new serialization format (version 3) which supports custom serialization of ALTREP framework objects. These objects can still be serialized in format 2, but less efficiently. Serialization format 3 also records the current native encoding of unflagged strings and converts them when de-serialized in R running under different native encoding. Format 3 comes with new serialization magic numbers (RDA3, RDB3, RDX3). Format 3 can be selected by version = 3 in save(), serialize() and saveRDS(), but format 2 remains the default for all serialization and saving of the workspace. Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.
  • The "Date" and “date-time” classes "POSIXlt" and "POSIXct" now have a working `length method, as wished in PR#17387.
  • optim(*, control = list(warn.1d.NelderMead = FALSE)) allows to turn off the warning when applying the default "Nelder-Mead" method to 1-dimensional problems.
  • matplot(.., panel.first = .) etc now work, as log becomes explicit argument and ... is passed to plot() unevaluated, as suggested by Sebastian Meyer in PR#17386.
  • Interrupts can be suspended while evaluating an expression using suspendInterrupts. Subexpression can be evaluated with interrupts enabled using allowInterrupts. These functions can be used to make sure cleanup handlers cannot be interrupted.
  • R 3.5.0 includes a framework that allows packages to provide alternate representations of basic R objects (ALTREP). The framework is still experimental and may undergo changes in future R releases as more experience is gained. For now, documentation is provided in https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.

UTILITIES

  • install.packages() for source packages now has the possibility to set a ‘timeout’ (elapsed-time limit). For serial installs this uses the timeout argument of system2(): for parallel installs it requires the timeout utility command from GNU coreutils.
  • It is now possible to set ‘timeouts’ (elapsed-time limits) for most parts of R CMD check via environment variables documented in the ‘R Internals’ manual.
  • The ‘BioC extra’ repository which was dropped from Bioconductor 3.6 and later has been removed from setRepositories(). This changes the mapping for 6–8 used by setRepositories(ind=).
  • R CMD check now also applies the settings of environment variables _R_CHECK_SUGGESTS_ONLY_ and _R_CHECK_DEPENDS_ONLY_ to the re-building of vignettes.
  • R CMD check with environment variable _R_CHECK_DEPENDS_ONLY_ set to a true value makes test-suite-management packages available and (for the time being) works around a common omission of rmarkdown from the VignetteBuilder field.

INSTALLATION on a UNIX-ALIKE

  • Support for a system Java on macOS has been removed — install a fairly recent Oracle Java (see ‘R Installation and Administration’ §C.3.2).
  • configure works harder to set additional flags in SAFE_FFLAGS only where necessary, and to use flags which have little or no effect on performance.In rare circumstances it may be necessary to override the setting of SAFE_FFLAGS.
  • C99 functions expm1, hypot, log1p and nearbyint are now required.
  • configure sets a -std flag for the C++ compiler for all supported C++ standards (e.g., -std=gnu++11 for the C++11 compiler). Previously this was not done in a few cases where the default standard passed the tests made (e.g. clang 6.0.0 for C++11).

C-LEVEL FACILITIES

  • ‘Writing R Extensions’ documents macros MAYBE_REFERENCED, MAYBE_SHARED and MARK_NOT_MUTABLE that should be used by package C code instead NAMED or SET_NAMED.
  • The object header layout has been changed to support merging the ALTREP branch. This requires re-installing packages that use compiled code.
  • ‘Writing R Extensions’ now documents the R_tryCatch, R_tryCatchError, and R_UnwindProtect functions.
  • NAMEDMAX has been raised to 3 to allow protection of intermediate results from (usually ill-advised) assignments in arguments to BUILTIN functions. Package C code usingSET_NAMED may need to be revised.

DEPRECATED AND DEFUNCT

  • Sys.timezone(location = FALSE) is defunct, and is ignored (with a warning).
  • methods:::bind_activation() is defunct now; it typically has been unneeded for years.The undocumented ‘hidden’ objects .__H__.cbind and .__H__.rbind in package base are deprecated (in favour of cbind and rbind).
  • The declaration of pythag() in ‘Rmath.h‘ has been removed — the entry point has not been provided since R 2.14.0.

BUG FIXES

  • printCoefmat() now also works without column names.
  • The S4 methods on Ops() for the "structure" class no longer cause infinite recursion when the structure is not an S4 object.
  • nlm(f, ..) for the case where f() has a "hessian" attribute now computes LL’ = H + µI correctly. (PR#17249).
  • An S4 method that “rematches” to its generic and overrides the default value of a generic formal argument to NULL no longer drops the argument from its formals.
  • Rscript can now accept more than one argument given on the #! line of a script. Previously, one could only pass a single argument on the #! line in Linux.
  • Connections are now written correctly with encoding "UTF-16LE". (PR#16737).
  • Evaluation of ..0 now signals an error. When ..1 is used and ... is empty, the error message is more appropriate.
  • (Windows mainly.) Unicode code points which require surrogate pairs in UTF-16 are now handled. All systems should properly handle surrogate pairs, even those systems that do not need to make use of them. (PR#16098)
  • stopifnot(e, e2, ...) now evaluates the expressions sequentially and in case of an error or warning shows the relevant expression instead of the full stopifnot(..) call.
  • path.expand() on Windows now accepts paths specified as UTF-8-encoded character strings even if not representable in the current locale. (PR#17120)
  • line(x, y) now correctly computes the medians of the left and right group’s x-values and in all cases reproduces straight lines.
  • Extending S4 classes with slots corresponding to special attributes like dim and dimnames now works.
  • Fix for legend() when fill has multiple values the first of which is NA (all colours used to default to par(fg)). (PR#17288)
  • installed.packages() did not remove the cached value for a library tree that had been emptied (but would not use the old value, just waste time checking it).
  • The documentation for installed.packages(noCache = TRUE) incorrectly claimed it would refresh the cache.
  • aggregate() no longer uses spurious names in some cases. (PR#17283)
  • object.size() now also works for long vectors.
  • packageDescription() tries harder to solve re-encoding issues, notably seen in some Windows locales. This fixes the citation() issue in PR#17291.
  • poly(, 3) now works, thanks to prompting by Marc Schwartz.
  • readLines() no longer segfaults on very large files with embedded '' (aka ‘nul’) characters. (PR#17311)
  • ns() (package splines) now also works for a single observation. interpSpline() gives a more friendly error message when the number of points is less than four.
  • dist(x, method = "canberra") now uses the correct definition; the result may only differ when x contains values of differing signs, e.g. not for 0-1 data.
  • methods:::cbind() and methods:::rbind() avoid deep recursion, thanks to Suharto Anggono via PR#17300.
  • Arithmetic with zero-column data frames now works more consistently; issue raised by Bill Dunlap.Arithmetic with data frames gives a data frame for ^ (which previously gave a numeric matrix).
  • pretty(x, n) for large n or large diff(range(x)) now works better (though it was never meant for large n); internally it uses the same rounding fuzz (1e-10) as seq.default() — as it did up to 2010-02-03 when both were 1e-7.
  • Internal C-level R_check_class_and_super() and hence R_check_class_etc() now also consider non-direct super classes and hence return a match in more cases. This e.g., fixes behaviour of derived classes in package Matrix.
  • Reverted unintended change in behavior of return calls in on.exit expressions introduced by stack unwinding changes in R 3.3.0.
  • Attributes on symbols are now detected and prevented; attempt to add an attribute to a symbol results in an error.
  • fisher.test(*, workspace = ) now may also increase the internal stack size which allows larger problem to be solved, fixing PR#1662.
  • The methods package no longer directly copies slots (attributes) into a prototype that is of an “abnormal” (reference) type, like a symbol.
  • The methods package no longer attempts to call length on NULL (during the bootstrap process).
  • The methods package correctly shows methods when there are multiple methods with the same signature for the same generic (still not fully supported, but at least the user can see them).
  • sys.on.exit() is now always evaluated in the right frame. (From Lionel Henry.)
  • seq.POSIXt(*, by = " DSTdays") now should work correctly in all cases and is faster. (PR#17342)
  • .C() when returning a logical vector now always maps values other than FALSE and NA to TRUE (as documented).
  • Subassignment with zero length vectors now coerces as documented (PR#17344).
    Further, x now signals an error ‘replacement has length zero‘ (or a translation of that) instead of doing nothing.
  • (Package parallel.) mclapply(), pvec() and mcparallel() (when mccollect() is used to collect results) no longer leave zombie processes behind.
  • R CMD INSTALL now produces the intended error message when, e.g., the LazyData field is invalid.
  • as.matrix(dd) now works when the data frame dd contains a column which is a data frame or matrix, including a 0-column matrix/d.f. .
  • mclapply(X, mc.cores) now follows its documentation and calls lapply() in case mc.cores = 1 also in the case mc.preschedule is false. (PR#17373)
  • aggregate(, drop=FALSE) no longer calls the function on parts but sets corresponding results to NA. (Thanks to Suharto Anggono’s patches in PR#17280).
  • The duplicated() method for data frames is now based on the list method (instead of string coercion). Consequently unique() is better distinguishing data frame rows, fixing PR#17369 and PR#17381. The methods for matrices and arrays are changed accordingly.
  • Calling names() on an S4 object derived from "environment" behaves (by default) like calling names() on an ordinary environment.
  • read.table() with a non-default separator now supports quotes following a non-whitespace character, matching the behavior of scan().
  • parLapplyLB and parSapplyLB have been fixed to do load balancing (dynamic scheduling). This also means that results of computations depending on random number generators will now really be non-reproducible, as documented.
  • Indexing a list using dollar and empty string (l$"") returns NULL.
  • Using usage{ data(, package="") } no longer produces R CMD check warnings.
  • match.arg() more carefully chooses the environment for constructing default choices, fixing PR#17401 as proposed by Duncan Murdoch.
  • Deparsing of consecutive ! calls is now consistent with deparsing unary - and + calls and creates code that can be reparsed exactly; thanks to a patch by Lionel Henry inPR#17397. (As a side effect, this uses fewer parentheses in some other deparsing involving ! calls.)

logo

To leave a comment for the author, please follow the link and comment on their blog: R – R-statistics blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Semantic dashboard – new open source R Shiny package

By Appsilon Data Science Blog

(This article was first published on Appsilon Data Science Blog, and kindly contributed to R-bloggers)

Semantic dashboard is on!

Are you fed up with ordinary shinydashboard look?

Give your app a new life with Semantic UI support. It cannot be any easier! Install semantic.dashboard and load it instead to your app. It’s compatible with classical shinydashboard! You don’t have to start from scratch:

#########################
library(shinydashboard) # ui  dashboardPage(
  dashboardHeader(title = "Basic dashboard"),
  dashboardSidebar(sidebarMenu(
      menuItem(tabName = "home", text = "Home", icon = icon("home")),
      menuItem(tabName = "another", text = "Another Tab", icon = icon("heart"))
  )),
  dashboardBody(
    fluidRow(
      box(plotOutput("plot1", height = 250)),
      box(
        title = "Controls",
        sliderInput("slider", "Number of observations:", 1, 100, 50)
      )
    )
  )
)

server  function(input, output) {
  set.seed(122)
  histdata  rnorm(500)
  output$plot1  renderPlot({
    data  histdata[seq_len(input$slider)]
    hist(data)
  })
}

shinyApp(ui, server)

We thrive to deliver the most awesome shiny apps for our clients. In the past we had to get through limitations of ordinary shiny dashboards. Couple months back we decided that it’s time to make the next step. We created our own dashboard package with full integration of Semantic UI.

semantic.dashboard offers basic functions for creating dashboard but not only. You can select from many Semantic UI Themes and easily adjust the look of your dashboard.

For specific installation guidelines and more examples visit dashboard’s Github page, or simply install version 0.1.1 from CRAN and check documentation:

install.packages("semantic.dashboard")

semantic.dashboard engine is based on our other successful package shiny.semantic. It helps you introduce semantic elements to all kinds of shiny apps. You might want to check it out as well here or get familiar with the whole family of our open source projects. semantic.dashboard is a next step of our mission to make shiny apps awesome!

Unleash your imagination and let us know what have you achieved!

Read the original post at
Appsilon Data Science Blog.

Follow Appsilon Data Science

To leave a comment for the author, please follow the link and comment on their blog: Appsilon Data Science Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

An Introduction to Greta

By R Views

(This article was first published on R Views, and kindly contributed to R-bloggers)

I was surprised by greta. I had assumed that the tensorflow and reticulate packages would eventually enable R developers to look beyond deep learning applications and exploit the TensorFlow platform to create all manner of production-grade statistical applications. But I wasn’t thinking Bayesian. After all, Stan is probably everything a Bayesian modeler could want. Stan is a powerful, production-level probability distribution modeling engine with a slick R interface, deep documentation, and a dedicated development team.

But greta lets users write TensorFlow-based Bayesian models directly in R! What could be more charming? greta removes the barrier of learning an intermediate modeling language while still promising to deliver high-performance MCMC models that run anywhere TensorFlow can go.

In this post, I’ll introduce you to greta with a simple model used by Richard McElreath in section 8.3 of his iconoclastic book: Statistical Rethinking: A Bayesian Course with Examples in R and Stan. This model seeks to explain the log of a country’s GDP based on a measure of terrain ruggedness while controlling for whether or not the country is in Africa. I am going to use it just to illustrate MCMC sampling with greta. The extended example in McElreath’s book, however, is a meditation on the subtleties of modeling interactions, and is well worth studying.

First, we load the required packages and fetch the data. DiagrammeR is for plotting the TensorFlow flow diagram of the model, and bayesplot is used to plot trace diagrams of the Markov chains. The rugged data set which provides 52 variables for 234 is fairly interesting, but we will use a trimmed-down data set with only 170 counties and three variables.

library(rethinking)
library(greta)
library(DiagrammeR)
library(bayesplot)
library(ggplot2)

# Example from section 8.3 Statistical Rethinking
data(rugged)
d 
##     log_gdp rugged cont_africa
## 3  7.492609  0.858           1
## 5  8.216929  3.427           0
## 8  9.933263  0.769           0
## 9  9.407032  0.775           0
## 10 7.792343  2.688           0
## 12 9.212541  0.006           0
set.seed(1234)

In this section of code, we set up the TensorFlow data structures. The first step is to move the data into greta arrays. These data structures behave similarly to R arrays in that they can be manipulated with functions. However, greta doesn’t immediately calculate values for new arrays. It works out the size and shape of the result and creates a place-holder data structure.

#data
g_log_gdp 

In this section, we set up the Bayesian model. All parameters need prior probability distributions. Note that the parameters a, bR, bA, bAR, sigma, and mu are all new greta arrays that don’t contain any data. a is 1 x 1 array and mu is a 170 x 1 array with one slot for each observation.

The distribution() function sets up the likelihood function for the model.

# Variables and Priors

a 
## greta array (variable following a normal distribution)
## 
##      [,1]
## [1,]  ?
# operations
mu 
## [1] 170   1
# likelihood
distribution(g_log_gdp) = normal(mu, sigma)

The model() function does all of the work. It fits the model and produces a fairly complicated object organized as three lists that contain, respectively, the R6 class, TensorFlow structures, and the various greta data arrays.

# defining the model
mod 
## List of 3
##  $ dag                 :Classes 'dag_class', 'R6' 
##   Public:
##     adjacency_matrix: function () 
##     build_dag: function (greta_array_list) 
##     clone: function (deep = FALSE) 
##     compile: TRUE
##     define_gradients: function () 
##     define_joint_density: function () 
##     define_tf: function () 
##     example_parameters: function (flat = TRUE) 
##     find_node_neighbours: function () 
##     get_tf_names: function (types = NULL) 
##     gradients: function (adjusted = TRUE) 
##     initialize: function (target_greta_arrays, tf_float = tf$float32, n_cores = 2L, 
##     log_density: function (adjusted = TRUE) 
##     make_names: function () 
##     n_cores: 4
##     node_list: list
##     node_tf_names: variable_1 distribution_1 data_1 data_2 operation_1 oper ...
##     node_types: variable distribution data data operation operation oper ...
##     parameters_example: list
##     send_parameters: function (parameters, flat = TRUE) 
##     subgraph_membership: function () 
##     target_nodes: list
##     tf_environment: environment
##     tf_float: tensorflow.python.framework.dtypes.DType, python.builtin.object
##     tf_name: function (node) 
##     trace_values: function ()  
##  $ target_greta_arrays :List of 5
##  $ visible_greta_arrays:List of 9

Plotting mod produces the TensorFlow flow diagram that shows the structure of the underlying TensorFlow model, which is simple for this model and easily interpretable.

# plotting
plot(mod)

Next, we use the greta function mcmc() to sample from the posterior distributions defined in the model.

# sampling
draws 
## 
## Iterations = 1:1000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##          Mean      SD Naive SE Time-series SE
## a      9.2225 0.13721 0.004339       0.004773
## bR    -0.2009 0.07486 0.002367       0.002746
## bA    -1.9485 0.23033 0.007284       0.004435
## bAR    0.3992 0.13271 0.004197       0.003136
## sigma  0.9527 0.04892 0.001547       0.001744
## 
## 2. Quantiles for each variable:
## 
##          2.5%     25%     50%     75%    97.5%
## a      8.9575  9.1284  9.2306  9.3183  9.47865
## bR    -0.3465 -0.2501 -0.1981 -0.1538 -0.05893
## bA    -2.3910 -2.1096 -1.9420 -1.7876 -1.50781
## bAR    0.1408  0.3054  0.3954  0.4844  0.66000
## sigma  0.8616  0.9194  0.9520  0.9845  1.05006

Now that we have the samples of the posterior distributions of the parameters in the model, it is straightforward to examine them. Here, we plot the posterior distribution of the interaction term.

mat 

Finally, we examine the trace plots for the MCMC samples using the greta function mcmc_trace(). The plots for each parameter appear to be stationary (flat, i.e., centered on a constant value) and well-mixed (there is no obvious correlation between points). mcmc_intervals() plots the uncertainty intervals for each parameter computed from posterior draws with all chains merged.

mcmc_trace(draws)

mcmc_intervals(draws)

So there it is – a Bayesian model using Hamiltonian Monte Carlo sampling built in R and evaluated by TensorFlow.

For an expert discussion of the model, have a look at McElreath’s book, described at the link above. For more on greta, see the package documentation. And please, do take the time to read about greta‘s namesake: Greta Hermann, a remarkable woman – mathematician, philosopher, educator, social activist, and theoretical physicist who found the error in John von Neuman’s “proof” of the “No hidden variables theorem” of Quantum Mechanics.

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Forecasting in NYC: 25-27 June 2018

By R on Rob J Hyndman

(This article was first published on R on Rob J Hyndman, and kindly contributed to R-bloggers)

In late June, I will be in New York to teach my 3-day workshop on Forecasting using R. Tickets are available at Eventbrite.
This is the first time I’ve taught this workshop in the US, having previously run it in the Netherlands and Australia. It will be based on the 2nd edition of my book “Forecasting: Principles and Practice” with George Athanasopoulos. All participants will get a print version of the book.

To leave a comment for the author, please follow the link and comment on their blog: R on Rob J Hyndman.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Mapping earthquakes off the Irish Coast using the leaflet package in R

By S. Walsh

histogram

(This article was first published on Environmental Science and Data Analytics, and kindly contributed to R-bloggers)

Ireland is a typically stable region from a seismic activity perspective as it is distant from major plate boundaries where subduction and sea-floor spreading occur. However, in reading the following article, I was surprised to discover that earthquakes occur quite frequently both off the Irish coast and within the country itself. Most of these events occur as a result of tension and pressure release in the crustal rock.

The Irish National Seismic Network (INSN) maintains a dataset on 144 local earthquakes dating back to February 1980. The variables in the dataset are event date, event time, latitude, longitude, magnitude, region and subjective intensity (with levels such as “felt”, “feltIII”, “feltIV”).

Distribution of local earthquake magnitudes

The median magnitude recorded since 1980 is 1.5 on the Richter Scale. The largest intensity recorded was magnitude 5.4 on the 19th July 1984 in the Lleyn Peninsula region of Wales. This event occurred near the village of Llanaelhaearn and is the location of the biggest earthquake in the past 50 years.

Average waiting time between events

How frequently do seismic events occur around Ireland and its coast? It turns out that the median time between events is just 25 days! The greatest interval (1,335 days) is between an event recorded in New Ross, Co. Waterford on the 19th April 2002 and a magnitude 2.8 in the Irish Sea on the 14th December 2005. The distribution of waiting times is plotted below with outliers included and removed.

boxplots

Geospatial mapping of earthquake distribution

The leaflet package for R was used to map the data. This wonderful package allows one to create excellent interactive data visualisations. The map below is a static .png with no interactivity but it shows the distribution well.

With the interactive version, each data point can be investigated by clicking on it to bring up a box containing additional information about the event. Zooming in and out is also possible with the interactive version as is changing the base map layer and other aesthetics.

geospatial

Packages used

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.

Hadley Wickham, Romain Francois, Lionel Henry and Kirill Müller (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4. https://CRAN.R-project.org/package=dplyr

Hadley Wickham and Jennifer Bryan (2017). readxl: Read Excel Files. R package version
1.0.0. https://CRAN.R-project.org/package=readxl

Joe Cheng, Bhaskar Karambelkar and Yihui Xie (2017). leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library. R package version 1.1.0. https://CRAN.R-project.org/package=leaflet

Ramnath Vaidyanathan, Yihui Xie, JJ Allaire, Joe Cheng and Kenton Russell (2018).
htmlwidgets: HTML Widgets for R. R package version 1.0.
https://CRAN.R-project.org/package=htmlwidgets

To leave a comment for the author, please follow the link and comment on their blog: Environmental Science and Data Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News

Predatory Journals and R

By Marcelo S. Perlin

(This article was first published on Marcelo S. Perlin, and kindly contributed to R-bloggers)

My paper about the penetration of predatory journals in Brazil, Is predatory publishing a real threat? Evidence from a large database study, just got published in Scientometrics!. The working paper version is available in SSRN.

This is a nice example of a data-intensive scientific work cycle, from gathering data to reporting results. Everything was done in R, using web scrapping algorithms, parallel processing, tidyverse packages and more. This was a special project for me, given its implications in science making in Brazil. It took me nearly one year to produce and execute the whole code. It is also a nice case of the capabilities of package ggplot2 in producing publication-ready figures. As a side output, our database of predatory journals is available as a shiny app.

More details about the study itself is available in the paper. Our abstract is as follows:

Using a database of potential, possible, or probable predatory scholarly open-access journals, the objective of this research is to study the penetration of predatory publications in the Brazilian academic system and the profile of authors in a cross-section empirical study. Based on a massive amount of publications from Brazilian researchers of all disciplines during the 2000–2015 period, we were able to analyze the extent of predatory publications using an econometric modeling. Descriptive statistics indicate that predatory publications represent a small overall proportion, but grew exponentially in the last 5 years. Departing from prior studies, our analysis shows that experienced researchers with a high number of non-indexed publications and PhD obtained locally are more likely to publish in predatory journals. Further analysis shows that once a journal regarded as predatory is listed in the local ranking system, the Qualis, it starts to receive more publications than non-predatory ones.

To leave a comment for the author, please follow the link and comment on their blog: Marcelo S. Perlin.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News