Machine Learning Results in R: one plot to rule them all!

By Bernardo Lares

(This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers)

To automate the process of modeling selection and evaluate the results with visualization, I have created some functions into my personal library and today I’m sharing the codes with you. I run them to evaluate and compare Machine Learning models as fast and easily as possible. Currently, they are designed to evaluate binary classification models results. Before we start, let me show you the final outcome so you know what we are trying to achieve here with just a simple R function:

So, let’s start!

The results object

First of all, we need to have a single list with all the results to facilitate the next steps. I am assuming on this step that you already designed a model and can calculate the predictions out of your test set. So, on my list I have the following objects:

  • Project name (i.e. Fraud Score)
  • Model (the object with our model)
  • Test Scores:
    • Index (row id, it can be a user_id, email, lead_id…)
    • Tag (known label)
    • Score (calculated with the model we are studying)
  • Datasets:
    • Train set
    • Test set
  • Parameters:
    • nfolds, ntrees, max_depth, seed, sample_rate….
  • Variable importance
  • Metrics:
    • log_loss
    • auc
  • Notes (anything you’d like to write to give you a reference later on)
  • Once we automate our results object, we can start with our beautiful plots!

    Density Plot

    I have always given importance to the density plot because it gives us visual information on skewness, distribution and our model’s facility to distinguish each class. Here we can see how the model has distributed both our categories, our whole test set and the cumulative of each category (the more separate, the better).


    Gives this plot:

    ROC Curve

    The ROC curve will give us an idea of how our model is performing with our test set. You should know by now that if the AUC is close to 50% then the model is as good as a random selector; on the other hand, if the AUC is near 100% then you have a “perfect model” (wanting or not, you must have been giving the model the answer this whole time!). So it is always good to check this plot and check that we are getting a reasonable Area Under the Curve with a nice and closed 95% confidence range.

    # ROC Curve

    Gives this plot:

    Cuts by quantile

    If we'd have to cut the score in n equal-sized buckets, what would the score cuts be? Is the result a ladder (as it should), or a huge wall, or a valley? Is our score distribution lineal and easy to split?

    mplot_cuts  25) {
        stop("You should try with less splits!")

    Gives this plot:

    Split and compare quantiles

    This parameter is the easiest to sell to the C-level guys. “Did you know that with this model, if we chop the worst 20% of leads we would have avoided 60% of the frauds and only lose 8% of our sales?” That's what this plot will give you:

    mplot_splits  10) {
        stop("You should try with less splits!")
      df % group_by(quantile) %>%
        summarise(n = n(), 
                  max_score = round(100 * max(score), 1), 
                  min_score = round(100 * min(score), 1)) %>%
        mutate(quantile_tag = paste0(quantile," (",min_score,"-",max_score,")"))
      p % 
        mutate(quantile = ntile(score, splits)) %>% 
        group_by(quantile, facet, tag) %>% tally() %>%
        ungroup() %>% group_by(facet, tag) %>% 
        arrange(desc(quantile)) %>%
        mutate(p = round(100*n/sum(n),2),
               cum = cumsum(100*n/sum(n))) %>%
        left_join(names, by = c("quantile")) %>%
        ggplot(aes(x = as.character(tag), y = p, label = as.character(p),
                   fill = as.character(quantile_tag))) + theme_minimal() +
        geom_col(position = "stack") +
        geom_text(size = 3, position = position_stack(vjust = 0.5), check_overlap = TRUE) +
        xlab("Tag") + ylab("Total Percentage by Tag") +
        guides(fill = guide_legend(title=paste0("~",npersplit," p/split"))) +
        labs(title = "Tag vs Score Splits Comparison") +
        scale_fill_brewer(palette = "Spectral")
      if(! {

    Gives this plot:

    Finally, let's plot our results

    Once we have defined these functions above, we can create a new one that will bring everything together into one single plot. If you pay attention to the variables needed to create this dashboard you would notice it actually only needs two: the label or tag, and the score. You can customize the splits for the upper right plot, set a subtitle, define the model's name, save it in a new folder, change the image's name.


    That's it. This dashboard will give us almost everything we need to visually evaluate our model's performance into the test set.

    One bonus tip for these plots: you can set the subtitle and subdirectory before you plot everything so you don't have to change it whenever you are trying a new model.


    Bonus: Variables Importance

    If you are working with a ML algorithm that let's you see the importance of each variable, you can use the following function to see the results:


    Gives this plot:

    Hope you guys enjoyed this post and any further comments or suggestions are more than welcome. Not a programmer here but I surely enjoy sharing my code and ideas! Feel free to connect with me in LinkedIn and/or write below in the comments.

      Related Post

      1. Seaborn Categorical Plots in Python
      2. Matplotlib Library Tutorial with Examples – Python
      3. Visualize the World Cup with R! Part 1: Recreating Goals with ggsoccer and ggplot2
      4. Creating Slopegraphs with R
      5. How to use paletteR to automagically build palettes from pictures
      To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

      Source:: R News