DART: Dropout Regulation in Boosting Ensembles

By statcompute

(This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers)

The dropout approach developed by Hinton has been widely employed in the context of deep learnings to prevent the deep neural network from over-fitting, as shown in https://statcompute.wordpress.com/2017/01/02/dropout-regularization-in-deep-neural-networks.

In the paper http://proceedings.mlr.press/v38/korlakaivinayak15.pdf, the dropout is also proposed to address the over-fitting in tree boosting ensembles, e.g. MART, caused by the so-called “over-specialization”. In particular, while first few trees added at the beginning of ensembles would dominate the model performance, the rest trees added later can only contribute to learning from a small subset, which increases the risk of over-fittings. The idea of DART is to build an ensemble by randomly dropping boosting tree members. The percentage of dropouts would determine the degree of regularization for tree ensembles.

Below is a demonstration showing the implementation of DART in the R xgboost package. First of all, after importing the data, we divided it into two pieces, one for training and the other for testing.

pkgs 

For the comparison purpose, we first developed a tree boosting ensemble without dropouts, as shown below. For the simplicity, all parameters were chosen heuristically. The max_depth is set to 3 due to the fact that the boosting tends to work well with weak learners. While the ROC for the training set can be as high as 0.95, the ROC for the testing set is only 0.60 in our case, implying the over-fitting issue.

mart.parm 

With the same set of parameters, we refitted the ensemble with dropouts, e.g. DART. As shown below, by dropping 10% tree members, the ROC for the testing set can be improved from 0.60 to 0.65. In addition, the performance disparity between training and testing sets with DART decreases significantly.

dart.parm 

Besides rate_drop = 0.1, a wide range of dropout rates have also been tested. In most cases, DART outperforms its counterpart without the dropout regulation.

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Source:: R News

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.