Xgboost: A Scalable Tree Boosting System - Explained

1. XGBoost: A Scalable Tree Boosting System Simon Lia-Jonassen

2. Motivation  Used by majority of winning solutions on Kaggle, 2nd most popular method after DNN.  Also used by 10 best teams in KDDCup’15.  Applies to classification, regression and learning-to-rank tasks.  Usually outperforms alternatives in an out-of-the-box setting.  Combines a good theoretical foundation and a highly efficient implementation.  So, how does it work?

3. Decision Tree Boosting Number of trees Tree function, maps to a set of leaf weights Instance features

4. Regularized Learning Objective Prediction loss Complexity penalty Number of leaves L2 regularization on leaves weights

5. Regularized Learning Objective First order gradient of the loss function Second order gradient of the loss function By additive definition Where: However, for example:

6. Regularized Learning Objective By expansion: For each instance For each leaf For each instance in the leaf

7. Regularized Learning Objective Optimal leaf weight for a fixed structure: By substitution:

8. Gradient Tree Boosting Before we split Left split Right split Split penalty

9. Gradient Tree Boosting

10. Optimizations  Shrinkage  More trees  Column subsampling  Prevents over-fitting  Approximate split finding  Faster AUC convergence  Sparsity-aware split finding  Visit only non-missing values  Cache-aware parallel column block access  Fewer misses on large datasets  Block compression and sharding  Faster I/O for out-of-core computation

15. Optimizations

16. Further reading  The paper:  https://arxiv.org/pdf/1603.02754.pdf  XGBoost tutorial:  http://xgboost.readthedocs.io/en/latest/model.html  A great deck of slides:  https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf  A simple usage example:  https://www.kaggle.com/kevalm/xgboost-implementation-on-iris-dataset-python  DataCamp mini-course:  https://campus.datacamp.com/courses/extreme-gradient-boosting-with-xgboost

Xgboost: A Scalable Tree Boosting System - Explained

More Related Content

What's hot (20)

Similar to Xgboost: A Scalable Tree Boosting System - Explained (20)

More from Simon Lia-Jonassen (10)

Recently uploaded (20)

Xgboost: A Scalable Tree Boosting System - Explained