SlideShare a Scribd company logo
How to Effectively Combine
Numerical Features and Categorical Features
June 14th, 2017
Liangjie Hong
Head of Data Science, Etsy Inc.
• Head of Data Science
- Etsy Inc. in NYC, NY (2016. – Present)
- Search & Discovery; Personalization and Recommendation; Computational Advertising
• Senior Manager of Research
- Yahoo Research in Sunnyvale, CA (2013 – 2016)
Leading science efforts for personalization and search sciences
• Published papers in SIGIR, WWW, KDD, CIKM, AAAI, WSDM, RecSys and ICML
• 3 Best Paper Awards, 2000+ Citations with H-Index 18
• Program committee members in KDD, WWW, SIGIR, WSDM, AAAI, EMNLP, ICWSM, ACL, CIKM,
IJCAI and various journal reviewers
Liangjie Hong
About This Paper
• Authors
Qian Zhao, PhD Student from University of Minnesota
Yue Shi, Research Scientist at Facebook
Liangjie Hong, Head of Data Science at Etsy Inc.
• Paper Venue
Full Research Paper in The 26th International World Wide Web Conference, 2017 (WWW 2017)
High-Level Takeaways
• A new family of models to handle categorical features and numerical features well by combining
embedding models and tree-based models
• A simple learning algorithm that can be easily extended from existing data mining and machine learning
toolkits
• State-of-the-art performance on major datasets
Why we need GB-CENT
Why we need GB-CENT
Motivations
• Real-World Data
Categorical features: user ids, item ids, words, document ids, ...
Numerical features: dwell time, average purchase prices, click-through-rate,...
Why we need GB-CENT
Motivations
• Real-World Data
Categorical features: user ids, item ids, words, document ids, ...
Numerical features: dwell time, average purchase prices, click-through-rate,...
• Ideas
Converting categorical features into numerical ones (e.g., statistics, embedding methods, topic models...)
Converting numerical features into categorical ones (e.g., bucketizing, binary codes, sigmoid transformation...)
Why we need GB-CENT
Motivations
Two Families of Powerful Practical Data Mining and Machine Learning Tools
• Tree-based Models
Decision Trees, Random Forest, Gradient Boosted Decision Trees…
• Matrix-based Embedding Models
Matrix Factorization, Factorization Machines…
Why we need GB-CENT: Tree-based Models
• Pros:
Interpretability for simple trees
Effectiveness in certain tasks: IR ranking models
Simple and easy to train
Handle numerical features well
…
Why we need GB-CENT: Tree-based Models
• Pros:
Interpretability for simple trees
Effectiveness in certain tasks: IR ranking models
Simple and easy to train
Handle numerical features well
…
• Cons:
Need one-hot-encoding to handle categorical features and therefore
cannot easily handle features with large cardinality*
For complex trees, features might appear multiple times in a tree – hard
to explain
…
*TIANQI CHEN AND CARLOS GUESTRIN. XGBOOST: A SCALABLE TREE BOOSTING SYSTEM. KDD '16.
Why we need GB-CENT: Embedding-based Models
• Pros:
Predictive power
Effectiveness in certain tasks: recommender systems
Handle categorical features well through one-hot-encoding
…
Why we need GB-CENT: Embedding-based Models
• Pros:
Predictive power
Effectiveness in certain tasks: recommender systems
Handle categorical features well through one-hot-encoding
…
• Cons:
Numerical features usually need preprocessing and hard to handle.
Hard to interpret in general
…
Why we need GB-CENT
Tree-based models are good at numerical features.
Embedding models are good at categorical features.
Why not combine them two?
What is GB-CENT
What is GB-CENT
In a nutshell, GB-CENT is Gradient Boosted Categorical Embedding and Numerical Trees, which combines
• Matrix-based Embedding Models
Handle large-cardinality categorical features…
• Tree-based Models
Handle numerical features…
What is GB-CENT
In a nutshell, GB-CENT is Gradient Boosted Categorical Embedding and Numerical Trees, which combines
• Factorization Machines
Handle large-cardinality categorical features…
• Gradient Boosted Decision Trees
Handle numerical features…
What is GB-CENT
CAT-E (Factorization Machines)
• Bias term for each categorical feature
• Embedding for each categorical feature
• Interactions between meaningful categorical groups
e.g., users, items, age groups, gender...
No numerical features
What is GB-CENT
CAT-NT (Gradient Boosted Decision Trees)
• One tree per categorical feature (potentially)
• For each tree, the training data is all data instances with numerical features containing this particular
categorical feature.
No categorical features
What is GB-CENT
CAT-E (Factorization Machines)
• generalizes categorical features by embedding them into low-dimensional space.
CAT-NT (Gradient Boosted Decision Trees)
• memorizes each categorical feature’s peculiarities.
HENG-TZE CHENG, LEVENT KOC, JEREMIAH HARMSEN, TAL SHAKED, TUSHAR CHANDRA, HRISHI ARADHYE, GLEN ANDERSON, GREG CORRADO,
WEI CHAI, MUSTAFA ISPIR, ROHAN ANIL, ZAKARIA HAQUE, LICHAN HONG, VIHAN JAIN, XIAOBING LIU, AND HEMAL SHAH. WIDE & DEEP
LEARNING FOR RECOMMENDER SYSTEMS. IN PROCEEDINGS OF THE 1ST WORKSHOP ON DEEP LEARNING FOR RECOMMENDER
SYSTEMS (DLRS 2016). ACM, NEW YORK, NY, USA, 7-10.
What is GB-CENT
Different from GBDT:
• The number of trees in GB-CENT depends on the cardinality of categorical features in the data set, while GBDT has a
pre-specified number of trees M.
• Each tree in GB-CENT only takes numerical features as input while GBDT takes in both categorical and numerical
features.
• Learning a tree for GBDT uses all N instances in the data set while the tree for a categorical feature in GB-CENT only
involves its supporting instances.
What is GB-CENT
Training GB-CENT:
• Train CAT-E part firstly using Stochastic Gradient Descent (SGD)
• Train CAT-NT part secondly
What is GB-CENT
Training GB-CENT:
• Train CAT-E part firstly using Stochastic Gradient Descent (SGD)
• Train CAT-NT part secondly
-- 1) Sort categorical features by their support (how many data instances)
-- 2) Check whether we meet minTreeSupport
-- 3) Use maxTreeDepth and minNodeSplit to fit a tree
-- 4) Use minTreeGain to decide whether keeping a tree
How does GB-CENT perform
How does GB-CENT perform
• Datasets
MovieLens
Statistics: 240K users, 33K movies, 22M instances, 5 ratings
Categorical features: user_id, item_id, genre, language, country, grade
Numerical features: year, runTime, imdbVotes, imdbRating, metaScore
RedHat
Statistics: 151K customers, 7 categories, 2M instances, binary response
Categorical features: people_id, activity_category
Numerical features: activity characteristics
How does GB-CENT perform
• Datasets
MovieLens
Evaluation Metric: Root Mean Squared Error (RMSE)
RedHat
Evaluation Metric: Area Under the Curve (AUC)
80% of train, 10% of validation and 10% of testing
We also compare empirical training time.
How does GB-CENT perform
• Baselines
GB-CENT variants:
1) CAT-E
2) CAT-NT
3) GB-CENT
GBDT variants:
1) GBDT-OH: GBDT + One-hot-encoding for categorical features
2) GBDT-CE: Fit CAT-E firstly and then feed into GBDT
FM variants:
1) FM-S: Transform numerical features by sigmoid and feed into FM
2) FM-D: Transform numerical features by discretizing them and feed into FM
SVDFeature variants:
1) SVDFeature-S: Transform numerical features by sigmoid and feed into SVDFeature
2) SVDFeature-D: Transform numerical features by discretizing them and feed into SVDFeature
All latent dimensionality is 20. For GB-CENT, minTreeSupport = 50, minTreeGain = 0.0, minNodeSplit = 50 and maxTreeDepth
= 3.
How does GB-CENT perform
How does GB-CENT perform
Main takeaway: Learn many shallow small trees
How does GB-CENT perform
Main takeaway: Learn many shallow small trees
How does GB-CENT perform
GB-CENT
• Combine Factorization Machines (handle categorical features) and GBDT (handle numerical features) together
• Combine interpretable results and high predictive power
• Achieve high performance in real-world datasets
Summary
Questions

More Related Content

PPTX
ThinkFast: Scaling Machine Learning to Modern Demands
PPTX
Big data Intro - Presentation to OCHackerz Meetup Group
PPT
introduction to data mining tutorial
PPTX
Introduction to Datamining Concept and Techniques
PDF
Data mining and Machine learning expained in jargon free & lucid language
PPTX
"Demystifying Big Data by AIBDP.org
PPTX
Big Data Analytics
PPSX
De-Mystifying Big Data
ThinkFast: Scaling Machine Learning to Modern Demands
Big data Intro - Presentation to OCHackerz Meetup Group
introduction to data mining tutorial
Introduction to Datamining Concept and Techniques
Data mining and Machine learning expained in jargon free & lucid language
"Demystifying Big Data by AIBDP.org
Big Data Analytics
De-Mystifying Big Data

What's hot (20)

PDF
Demystifying Data Science with an introduction to Machine Learning
PPTX
3 classification
PDF
Data mining
PPSX
Next Generation Text Analytics Platform
PDF
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
PPTX
Data preparation and processing chapter 2
PPTX
Knowledge Discovery and Data Mining
PPTX
Introduction to Data Mining
PPTX
Data Mining : Concepts
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PDF
Training in Analytics and Data Science
PPTX
Data science and business analytics
PDF
Ch 1 intro_dw
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PPTX
Towards Visualization Recommendation Systems
PPTX
A Practical-ish Introduction to Data Science
PDF
Introduction to Data Mining
PPTX
Additional themes of data mining for Msc CS
PPTX
Analytics and Data Mining Industry Overview
PPTX
Data mining and its applications!
Demystifying Data Science with an introduction to Machine Learning
3 classification
Data mining
Next Generation Text Analytics Platform
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Data preparation and processing chapter 2
Knowledge Discovery and Data Mining
Introduction to Data Mining
Data Mining : Concepts
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Training in Analytics and Data Science
Data science and business analytics
Ch 1 intro_dw
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Towards Visualization Recommendation Systems
A Practical-ish Introduction to Data Science
Introduction to Data Mining
Additional themes of data mining for Msc CS
Analytics and Data Mining Industry Overview
Data mining and its applications!
Ad

Similar to How to Effectively Combine Numerical Features and Categorical Features (20)

PPT
Large Scale Data Mining using Genetics-Based Machine Learning
PPTX
Data Mining - The Big Picture!
PPTX
Towards a Comprehensive Machine Learning Benchmark
PPTX
rsec2a-2016-jheaton-morning
PDF
Data Mining algorithms PPT with Overview explanation.
PPT
Database Management System Processing.ppt
PPTX
Big Data 101 - An introduction
PPTX
Big Data Real Time Training in Chennai
PPTX
Azure Databricks for Data Scientists
PDF
Prepare your data for machine learning
PDF
Foundations of Machine Learning - StampedeCon AI Summit 2017
PPTX
Data Mining Lecture_1.pptx
PPTX
Deep feature synthesis
PPT
What is cluster analysis
PDF
Large Scale Data Mining using Genetics-Based Machine Learning
PPT
Data Mining- Unit-I PPT (1).ppt
PDF
Predict oscars (4:17)
PDF
Computational intelligence for big data analytics bda 2013
PPTX
Machine learning
PPTX
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
Large Scale Data Mining using Genetics-Based Machine Learning
Data Mining - The Big Picture!
Towards a Comprehensive Machine Learning Benchmark
rsec2a-2016-jheaton-morning
Data Mining algorithms PPT with Overview explanation.
Database Management System Processing.ppt
Big Data 101 - An introduction
Big Data Real Time Training in Chennai
Azure Databricks for Data Scientists
Prepare your data for machine learning
Foundations of Machine Learning - StampedeCon AI Summit 2017
Data Mining Lecture_1.pptx
Deep feature synthesis
What is cluster analysis
Large Scale Data Mining using Genetics-Based Machine Learning
Data Mining- Unit-I PPT (1).ppt
Predict oscars (4:17)
Computational intelligence for big data analytics bda 2013
Machine learning
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
Ad

More from Domino Data Lab (20)

PDF
What's in your workflow? Bringing data science workflows to business analysis...
PDF
The Proliferation of New Database Technologies and Implications for Data Scie...
PDF
Racial Bias in Policing: an analysis of Illinois traffic stops data
PPTX
Data Quality Analytics: Understanding what is in your data, before using it
PPTX
Supporting innovation in insurance with randomized experimentation
PPTX
Leveraging Data Science in the Automotive Industry
PDF
Summertime Analytics: Predicting E. coli and West Nile Virus
PPTX
Reproducible Dashboards and other great things to do with Jupyter
PDF
GeoViz: A Canvas for Data Science
PPTX
Managing Data Science | Lessons from the Field
PDF
Doing your first Kaggle (Python for Big Data sets)
PDF
Leveraged Analytics at Scale
PDF
How I Learned to Stop Worrying and Love Linked Data
PDF
Software Engineering for Data Scientists
PDF
Making Big Data Smart
PPTX
Moving Data Science from an Event to A Program: Considerations in Creating Su...
PPTX
Building Data Analytics pipelines in the cloud using serverless technology
PPTX
Leveraging Open Source Automated Data Science Tools
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
PDF
The Role and Importance of Curiosity in Data Science
What's in your workflow? Bringing data science workflows to business analysis...
The Proliferation of New Database Technologies and Implications for Data Scie...
Racial Bias in Policing: an analysis of Illinois traffic stops data
Data Quality Analytics: Understanding what is in your data, before using it
Supporting innovation in insurance with randomized experimentation
Leveraging Data Science in the Automotive Industry
Summertime Analytics: Predicting E. coli and West Nile Virus
Reproducible Dashboards and other great things to do with Jupyter
GeoViz: A Canvas for Data Science
Managing Data Science | Lessons from the Field
Doing your first Kaggle (Python for Big Data sets)
Leveraged Analytics at Scale
How I Learned to Stop Worrying and Love Linked Data
Software Engineering for Data Scientists
Making Big Data Smart
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Building Data Analytics pipelines in the cloud using serverless technology
Leveraging Open Source Automated Data Science Tools
Domino and AWS: collaborative analytics and model governance at financial ser...
The Role and Importance of Curiosity in Data Science

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Weekly Chronicles - August'25-Week II
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
A comparative analysis of optical character recognition models for extracting...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
Programs and apps: productivity, graphics, security and other tools
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Approach and Philosophy of On baking technology

How to Effectively Combine Numerical Features and Categorical Features

  • 1. How to Effectively Combine Numerical Features and Categorical Features June 14th, 2017 Liangjie Hong Head of Data Science, Etsy Inc.
  • 2. • Head of Data Science - Etsy Inc. in NYC, NY (2016. – Present) - Search & Discovery; Personalization and Recommendation; Computational Advertising • Senior Manager of Research - Yahoo Research in Sunnyvale, CA (2013 – 2016) Leading science efforts for personalization and search sciences • Published papers in SIGIR, WWW, KDD, CIKM, AAAI, WSDM, RecSys and ICML • 3 Best Paper Awards, 2000+ Citations with H-Index 18 • Program committee members in KDD, WWW, SIGIR, WSDM, AAAI, EMNLP, ICWSM, ACL, CIKM, IJCAI and various journal reviewers Liangjie Hong
  • 3. About This Paper • Authors Qian Zhao, PhD Student from University of Minnesota Yue Shi, Research Scientist at Facebook Liangjie Hong, Head of Data Science at Etsy Inc. • Paper Venue Full Research Paper in The 26th International World Wide Web Conference, 2017 (WWW 2017)
  • 4. High-Level Takeaways • A new family of models to handle categorical features and numerical features well by combining embedding models and tree-based models • A simple learning algorithm that can be easily extended from existing data mining and machine learning toolkits • State-of-the-art performance on major datasets
  • 5. Why we need GB-CENT
  • 6. Why we need GB-CENT Motivations • Real-World Data Categorical features: user ids, item ids, words, document ids, ... Numerical features: dwell time, average purchase prices, click-through-rate,...
  • 7. Why we need GB-CENT Motivations • Real-World Data Categorical features: user ids, item ids, words, document ids, ... Numerical features: dwell time, average purchase prices, click-through-rate,... • Ideas Converting categorical features into numerical ones (e.g., statistics, embedding methods, topic models...) Converting numerical features into categorical ones (e.g., bucketizing, binary codes, sigmoid transformation...)
  • 8. Why we need GB-CENT Motivations Two Families of Powerful Practical Data Mining and Machine Learning Tools • Tree-based Models Decision Trees, Random Forest, Gradient Boosted Decision Trees… • Matrix-based Embedding Models Matrix Factorization, Factorization Machines…
  • 9. Why we need GB-CENT: Tree-based Models • Pros: Interpretability for simple trees Effectiveness in certain tasks: IR ranking models Simple and easy to train Handle numerical features well …
  • 10. Why we need GB-CENT: Tree-based Models • Pros: Interpretability for simple trees Effectiveness in certain tasks: IR ranking models Simple and easy to train Handle numerical features well … • Cons: Need one-hot-encoding to handle categorical features and therefore cannot easily handle features with large cardinality* For complex trees, features might appear multiple times in a tree – hard to explain … *TIANQI CHEN AND CARLOS GUESTRIN. XGBOOST: A SCALABLE TREE BOOSTING SYSTEM. KDD '16.
  • 11. Why we need GB-CENT: Embedding-based Models • Pros: Predictive power Effectiveness in certain tasks: recommender systems Handle categorical features well through one-hot-encoding …
  • 12. Why we need GB-CENT: Embedding-based Models • Pros: Predictive power Effectiveness in certain tasks: recommender systems Handle categorical features well through one-hot-encoding … • Cons: Numerical features usually need preprocessing and hard to handle. Hard to interpret in general …
  • 13. Why we need GB-CENT Tree-based models are good at numerical features. Embedding models are good at categorical features. Why not combine them two?
  • 15. What is GB-CENT In a nutshell, GB-CENT is Gradient Boosted Categorical Embedding and Numerical Trees, which combines • Matrix-based Embedding Models Handle large-cardinality categorical features… • Tree-based Models Handle numerical features…
  • 16. What is GB-CENT In a nutshell, GB-CENT is Gradient Boosted Categorical Embedding and Numerical Trees, which combines • Factorization Machines Handle large-cardinality categorical features… • Gradient Boosted Decision Trees Handle numerical features…
  • 17. What is GB-CENT CAT-E (Factorization Machines) • Bias term for each categorical feature • Embedding for each categorical feature • Interactions between meaningful categorical groups e.g., users, items, age groups, gender... No numerical features
  • 18. What is GB-CENT CAT-NT (Gradient Boosted Decision Trees) • One tree per categorical feature (potentially) • For each tree, the training data is all data instances with numerical features containing this particular categorical feature. No categorical features
  • 19. What is GB-CENT CAT-E (Factorization Machines) • generalizes categorical features by embedding them into low-dimensional space. CAT-NT (Gradient Boosted Decision Trees) • memorizes each categorical feature’s peculiarities. HENG-TZE CHENG, LEVENT KOC, JEREMIAH HARMSEN, TAL SHAKED, TUSHAR CHANDRA, HRISHI ARADHYE, GLEN ANDERSON, GREG CORRADO, WEI CHAI, MUSTAFA ISPIR, ROHAN ANIL, ZAKARIA HAQUE, LICHAN HONG, VIHAN JAIN, XIAOBING LIU, AND HEMAL SHAH. WIDE & DEEP LEARNING FOR RECOMMENDER SYSTEMS. IN PROCEEDINGS OF THE 1ST WORKSHOP ON DEEP LEARNING FOR RECOMMENDER SYSTEMS (DLRS 2016). ACM, NEW YORK, NY, USA, 7-10.
  • 20. What is GB-CENT Different from GBDT: • The number of trees in GB-CENT depends on the cardinality of categorical features in the data set, while GBDT has a pre-specified number of trees M. • Each tree in GB-CENT only takes numerical features as input while GBDT takes in both categorical and numerical features. • Learning a tree for GBDT uses all N instances in the data set while the tree for a categorical feature in GB-CENT only involves its supporting instances.
  • 21. What is GB-CENT Training GB-CENT: • Train CAT-E part firstly using Stochastic Gradient Descent (SGD) • Train CAT-NT part secondly
  • 22. What is GB-CENT Training GB-CENT: • Train CAT-E part firstly using Stochastic Gradient Descent (SGD) • Train CAT-NT part secondly -- 1) Sort categorical features by their support (how many data instances) -- 2) Check whether we meet minTreeSupport -- 3) Use maxTreeDepth and minNodeSplit to fit a tree -- 4) Use minTreeGain to decide whether keeping a tree
  • 23. How does GB-CENT perform
  • 24. How does GB-CENT perform • Datasets MovieLens Statistics: 240K users, 33K movies, 22M instances, 5 ratings Categorical features: user_id, item_id, genre, language, country, grade Numerical features: year, runTime, imdbVotes, imdbRating, metaScore RedHat Statistics: 151K customers, 7 categories, 2M instances, binary response Categorical features: people_id, activity_category Numerical features: activity characteristics
  • 25. How does GB-CENT perform • Datasets MovieLens Evaluation Metric: Root Mean Squared Error (RMSE) RedHat Evaluation Metric: Area Under the Curve (AUC) 80% of train, 10% of validation and 10% of testing We also compare empirical training time.
  • 26. How does GB-CENT perform • Baselines GB-CENT variants: 1) CAT-E 2) CAT-NT 3) GB-CENT GBDT variants: 1) GBDT-OH: GBDT + One-hot-encoding for categorical features 2) GBDT-CE: Fit CAT-E firstly and then feed into GBDT FM variants: 1) FM-S: Transform numerical features by sigmoid and feed into FM 2) FM-D: Transform numerical features by discretizing them and feed into FM SVDFeature variants: 1) SVDFeature-S: Transform numerical features by sigmoid and feed into SVDFeature 2) SVDFeature-D: Transform numerical features by discretizing them and feed into SVDFeature All latent dimensionality is 20. For GB-CENT, minTreeSupport = 50, minTreeGain = 0.0, minNodeSplit = 50 and maxTreeDepth = 3.
  • 27. How does GB-CENT perform
  • 28. How does GB-CENT perform Main takeaway: Learn many shallow small trees
  • 29. How does GB-CENT perform Main takeaway: Learn many shallow small trees
  • 30. How does GB-CENT perform
  • 31. GB-CENT • Combine Factorization Machines (handle categorical features) and GBDT (handle numerical features) together • Combine interpretable results and high predictive power • Achieve high performance in real-world datasets Summary