SlideShare a Scribd company logo
Michael Levin - MatrixNet Applications at Yandex
MatrixNet
Michael Levin
Chief Data Scientist
Yandex Data Factory
› Created in 2014
› Machine Learning for other
industries
› Computing resources
› Machine Learning infrastructure
› Data scientists
3
› Gradient Boosting over Decision Trees
› Classification, Regression, Ranking
› Strong results with default parameters
› Easy to use
› Highly optimized
› Training can be local or parallelized on a cluster
What is MatrixNet?
4
› Web search ranking
› Ads click prediction
› External projects of YDF
› Recommendations
MatrixNet applications at Yandex
› Bot detection
› Resolving homonymy
› User segmentation
› …
5
› Oblivious trees
Some tricks & features
6
Regular vs oblivious trees
7
Decision Tree Oblivious Trees
F1>3
F2>3
F1>6
F1>3
F2>3 F2>3
F2
F1
F2
F1
› Oblivious trees
› Leaf regularization
› Gradually increase model complexity
› Different objectives: MSE, Log-loss, combinations and non-standard
› Feature binarization
› Estimates feature importance and correlation
Some tricks & features
8
› Train based on judged (query, document) pairs
› Excellent, Good, Moderate, Bad, Stupid
› Features: query, document, query-document, url, host, link,…
› Multiclassification, objective = cross-entropy
› Regression: Excellent = 1, Stupid = 0, Good = 0.8, objective = MSE
› Ranking: objective - nDCG
Ranking
9
› Non-smooth, so no gradient for gradient boosting
› Approximate smooth ranking objective
› Alternative - pairwise approach
› P(r(dij) < r(dik)) = σ(M(dij) – M(dik))
› Maximize likelihood of data given predictions
Ranking
10
› Search ads
› User enters query or clicks a link, advertiser enters keywords
› Match query and keywords, then show the best ads
› Which are the best?
› Relevant ads which maximize revenue
› Expected money = P(click) * Bid
› Goodness = P(click) * Bid * Relevance
Ads click prediction
11
› Need to estimate probability of click
› Solution: use log-loss
› P(click) = σ(M(ad))
› Maximizes likelihood of data given the predictions
› But if ranking doesn’t change, no point in better predictions
› Don’t waste model resources on approximating probabilities
› Use combination of classification and ranking objective
Ads click prediction
12
› Which telecom users are going to switch after a week?
› A week is needed to prepare churn prevention campaign
› Compared with telecom’s in-house model
› Metric – Lift-10%
› Won by 18.7% on CV, 11.5% on test data (churn rate grew 2x)
› Got most of this delta with first application of MatrixNet
Churn prediction in telecom
13
› Multiple category features
› Sparse features
› Can’t “divide” discretized features
› Continuous dependency
› “Golden feature”
MatrixNet limitations
14
› MatrixNet is GBDT with bells and whistles
› Handles numeric and categorical features
› Almost no tuning is needed
› Training is parallelized and optimized
› Often superior to other available models
› Some careful feature preparation is needed because of limitations
Conclusions
15
Contacts
mlevin@yandex-team.com
Michael Levin
Chief Data Scientist,
Yandex Data Factory

More Related Content

PDF
Data Science Use cases in Banking
PDF
Андрей Гулин "Знакомство с MatrixNet"
PPT
INTERACTIVE INTELLIGENCE Expocontact11
DOCX
Mie kocok cirebon
PPT
Semillas Que Germinan
PPTX
ESTUDIO DE MERCADO - PARTE III
PDF
GreenTrac
Data Science Use cases in Banking
Андрей Гулин "Знакомство с MatrixNet"
INTERACTIVE INTELLIGENCE Expocontact11
Mie kocok cirebon
Semillas Que Germinan
ESTUDIO DE MERCADO - PARTE III
GreenTrac

Viewers also liked (17)

PDF
DOCX
LA CIUDAD MODERNA EN LA POÉTICA DE OLIVERIO GIRONDO
PDF
Edificio Oficinas de Peyber
PPT
Intro ch 01_a
PPT
Persönliches Wissensmanagement mit Semantischen Technologien
PPTX
Decor arte
PDF
Nec2011 cap.7-construccion con madera-021412
PDF
Curso SMTP avanzado
PDF
Nightsbridge 2012
PDF
¿Existen empresas siniestras?
PDF
Презентация для персонала компаний "Коммерческая тайна"
PDF
Mi coltivo orto-a-scuola_fondazione-catella
PDF
Increasing business success with TPI NEXT and PointZERO
PPTX
Chapter 04 it-8ed-volonino
PDF
App080 en special-roll-gap-measurement-calender-press
PDF
5º cancioneiro ceip da espiñeira aldán 2015
LA CIUDAD MODERNA EN LA POÉTICA DE OLIVERIO GIRONDO
Edificio Oficinas de Peyber
Intro ch 01_a
Persönliches Wissensmanagement mit Semantischen Technologien
Decor arte
Nec2011 cap.7-construccion con madera-021412
Curso SMTP avanzado
Nightsbridge 2012
¿Existen empresas siniestras?
Презентация для персонала компаний "Коммерческая тайна"
Mi coltivo orto-a-scuola_fondazione-catella
Increasing business success with TPI NEXT and PointZERO
Chapter 04 it-8ed-volonino
App080 en special-roll-gap-measurement-calender-press
5º cancioneiro ceip da espiñeira aldán 2015
Ad

Similar to Michael Levin - MatrixNet Applications at Yandex (20)

PDF
Recommender Systems @ Scale, Big Data Europe Conference 2019
PPTX
Intelligent Applications with Machine Learning Toolkits
PPTX
Building Low Latency ML Systems for Real-Time Model Predictions at Xandr
PDF
Using Machine Learning in the delivery of ads
PPTX
An Agile Approach to Machine Learning
PDF
Lessons learned from building practical deep learning systems
PDF
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
PPTX
Introduction overviewmachinelearning sig Door Lucas Jellema
PDF
Past, present, and future of Recommender Systems: an industry perspective
PPTX
PPTX
News recommender system from math model to production solution
PDF
Ai 管理人看人工智慧、發展與應用變革
PDF
Recommender Systems @ Scale - PyData 2019
PPTX
Predicting churn in telco industry: machine learning approach - Marko Mitić
PDF
AI and ML Skills for the Testing World Tutorial
PDF
Recommender systems in practice
PDF
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
PDF
Recsys2016 Tutorial by Xavier and Deepak
PPTX
Build a Neural Network for ITSM with TensorFlow
PDF
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
Recommender Systems @ Scale, Big Data Europe Conference 2019
Intelligent Applications with Machine Learning Toolkits
Building Low Latency ML Systems for Real-Time Model Predictions at Xandr
Using Machine Learning in the delivery of ads
An Agile Approach to Machine Learning
Lessons learned from building practical deep learning systems
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Introduction overviewmachinelearning sig Door Lucas Jellema
Past, present, and future of Recommender Systems: an industry perspective
News recommender system from math model to production solution
Ai 管理人看人工智慧、發展與應用變革
Recommender Systems @ Scale - PyData 2019
Predicting churn in telco industry: machine learning approach - Marko Mitić
AI and ML Skills for the Testing World Tutorial
Recommender systems in practice
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Recsys2016 Tutorial by Xavier and Deepak
Build a Neural Network for ITSM with TensorFlow
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
Ad

More from Machine Learning Prague (13)

PDF
Vít Listík - Email.cz workshop
PDF
Lukáš Vrábel - Deep Convolutional Neural Networks
PDF
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
PDF
Jan Pospíšil - Azure ML
PDF
Libor Mořkovský - Recognizing Malware
PDF
Adam Ashenfelter - Finding the Oddballs
PPTX
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
PPTX
Tomáš Mikolov - Distributed Representations for NLP
PDF
Kateřina Veselovská - ML Approaches to Sentiment Analysis
PPTX
Jiří Materna - Artificial Intelligence in Creative Writing
PPTX
Jan Šedivý - Intelligent Personal Assistants
PPTX
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
PPTX
Xuedong Huang - Deep Learning and Intelligent Applications
Vít Listík - Email.cz workshop
Lukáš Vrábel - Deep Convolutional Neural Networks
Tomáš Cícha - Machine Learning Solutions at Seznam.cz
Jan Pospíšil - Azure ML
Libor Mořkovský - Recognizing Malware
Adam Ashenfelter - Finding the Oddballs
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Tomáš Mikolov - Distributed Representations for NLP
Kateřina Veselovská - ML Approaches to Sentiment Analysis
Jiří Materna - Artificial Intelligence in Creative Writing
Jan Šedivý - Intelligent Personal Assistants
Marek Rosa - Inventing General Artificial Intelligence: A Vision and Methodology
Xuedong Huang - Deep Learning and Intelligent Applications

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
A Presentation on Artificial Intelligence
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
A Presentation on Artificial Intelligence
20250228 LYD VKU AI Blended-Learning.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf

Michael Levin - MatrixNet Applications at Yandex

  • 3. Yandex Data Factory › Created in 2014 › Machine Learning for other industries › Computing resources › Machine Learning infrastructure › Data scientists 3
  • 4. › Gradient Boosting over Decision Trees › Classification, Regression, Ranking › Strong results with default parameters › Easy to use › Highly optimized › Training can be local or parallelized on a cluster What is MatrixNet? 4
  • 5. › Web search ranking › Ads click prediction › External projects of YDF › Recommendations MatrixNet applications at Yandex › Bot detection › Resolving homonymy › User segmentation › … 5
  • 6. › Oblivious trees Some tricks & features 6
  • 7. Regular vs oblivious trees 7 Decision Tree Oblivious Trees F1>3 F2>3 F1>6 F1>3 F2>3 F2>3 F2 F1 F2 F1
  • 8. › Oblivious trees › Leaf regularization › Gradually increase model complexity › Different objectives: MSE, Log-loss, combinations and non-standard › Feature binarization › Estimates feature importance and correlation Some tricks & features 8
  • 9. › Train based on judged (query, document) pairs › Excellent, Good, Moderate, Bad, Stupid › Features: query, document, query-document, url, host, link,… › Multiclassification, objective = cross-entropy › Regression: Excellent = 1, Stupid = 0, Good = 0.8, objective = MSE › Ranking: objective - nDCG Ranking 9
  • 10. › Non-smooth, so no gradient for gradient boosting › Approximate smooth ranking objective › Alternative - pairwise approach › P(r(dij) < r(dik)) = σ(M(dij) – M(dik)) › Maximize likelihood of data given predictions Ranking 10
  • 11. › Search ads › User enters query or clicks a link, advertiser enters keywords › Match query and keywords, then show the best ads › Which are the best? › Relevant ads which maximize revenue › Expected money = P(click) * Bid › Goodness = P(click) * Bid * Relevance Ads click prediction 11
  • 12. › Need to estimate probability of click › Solution: use log-loss › P(click) = σ(M(ad)) › Maximizes likelihood of data given the predictions › But if ranking doesn’t change, no point in better predictions › Don’t waste model resources on approximating probabilities › Use combination of classification and ranking objective Ads click prediction 12
  • 13. › Which telecom users are going to switch after a week? › A week is needed to prepare churn prevention campaign › Compared with telecom’s in-house model › Metric – Lift-10% › Won by 18.7% on CV, 11.5% on test data (churn rate grew 2x) › Got most of this delta with first application of MatrixNet Churn prediction in telecom 13
  • 14. › Multiple category features › Sparse features › Can’t “divide” discretized features › Continuous dependency › “Golden feature” MatrixNet limitations 14
  • 15. › MatrixNet is GBDT with bells and whistles › Handles numeric and categorical features › Almost no tuning is needed › Training is parallelized and optimized › Often superior to other available models › Some careful feature preparation is needed because of limitations Conclusions 15