Michael Levin - MatrixNet Applications at Yandex

MatrixNet
Michael Levin
Chief Data Scientist

Yandex Data Factory
› Created in 2014
› Machine Learning for other
industries
› Computing resources
› Machine Learning infrastructure
› Data scientists
3

› Gradient Boosting over Decision Trees
› Classification, Regression, Ranking
› Strong results with default parameters
› Easy to use
› Highly optimized
› Training can be local or parallelized on a cluster
What is MatrixNet?
4

› Web search ranking
› Ads click prediction
› External projects of YDF
› Recommendations
MatrixNet applications at Yandex
› Bot detection
› Resolving homonymy
› User segmentation
› …
5

› Oblivious trees
Some tricks & features
6

Regular vs oblivious trees
7
Decision Tree Oblivious Trees
F1>3
F2>3
F1>6
F1>3
F2>3 F2>3
F2
F1
F2
F1

› Oblivious trees
› Leaf regularization
› Gradually increase model complexity
› Different objectives: MSE, Log-loss, combinations and non-standard
› Feature binarization
› Estimates feature importance and correlation
Some tricks & features
8

› Train based on judged (query, document) pairs
› Excellent, Good, Moderate, Bad, Stupid
› Features: query, document, query-document, url, host, link,…
› Multiclassification, objective = cross-entropy
› Regression: Excellent = 1, Stupid = 0, Good = 0.8, objective = MSE
› Ranking: objective - nDCG
Ranking
9

› Non-smooth, so no gradient for gradient boosting
› Approximate smooth ranking objective
› Alternative - pairwise approach
› P(r(dij) < r(dik)) = σ(M(dij) – M(dik))
› Maximize likelihood of data given predictions
Ranking
10

› Search ads
› User enters query or clicks a link, advertiser enters keywords
› Match query and keywords, then show the best ads
› Which are the best?
› Relevant ads which maximize revenue
› Expected money = P(click) * Bid
› Goodness = P(click) * Bid * Relevance
Ads click prediction
11

› Need to estimate probability of click
› Solution: use log-loss
› P(click) = σ(M(ad))
› Maximizes likelihood of data given the predictions
› But if ranking doesn’t change, no point in better predictions
› Don’t waste model resources on approximating probabilities
› Use combination of classification and ranking objective
Ads click prediction
12

› Which telecom users are going to switch after a week?
› A week is needed to prepare churn prevention campaign
› Compared with telecom’s in-house model
› Metric – Lift-10%
› Won by 18.7% on CV, 11.5% on test data (churn rate grew 2x)
› Got most of this delta with first application of MatrixNet
Churn prediction in telecom
13

› Multiple category features
› Sparse features
› Can’t “divide” discretized features
› Continuous dependency
› “Golden feature”
MatrixNet limitations
14

› MatrixNet is GBDT with bells and whistles
› Handles numeric and categorical features
› Almost no tuning is needed
› Training is parallelized and optimized
› Often superior to other available models
› Some careful feature preparation is needed because of limitations
Conclusions
15

Contacts
mlevin@yandex-team.com
Michael Levin
Chief Data Scientist,
Yandex Data Factory

Michael Levin - MatrixNet Applications at Yandex

More Related Content

Viewers also liked (17)

Similar to Michael Levin - MatrixNet Applications at Yandex (20)

More from Machine Learning Prague (13)

Recently uploaded (20)

Michael Levin - MatrixNet Applications at Yandex