SlideShare a Scribd company logo
Iterative Methodology
for Personalization
Models Optimization
 Iterative Methodology for Personalization Models Optimization
Serving Layer
Serving Layer
Serving Layer
Serving Layer
 Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization
1.
2.
3.
 Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization
-
-
User Profile Lab
Model Learning
Framework
 Iterative Methodology for Personalization Models Optimization
1.
2.
1.
1.
1.
→
1.
1.
2.
a.
b.
1 2
4 5
3
1 2 3
4 5
 Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization
1 4
- Listings
- Real Listings
- Clicks
- User Profile
- Ad Profile
1
1
4
4
1 2
4
1 2
4
1 2
4
1 2
 Iterative Methodology for Personalization Models Optimization
=
<
 Iterative Methodology for Personalization Models Optimization
User Profile Lab
Model Learning
Framework
 Iterative Methodology for Personalization Models Optimization
Why User Profile Is Important
● Personalization
● Lookalikes modeling
● Interest Targeting
● and more….
User Profile Data Model
DocId Timestamp Feature Confidence
12 0 Sport Cars 1
42 -1 Sport Cars 1
43 -21 World Cup 1
55 -21 World Cup 1
Offline Profile
User Profile Data Model
DocId TS Feature Co
nf
12 0 Sport Cars 1
42 -1 Sport Cars 1
43 -21 World Cup 1
55 -21 World Cup 1
Category Conf
Sports
Cats
2
Soccer 2
Serving ProfileOffline Profile
User Profile - Boost Recent X2
DocId TS Feature Co
nf
12 0 Sport Cars 1
42 -1 Sport Cars 1
43 -21 World Cup 1
55 -21 World Cup 1
Category Conf
Sports
Cats
2
Soccer 1
Serving ProfileOffline Profile
Motivation - User Profile Tweaks
● Is this hypothesis true?
● What is the decay schema?
● Linear in time?
● Exponential in time?
● Potentially many trial & error cycles
Profile Lab - Basic Flow
● Static dataset of offline profiles
● Sequence of docids for each user
● Static feature mapping all docids
● Lean algo block needed to be implemented
● Transform offline profile to online
● Apply algo piece to generate online profile
● Generate KPIs
Profile Lab - Cross User KPI
Profile Lab - Cross User KPI
Profile Lab - Cross User KPI
Profile Lab - Cross User KPI
Profile Lab - Cross User KPI
Profile Lab - Cross User KPI
● Run cross user test 20K time
● Average and normalize result frequencies
Profile Lab - Advanced Flow (WIP)
Uuid, click, adid, document ids of interactions
11233455, true, 99837377, [11234342, 13424234, 3254534]
56456546, false, 3434888, [11234342, 23432444, 1213333, 23432423]
34564363, true, 11113333,[35245555, 463321111, 19938222]
…….
Profile Lab - Advanced Flow (WIP)
● Static supervised datasets of offline profiles
● Perso - with click
● LAL - in/out segment
● Model is built
● Model performance KPIs (auc etc)
 Iterative Methodology for Personalization Models Optimization
Motivation: Named Entities & Wikitags
Motivation: Named Entities & Wikitags
● Very high cardinality ~300-400K
● Precise user taste
● Big potential in perso
● Big money for user segmentation
● Hard to leverage as is
Machine
Learning
High
Applied
Statistics
High
Student
Loans
Medium
Teaching Medium
Technical
Tutorial
High
Recurrent
Neural Nets
High
Andrew Ng High
Motivation: Hard To Leverage
Gender Royal Age Tech Rich
Prince Harry 1 0.9 -0.05 0.3 0.9
Queen Elizabeth -1 0.99 0.9 -0.8 0.9
Apple inc. 0.1 -0.03 0.5 0.9 0.9
Machine Learning Students -0.7 0.02 -0.5 0.8 -0.6
Facebook 0 0.01 0.2 0.9 0.87
Embeddings: Dense Representation
Embeddings: Dense Representation
● Given a high confidence concept in a doc
● Context is other concepts
● Lots of training data in our DocStore
● Many existing libraries: word2vec, glove, starspace etc
● Good embedding model
Machine
Learning
High
Applied
Statistics
High
Student
Loans
Medium
Teaching Medium
Technical
Tutorial
High
Recurrent
Neural Nets
High
Andrew Ng High
Embeddings Based Models
Embeddings Based Models
● Major change in prod model architecture
● High dev costs
● Potential issue with Elastic
Static embedding cluster is a good fallback
Clustering Phase
● Cluster embedding vectors
● Cluster id = doc feature
● Concept vector => cluster id
● Easy integration with current architecture
Clustering - Many Hyperparameters
● Train embedding model : |D| docs, |C| coordinates
● Quick sanity over embeddings model
● Select most frequent N concepts
● Apply sk-learn clustering analysis method A
● Benchmark clusters using common metrics
● Qualitative cluster analysis look good?
● No: Try different |D|, |C|, N, A
● Yes: Implement test and run on lab
Clustering - Many Hyperparameters
● Starspace embeddings / Word2Vec
● Embedding Dim - 50, 100, 300
● Number of Clusters - 1000, 2000, 5000, 10000
● Clustering Algorithms - k-means, DB-Scan
Clustering Phase
● Pakistani Cricketers
● Philipino Celebrities
● Norwegian Politics
● Badminton
● Potato Dishes
● Spanish Football
Clustering Phase
Clustering Phase
Results - WEC vs WikiTags
First WEC model
● Word2Vec on 3 days data
● Only features which conf > 0.3
● No long tail clustering (freq>100)
Results - WEC vs Categories
Training with Modeling Framework
Training with Modeling Framework
Thank You

More Related Content

PDF
Building successful and secure products with AI and ML
PPTX
Webinar: Scaling MongoDB
PPTX
Apache Pinot Meetup Sept02, 2020
PDF
gDayX 2013 - Advanced AngularJS - Nicolas Embleton
PDF
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
PDF
Business Applications of Predictive Modeling at Scale
PDF
Role of Data Science in eCommerce
PDF
gDayX - Advanced angularjs
Building successful and secure products with AI and ML
Webinar: Scaling MongoDB
Apache Pinot Meetup Sept02, 2020
gDayX 2013 - Advanced AngularJS - Nicolas Embleton
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale
Role of Data Science in eCommerce
gDayX - Advanced angularjs

Similar to Iterative Methodology for Personalization Models Optimization (20)

PDF
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
PPT
Agile india2018 exp_report
PDF
Develop an App with the Odoo Framework
PDF
NLP Text Recommendation System Journey to Automated Training
PDF
Lean Analytics - How to Measure Your Product
PPTX
Revealing ALLSTOCKER
PDF
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
PPTX
RS in the context of Big Data-v4
PDF
Nicolas Embleton, Advanced Angular JS
PDF
Hadoop France meetup Feb2016 : recommendations with spark
PPTX
Prashant technical practices-tdd for xebia event
PDF
Dynamic Search and Beyond
DOCX
Java Developer
PDF
Data Science in the Elastic Stack
PDF
End to end MLworkflows
PDF
Project Training in Noida
DOCX
VISWAPAVAN _2015_v1
PPTX
Designing salesforce solutions for reuse - Josh Dennis
DOC
Aniruddha_Mukherjee_Jan_2015
PDF
Elasticsearch Performance Testing and Scaling @ Signal
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Agile india2018 exp_report
Develop an App with the Odoo Framework
NLP Text Recommendation System Journey to Automated Training
Lean Analytics - How to Measure Your Product
Revealing ALLSTOCKER
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
RS in the context of Big Data-v4
Nicolas Embleton, Advanced Angular JS
Hadoop France meetup Feb2016 : recommendations with spark
Prashant technical practices-tdd for xebia event
Dynamic Search and Beyond
Java Developer
Data Science in the Elastic Stack
End to end MLworkflows
Project Training in Noida
VISWAPAVAN _2015_v1
Designing salesforce solutions for reuse - Josh Dennis
Aniruddha_Mukherjee_Jan_2015
Elasticsearch Performance Testing and Scaling @ Signal
Ad

More from Sonya Liberman (7)

PDF
Recommender Systems @ Scale, Big Data Europe Conference 2019
PPTX
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
PDF
Recommender Systems @ Scale - PyData 2019
PDF
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
PPTX
From Spark to Elasticsearch and Back - Learning Large Scale Models for Conten...
PPTX
Looking at Content Recommendations through a Search Lens - Extended Version
PDF
Compact Hierarchical Explicit Semantic Representation
Recommender Systems @ Scale, Big Data Europe Conference 2019
Search-Based Serving Architecture of Embeddings-Based Recommendations (RecSys...
Recommender Systems @ Scale - PyData 2019
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
From Spark to Elasticsearch and Back - Learning Large Scale Models for Conten...
Looking at Content Recommendations through a Search Lens - Extended Version
Compact Hierarchical Explicit Semantic Representation
Ad

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mushroom cultivation and it's methods.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Getting Started with Data Integration: FME Form 101
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Group 1 Presentation -Planning and Decision Making .pptx
Unlocking AI with Model Context Protocol (MCP)
Mushroom cultivation and it's methods.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Getting Started with Data Integration: FME Form 101
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Tartificialntelligence_presentation.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Network Security Unit 5.pdf for BCA BBA.
TLE Review Electricity (Electricity).pptx
cloud_computing_Infrastucture_as_cloud_p
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
OMC Textile Division Presentation 2021.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Assigned Numbers - 2025 - Bluetooth® Document

Iterative Methodology for Personalization Models Optimization