SlideShare a Scribd company logo
Learned Embeddings for
Search and Discovery at Instacart
Sharath Rao
Data Scientist / Manager
Search and Discovery
Collaborators: Angadh Singh
v
Talk Outline
• Quick overview of Search and Discovery at Instacart
• Word2vec for product recommendations
• Extending Word2vec embeddings to improve search ranking
v
About me
• Worked on several ML products in catalog, search and
personalization at Instacart since 2015
• Currently leading full stack search and discovery team
@sharathrao
v
Search and Discovery@ Instacart
Help customers find what they are looking for
and discover what they might like
v
Grocery Shopping in “Low Dimensional Space”
Search
Restock
Discover
v
“Here is what I want”
Query has easily accessible
information content
“Here I am, what next?”
Context is the query, and
recommendations vary
as contexts do
Search Discovery
v
Entities we try to model
items
products
aisles
departments
retailers
queries
customers
brands
Most of our data products are
about modeling relationships
between them
(part of which is learning embeddings)
v
Common paradigm for search/discovery problems
1st phase: Candidate Generation
2st phase: Reranking
v
• Select top 100s from among potentially millions
• Must be fast and simple
• often not even a learned model
• Recall oriented
Candidate Generation
v
• Ranks fewer products, but users richer models/features
• Tuned towards high precision
• Often happens in real-time
Reranking
vWord2vec for Product Recommendations
v
“Frequently bought with” recommendations
Not necessarily
consumed together
Help customers shop for the next item
Probably
consumed together
v
Quick Tour of Word2vec
(simplified, with a view to develop some intuition)
v
• We need to represent words as features in a model/task
• Naive representation => one hot encoding
• Problem: Every word is dissimilar to every other word => not intuitive
One-hot encodings are good, but not good enough
v
• stadium, field and restaurant are in
some sense similar
• stadium and field are more similar
than stadium and restaurant
• game has another meaning/sense that
is similar to stadium/field/restaurant
What one-hot encodings fail to capture
Observation 1: “game at the stadium”
Observation 2: “game at the field”
Observation 3: “ate at the restaurant”
Observation 4: “met at the game”
Data What models must learn
None of these are easy to learn with one-hot encoded representations without
supplementing with hand-engineered features
vYou shall know a word by the company it keeps - Firth, J.R., 1957
Core motivation behind semantic word representations
v
• Learns a feature vector per word
• for 1M vocabulary and 100 dimensional vector, we learn 1M vectors => 100M numbers
• Vectors must be such that words appearing in similar contexts are closer than any random pair of words
• Must work with unlabeled training data
• Must scale to large (unlabeled) datasets
• Embedding space itself must be general enough that word representations are broadly applicable
Word2vec is a scalable way to learn semantic word representations
vWord2vec beyond text and words
v
On graphs where random walks are the sequences
v
On songs where plays are the sequences
v
Even the emojis weren’t spared!
v
You shall know a
word product
by the company it keeps by what its purchased with
v
We applied word2vec to purchase sequences
Typical purchasing session contains tens of cart adds
Sequences of products added to cart are the ‘sentences’
v
“Frequently Bought With” Recommendations
Extract
Training
Sequences
Learn
word2vec
representations
Eliminate
substitute
products
Event Data
Approximate
Nearest Neighbors
Cache
recommendations
v
Word2vec for product recommendations
Surfaces
complementary
products
v
Next step is to make recommendations contextual
Not ideal if
already shopped
for sugar recently
inappropriate if
allergic to walnuts
Not if favorite brand
of butter isn’t
otherwise popular
v
We see word2vec recommendations as a candidate generation step
vWord2vec for Search Ranking
v
The Search Ranking Problem
In response to a query, rank products in the order that
maximizes the probability of purchase
v
The Search Ranking Problem
Learning to
Rank Model
Candidate Generation
Boolean matching
BM25 ranking
Synonym/Semantic Expansion
(Online) Reranking
Matching features
Historical aggregates
Personalization features
v
Search Ranking - Training
Generate
training
features
Label Generation
with Implicit
Feedback
Event Data
Learning to Rank
Training
Model
Repository
v
Search Ranking - Scoring
Online reranker
query
Model
Repository
Reranked
products
Top N
products
Top N
productsprocessed query
Final
ranking
v
Features for Learning to Rank
Historical aggregates => normalized purchase counts
High coverage
Low precision
Low coverage
More precision
[query, product]
[query, product, user]
[query, brand]
[query, aisle]
[product]
But historical aggregates alone are not enough.
Because sparsity and cold start.
[user, brand]
v
Learning word2vec embeddings from search logs
• Representations learnt from wikipedia or petabyte text aren’t ideal for Instacart
• No constraint that word2vec models must be learnt on temporal or spatial sequences
• We constructed training data for word2vec from search logs
v
Training Contexts from Search Logs
• Create contexts that are observed and desirable
<query1> <purchased description of converted product1>
<query2> <purchased description of converted product2>
..
..
• Learn embeddings for each word in the unified vocabulary of query
tokens and catalog product descriptions
v
Examples of nearest neighbors in embeddings space
OK, so now we have word representations
Matching features require product features in the same space
bread
orowheat
bonaparte
zissel
ryebread
raisins
sunmade
monukka
raisinettes
fruitsource
mushrooms
cremini
portabello
shitake
v
Deriving other features from word embeddings
• One option is to create contexts with product identifiers in training sequences
• Promising but we will have a product cold start problem
Use learnt word embeddings to derive features for other
entities such as products, brand names, queries and users etc.
v
Simple averaging of word representations works well
Product
Average embeddings
of words in product
description
Brands:
Average embeddings of
products sold by the
brand
User
Average embeddings of
products purchased by
the user
Aisles/Departments
Average embeddings of
products in the aisle/
department
Word
Representation
Learnt from converted
search logs
Query
Average embeddings
of words in query
v
Wait, so we have 2 different representations for products?
Embeddings learnt from
purchase sequences
Embeddings derived from words
from search logs
Products are similar if
they bought together
Products are similar if
their descriptions are
semantically similar
v
Product representations in different embedding spaces
Embeddings learnt from
purchase sequences
Products are similar if
they bought together
v
Product representations in different embedding spaces
Embeddings derived from words
from search logs
Products are similar if
their descriptions are
semantically similar
v
Examples of nearest neighbors for products
Cinnamon Toast Crunch CerealGolden Grahams Cereal
Not much common between the product names
v
Examples of nearest neighbors for brands
v
Using word2vec features in search ranking
[query, product]
[query, brand]
[query, aisle]
[query, department]
[user, product]
[user, brand]
[user, aisle]
[user, department]
• Construct similarity scores between different entities as features
Matching Features
Personalization
Features
v
We saw significant improvement with word2vec features
98.0%
99.0%
100.0%
101.0%
102.0%
103.0%
104.0%
AUC Recall@10
Relative(Improvement(with(word2vec(features
Baseline With:word2vec:features
v
word2vec features rank high among top features
[query,(aisle](word2vec
[query,(product](historical(aggregate
[query,(department](word2vec
[query,(product](historical(aggregate
[user,(product](word2vec
lda(model(2
lda(model(1(
query(length
[user,(brand](word2vec
[query,(product](word2vec
bm25
position
XGBoost
logistic loss
with early stopping
vOther contextual recommendation problems
v
Broad based discovery oriented recommendations
Including from stores customers may have never shopped from
v
Run out of X?
Rank products by
repurchase probability
v
Introduce customers to products new in the catalog
Also mitigates product cold start problems
v
Replacement Product Recommendations
Mitigate adverse impact of
last-minute out of stocks
v
Data Products in Search and Discovery
• query autocorrection
• query spell correction
• query expansion
• deep matching/document expansion
• search ranking
• search advertising
• Substitute/replacements products
• Frequently bought with products
• Next basket recommendations
• Guided discovery
• Interpretable recommendations
Search Discovery
v
Thank you!
We are
hiring!
Senior Machine Learning Engineer http://bit.ly/2kzHpcg

More Related Content

PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PDF
Introduction to AI in the Retail Industry
PPTX
Change! Digital Transformation
PDF
Transfer Learning -- The Next Frontier for Machine Learning
PDF
Vioale digital marketing proposal 2
PPTX
A brief primer on OpenAI's GPT-3
PDF
Transfer Learning: An overview
PPTX
Deep learning presentation
Transfer Learning and Fine-tuning Deep Neural Networks
Introduction to AI in the Retail Industry
Change! Digital Transformation
Transfer Learning -- The Next Frontier for Machine Learning
Vioale digital marketing proposal 2
A brief primer on OpenAI's GPT-3
Transfer Learning: An overview
Deep learning presentation

What's hot (20)

PDF
Recent Trends in Personalization at Netflix
PDF
Homepage Personalization at Spotify
PDF
Applied Machine Learning for Ranking Products in an Ecommerce Setting
PDF
Context Aware Recommendations at Netflix
PDF
Deep Learning for Recommender Systems
PDF
DataEngConf SF16 - Recommendations at Instacart
PDF
Introducing Neo4j
PDF
AI & Big Data - Personalização da Jornada - PicPay - TDC
PDF
Calibrated Recommendations
PDF
Contextualization at Netflix
PDF
Recommender system algorithm and architecture
PDF
Déjà Vu: The Importance of Time and Causality in Recommender Systems
PDF
Deep Learning for Personalized Search and Recommender Systems
PPTX
Learning a Personalized Homepage
PPTX
Learn to Rank search results
PDF
Learning to Rank - From pairwise approach to listwise
PDF
Overview of recommender system
PDF
ML @ Instacart: Improving the quality of On-demand Grocery
PDF
Tutorial: Context In Recommender Systems
PDF
Data Science @ Instacart
Recent Trends in Personalization at Netflix
Homepage Personalization at Spotify
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Context Aware Recommendations at Netflix
Deep Learning for Recommender Systems
DataEngConf SF16 - Recommendations at Instacart
Introducing Neo4j
AI & Big Data - Personalização da Jornada - PicPay - TDC
Calibrated Recommendations
Contextualization at Netflix
Recommender system algorithm and architecture
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Learning a Personalized Homepage
Learn to Rank search results
Learning to Rank - From pairwise approach to listwise
Overview of recommender system
ML @ Instacart: Improving the quality of On-demand Grocery
Tutorial: Context In Recommender Systems
Data Science @ Instacart
Ad

Similar to Learned Embeddings for Search and Discovery at Instacart (20)

PPTX
Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
PDF
Value Proposition: How to leverage your company's most valuable asset in your...
PDF
Keyword Research-Misbah-Jalal-Siddiqui
PPTX
Job Seeking in a Web Based World
PDF
Keyword Research Process_Training Deck
PDF
DataEngConf 2017 - Machine Learning Models in Production
PDF
Inbound Growth for SaaS Scale-Ups #INBOUND18
PDF
meetup-talk
PPTX
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
PPT
Keyword Research Process
PPT
Seo Tools You Can Use Today
PPTX
online research
PPT
Lesson 07 Ist402 Keywords Take 02
PPT
2011 12 ECMOD360 Writing for Readers and Search Bots
PPTX
Competitive Keyword Research For SEO
PDF
The Salesforce Playbook- 6 Steps to Better Deployments
PDF
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
PPTX
Using the Wisdom of the Crowd for Content Excellence
PPTX
SEO Keyword Research & Mapping
PPSX
Search 1
Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
Value Proposition: How to leverage your company's most valuable asset in your...
Keyword Research-Misbah-Jalal-Siddiqui
Job Seeking in a Web Based World
Keyword Research Process_Training Deck
DataEngConf 2017 - Machine Learning Models in Production
Inbound Growth for SaaS Scale-Ups #INBOUND18
meetup-talk
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
Keyword Research Process
Seo Tools You Can Use Today
online research
Lesson 07 Ist402 Keywords Take 02
2011 12 ECMOD360 Writing for Readers and Search Bots
Competitive Keyword Research For SEO
The Salesforce Playbook- 6 Steps to Better Deployments
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
Using the Wisdom of the Crowd for Content Excellence
SEO Keyword Research & Mapping
Search 1
Ad

Recently uploaded (20)

PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Sustainable Sites - Green Building Construction
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Well-logging-methods_new................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
UNIT-1 - COAL BASED THERMAL POWER PLANTS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Sustainable Sites - Green Building Construction
Mechanical Engineering MATERIALS Selection
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Embodied AI: Ushering in the Next Era of Intelligent Systems
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Well-logging-methods_new................
CYBER-CRIMES AND SECURITY A guide to understanding
additive manufacturing of ss316l using mig welding
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Foundation to blockchain - A guide to Blockchain Tech
Internet of Things (IOT) - A guide to understanding
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Learned Embeddings for Search and Discovery at Instacart

  • 1. Learned Embeddings for Search and Discovery at Instacart Sharath Rao Data Scientist / Manager Search and Discovery Collaborators: Angadh Singh
  • 2. v Talk Outline • Quick overview of Search and Discovery at Instacart • Word2vec for product recommendations • Extending Word2vec embeddings to improve search ranking
  • 3. v About me • Worked on several ML products in catalog, search and personalization at Instacart since 2015 • Currently leading full stack search and discovery team @sharathrao
  • 4. v Search and Discovery@ Instacart Help customers find what they are looking for and discover what they might like
  • 5. v Grocery Shopping in “Low Dimensional Space” Search Restock Discover
  • 6. v “Here is what I want” Query has easily accessible information content “Here I am, what next?” Context is the query, and recommendations vary as contexts do Search Discovery
  • 7. v Entities we try to model items products aisles departments retailers queries customers brands Most of our data products are about modeling relationships between them (part of which is learning embeddings)
  • 8. v Common paradigm for search/discovery problems 1st phase: Candidate Generation 2st phase: Reranking
  • 9. v • Select top 100s from among potentially millions • Must be fast and simple • often not even a learned model • Recall oriented Candidate Generation
  • 10. v • Ranks fewer products, but users richer models/features • Tuned towards high precision • Often happens in real-time Reranking
  • 11. vWord2vec for Product Recommendations
  • 12. v “Frequently bought with” recommendations Not necessarily consumed together Help customers shop for the next item Probably consumed together
  • 13. v Quick Tour of Word2vec (simplified, with a view to develop some intuition)
  • 14. v • We need to represent words as features in a model/task • Naive representation => one hot encoding • Problem: Every word is dissimilar to every other word => not intuitive One-hot encodings are good, but not good enough
  • 15. v • stadium, field and restaurant are in some sense similar • stadium and field are more similar than stadium and restaurant • game has another meaning/sense that is similar to stadium/field/restaurant What one-hot encodings fail to capture Observation 1: “game at the stadium” Observation 2: “game at the field” Observation 3: “ate at the restaurant” Observation 4: “met at the game” Data What models must learn None of these are easy to learn with one-hot encoded representations without supplementing with hand-engineered features
  • 16. vYou shall know a word by the company it keeps - Firth, J.R., 1957 Core motivation behind semantic word representations
  • 17. v • Learns a feature vector per word • for 1M vocabulary and 100 dimensional vector, we learn 1M vectors => 100M numbers • Vectors must be such that words appearing in similar contexts are closer than any random pair of words • Must work with unlabeled training data • Must scale to large (unlabeled) datasets • Embedding space itself must be general enough that word representations are broadly applicable Word2vec is a scalable way to learn semantic word representations
  • 19. v On graphs where random walks are the sequences
  • 20. v On songs where plays are the sequences
  • 21. v Even the emojis weren’t spared!
  • 22. v You shall know a word product by the company it keeps by what its purchased with
  • 23. v We applied word2vec to purchase sequences Typical purchasing session contains tens of cart adds Sequences of products added to cart are the ‘sentences’
  • 24. v “Frequently Bought With” Recommendations Extract Training Sequences Learn word2vec representations Eliminate substitute products Event Data Approximate Nearest Neighbors Cache recommendations
  • 25. v Word2vec for product recommendations Surfaces complementary products
  • 26. v Next step is to make recommendations contextual Not ideal if already shopped for sugar recently inappropriate if allergic to walnuts Not if favorite brand of butter isn’t otherwise popular
  • 27. v We see word2vec recommendations as a candidate generation step
  • 29. v The Search Ranking Problem In response to a query, rank products in the order that maximizes the probability of purchase
  • 30. v The Search Ranking Problem Learning to Rank Model Candidate Generation Boolean matching BM25 ranking Synonym/Semantic Expansion (Online) Reranking Matching features Historical aggregates Personalization features
  • 31. v Search Ranking - Training Generate training features Label Generation with Implicit Feedback Event Data Learning to Rank Training Model Repository
  • 32. v Search Ranking - Scoring Online reranker query Model Repository Reranked products Top N products Top N productsprocessed query Final ranking
  • 33. v Features for Learning to Rank Historical aggregates => normalized purchase counts High coverage Low precision Low coverage More precision [query, product] [query, product, user] [query, brand] [query, aisle] [product] But historical aggregates alone are not enough. Because sparsity and cold start. [user, brand]
  • 34. v Learning word2vec embeddings from search logs • Representations learnt from wikipedia or petabyte text aren’t ideal for Instacart • No constraint that word2vec models must be learnt on temporal or spatial sequences • We constructed training data for word2vec from search logs
  • 35. v Training Contexts from Search Logs • Create contexts that are observed and desirable <query1> <purchased description of converted product1> <query2> <purchased description of converted product2> .. .. • Learn embeddings for each word in the unified vocabulary of query tokens and catalog product descriptions
  • 36. v Examples of nearest neighbors in embeddings space OK, so now we have word representations Matching features require product features in the same space bread orowheat bonaparte zissel ryebread raisins sunmade monukka raisinettes fruitsource mushrooms cremini portabello shitake
  • 37. v Deriving other features from word embeddings • One option is to create contexts with product identifiers in training sequences • Promising but we will have a product cold start problem Use learnt word embeddings to derive features for other entities such as products, brand names, queries and users etc.
  • 38. v Simple averaging of word representations works well Product Average embeddings of words in product description Brands: Average embeddings of products sold by the brand User Average embeddings of products purchased by the user Aisles/Departments Average embeddings of products in the aisle/ department Word Representation Learnt from converted search logs Query Average embeddings of words in query
  • 39. v Wait, so we have 2 different representations for products? Embeddings learnt from purchase sequences Embeddings derived from words from search logs Products are similar if they bought together Products are similar if their descriptions are semantically similar
  • 40. v Product representations in different embedding spaces Embeddings learnt from purchase sequences Products are similar if they bought together
  • 41. v Product representations in different embedding spaces Embeddings derived from words from search logs Products are similar if their descriptions are semantically similar
  • 42. v Examples of nearest neighbors for products Cinnamon Toast Crunch CerealGolden Grahams Cereal Not much common between the product names
  • 43. v Examples of nearest neighbors for brands
  • 44. v Using word2vec features in search ranking [query, product] [query, brand] [query, aisle] [query, department] [user, product] [user, brand] [user, aisle] [user, department] • Construct similarity scores between different entities as features Matching Features Personalization Features
  • 45. v We saw significant improvement with word2vec features 98.0% 99.0% 100.0% 101.0% 102.0% 103.0% 104.0% AUC Recall@10 Relative(Improvement(with(word2vec(features Baseline With:word2vec:features
  • 46. v word2vec features rank high among top features [query,(aisle](word2vec [query,(product](historical(aggregate [query,(department](word2vec [query,(product](historical(aggregate [user,(product](word2vec lda(model(2 lda(model(1( query(length [user,(brand](word2vec [query,(product](word2vec bm25 position XGBoost logistic loss with early stopping
  • 48. v Broad based discovery oriented recommendations Including from stores customers may have never shopped from
  • 49. v Run out of X? Rank products by repurchase probability
  • 50. v Introduce customers to products new in the catalog Also mitigates product cold start problems
  • 51. v Replacement Product Recommendations Mitigate adverse impact of last-minute out of stocks
  • 52. v Data Products in Search and Discovery • query autocorrection • query spell correction • query expansion • deep matching/document expansion • search ranking • search advertising • Substitute/replacements products • Frequently bought with products • Next basket recommendations • Guided discovery • Interpretable recommendations Search Discovery
  • 53. v Thank you! We are hiring! Senior Machine Learning Engineer http://bit.ly/2kzHpcg