Learned Embeddings for Search and Discovery at Instacart

Learned Embeddings for
Search and Discovery at Instacart
Sharath Rao
Data Scientist / Manager
Search and Discovery
Collaborators: Angadh Singh

v
Talk Outline
• Quick overview of Search and Discovery at Instacart
• Word2vec for product recommendations
• Extending Word2vec embeddings to improve search ranking

v
About me
• Worked on several ML products in catalog, search and
personalization at Instacart since 2015
• Currently leading full stack search and discovery team
@sharathrao

v
Search and Discovery@ Instacart
Help customers find what they are looking for
and discover what they might like

v
Grocery Shopping in “Low Dimensional Space”
Search
Restock
Discover

v
“Here is what I want”
Query has easily accessible
information content
“Here I am, what next?”
Context is the query, and
recommendations vary
as contexts do
Search Discovery

v
Entities we try to model
items
products
aisles
departments
retailers
queries
customers
brands
Most of our data products are
about modeling relationships
between them
(part of which is learning embeddings)

v
Common paradigm for search/discovery problems
1st phase: Candidate Generation
2st phase: Reranking

v
• Select top 100s from among potentially millions
• Must be fast and simple
• often not even a learned model
• Recall oriented
Candidate Generation

v
• Ranks fewer products, but users richer models/features
• Tuned towards high precision
• Often happens in real-time
Reranking

vWord2vec for Product Recommendations

v
“Frequently bought with” recommendations
Not necessarily
consumed together
Help customers shop for the next item
Probably
consumed together

v
Quick Tour of Word2vec
(simplified, with a view to develop some intuition)

v
• We need to represent words as features in a model/task
• Naive representation => one hot encoding
• Problem: Every word is dissimilar to every other word => not intuitive
One-hot encodings are good, but not good enough

v
• stadium, field and restaurant are in
some sense similar
• stadium and field are more similar
than stadium and restaurant
• game has another meaning/sense that
is similar to stadium/field/restaurant
What one-hot encodings fail to capture
Observation 1: “game at the stadium”
Observation 2: “game at the field”
Observation 3: “ate at the restaurant”
Observation 4: “met at the game”
Data What models must learn
None of these are easy to learn with one-hot encoded representations without
supplementing with hand-engineered features

vYou shall know a word by the company it keeps - Firth, J.R., 1957
Core motivation behind semantic word representations

v
• Learns a feature vector per word
• for 1M vocabulary and 100 dimensional vector, we learn 1M vectors => 100M numbers
• Vectors must be such that words appearing in similar contexts are closer than any random pair of words
• Must work with unlabeled training data
• Must scale to large (unlabeled) datasets
• Embedding space itself must be general enough that word representations are broadly applicable
Word2vec is a scalable way to learn semantic word representations

vWord2vec beyond text and words

v
On graphs where random walks are the sequences

v
On songs where plays are the sequences

v
Even the emojis weren’t spared!

v
You shall know a
word product
by the company it keeps by what its purchased with

v
We applied word2vec to purchase sequences
Typical purchasing session contains tens of cart adds
Sequences of products added to cart are the ‘sentences’

v
“Frequently Bought With” Recommendations
Extract
Training
Sequences
Learn
word2vec
representations
Eliminate
substitute
products
Event Data
Approximate
Nearest Neighbors
Cache
recommendations

v
Word2vec for product recommendations
Surfaces
complementary
products

v
Next step is to make recommendations contextual
Not ideal if
already shopped
for sugar recently
inappropriate if
allergic to walnuts
Not if favorite brand
of butter isn’t
otherwise popular

v
We see word2vec recommendations as a candidate generation step

v
The Search Ranking Problem
In response to a query, rank products in the order that
maximizes the probability of purchase

v
The Search Ranking Problem
Learning to
Rank Model
Candidate Generation
Boolean matching
BM25 ranking
Synonym/Semantic Expansion
(Online) Reranking
Matching features
Historical aggregates
Personalization features

v
Search Ranking - Training
Generate
training
features
Label Generation
with Implicit
Feedback
Event Data
Learning to Rank
Training
Model
Repository

v
Search Ranking - Scoring
Online reranker
query
Model
Repository
Reranked
products
Top N
products
Top N
productsprocessed query
Final
ranking

v
Features for Learning to Rank
Historical aggregates => normalized purchase counts
High coverage
Low precision
Low coverage
More precision
[query, product]
[query, product, user]
[query, brand]
[query, aisle]
[product]
But historical aggregates alone are not enough.
Because sparsity and cold start.
[user, brand]

v
Learning word2vec embeddings from search logs
• Representations learnt from wikipedia or petabyte text aren’t ideal for Instacart
• No constraint that word2vec models must be learnt on temporal or spatial sequences
• We constructed training data for word2vec from search logs

v
Training Contexts from Search Logs
• Create contexts that are observed and desirable
<query1> <purchased description of converted product1>
<query2> <purchased description of converted product2>
..
..
• Learn embeddings for each word in the unified vocabulary of query
tokens and catalog product descriptions

v
Examples of nearest neighbors in embeddings space
OK, so now we have word representations
Matching features require product features in the same space
bread
orowheat
bonaparte
zissel
ryebread
raisins
sunmade
monukka
raisinettes
fruitsource
mushrooms
cremini
portabello
shitake

v
Deriving other features from word embeddings
• One option is to create contexts with product identifiers in training sequences
• Promising but we will have a product cold start problem
Use learnt word embeddings to derive features for other
entities such as products, brand names, queries and users etc.

v
Simple averaging of word representations works well
Product
Average embeddings
of words in product
description
Brands:
Average embeddings of
products sold by the
brand
User
products purchased by
the user
Aisles/Departments
products in the aisle/
department
Word
Representation
Learnt from converted
search logs
Query
Average embeddings
of words in query

v
Wait, so we have 2 different representations for products?
Embeddings learnt from
purchase sequences
Embeddings derived from words
from search logs
Products are similar if
they bought together
their descriptions are
semantically similar

v
Product representations in different embedding spaces
Embeddings learnt from
purchase sequences
they bought together

v
Product representations in different embedding spaces
Embeddings derived from words
from search logs
their descriptions are
semantically similar

v
Examples of nearest neighbors for products
Cinnamon Toast Crunch CerealGolden Grahams Cereal
Not much common between the product names

v
Examples of nearest neighbors for brands

v
Using word2vec features in search ranking
[query, product]
[query, brand]
[query, aisle]
[query, department]
[user, product]
[user, brand]
[user, aisle]
[user, department]
• Construct similarity scores between different entities as features
Matching Features
Personalization
Features

v
We saw significant improvement with word2vec features
98.0%
99.0%
100.0%
101.0%
102.0%
103.0%
104.0%
AUC Recall@10
Relative(Improvement(with(word2vec(features
Baseline With:word2vec:features

v
word2vec features rank high among top features
[query,(aisle](word2vec
[query,(product](historical(aggregate
[query,(department](word2vec
[query,(product](historical(aggregate
[user,(product](word2vec
lda(model(2
lda(model(1(
query(length
[user,(brand](word2vec
[query,(product](word2vec
bm25
position
XGBoost
logistic loss
with early stopping

vOther contextual recommendation problems

v
Broad based discovery oriented recommendations
Including from stores customers may have never shopped from

v
Run out of X?
Rank products by
repurchase probability

v
Introduce customers to products new in the catalog
Also mitigates product cold start problems

v
Replacement Product Recommendations
Mitigate adverse impact of
last-minute out of stocks

v
Data Products in Search and Discovery
• query autocorrection
• query spell correction
• query expansion
• deep matching/document expansion
• search ranking
• search advertising
• Substitute/replacements products
• Frequently bought with products
• Next basket recommendations
• Guided discovery
• Interpretable recommendations
Search Discovery

v
Thank you!
We are
hiring!
Senior Machine Learning Engineer http://bit.ly/2kzHpcg

Learned Embeddings for Search and Discovery at Instacart

More Related Content

What's hot (20)

Similar to Learned Embeddings for Search and Discovery at Instacart (20)

Recently uploaded (20)

Learned Embeddings for Search and Discovery at Instacart