SlideShare a Scribd company logo
Fast, Lenient, and Accurate
Building Personalized Instant Search Experience at LinkedIn
Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha
LinkedIn
Agenda
● LinkedIn
● LinkedIn Search
○ Navigational vs Exploratory searches
○ Typeahead vs SERP
● Big picture and problem statement
● Instant search – Search-as-you-type
○ Query autocomplete
○ Entity-aware suggestions
○ Instant results
● Conclusions & Future work
LinkedIn – Professional Identity
LinkedIn – Professional Graph
LinkedIn – Jobs
LinkedIn – And much more...
Companies
Skills
Professional Content
LinkedIn – Massive Scale
LinkedIn Search
Navigational Search
Looking for someone specific
by name.
Query has a single correct
result.
Exploratory Search
Finding people that match a
given set of criteria.
Multiple results match the
user’s query.
Instant Search – Search-as-you-type
Satisfy navigational searches:
Show instant search results.
Help frame exploratory searches:
Complete the user’s query and show search suggestions.
Big Picture
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manually
entered
query
Big Picture
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manually
entered
query
Focus today:
● Autocomplete
● Search suggestions
● Instant results
Problem Statement
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manually
entered
query
Focus today:
● Autocomplete
● Search suggestions
● Instant results
How can we build an instant search experience that scales to 450+ million members, and is
fast, lenient, and accurate?
● Instant search = Query autocomplete + search suggestions + instant results
● Fast = Search-as-you-type latencies
● Lenient = Handle spelling errors and common variations
● Accurate = Highly relevant and personalized results
Query Tagging
PERSON
TITLE
(ID=126)
COMPANY
(ID=1337)
Entity types identified:
Person name, job title, company, school, skills, locations.
Key part of query processing!
Impacts: autocomplete, spelling correction, search suggestions,
query rewriting, ranking.
Sequential prediction model
(CRF – Conditional Random Fields)
Training data:
● Standardized dictionaries (people names,
companies, schools, titles, skills, locations)
● Query logs
● Clickthrough (CTR) data
● Crowdsourced labels
Query Autocomplete
● Fast
● Relevant and contextual
● Resilient to spelling errors
Query Autocomplete – Offline processing
linkedin software engineer
software engineer
big data
data scientist
data engineer
expert systems
.
.
[linkedin] [software engineer]
Query logs Entities Index
FST – Finite State Transducers
Compact + fast retrieval + fuzzy match (via Levenstein Automata)
Query Autocomplete – Online processing
Two step process:
1. Retrieval (Candidate generation)
User’s query: [big data e]
Candidates = C(big data e) U C(data e) U C(e)
= big data engineer,
big data expert systems,
big data entry,
...
linkedin software engineer
software engineer
big data
data scientist
data engineer
expert systems
.
.
Query logs
Query Autocomplete – Online processing
Two step process:
2. Scoring (Ranking)
User’s query: [big data e]
Candidate completions: “big data engineer”, “big data expert”, “big data entry”
Score(“big data engineer”):
P(s1
, s2
, s3
…) ≈ P(s1
)·P(s2
|s1
)·P(s3
|s2
).. // Bigram language model
Use entities : P([engineer] | [big data])
Fall back to words : P(engineer | data)·P(data | big)
Query Suggestions – Autocomplete + query tagger
“linke” ⇒ “Linkedin” ⇒ COMPANY
“had” ⇒ “Hadoop” ⇒ SKILL
Instant Results
● Fast retrieval over 450+ million members
● Highly personalized
● Balance personalization & popularity
● Resilient to spelling variations
Instant Results – Indexing
NAME: richard
PREFIX: r, ri, ric, rich, richa, ...
NAME: branson
PREFIX: b, br, bra, bran, brans, ...
● Inverted Index (Maps token to list of docs that contain that token):
NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard”
PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri”
…
● Retrieval approach
User’s query – richard b
Rewritten query – +NAME:richard +PREFIX:b
● Prefix-based tokenization:
DOCID 4
(posting lists)
Instant Results – Indexing
CONN: 1, 10, 15
● Inverted Index
CONN:4 => [1, 10, 15] // Everyone connected to Richard Branson
CONN:1 => [4, ...]
CONN:10 => [4, ...]
...
● Retrieval approach
User’s query – richard b
Rewritten query – +NAME:richard +PREFIX:b +CONN:1
(Everyone named richard b… and connected to User:1)
● Connections Index:
DOCID 4
Instant Results – Indexing
Early Termination
Problem: A query like [PREFIX:ri] might retrieve too many candidate documents.
How can we retrieve the most promising documents first so that we don’t need to score all of them?
Static Rank: Order documents based on their prior (query independent) likelihood of relevance:
A combination of:
● Profile views
● Spam and security related scores
● Editorial rules (Celebrities, influencers, …)
numToScore: The number of documents to retrieve and score for any query
Balancing Popularity and Personalization
Query: richard b…
Are you looking for Richard Branson, or a colleague name Richard Burton?
(Assume searcher’s ID = 1)
Rewritten Query:
● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections.
● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections
Instant Results – Retrieval
Custom search operator: “Weighted OR”
Instant Results – Spelling Variations
weiner ⇔ wiener
catherine ⇔ kathryn
dipak ⇔ deepak
Name Clusters
Offline process to cluster together similar sounding or similarly spelt names.
Two step process:
1. Coarse clustering (optimized for broad coverage)
Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f)
Combination of edit distance & double metaphone (sound)
E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff)
2. Fine-grained clustering (optimized for precision)
Split up clusters based on more sophisticated rules
Position and character-aware edit distance
Query reformulation data (q1
→ q2
→ click)
E.g. (jeff ≠ joff)
Instant Results – Spelling Variations
Instant Results – Spelling Variations
NAME: kathryn
CLUSTER: katharine
Potential queries:
katherine
kathryn
katharine
catharine
Rewritten queries:
?NAME:katherine ?CLUSTER:katharine
?NAME:kathryn ?CLUSTER:katharine
?NAME:katharine ?CLUSTER:katharine
?NAME:catharine ?CLUSTER:katharine
Either match original query term or match the name cluster
Query time
Indexing time
Clicked result treated as positive.
All other shown results treated as negative.
Since this is navigational search, we assume there’s
only 1 correct result => low presentation bias.
Learning to Rank (Machine-learned ranking)
Training data
● Click data from previous typeahead sessions
● <searcher, query, doc> ⇒ positive/negative
Features / signals
● Textual match against various fields
● Network distance, number of shared connections
● Global popularity
● Compound features
Instant Results – Scoring
+
–
–
–
Conclusions
● Instant search experience
○ Directly satisfy navigational search uses in typeahead via Instant Results
○ Help the user frame exploratory search queries via Query Autocomplete & Search
Suggestions
● Combination of techniques
○ Query tagger for entity extraction – “Things not Strings”
○ FST-based query completion
○ Inverted index-based instant results + Early termination + Weighted OR
○ Name clusters for fuzzy name matching
Future Work
● Personalized query completions
○ m ⇒ machine learning
○ m ⇒ machinist
● Multi-entity query suggestions
○ Now : [linkedin] ⇒ “Find people who work at LinkedIn”
○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn”
● Better blending
○ Autocomplete + query suggestions + instant results
○ Query features – what does the query mean?
○ Results features – what results come back from each system?
Thank You!
LinkedIn – The Economic Graph
LinkedIn Search – SERP (Jobs)
LinkedIn Search – Typeahead
LinkedIn Search – SERP

More Related Content

PPTX
Better Search Through Query Understanding
PDF
Query Understanding at LinkedIn [Talk at Facebook]
PPTX
Learn to Rank search results
PPTX
William slawski-google-patents- how-do-they-influence-search
PDF
6 Keys To Effective Content That Ranks High On SERPs Right Now
PDF
Find and be Found: Information Retrieval at LinkedIn
PPTX
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
PPTX
Semantic seo and the evolution of queries
Better Search Through Query Understanding
Query Understanding at LinkedIn [Talk at Facebook]
Learn to Rank search results
William slawski-google-patents- how-do-they-influence-search
6 Keys To Effective Content That Ranks High On SERPs Right Now
Find and be Found: Information Retrieval at LinkedIn
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Semantic seo and the evolution of queries

What's hot (20)

PPTX
X-Ray Searching - SourceBreaker
PDF
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
PDF
Data engineering in 10 years.pdf
PPTX
How to unlock the secrets of effortless keyword research with ChatGPT.pptx
PDF
ESSIR 2013 Recommender Systems tutorial
PDF
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
PPTX
Top 10 Free SEO Tools
PDF
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
PDF
PoolParty Semantic Classifier
PPTX
Google Tag Manager | Google Tag Manager Tutorial 2019 | Google Tag Manager Se...
PDF
Search Engine Optimization - What's it about?
PDF
Learning to Rank - From pairwise approach to listwise
PDF
Learning to rank
PPTX
Prompting is an art / Sztuka promptowania
PPTX
How Search Works
PDF
Recent Trends in Personalization: A Netflix Perspective
PPT
Basic SEO Lecture Presentation
PPTX
Recommending What Video to Watch Next: A Multitask Ranking System
PDF
MLOps at OLX
PDF
Generative AI: The New Wild West of SEO - Ryan Huser, Ayima
X-Ray Searching - SourceBreaker
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Data engineering in 10 years.pdf
How to unlock the secrets of effortless keyword research with ChatGPT.pptx
ESSIR 2013 Recommender Systems tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Top 10 Free SEO Tools
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
PoolParty Semantic Classifier
Google Tag Manager | Google Tag Manager Tutorial 2019 | Google Tag Manager Se...
Search Engine Optimization - What's it about?
Learning to Rank - From pairwise approach to listwise
Learning to rank
Prompting is an art / Sztuka promptowania
How Search Works
Recent Trends in Personalization: A Netflix Perspective
Basic SEO Lecture Presentation
Recommending What Video to Watch Next: A Multitask Ranking System
MLOps at OLX
Generative AI: The New Wild West of SEO - Ryan Huser, Ayima
Ad

Similar to Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn (20)

PDF
Instant search - A hands-on tutorial
PDF
Disrupting Data Discovery
PDF
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
PDF
Reflected intelligence evolving self-learning data systems
PPTX
Next generation linked in talent search
PDF
AI, Search, and the Disruption of Knowledge Management
PPTX
PDF
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
PDF
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
PDF
SDSC18 and DSATL Meetup March 2018
PPTX
Data council sf amundsen presentation
PPTX
Personalizing Search at LinkedIn
PDF
Amundsen: From discovering to security data
PPTX
Fairness, Transparency, and Privacy in AI @ LinkedIn
PDF
Crowdsourcing Linked Data Quality Assessment
PPTX
Understanding Queries through Entities
PPTX
Deep natural language processing in search systems
PPTX
How Lyft Drives Data Discovery
PDF
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
PDF
Análisis de las novedades del Elastic Stack
Instant search - A hands-on tutorial
Disrupting Data Discovery
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Reflected intelligence evolving self-learning data systems
Next generation linked in talent search
AI, Search, and the Disruption of Knowledge Management
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
SDSC18 and DSATL Meetup March 2018
Data council sf amundsen presentation
Personalizing Search at LinkedIn
Amundsen: From discovering to security data
Fairness, Transparency, and Privacy in AI @ LinkedIn
Crowdsourcing Linked Data Quality Assessment
Understanding Queries through Entities
Deep natural language processing in search systems
How Lyft Drives Data Discovery
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Análisis de las novedades del Elastic Stack
Ad

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
sap open course for s4hana steps from ECC to s4
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars

Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

  • 1. Fast, Lenient, and Accurate Building Personalized Instant Search Experience at LinkedIn Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha LinkedIn
  • 2. Agenda ● LinkedIn ● LinkedIn Search ○ Navigational vs Exploratory searches ○ Typeahead vs SERP ● Big picture and problem statement ● Instant search – Search-as-you-type ○ Query autocomplete ○ Entity-aware suggestions ○ Instant results ● Conclusions & Future work
  • 6. LinkedIn – And much more... Companies Skills Professional Content
  • 9. Navigational Search Looking for someone specific by name. Query has a single correct result.
  • 10. Exploratory Search Finding people that match a given set of criteria. Multiple results match the user’s query.
  • 11. Instant Search – Search-as-you-type Satisfy navigational searches: Show instant search results. Help frame exploratory searches: Complete the user’s query and show search suggestions.
  • 12. Big Picture Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query
  • 13. Big Picture Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query Focus today: ● Autocomplete ● Search suggestions ● Instant results
  • 14. Problem Statement Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query Focus today: ● Autocomplete ● Search suggestions ● Instant results How can we build an instant search experience that scales to 450+ million members, and is fast, lenient, and accurate? ● Instant search = Query autocomplete + search suggestions + instant results ● Fast = Search-as-you-type latencies ● Lenient = Handle spelling errors and common variations ● Accurate = Highly relevant and personalized results
  • 15. Query Tagging PERSON TITLE (ID=126) COMPANY (ID=1337) Entity types identified: Person name, job title, company, school, skills, locations. Key part of query processing! Impacts: autocomplete, spelling correction, search suggestions, query rewriting, ranking. Sequential prediction model (CRF – Conditional Random Fields) Training data: ● Standardized dictionaries (people names, companies, schools, titles, skills, locations) ● Query logs ● Clickthrough (CTR) data ● Crowdsourced labels
  • 16. Query Autocomplete ● Fast ● Relevant and contextual ● Resilient to spelling errors
  • 17. Query Autocomplete – Offline processing linkedin software engineer software engineer big data data scientist data engineer expert systems . . [linkedin] [software engineer] Query logs Entities Index FST – Finite State Transducers Compact + fast retrieval + fuzzy match (via Levenstein Automata)
  • 18. Query Autocomplete – Online processing Two step process: 1. Retrieval (Candidate generation) User’s query: [big data e] Candidates = C(big data e) U C(data e) U C(e) = big data engineer, big data expert systems, big data entry, ... linkedin software engineer software engineer big data data scientist data engineer expert systems . . Query logs
  • 19. Query Autocomplete – Online processing Two step process: 2. Scoring (Ranking) User’s query: [big data e] Candidate completions: “big data engineer”, “big data expert”, “big data entry” Score(“big data engineer”): P(s1 , s2 , s3 …) ≈ P(s1 )·P(s2 |s1 )·P(s3 |s2 ).. // Bigram language model Use entities : P([engineer] | [big data]) Fall back to words : P(engineer | data)·P(data | big)
  • 20. Query Suggestions – Autocomplete + query tagger “linke” ⇒ “Linkedin” ⇒ COMPANY “had” ⇒ “Hadoop” ⇒ SKILL
  • 21. Instant Results ● Fast retrieval over 450+ million members ● Highly personalized ● Balance personalization & popularity ● Resilient to spelling variations
  • 22. Instant Results – Indexing NAME: richard PREFIX: r, ri, ric, rich, richa, ... NAME: branson PREFIX: b, br, bra, bran, brans, ... ● Inverted Index (Maps token to list of docs that contain that token): NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard” PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri” … ● Retrieval approach User’s query – richard b Rewritten query – +NAME:richard +PREFIX:b ● Prefix-based tokenization: DOCID 4 (posting lists)
  • 23. Instant Results – Indexing CONN: 1, 10, 15 ● Inverted Index CONN:4 => [1, 10, 15] // Everyone connected to Richard Branson CONN:1 => [4, ...] CONN:10 => [4, ...] ... ● Retrieval approach User’s query – richard b Rewritten query – +NAME:richard +PREFIX:b +CONN:1 (Everyone named richard b… and connected to User:1) ● Connections Index: DOCID 4
  • 24. Instant Results – Indexing Early Termination Problem: A query like [PREFIX:ri] might retrieve too many candidate documents. How can we retrieve the most promising documents first so that we don’t need to score all of them? Static Rank: Order documents based on their prior (query independent) likelihood of relevance: A combination of: ● Profile views ● Spam and security related scores ● Editorial rules (Celebrities, influencers, …) numToScore: The number of documents to retrieve and score for any query
  • 25. Balancing Popularity and Personalization Query: richard b… Are you looking for Richard Branson, or a colleague name Richard Burton? (Assume searcher’s ID = 1) Rewritten Query: ● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections. ● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections Instant Results – Retrieval Custom search operator: “Weighted OR”
  • 26. Instant Results – Spelling Variations weiner ⇔ wiener catherine ⇔ kathryn dipak ⇔ deepak
  • 27. Name Clusters Offline process to cluster together similar sounding or similarly spelt names. Two step process: 1. Coarse clustering (optimized for broad coverage) Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f) Combination of edit distance & double metaphone (sound) E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff) 2. Fine-grained clustering (optimized for precision) Split up clusters based on more sophisticated rules Position and character-aware edit distance Query reformulation data (q1 → q2 → click) E.g. (jeff ≠ joff) Instant Results – Spelling Variations
  • 28. Instant Results – Spelling Variations NAME: kathryn CLUSTER: katharine Potential queries: katherine kathryn katharine catharine Rewritten queries: ?NAME:katherine ?CLUSTER:katharine ?NAME:kathryn ?CLUSTER:katharine ?NAME:katharine ?CLUSTER:katharine ?NAME:catharine ?CLUSTER:katharine Either match original query term or match the name cluster Query time Indexing time
  • 29. Clicked result treated as positive. All other shown results treated as negative. Since this is navigational search, we assume there’s only 1 correct result => low presentation bias. Learning to Rank (Machine-learned ranking) Training data ● Click data from previous typeahead sessions ● <searcher, query, doc> ⇒ positive/negative Features / signals ● Textual match against various fields ● Network distance, number of shared connections ● Global popularity ● Compound features Instant Results – Scoring + – – –
  • 30. Conclusions ● Instant search experience ○ Directly satisfy navigational search uses in typeahead via Instant Results ○ Help the user frame exploratory search queries via Query Autocomplete & Search Suggestions ● Combination of techniques ○ Query tagger for entity extraction – “Things not Strings” ○ FST-based query completion ○ Inverted index-based instant results + Early termination + Weighted OR ○ Name clusters for fuzzy name matching
  • 31. Future Work ● Personalized query completions ○ m ⇒ machine learning ○ m ⇒ machinist ● Multi-entity query suggestions ○ Now : [linkedin] ⇒ “Find people who work at LinkedIn” ○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn” ● Better blending ○ Autocomplete + query suggestions + instant results ○ Query features – what does the query mean? ○ Results features – what results come back from each system?
  • 33. LinkedIn – The Economic Graph
  • 34. LinkedIn Search – SERP (Jobs)
  • 35. LinkedIn Search – Typeahead