SlideShare a Scribd company logo
Trey Grainger
Chief Algorithms Officer
Balancing the Dimensions of User Intent
October 28, 2019
Trey Grainger
Chief Algorithms Officer
• Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder
• Georgia Tech – MBA, Management of Technology
• Furman University – BA, Computer Science, Business, & Philosophy
• Stanford University – Information Retrieval & Web Search
Other fun projects:
• Co-author of Solr in Action, plus numerous research publications
• Advisor to Presearch, the decentralized search engine
• Lucene / Solr contributor
About Me
• About Lucidworks
• What is AI-powered Search?
• The Dimensions of User Intent
• Content Understanding:
• Keyword Search
• User Understanding:
• Collaborative Recommendations
• Content Understanding + User Understanding:
• Personalized Search
• Domain Understanding:
• Knowledge Graphs
• Domain Understanding + User Understanding:
• Domain-aware Matching
• Content Understanding + Domain Understanding:
• Semantic Search
• Balancing Approaches:
• Keyword vs. Vector vs. Knowledge Graph Search
• Vector Search
• Knowledge Graph Search
• Combining it all together
Agenda
Who are we?
300+ CUSTOMERS ACROSS THE
FORTUNE 1000
400+EMPLOYEES
OFFICES IN
San Francisco, CA (HQ)
Raleigh-Durham, NC
Cambridge, UK
Bangalore, India
Hong Kong
The Search & AI Conference
COMPANY BEHIND
D E V E L O P M E N T,
H O S T I N G ,
& S U P P O R T
Proudly built with open-source
tech at its core: Apache Solr &
Apache Spark
Personalizes search
with applied
machine learning
Proven on the
world’s biggest
information systems
AI-Powered Search
What is
?
http://aiPoweredSearch.com
... is my new book!
(Haystack discount code: ctwhay19)
Balancing the Dimensions of User Intent
AI-powered Search
AI-powered Search
Question / Answer
Systems
Virtual Assistants
• Signals Boosting Models
• Learning to Rank
• Semantic Search
• Collaborative Filtering
• Personalized Search
• Content Clustering
• NLP / Entity Resolution
• Semantic Knowledge Graphs
• Document Classification
• etc.
• Neural Search
• Word Embeddings
• Vector Search
• Image / Voice Search
• etc.
• Question / Answer Systems
• Virtual Assistants
• Chatbots
• Rules-based Relevancy
• etc.
We have a big toolbox - great!
But how do we properly apply
those tools?
Dimensions of User Intent
Content
Understanding
Domain
Understanding
User
Understanding
User Intent
Keyword
Search
Dimensions of User Intent
Content
Understanding
Domain
Understanding
User
Understanding
User Intent
/solr/collection/select/?q=apache solr
Term Documents
… …
apache
doc1, doc3, doc4,
doc5
…
lucene doc2, doc4, doc6
… …
solr
doc1, doc3, doc4,
doc7, doc8
… …
doc5
doc7 doc8
doc1 doc3
doc4
solr
apache
apache solr
Matching queries to documents
BM25 (Relevance Scoring between Query and Documents)
Score(q, d) =
∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl )
t in q
Where:
t = term; d = document; q = query; i = index
tf(t in d) = numTermOccurrencesInDocument ½
idf(t) = 1 + log (numDocs / (docFreq + 1))
|d| = ∑ 1
t in d
avgdl = = ( ∑ |d| ) / ( ∑ 1 ) )
d in i d in i
k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point.
b = Free parameter. Usually ~0.75. Increases impact of document normalization.
ipad
Keyword
Search
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
User Intent
Balancing the Dimensions of User Intent
Collaborative Filtering (Recommendations)
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonzo ipad doc10,
doc22,
doc12, …
Elena printer doc84,
doc2,
doc17, …
Ming ipad doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc12
Elena click doc2
… … …
User Item Weight
Alonzo doc22 1.0
Alonzo doc12 0.4
… … …
Ming doc12 0.9
Ming doc22 0.6
… … …
ipad ⌕
Matrix Factorization
Recommendations for Alonzo:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Recommendations (User-Item, Item-Item, Query-Item)
User Item Weight
Alonzo doc22 1.0
Alonzo doc12 0.4
… … …
Ming doc12 0.9
Ming doc22 0.6
… … …
Recommendations for Alonzo:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Item Item Weight
doc22 doc22 1.0
doc22 doc12 0.85
… … …
doc12 doc12 1.0
doc12 doc22 0.83
… … …
Query Item Weight
ipad doc22 0.98
ipad doc12 0.6
… … …
kindle doc12 0.96
apple doc22 0.90
… … …
Recommendations for Doc22:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Recommendations for “ipad”:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Matrix Factorization
ipad
Balancing the Dimensions of User Intent
ipad
Keyword
Search
Knowledge Graph
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
User Intent
What is a Knowledge Graph?
(vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
Overly Simplistic Definitions
Alternative Labels: Substitute words with identical meanings
[ CTO => Chief Technology Officer; specialise => specialize ]
Synonyms List: Provides substitute words that can be used to represent
the same or very similar things
[ human => homo sapien, mankind; food => sustenance, meal ]
Taxonomy: Classifies things into Categories
[ john is Human; Human is Mammal; Mammal is Animal ]
Ontology: Defines relationships between types of things
[ animal eats food; human is animal ]
Knowledge Graph: Instantiation of an
Ontology (contains the things that are related)
[ john is human; john eats food ]
A Knowledge Graph subsumes the other types.
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
Keyword Search
(Completely User-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Keyword Search
(Completely User-specified)
User-guided
Recommendations
(Mostly driven by user profile,
partially user-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Keyword Search
(Completely User-specified)
Personalized
Queries
(Mostly user-specified,
partially driven by user profile)
Personalized
Queries
(Mostly user-specified,
partially driven by user profile)
Keyword Search
(Completely User-specified)
User-guided
Recommendations
(Mostly driven by user profile,
partially user-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Personalized Search
Personalization
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Regular Search Results:
Personalized Search Results:
User:
Nice - personalization is awesome!
Let’s roll it out everywhere!
Ugh…
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Domain-aware
Matching
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Personas / User Profiles
(User attributes and preferences in
knowledge graph)
Multimodal Recommendations
(Recommendations combining
collaborative filtering plus user-based
profile attribute matching/ranking)
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Personas / User Profiles
(User attributes and preferences in
knowledge graph)
Multimodal Recommendations
(Recommendations combining
collaborative filtering plus user-based
profile attribute matching/ranking)
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Domain-aware Matching
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
http://localhost:8983/solr/jobs/select/?
fl=jobtitle,city,state,salary&
q=(
jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10
)
AND (
(city:"Boston" AND state:"MA")^15
OR state:"MA")
AND _val_:"map(salary, 40000, 60000,10, 0)"
AND similar_users:{!terms}u99,u1,u50,u2311,u253,u70,u99
*Example derived from chapter 16 of Solr in Action
Multimodal Recommendations
Jane is a nurse educator in Boston seeking between $40K and $60K
She has interacted with the same content as the following users:
u99,u1,u50,u2311,u253,u70,u99
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Semantic
Search
Domain-aware
Matching
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
Keyword Search
(Finding and
Ranking Keyword)
Knowledge Graph
(Understanding conceptual and
logical relationships between
domain-specific entities)
Language Understanding
(Understanding syntax
and query structure)
Keyword Search
(Finding and
Ranking Keyword)
Terminology Understanding
(Understanding domain-specific
terms and conceptual meaning)
Knowledge Graph
(Understanding conceptual and
logical relationships between
domain-specific entities)
Language Understanding
(Understanding syntax
and query structure)
Terminology Understanding
(Understanding domain-specific
terms and conceptual meaning)
Keyword Search
(Finding and
Ranking Keyword)
Knowledge Graph
(Understanding conceptual and
logical relationships between
domain-specific entities)
Semantic Search
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Sentence Embeddings:
[ 2, 3, 2, 4, 2, 1, 5, 3 ]
[ 5, 3, 2, 3, 4, 0, 3, 4 ]
. . .
Document Embedding:
[ 4, 1, 4, 2, 1, 2, 4, 3 ]
Word Embeddings:
[ 5, 1, 3, 4, 2, 1, 5, 3 ]
[ 4, 1, 3, 0, 1, 1, 4, 2 ]
. . .
Paragraph Embeddings:
[ 5, 1, 4, 1, 0, 2, 4, 0 ]
[ 1, 1, 4, 2, 1, 0, 0, 0 ]
. . .
Thought Vectors
apple caffeine cheese coffee drink donut food juice pizza tea water … term N
cappuccino 0 0 0 0 0 0 0 0 0 0 0 ...
apple 1 0 0 0 0 0 0 0 0 0 0 ...
juice 0 0 0 0 0 0 0 1 0 0 0 ...
cheese 0 0 1 0 0 0 0 0 0 0 0 ...
pizza 0 0 0 0 0 0 0 0 1 0 0 ...
donut 0 0 0 0 0 1 0 0 0 0 0 ...
green 0 0 0 0 0 0 0 0 0 0 0 ...
tea 0 0 0 0 0 0 0 0 0 1 0 ...
bread 0 0 1 0 0 0 0 0 0 0 0 ...
sticks 0 0 0 0 0 0 0 0 0 0 0 ...
exact term lookup in inverted indexquery
Single Term Searches (as a Vector)
Combined Vector
query
Multi-term Query Vectors
juice 0 0 0 0 0 0 0 1 0 0 0 ...
apple 1 0 0 0 0 0 0 0 0 0 0 ...
+
apple juice 1 0 0 0 0 0 0 1 0 0 0 ...
apple caffeine cheese coffee drink donut food juice pizza tea water … term N
latte 0 0 0 0 0 0 0 0 0 0 0 ...
cappuccino 0 0 0 0 0 0 0 0 0 0 0 ...
apple juice 1 0 0 0 0 0 0 1 0 0 0 ...
cheese pizza 0 0 1 0 0 0 0 0 1 0 0 ...
donut 0 0 0 0 0 1 0 0 0 0 0 ...
soda 0 0 0 0 0 0 0 0 0 0 0 ...
green tea 0 0 0 0 0 0 0 0 0 1 0 ...
water 0 0 0 0 0 0 0 0 0 0 1 ...
cheese bread
sticks
0 0 1 0 0 0 0 0 0 0 0 ...
cinnamon sticks 0 0 0 0 0 0 0 0 0 0 0 ...
exact term lookup in inverted indexquery
Multi-term Searches
food drink dairy bread caffeine sweet calories healthy
apple juice 0 5 0 0 0 4 4 3
cappuccino 0 5 3 0 4 1 2 3
cheese bread
sticks
5 0 4 5 0 1 4 2
cheese pizza 5 0 4 4 0 1 5 2
cinnamon
bread sticks
5 0 1 5 0 3 4 2
donut 5 0 1 5 0 4 5 1
green tea 0 5 0 0 2 1 1 5
latte 0 5 4 0 4 1 3 3
soda 0 5 0 0 3 5 5 0
water 0 5 0 0 0 0 0 5
Dimensionality Reduction
Phrase: Vector:
apple juice: [ 0, 5, 0, 0, 0, 4, 4, 3 ]
cappuccino: [ 0, 5, 3, 0, 4, 1, 2, 3 ]
cheese bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ]
cheese pizza: [ 5, 0, 4, 4, 0, 1, 5, 2 ]
cinnamon bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ]
donut: [ 5, 0, 1, 5, 0, 4, 5, 1 ]
green tea: [ 0, 5, 0, 0, 2, 1, 1, 5 ]
latte: [ 0, 5, 4, 0, 4, 1, 3, 3 ]
soda: [ 0, 5, 0, 0, 3, 5, 5, 0 ]
water: [ 0, 5, 0, 0, 0, 0, 0, 5 ]
Ranked Results: Green Tea
0.94 water
0.85 cappuccino
0.80 latte
0.78 apple juice
0.60 soda
… …
0.19 donut
Vector Similarity Scores:
Vector Similarity (a, b):
cos(θ) = a · b
|a| × |b|
Ranked Results: Cheese Pizza
0.99 cheese bread sticks
0.91 cinnamon bread sticks
0.89 donut
0.47 latte
0.46 apple juice
… …
0.19 water
Vector Similarity Scoring
Vector Similarity Scores:
Performance Considerations
Problem: Vector Scoring is Slow
• Unlike keyword search, which looks up pre-indexed answers to queries, Vector Search must instead calculate
similarities between the query vector and every document’s vectors to determine best matches, which is
slow at scale.
Solution: Quantized Vectors
• “Quantization” is the process for mapping vectors features to discrete values.
• Creating “tokens” which map to a similar vector space, enables matching on those tokens to perform an ANN
(Approximate Nearest Neighbor) search
• This enables converting vector scoring into a search problem (term lookup and scoring), which is fast again,
at the expense of some recall and scoring accuracy
Recommended Approach: Quantized Vector Search + Vector Similarity Reranking
• Combine the best of both worlds by running an initial ANN search on a quantized vector representation, and
then re-rank the top-N results using full Vector similarity scoring.
Solr Implementation Options
Option 1: Streaming Expressions
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/food/update?commit=true 
--data-binary ' [
{"id": "1", "name_s":"donut", "vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
{"id": "2", "name_s":"apple juice",
"vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
{"id": "3", "name_s":"cappuccino",
"vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
{"id": "4", "name_s":"cheese pizza",
"vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
{"id": "5", "name_s":"green tea",
"vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
{"id": "6", "name_s":"latte", "vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
{"id": "7", "name_s":"soda", "vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
{"id": "8", "name_s":"cheese bread sticks",
"vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
{"id": "9", "name_s":"water", "vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
{"id": "10", "name_s":"cinnamon bread sticks",
"vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
] '
Send Documents to Solr:
Streaming Expressions
8983
Option 2:
Streaming Expressions Query Parser
http://localhost:8983/solr/food/select?q=*:*&fl=id,name_s&
fq={!streaming_expression}top(
select(
search(food, q="*:*", fl="id,vector_fs", sort="id asc"),
cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id),
n=5, sort="cos desc”
)
{ "responseHeader":{
… },
"response":{"numFound":5,"start":0,"docs":[
{ "name_s":"donut", "id":"1"},
{ "name_s":"apple juice", "id":"2"},
{ "name_s":"cheese pizza", "id":"4"},
{ "name_s":"cheese bread sticks", "id":"8"},
{ "name_s":"cinnamon bread sticks", "id":"10"}]
}}
Request:
Response:
Streaming Expressions Query Parser
Option 3:
Solr Vector Scoring Plugin
Send Documents to Solr:
curl -X POST -H "Content-Type: application/json"
http://localhost:8983/solr/{your-collection-name}/update?commit=true --
data-binary ‘
[
{"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33"},
{"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25"},
{"name":"example 2", "vector":"0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01"},
{"name":"example 3", "vector":"0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9"},
{"name":"example 4", "vector":"0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1"},
{"name":"example 5", "vector":"0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09"}
]'
Solr Vector Scoring Plugin
Request:
Response:
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="0.1,4.75,0.3,1.2,0.7,4.0”
}
{ "responseHeader":{ "status":0, "QTime":1}},
"response":{ "numFound":6,"start":0,"maxScore":0.99984086,
"docs":[
{ "name":["example 3"], "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "],
"score":0.99984086},
{ "name":["example 0"], "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "], "score":0.7693964},
{ "name":["example 5"], "vector":["0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "], "score":0.76322395},
{ "name":["example 4"], "vector":["0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "], "score":0.5328145},
{ "name":["example 1"], "vector":["0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "], "score":0.48513117},
{ "name":["example 2"], "vector":["0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "], "score":0.44909418}]
}}
Solr Vector Scoring Plugin
Option 4:
Solr Vector Scoring + LSH Plugin
Send Documents to Solr:
Solr Vector Scoring + LSH Plugin
curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-
name}/update?update.chain=LSH&commit=true --data-binary ‘
[
{"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"},
{"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"}
]'
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true"
reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_
Request:
Response:
Solr Vector Scoring + LSH Plugin
{
"responseHeader":{ "status":0, "QTime":8, "response":{"numFound":1,"start":0,"maxScore":36.65736,
"docs":[
{ "id": "1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33",
"_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==",
"_lsh_hash_":["0_8", "1_35", "2_7", "3_10", "4_2", "5_35", "6_16", "7_30", "8_27", "9_12", "10_7",
"11_32", "12_48", "13_36", "14_10", "15_7", "16_42", "17_5", "18_3", "19_2", "20_1",
"21_0", "22_24", "23_18", "24_42", "25_31", "26_35", "27_8", "28_1", "29_24", "30_47",
"31_14", "32_22", "33_39", "34_0", "35_34", "36_34", "37_39", "38_27", "39_27",
"40_45", "41_10", "42_21", "43_34", "44_41", "45_9", "46_31", "47_0", "48_4", "49_43"],
"score":36.65736}
] } }
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true"
reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_
Request:
Option 5 (Work in Progress):
First-class Vector Fields in Lucene/Solr
Now In Progress
ANN Benchmarks
(Approximate Nearest Neighbor)
https://github.com/erikbern/ann-benchmarks
Vector Encoders
• Take queries, documents, sentences, paragraphs, etc. and
transform them into vectors.
• Usually leverage deep learning, which can discover rich language
usage rules and map them to combinations of features in the
vector
• Popular Libraries:
• Bert
• Elmo
• Universal Sentence Encoder
• Word2Vec
• Sentence2Vec
• Glove
• fastText
• many more …
Vector Encoders
Balancing the Dimensions of User Intent
Query Type Likely Outcome
Obscure keyword combinations
Q. (software OR hardware) AND enginee*
• Keyword search succeeds
• Vector Search fails
Natural Language Queries
Q. Can my wife drive on my insurance?
• Keyword search might get
lucky, but probably fails
• Vector Search succeeds
Fuzzy Language Queries
Q. famous french tower
• Keyword search mismatch
yields poor results
• Vector Search succeeds
Structured Relationship Queries
Q. popular bbq near Activate
• Keyword search fails
• Vector search fails
• Need a Knowledge Graph!
Keyword Search vs. Vector Search
Giant Graph of Relationships...
Trey Grainger works for Lucidworks.
He spoke at the Activate 2019
conference.
#Activate19
(Activate) wqs held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Trey’s Voicemail
Semantic Knowledge Graph
id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Registered Nurse
desc: a registered nurse at
hospital doing hard work
skills: oncology, phlebotemy
id: 3
job_title: Java Developer
desc: a software engineer or a
java engineer doing work
skills: java, scala, hibernate
field doc term
desc
1
a
at
company
engineer
great
software
2
a
at
doing
hard
hospital
nurse
registered
work
3
a
doing
engineer
java
or
software
work
job_title 1
Software
Engineer
… … …
Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments
Source: Trey Grainger,
Khalifeh AlJadda,
Mohammed Korayem,
Andries Smith.“The Semantic
Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any
relationship within a domain”.
DSAA 2016.
Knowledge
Graph
field term postings
list
doc pos
desc
a
1 4
2 1
3 1, 5
at
1 3
2 4
company 1 6
doing
2 6
3 8
engineer
1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software
1 1
3 2
work
2 10
3 9
job_title java developer 3 1
… … … …
Related term vector (for query concept expansion)
http://localhost:8983/solr/stack-exchange-health/skg
Disambiguation by Category Example
Meaning 1: Restaurant => bbq, brisket, ribs, pork, …
Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, …
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Example Query:
Balancing the Dimensions of User Intent
Demo!
Demo Data
Places (also includes geonames database)
Entities (includes search commands)
Text Content
[ Web crawl of restaurant and product reviews sites ]
Solr Knowledge Graph Traversal Query
"bbq",
Balancing the Dimensions of User Intent
Why this Semantic Nuance Matters
popular barbeque near Activate
(popular same as "good", "top", "best")
Hotels near Haystack EU
hotels near popular BBQ in Berlin
BBQ near airports near Berlin
hotels near movie theaters in Berlin …
Other Knowledge Graph Search examples:
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Semantic
Search
Domain-aware
Matching
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
News Search : popularity and freshness drive relevance
Restaurant Search: geographical proximity and price range are critical
Ecommerce: likelihood of a purchase is key
Movie search: More popular titles are generally more relevant
Job search: category of job, salary range, and geographical proximity matter
The right ranking algorithm is domain and context-dependent
Example Combining Content + Domain + User Context
News website:
/select?
fq={!cache=false v=$keywords}&
q= {!func}scale(query($keywords),0,25)
{!func}scale(geodist(),0,25)
{!func}recip(rord(publicationDate),1,25,0)
{!func}scale(popularity,0,25)&
keywords="fall festival"&
sfield=location&
pt=33.748,-84.391
25%
25%
25%
25%
*Example from chapter 16 of Solr in Action
But how do we figure out the right
balance of weights?
Learning to Rank
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Re
Alonzo ipad do
do
do
Elena printer do
do
do
Ming ipad do
do
do
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Feature Weight
title_match_all_terms 15.25
exact_phrase_match 10
signal_boost 9.5
content_age 9.2
user_geo_distance 6.5
personalization_cat_1 2.8
doc_popularity 2.75
… …
ipad ⌕
Initial Results:
1) doc1
2) doc2
3) doc3
Build Ranking Classifier
(from Implicit Relevance Judgements)
Final Results:
1) doc3
2) doc1
3) doc2
Facet,
Topic &
Cluster
Query Rule
Matching
Natural
Language
Machine
Learning
Boosted
Results
Signals
Content
Index
System Generated
Human Generated
Application Generated
Solution
Data
We operationalize AI for the
largest businesses on the planet.
Questions?
Trey Grainger
trey@lucidworks.com
@treygrainger
Other presentations:
http://www.treygrainger.com
40% Discount code: ctwhay19
http://aiPoweredSearch.com
http://solrinaction.com
Books:
Thank You!

More Related Content

PDF
Thought Vectors and Knowledge Graphs in AI-powered Search
PDF
The Next Generation of AI-powered Search
PDF
The Python Cheat Sheet for the Busy Marketer
PPTX
BrightonSEO March 2021 | Dan Taylor, Image Entity Tags
PDF
A beginner's guide to machine learning for SEOs - WTSFest 2022
PPTX
How to Build a Semantic Search System
PDF
[우리가 데이터를 쓰는 법] 좋다는 건 알겠는데 좀 써보고 싶소. 데이터! - 넘버웍스 하용호 대표
PDF
성장을 좋아하는 사람이, 성장하고 싶은 사람에게
Thought Vectors and Knowledge Graphs in AI-powered Search
The Next Generation of AI-powered Search
The Python Cheat Sheet for the Busy Marketer
BrightonSEO March 2021 | Dan Taylor, Image Entity Tags
A beginner's guide to machine learning for SEOs - WTSFest 2022
How to Build a Semantic Search System
[우리가 데이터를 쓰는 법] 좋다는 건 알겠는데 좀 써보고 싶소. 데이터! - 넘버웍스 하용호 대표
성장을 좋아하는 사람이, 성장하고 싶은 사람에게

What's hot (20)

PDF
Antifragility in Digital Marketing
PPTX
Entity Seo Mastery
PDF
Duck Duck Go
PDF
Visualising Data with Code
PDF
MOBILITY X DATA : 모빌리티 산업의 도전 과제
PDF
Log File Analysis
PDF
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
PDF
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
PDF
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
PPTX
BrightonSEO - Exploring Cognitive Load (1).pptx
PPTX
검색 서비스 간략 교육
PDF
The complete guide to X-raying LinkedIn for Sourcing
PDF
Automating Google Lighthouse
PPTX
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...
PDF
데이터 분석가는 어떤 SKILLSET을 가져야 하는가? - 데이터 분석가 되기
PPTX
GREE 流!AWS をお得に使う方法
PDF
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
PPTX
Python for SEO
PDF
アサヒのデータ活用基盤を支えるデータ仮想化技術
Antifragility in Digital Marketing
Entity Seo Mastery
Duck Duck Go
Visualising Data with Code
MOBILITY X DATA : 모빌리티 산업의 도전 과제
Log File Analysis
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
BrightonSEO - Exploring Cognitive Load (1).pptx
검색 서비스 간략 교육
The complete guide to X-raying LinkedIn for Sourcing
Automating Google Lighthouse
Opinion-based Article Ranking for Information Retrieval Systems: Factoids and...
데이터 분석가는 어떤 SKILLSET을 가져야 하는가? - 데이터 분석가 되기
GREE 流!AWS をお得に使う方法
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Python for SEO
アサヒのデータ活用基盤を支えるデータ仮想化技術
Ad

Similar to Balancing the Dimensions of User Intent (20)

PDF
AI, Search, and the Disruption of Knowledge Management
PPTX
The Next Generation of AI-Powered Search
PDF
Scaling Recommendations, Semantic Search, & Data Analytics with solr
PDF
SDSC18 and DSATL Meetup March 2018
PDF
Beyond User Research
PDF
Everything You Wish You Knew About Search
PPTX
Measuring Impact: Towards a data citation metric
KEY
8 Information Architecture Better Practices
PDF
Before Code: How to Plan & Visualize Your Project
PDF
SEO Do's and Dont's - Search in 2018
PDF
A Journey With Microsoft Cognitive Services II
PDF
Using sharepoint to solve business problems #spsnairobi2014
PPTX
Crowd Sourced Reflected Intelligence for Solr and Hadoop
KEY
Adaptable Information Workshop slides
PPTX
Webinar: Scaling MongoDB
PDF
Reflected intelligence evolving self-learning data systems
PPTX
Ordering the chaos: Creating websites with imperfect data
PDF
Open Source Needs Design
PPTX
Navigating the Mess of a Shared drive Migration to SharePoint
KEY
Search Analytics for Content Strategists
AI, Search, and the Disruption of Knowledge Management
The Next Generation of AI-Powered Search
Scaling Recommendations, Semantic Search, & Data Analytics with solr
SDSC18 and DSATL Meetup March 2018
Beyond User Research
Everything You Wish You Knew About Search
Measuring Impact: Towards a data citation metric
8 Information Architecture Better Practices
Before Code: How to Plan & Visualize Your Project
SEO Do's and Dont's - Search in 2018
A Journey With Microsoft Cognitive Services II
Using sharepoint to solve business problems #spsnairobi2014
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Adaptable Information Workshop slides
Webinar: Scaling MongoDB
Reflected intelligence evolving self-learning data systems
Ordering the chaos: Creating websites with imperfect data
Open Source Needs Design
Navigating the Mess of a Shared drive Migration to SharePoint
Search Analytics for Content Strategists
Ad

More from Trey Grainger (20)

PDF
Reflected Intelligence: Real world AI in Digital Transformation
PDF
Natural Language Search with Knowledge Graphs (Chicago Meetup)
PDF
Natural Language Search with Knowledge Graphs (Activate 2019)
PDF
Measuring Relevance in the Negative Space
PDF
Natural Language Search with Knowledge Graphs (Haystack 2019)
PDF
The Future of Search and AI
PPTX
The Relevance of the Apache Solr Semantic Knowledge Graph
PPTX
Searching for Meaning
PPTX
The Intent Algorithms of Search & Recommendation Engines
PPTX
The Apache Solr Semantic Knowledge Graph
PPTX
Building Search & Recommendation Engines
PPTX
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
PPTX
Self-learned Relevancy with Apache Solr
PPTX
The Apache Solr Smart Data Ecosystem
PPTX
South Big Data Hub: Text Data Analysis Panel
PPTX
The Semantic Knowledge Graph
PPTX
Reflected Intelligence: Lucene/Solr as a self-learning data system
PPTX
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
PPTX
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
PDF
Semantic & Multilingual Strategies in Lucene/Solr
Reflected Intelligence: Real world AI in Digital Transformation
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Activate 2019)
Measuring Relevance in the Negative Space
Natural Language Search with Knowledge Graphs (Haystack 2019)
The Future of Search and AI
The Relevance of the Apache Solr Semantic Knowledge Graph
Searching for Meaning
The Intent Algorithms of Search & Recommendation Engines
The Apache Solr Semantic Knowledge Graph
Building Search & Recommendation Engines
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Self-learned Relevancy with Apache Solr
The Apache Solr Smart Data Ecosystem
South Big Data Hub: Text Data Analysis Panel
The Semantic Knowledge Graph
Reflected Intelligence: Lucene/Solr as a self-learning data system
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Semantic & Multilingual Strategies in Lucene/Solr

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Modernizing your data center with Dell and AMD
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
20250228 LYD VKU AI Blended-Learning.pptx
Modernizing your data center with Dell and AMD
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf

Balancing the Dimensions of User Intent

  • 1. Trey Grainger Chief Algorithms Officer Balancing the Dimensions of User Intent October 28, 2019
  • 2. Trey Grainger Chief Algorithms Officer • Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder • Georgia Tech – MBA, Management of Technology • Furman University – BA, Computer Science, Business, & Philosophy • Stanford University – Information Retrieval & Web Search Other fun projects: • Co-author of Solr in Action, plus numerous research publications • Advisor to Presearch, the decentralized search engine • Lucene / Solr contributor About Me
  • 3. • About Lucidworks • What is AI-powered Search? • The Dimensions of User Intent • Content Understanding: • Keyword Search • User Understanding: • Collaborative Recommendations • Content Understanding + User Understanding: • Personalized Search • Domain Understanding: • Knowledge Graphs • Domain Understanding + User Understanding: • Domain-aware Matching • Content Understanding + Domain Understanding: • Semantic Search • Balancing Approaches: • Keyword vs. Vector vs. Knowledge Graph Search • Vector Search • Knowledge Graph Search • Combining it all together Agenda
  • 4. Who are we? 300+ CUSTOMERS ACROSS THE FORTUNE 1000 400+EMPLOYEES OFFICES IN San Francisco, CA (HQ) Raleigh-Durham, NC Cambridge, UK Bangalore, India Hong Kong The Search & AI Conference COMPANY BEHIND D E V E L O P M E N T, H O S T I N G , & S U P P O R T
  • 5. Proudly built with open-source tech at its core: Apache Solr & Apache Spark Personalizes search with applied machine learning Proven on the world’s biggest information systems
  • 7. http://aiPoweredSearch.com ... is my new book! (Haystack discount code: ctwhay19)
  • 10. AI-powered Search Question / Answer Systems Virtual Assistants • Signals Boosting Models • Learning to Rank • Semantic Search • Collaborative Filtering • Personalized Search • Content Clustering • NLP / Entity Resolution • Semantic Knowledge Graphs • Document Classification • etc. • Neural Search • Word Embeddings • Vector Search • Image / Voice Search • etc. • Question / Answer Systems • Virtual Assistants • Chatbots • Rules-based Relevancy • etc.
  • 11. We have a big toolbox - great!
  • 12. But how do we properly apply those tools?
  • 13. Dimensions of User Intent Content Understanding Domain Understanding User Understanding User Intent
  • 14. Keyword Search Dimensions of User Intent Content Understanding Domain Understanding User Understanding User Intent
  • 15. /solr/collection/select/?q=apache solr Term Documents … … apache doc1, doc3, doc4, doc5 … lucene doc2, doc4, doc6 … … solr doc1, doc3, doc4, doc7, doc8 … … doc5 doc7 doc8 doc1 doc3 doc4 solr apache apache solr Matching queries to documents
  • 16. BM25 (Relevance Scoring between Query and Documents) Score(q, d) = ∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl ) t in q Where: t = term; d = document; q = query; i = index tf(t in d) = numTermOccurrencesInDocument ½ idf(t) = 1 + log (numDocs / (docFreq + 1)) |d| = ∑ 1 t in d avgdl = = ( ∑ |d| ) / ( ∑ 1 ) ) d in i d in i k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point. b = Free parameter. Usually ~0.75. Increases impact of document normalization.
  • 17. ipad
  • 18. Keyword Search Dimensions of User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding User Intent
  • 20. Collaborative Filtering (Recommendations) User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Results Alonzo ipad doc10, doc22, doc12, … Elena printer doc84, doc2, doc17, … Ming ipad doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc12 Elena click doc2 … … … User Item Weight Alonzo doc22 1.0 Alonzo doc12 0.4 … … … Ming doc12 0.9 Ming doc22 0.6 … … … ipad ⌕ Matrix Factorization Recommendations for Alonzo: • doc22: “iPad Pro” • doc12: “Kindle Fire” …
  • 21. Recommendations (User-Item, Item-Item, Query-Item) User Item Weight Alonzo doc22 1.0 Alonzo doc12 0.4 … … … Ming doc12 0.9 Ming doc22 0.6 … … … Recommendations for Alonzo: • doc22: “iPad Pro” • doc12: “Kindle Fire” … Item Item Weight doc22 doc22 1.0 doc22 doc12 0.85 … … … doc12 doc12 1.0 doc12 doc22 0.83 … … … Query Item Weight ipad doc22 0.98 ipad doc12 0.6 … … … kindle doc12 0.96 apple doc22 0.90 … … … Recommendations for Doc22: • doc22: “iPad Pro” • doc12: “Kindle Fire” … Recommendations for “ipad”: • doc22: “iPad Pro” • doc12: “Kindle Fire” … Matrix Factorization
  • 22. ipad
  • 24. ipad
  • 25. Keyword Search Knowledge Graph Dimensions of User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding User Intent
  • 26. What is a Knowledge Graph? (vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
  • 27. Overly Simplistic Definitions Alternative Labels: Substitute words with identical meanings [ CTO => Chief Technology Officer; specialise => specialize ] Synonyms List: Provides substitute words that can be used to represent the same or very similar things [ human => homo sapien, mankind; food => sustenance, meal ] Taxonomy: Classifies things into Categories [ john is Human; Human is Mammal; Mammal is Animal ] Ontology: Defines relationships between types of things [ animal eats food; human is animal ] Knowledge Graph: Instantiation of an Ontology (contains the things that are related) [ john is human; john eats food ] A Knowledge Graph subsumes the other types.
  • 30. Keyword Search Knowledge Graph User Intent Personalized Search Dimensions of User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 32. Keyword Search (Completely User-specified) User-guided Recommendations (Mostly driven by user profile, partially user-specified) Traditional Recommendations (Completely driven by user behavior) Keyword Search (Completely User-specified) Personalized Queries (Mostly user-specified, partially driven by user profile)
  • 33. Personalized Queries (Mostly user-specified, partially driven by user profile) Keyword Search (Completely User-specified) User-guided Recommendations (Mostly driven by user profile, partially user-specified) Traditional Recommendations (Completely driven by user behavior) Personalized Search
  • 37. Regular Search Results: Personalized Search Results: User:
  • 38. Nice - personalization is awesome! Let’s roll it out everywhere!
  • 40. Keyword Search Knowledge Graph User Intent Personalized Search Domain-aware Matching Dimensions of User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 41. Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Collaborative Recommendations (Completely driven by user behavior)
  • 42. Personas / User Profiles (User attributes and preferences in knowledge graph) Multimodal Recommendations (Recommendations combining collaborative filtering plus user-based profile attribute matching/ranking) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Collaborative Recommendations (Completely driven by user behavior)
  • 43. Personas / User Profiles (User attributes and preferences in knowledge graph) Multimodal Recommendations (Recommendations combining collaborative filtering plus user-based profile attribute matching/ranking) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Collaborative Recommendations (Completely driven by user behavior) Domain-aware Matching
  • 48. http://localhost:8983/solr/jobs/select/? fl=jobtitle,city,state,salary& q=( jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10 ) AND ( (city:"Boston" AND state:"MA")^15 OR state:"MA") AND _val_:"map(salary, 40000, 60000,10, 0)" AND similar_users:{!terms}u99,u1,u50,u2311,u253,u70,u99 *Example derived from chapter 16 of Solr in Action Multimodal Recommendations Jane is a nurse educator in Boston seeking between $40K and $60K She has interacted with the same content as the following users: u99,u1,u50,u2311,u253,u70,u99
  • 49. Keyword Search Knowledge Graph User Intent Personalized Search Semantic Search Domain-aware Matching Dimensions of User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 50. Keyword Search (Finding and Ranking Keyword) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities)
  • 51. Language Understanding (Understanding syntax and query structure) Keyword Search (Finding and Ranking Keyword) Terminology Understanding (Understanding domain-specific terms and conceptual meaning) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities)
  • 52. Language Understanding (Understanding syntax and query structure) Terminology Understanding (Understanding domain-specific terms and conceptual meaning) Keyword Search (Finding and Ranking Keyword) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Semantic Search
  • 55. Sentence Embeddings: [ 2, 3, 2, 4, 2, 1, 5, 3 ] [ 5, 3, 2, 3, 4, 0, 3, 4 ] . . . Document Embedding: [ 4, 1, 4, 2, 1, 2, 4, 3 ] Word Embeddings: [ 5, 1, 3, 4, 2, 1, 5, 3 ] [ 4, 1, 3, 0, 1, 1, 4, 2 ] . . . Paragraph Embeddings: [ 5, 1, 4, 1, 0, 2, 4, 0 ] [ 1, 1, 4, 2, 1, 0, 0, 0 ] . . . Thought Vectors
  • 56. apple caffeine cheese coffee drink donut food juice pizza tea water … term N cappuccino 0 0 0 0 0 0 0 0 0 0 0 ... apple 1 0 0 0 0 0 0 0 0 0 0 ... juice 0 0 0 0 0 0 0 1 0 0 0 ... cheese 0 0 1 0 0 0 0 0 0 0 0 ... pizza 0 0 0 0 0 0 0 0 1 0 0 ... donut 0 0 0 0 0 1 0 0 0 0 0 ... green 0 0 0 0 0 0 0 0 0 0 0 ... tea 0 0 0 0 0 0 0 0 0 1 0 ... bread 0 0 1 0 0 0 0 0 0 0 0 ... sticks 0 0 0 0 0 0 0 0 0 0 0 ... exact term lookup in inverted indexquery Single Term Searches (as a Vector)
  • 57. Combined Vector query Multi-term Query Vectors juice 0 0 0 0 0 0 0 1 0 0 0 ... apple 1 0 0 0 0 0 0 0 0 0 0 ... + apple juice 1 0 0 0 0 0 0 1 0 0 0 ...
  • 58. apple caffeine cheese coffee drink donut food juice pizza tea water … term N latte 0 0 0 0 0 0 0 0 0 0 0 ... cappuccino 0 0 0 0 0 0 0 0 0 0 0 ... apple juice 1 0 0 0 0 0 0 1 0 0 0 ... cheese pizza 0 0 1 0 0 0 0 0 1 0 0 ... donut 0 0 0 0 0 1 0 0 0 0 0 ... soda 0 0 0 0 0 0 0 0 0 0 0 ... green tea 0 0 0 0 0 0 0 0 0 1 0 ... water 0 0 0 0 0 0 0 0 0 0 1 ... cheese bread sticks 0 0 1 0 0 0 0 0 0 0 0 ... cinnamon sticks 0 0 0 0 0 0 0 0 0 0 0 ... exact term lookup in inverted indexquery Multi-term Searches
  • 59. food drink dairy bread caffeine sweet calories healthy apple juice 0 5 0 0 0 4 4 3 cappuccino 0 5 3 0 4 1 2 3 cheese bread sticks 5 0 4 5 0 1 4 2 cheese pizza 5 0 4 4 0 1 5 2 cinnamon bread sticks 5 0 1 5 0 3 4 2 donut 5 0 1 5 0 4 5 1 green tea 0 5 0 0 2 1 1 5 latte 0 5 4 0 4 1 3 3 soda 0 5 0 0 3 5 5 0 water 0 5 0 0 0 0 0 5 Dimensionality Reduction
  • 60. Phrase: Vector: apple juice: [ 0, 5, 0, 0, 0, 4, 4, 3 ] cappuccino: [ 0, 5, 3, 0, 4, 1, 2, 3 ] cheese bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ] cheese pizza: [ 5, 0, 4, 4, 0, 1, 5, 2 ] cinnamon bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ] donut: [ 5, 0, 1, 5, 0, 4, 5, 1 ] green tea: [ 0, 5, 0, 0, 2, 1, 1, 5 ] latte: [ 0, 5, 4, 0, 4, 1, 3, 3 ] soda: [ 0, 5, 0, 0, 3, 5, 5, 0 ] water: [ 0, 5, 0, 0, 0, 0, 0, 5 ] Ranked Results: Green Tea 0.94 water 0.85 cappuccino 0.80 latte 0.78 apple juice 0.60 soda … … 0.19 donut Vector Similarity Scores: Vector Similarity (a, b): cos(θ) = a · b |a| × |b| Ranked Results: Cheese Pizza 0.99 cheese bread sticks 0.91 cinnamon bread sticks 0.89 donut 0.47 latte 0.46 apple juice … … 0.19 water Vector Similarity Scoring
  • 61. Vector Similarity Scores: Performance Considerations Problem: Vector Scoring is Slow • Unlike keyword search, which looks up pre-indexed answers to queries, Vector Search must instead calculate similarities between the query vector and every document’s vectors to determine best matches, which is slow at scale. Solution: Quantized Vectors • “Quantization” is the process for mapping vectors features to discrete values. • Creating “tokens” which map to a similar vector space, enables matching on those tokens to perform an ANN (Approximate Nearest Neighbor) search • This enables converting vector scoring into a search problem (term lookup and scoring), which is fast again, at the expense of some recall and scoring accuracy Recommended Approach: Quantized Vector Search + Vector Similarity Reranking • Combine the best of both worlds by running an initial ANN search on a quantized vector representation, and then re-rank the top-N results using full Vector similarity scoring.
  • 63. Option 1: Streaming Expressions
  • 64. curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food/update?commit=true --data-binary ' [ {"id": "1", "name_s":"donut", "vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}, {"id": "2", "name_s":"apple juice", "vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]}, {"id": "3", "name_s":"cappuccino", "vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]}, {"id": "4", "name_s":"cheese pizza", "vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}, {"id": "5", "name_s":"green tea", "vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]}, {"id": "6", "name_s":"latte", "vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]}, {"id": "7", "name_s":"soda", "vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]}, {"id": "8", "name_s":"cheese bread sticks", "vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]}, {"id": "9", "name_s":"water", "vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]}, {"id": "10", "name_s":"cinnamon bread sticks", "vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]} ] ' Send Documents to Solr: Streaming Expressions
  • 65. 8983
  • 67. http://localhost:8983/solr/food/select?q=*:*&fl=id,name_s& fq={!streaming_expression}top( select( search(food, q="*:*", fl="id,vector_fs", sort="id asc"), cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort="cos desc” ) { "responseHeader":{ … }, "response":{"numFound":5,"start":0,"docs":[ { "name_s":"donut", "id":"1"}, { "name_s":"apple juice", "id":"2"}, { "name_s":"cheese pizza", "id":"4"}, { "name_s":"cheese bread sticks", "id":"8"}, { "name_s":"cinnamon bread sticks", "id":"10"}] }} Request: Response: Streaming Expressions Query Parser
  • 68. Option 3: Solr Vector Scoring Plugin
  • 69. Send Documents to Solr: curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-name}/update?commit=true -- data-binary ‘ [ {"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33"}, {"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25"}, {"name":"example 2", "vector":"0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01"}, {"name":"example 3", "vector":"0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9"}, {"name":"example 4", "vector":"0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1"}, {"name":"example 5", "vector":"0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09"} ]' Solr Vector Scoring Plugin
  • 70. Request: Response: http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0” } { "responseHeader":{ "status":0, "QTime":1}}, "response":{ "numFound":6,"start":0,"maxScore":0.99984086, "docs":[ { "name":["example 3"], "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "], "score":0.99984086}, { "name":["example 0"], "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "], "score":0.7693964}, { "name":["example 5"], "vector":["0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "], "score":0.76322395}, { "name":["example 4"], "vector":["0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "], "score":0.5328145}, { "name":["example 1"], "vector":["0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "], "score":0.48513117}, { "name":["example 2"], "vector":["0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "], "score":0.44909418}] }} Solr Vector Scoring Plugin
  • 71. Option 4: Solr Vector Scoring + LSH Plugin
  • 72. Send Documents to Solr: Solr Vector Scoring + LSH Plugin curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection- name}/update?update.chain=LSH&commit=true --data-binary ‘ [ {"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"}, {"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"} ]' http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true" reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_ Request:
  • 73. Response: Solr Vector Scoring + LSH Plugin { "responseHeader":{ "status":0, "QTime":8, "response":{"numFound":1,"start":0,"maxScore":36.65736, "docs":[ { "id": "1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33", "_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==", "_lsh_hash_":["0_8", "1_35", "2_7", "3_10", "4_2", "5_35", "6_16", "7_30", "8_27", "9_12", "10_7", "11_32", "12_48", "13_36", "14_10", "15_7", "16_42", "17_5", "18_3", "19_2", "20_1", "21_0", "22_24", "23_18", "24_42", "25_31", "26_35", "27_8", "28_1", "29_24", "30_47", "31_14", "32_22", "33_39", "34_0", "35_34", "36_34", "37_39", "38_27", "39_27", "40_45", "41_10", "42_21", "43_34", "44_41", "45_9", "46_31", "47_0", "48_4", "49_43"], "score":36.65736} ] } } http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true" reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_ Request:
  • 74. Option 5 (Work in Progress): First-class Vector Fields in Lucene/Solr
  • 76. ANN Benchmarks (Approximate Nearest Neighbor) https://github.com/erikbern/ann-benchmarks
  • 78. • Take queries, documents, sentences, paragraphs, etc. and transform them into vectors. • Usually leverage deep learning, which can discover rich language usage rules and map them to combinations of features in the vector • Popular Libraries: • Bert • Elmo • Universal Sentence Encoder • Word2Vec • Sentence2Vec • Glove • fastText • many more … Vector Encoders
  • 80. Query Type Likely Outcome Obscure keyword combinations Q. (software OR hardware) AND enginee* • Keyword search succeeds • Vector Search fails Natural Language Queries Q. Can my wife drive on my insurance? • Keyword search might get lucky, but probably fails • Vector Search succeeds Fuzzy Language Queries Q. famous french tower • Keyword search mismatch yields poor results • Vector Search succeeds Structured Relationship Queries Q. popular bbq near Activate • Keyword search fails • Vector search fails • Need a Knowledge Graph! Keyword Search vs. Vector Search
  • 81. Giant Graph of Relationships... Trey Grainger works for Lucidworks. He spoke at the Activate 2019 conference. #Activate19 (Activate) wqs held in Washington, DC September 9-12, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail
  • 83. id: 1 job_title: Software Engineer desc: software engineer at a great company skills: .Net, C#, java id: 2 job_title: Registered Nurse desc: a registered nurse at hospital doing hard work skills: oncology, phlebotemy id: 3 job_title: Java Developer desc: a software engineer or a java engineer doing work skills: java, scala, hibernate field doc term desc 1 a at company engineer great software 2 a at doing hard hospital nurse registered work 3 a doing engineer java or software work job_title 1 Software Engineer … … … Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph field term postings list doc pos desc a 1 4 2 1 3 1, 5 at 1 3 2 4 company 1 6 doing 2 6 3 8 engineer 1 2 3 3, 7 great 1 5 hard 2 7 hospital 2 5 java 3 6 nurse 2 3 or 3 4 registered 2 2 software 1 1 3 2 work 2 10 3 9 job_title java developer 3 1 … … … …
  • 84. Related term vector (for query concept expansion) http://localhost:8983/solr/stack-exchange-health/skg
  • 85. Disambiguation by Category Example Meaning 1: Restaurant => bbq, brisket, ribs, pork, … Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, …
  • 93. Demo!
  • 94. Demo Data Places (also includes geonames database) Entities (includes search commands) Text Content [ Web crawl of restaurant and product reviews sites ]
  • 95. Solr Knowledge Graph Traversal Query "bbq",
  • 97. Why this Semantic Nuance Matters
  • 98. popular barbeque near Activate (popular same as "good", "top", "best") Hotels near Haystack EU hotels near popular BBQ in Berlin BBQ near airports near Berlin hotels near movie theaters in Berlin … Other Knowledge Graph Search examples:
  • 99. Keyword Search Knowledge Graph User Intent Personalized Search Semantic Search Domain-aware Matching Dimensions of User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 100. News Search : popularity and freshness drive relevance Restaurant Search: geographical proximity and price range are critical Ecommerce: likelihood of a purchase is key Movie search: More popular titles are generally more relevant Job search: category of job, salary range, and geographical proximity matter The right ranking algorithm is domain and context-dependent
  • 101. Example Combining Content + Domain + User Context News website: /select? fq={!cache=false v=$keywords}& q= {!func}scale(query($keywords),0,25) {!func}scale(geodist(),0,25) {!func}recip(rord(publicationDate),1,25,0) {!func}scale(popularity,0,25)& keywords="fall festival"& sfield=location& pt=33.748,-84.391 25% 25% 25% 25% *Example from chapter 16 of Solr in Action
  • 102. But how do we figure out the right balance of weights?
  • 103. Learning to Rank User Searches User Sees Results User takes an action Users’ actions inform system improvements User Query Re Alonzo ipad do do do Elena printer do do do Ming ipad do do do … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Feature Weight title_match_all_terms 15.25 exact_phrase_match 10 signal_boost 9.5 content_age 9.2 user_geo_distance 6.5 personalization_cat_1 2.8 doc_popularity 2.75 … … ipad ⌕ Initial Results: 1) doc1 2) doc2 3) doc3 Build Ranking Classifier (from Implicit Relevance Judgements) Final Results: 1) doc3 2) doc1 3) doc2
  • 105. We operationalize AI for the largest businesses on the planet.
  • 107. Trey Grainger trey@lucidworks.com @treygrainger Other presentations: http://www.treygrainger.com 40% Discount code: ctwhay19 http://aiPoweredSearch.com http://solrinaction.com Books: Thank You!