SlideShare a Scribd company logo
Hybrid Search with Apache
Solr Reciprocal Rank Fusion
Speaker: Alessandro Benedetti, Director @ Sease
BERLIN BUZZWORDS 2024 - 10/06/2024
‣ Born in Tarquinia (ancient Etruscan city in Italy)
‣ R&D Software Engineer
‣ Director
‣ Master degree in Computer Science
‣ PC member for ECIR, SIGIR and Desires
‣ Apache Lucene/Solr PMC member/committer
‣ Elasticsearch/OpenSearch expert
‣ Semantic search, NLP, Machine Learning
technologies passionate
‣ Beach Volleyball player and Snowboarder
ALESSANDRO BENEDETTI
WHO AM I ?
2
‣ Headquarter in London/distributed
‣ Open-source Enthusiasts
‣ Apache Lucene/Solr experts
‣ Elasticsearch/OpenSearch experts
‣ Community Contributors
‣ Active Researchers
Hot Trends :
● Large Language Models Applications
● Vector-based (Neural) Search
● Natural Language Processing
● Learning To Rank
● Document Similarity
● Search Quality Evaluation
● Relevance Tuning
SEArch SErvices
www.sease.io
3
Limitations of Vector-Based Search
Reciprocal Rank Fusion
APIs
Internals
Explainability
Overview
Limitations of Vector-Based Search
Reciprocal Rank Fusion
APIs
Internals
Explainability
Overview
Vector-based (Neural) Search Workflow
Similarity between a Query and a Document is translated to distance in a vector space
Low Explainability
● High Dimensionality - vectors are long sequences of numerical values (768,
1536 …)
● Dimensions - each feature (element in the vector) has no clear semantic in
many cases (slightly different from sparse vectors or explicit feature vectors)
● Values - It’s not obvious to estimate how a single value impact relevance
(higher is better?)
● Similarity - To explain why a search result is retrieved in the top-K the vector
distance is the only info you have
Research is happening but it’s still an open problem
Lexical matches?
● Search users still have the expectation of lexical matches to happen
● You can’t enforce that with Vector-based search
“Why the document with the keyword in the title
is not coming up?” cit.
Low Diversity
● vector-based search just returns the top-k ordered by
vector similarity
● Unless you add more logic on top, you would expect low
diversity by definition
Limitations of Vector-Based Search
Reciprocal Rank Fusion
APIs
Internals
Explainability
Overview
Hybrid Search
● Mitigation of current vector-search problems - Is it here to stay?
● Combine traditional keyword-based (lexical) search with vector-based (neural)
search
● Retrieval of two sets of candidates:
○ one set of results coming from lexical matches with the query keywords
○ a set of results coming from the K-Nearest Neighbours search with the query
vector
● Ranking of the candidates
Reciprocal Rank Fusion
● Multiple ranked lists -> unified result set.
● reciprocal rank -> inverse of the rank of a document in a ranked list of search
results.
● higher ranking -> higher contribution
● RRF was introduced the first time by Cormack et al. in [1].
[1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet and individual rank learning methods.” Proceedings of the 32nd
international ACM SIGIR conference on Research and development in information retrieval (2009)
Reciprocal Rank Fusion
score = 0.0
for q in queries:
if d in result(q):
score += 1.0 / ( k + rank( result(q), d ) )
return score
Reciprocal Rank Fusion
● e.g.
● K=10
1 id10
2 id7
3 id9
4 id5
5 id3
1 id7
2 id5
3 id9
4 id4
5 id10
id7 ( 0.174 = 1/10+2 +1/10+1)
id10 ( 0.157 = 1/10+1 +1/10+5)
id5 ( 0.155 = 1/10+4 +1/10+2)
id9 ( 0.154 = 1/10+3 +1/10+3)
id4 ( 0.071 = 1/10+4)
id3 ( 0.666 = 1/10+5)
Limitations of Vector-Based Search
Reciprocal Rank Fusion
APIs
Internals
Explainability
Overview
Open Pull Request
https://github.com/apache/solr/pull/2489
Challenges
N.B. If you are not a full-time contributor, the Lucene/Solr split hits
hard!
JSON APIs
{
"queries": {
"lexical1": {
"lucene": {
"query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
}
},
"lexical2": {
"lucene": {
"query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
}
}
},
"limit": 10,
"fields": "[id,score]",
"params": {
"combiner": true,
"combiner.upTo": 5
}
}
Trivia:
“queries” support was originally
introduced to be used in faceting for
domain/exclusion
https://issues.apache.org/jira/brows
e/SOLR-12490
JSON APIs
org.apache.solr.common.params.CombinerParams
String COMBINER = "combiner";
String COMBINER_ALGORITHM = COMBINER + ".algorithm";
String RECIPROCAl_RANK_FUSION = "rrf";
String COMBINER_UP_TO = COMBINER + ".upTo";
int COMBINER_UP_TO_DEFAULT = 100;
String COMBINER_RRF_K = COMBINER + "." + RECIPROCAl_RANK_FUSION + ".k";
int COMBINER_RRF_K_DEFAULT = 60; // from original paper
Limitations of Vector-Based Search
Reciprocal Rank Fusion
APIs
Internals
Explainability
Overview
JSON Request
solr/core/src/java/org/apache/solr/request/json/RequestUtil.java
● Just store the query keys as a Solr param (so that we can access later)
{
"queries": {
"lexical1": {
…
},
"lexical2": {
…
}
},
…
}
[“lexical1”,”lexical2”]
Query Component Prepare
It has the scope to prepare the variables and structures to use at processing time
● get the queries to combine from params
○ save them (unparsed query string) in the response builder
● parse the queries
○ save them parsed in the response builder
○ save the query parsers (class name) in the response builder
org.apache.solr.handler.component.QueryComponent#prepare
Query Component Process
It has the scope to do the combination of search results (not breaking single query
scenarios!)
● get the parsed queries to combine from response builder
○ retrieve the result sets (one for each query)
● combine the results
○ save the combined set in the response builder
org.apache.solr.handler.component.QueryComponent#process
Queries Combiner
org.apache.solr.search.combining.QueriesCombiner
It has the scope to combine results sets, abstracting the algorithm
● init the combined query results
○ keeping track of additional attributes such as Partial Results
● offer abstract methods for
○ combination
○ explainability
public abstract QueryResult combine(QueryResult[] rankedLists);
public abstract NamedList<Explanation> getExplanations(
String[] queryKeys,
List<Query> queries,
List<DocList> resultsPerQuery,
SolrIndexSearcher searcher,
IndexSchema schema)
Reciprocal Rank Fusion
org.apache.solr.search.combining.ReciprocalRankFusion
It has the scope to implement the specific algorithm
● combine the search results (>=2)
○ keeping track of additional attributes such as Partial Results
● explain them
○ how the combination happened, based on the rankings
○ how the original score was calculated
Limitations
This is a first step, the current limitations include:
● Field collapsing/grouping is not supported
● Distributed search support is included but combination happens per node and
may not be ideal.
Limitations of Vector-Based Search
Reciprocal Rank Fusion
APIs
Internals
Explainability
Overview
Debug Component
We have two layers of explainability:
● Query Parsing
● Results explanation
○ Combiner - how the multiple results sets were combined.
○ Original Scores - how the original scores were obtained
Query Debug
<lst name="debug">
<lst name="queriesToCombine">
<lst name="lexical">
<str name="querystring">id:(10^=2 OR 2^=1)</str>
<str name="queryparser">LuceneQParser</str>
<str name="parsedquery">ConstantScore(id:10)^2.0
ConstantScore(id:2)^1.0</str>
<str
name="parsedquery_toString">(ConstantScore(id:10))^2.0 (ConstantScore(id:2))^1.0</str>
</lst>
<lst name="vector-based">
<str name="querystring">[1.0, 2.0, 3.0, 4.0]</str>
<str name="queryparser">KnnQParser</str>
<str
name="parsedquery">KnnFloatVectorQuery(KnnFloatVectorQuery:vector[1.0,...][5])</str>
<str
name="parsedquery_toString">KnnFloatVectorQuery:vector[1.0,...][5]</str>
</lst>
</lst>
Results Debug
'0.032522473 =
1/(60+1) + 1/(60+2) because its ranks were:
1 for query(lexical), 2 for query(lexical2)'
+
Original Scores
Future Works
PR: https://github.com/apache/solr/pull/2489
- Distributed tests
- Documentation
—------------------------
- better distributed support
- field collapsing/grouping
- Additional algorithms?
THANK YOU!
@seaseltd @sease-
ltd
@seaseltd @sease_ltd

More Related Content

PPTX
Hybrid Search With Apache Solr
PPTX
Apache lucene
PDF
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
PDF
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
PDF
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
PDF
Reflected intelligence evolving self-learning data systems
PDF
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
PPT
Advanced full text searching techniques using Lucene
Hybrid Search With Apache Solr
Apache lucene
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Reflected intelligence evolving self-learning data systems
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Advanced full text searching techniques using Lucene

Similar to Hybrid Search with Apache Solr Reciprocal Rank Fusion (20)

PPTX
Hacking Lucene for Custom Search Results
PDF
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
PPTX
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
PPTX
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
PDF
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
PPT
CrossRef Technical Information for Libraries
PDF
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
PPTX
Faceted search using Solr and Ontopia
PDF
Apache Lucene/Solr Document Classification
PDF
Lucene And Solr Document Classification
PDF
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
PPTX
REST Easy with Django-Rest-Framework
PDF
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
PDF
Full Text Search with Lucene
PPTX
Candidate selection tutorial
PPTX
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
PPTX
Neural Search Comes to Apache Solr
PDF
Introduction to MongoDB
PDF
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
PPT
Solr and Elasticsearch, a performance study
Hacking Lucene for Custom Search Results
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
CrossRef Technical Information for Libraries
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Faceted search using Solr and Ontopia
Apache Lucene/Solr Document Classification
Lucene And Solr Document Classification
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
REST Easy with Django-Rest-Framework
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Full Text Search with Lucene
Candidate selection tutorial
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
Neural Search Comes to Apache Solr
Introduction to MongoDB
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Solr and Elasticsearch, a performance study
Ad

More from Sease (20)

PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
PPTX
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
PPTX
From Natural Language to Structured Solr Queries using LLMs
PPTX
Multi Valued Vectors Lucene
PPTX
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
PDF
How To Implement Your Online Search Quality Evaluation With Kibana
PDF
Introducing Multi Valued Vectors Fields in Apache Lucene
PPTX
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
PPTX
How does ChatGPT work: an Information Retrieval perspective
PDF
How To Implement Your Online Search Quality Evaluation With Kibana
PPTX
Large Scale Indexing
PDF
Dense Retrieval with Apache Solr Neural Search.pdf
PPTX
How to cache your searches_ an open source implementation.pptx
PDF
Online Testing Learning to Rank with Solr Interleaving
PDF
Advanced Document Similarity with Apache Lucene
PDF
Search Quality Evaluation: a Developer Perspective
PDF
Introduction to Music Information Retrieval
PDF
Explainability for Learning to Rank
PDF
Interactive Questions and Answers - London Information Retrieval Meetup
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Building Search Using OpenSearch: Limitations and Workarounds
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
From Natural Language to Structured Solr Queries using LLMs
Multi Valued Vectors Lucene
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
How To Implement Your Online Search Quality Evaluation With Kibana
Introducing Multi Valued Vectors Fields in Apache Lucene
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
How does ChatGPT work: an Information Retrieval perspective
How To Implement Your Online Search Quality Evaluation With Kibana
Large Scale Indexing
Dense Retrieval with Apache Solr Neural Search.pdf
How to cache your searches_ an open source implementation.pptx
Online Testing Learning to Rank with Solr Interleaving
Advanced Document Similarity with Apache Lucene
Search Quality Evaluation: a Developer Perspective
Introduction to Music Information Retrieval
Explainability for Learning to Rank
Interactive Questions and Answers - London Information Retrieval Meetup
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Programs and apps: productivity, graphics, security and other tools
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
Mobile App Security Testing_ A Comprehensive Guide.pdf

Hybrid Search with Apache Solr Reciprocal Rank Fusion

  • 1. Hybrid Search with Apache Solr Reciprocal Rank Fusion Speaker: Alessandro Benedetti, Director @ Sease BERLIN BUZZWORDS 2024 - 10/06/2024
  • 2. ‣ Born in Tarquinia (ancient Etruscan city in Italy) ‣ R&D Software Engineer ‣ Director ‣ Master degree in Computer Science ‣ PC member for ECIR, SIGIR and Desires ‣ Apache Lucene/Solr PMC member/committer ‣ Elasticsearch/OpenSearch expert ‣ Semantic search, NLP, Machine Learning technologies passionate ‣ Beach Volleyball player and Snowboarder ALESSANDRO BENEDETTI WHO AM I ? 2
  • 3. ‣ Headquarter in London/distributed ‣ Open-source Enthusiasts ‣ Apache Lucene/Solr experts ‣ Elasticsearch/OpenSearch experts ‣ Community Contributors ‣ Active Researchers Hot Trends : ● Large Language Models Applications ● Vector-based (Neural) Search ● Natural Language Processing ● Learning To Rank ● Document Similarity ● Search Quality Evaluation ● Relevance Tuning SEArch SErvices www.sease.io 3
  • 4. Limitations of Vector-Based Search Reciprocal Rank Fusion APIs Internals Explainability Overview
  • 5. Limitations of Vector-Based Search Reciprocal Rank Fusion APIs Internals Explainability Overview
  • 6. Vector-based (Neural) Search Workflow Similarity between a Query and a Document is translated to distance in a vector space
  • 7. Low Explainability ● High Dimensionality - vectors are long sequences of numerical values (768, 1536 …) ● Dimensions - each feature (element in the vector) has no clear semantic in many cases (slightly different from sparse vectors or explicit feature vectors) ● Values - It’s not obvious to estimate how a single value impact relevance (higher is better?) ● Similarity - To explain why a search result is retrieved in the top-K the vector distance is the only info you have Research is happening but it’s still an open problem
  • 8. Lexical matches? ● Search users still have the expectation of lexical matches to happen ● You can’t enforce that with Vector-based search “Why the document with the keyword in the title is not coming up?” cit.
  • 9. Low Diversity ● vector-based search just returns the top-k ordered by vector similarity ● Unless you add more logic on top, you would expect low diversity by definition
  • 10. Limitations of Vector-Based Search Reciprocal Rank Fusion APIs Internals Explainability Overview
  • 11. Hybrid Search ● Mitigation of current vector-search problems - Is it here to stay? ● Combine traditional keyword-based (lexical) search with vector-based (neural) search ● Retrieval of two sets of candidates: ○ one set of results coming from lexical matches with the query keywords ○ a set of results coming from the K-Nearest Neighbours search with the query vector ● Ranking of the candidates
  • 12. Reciprocal Rank Fusion ● Multiple ranked lists -> unified result set. ● reciprocal rank -> inverse of the rank of a document in a ranked list of search results. ● higher ranking -> higher contribution ● RRF was introduced the first time by Cormack et al. in [1]. [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet and individual rank learning methods.” Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (2009)
  • 13. Reciprocal Rank Fusion score = 0.0 for q in queries: if d in result(q): score += 1.0 / ( k + rank( result(q), d ) ) return score
  • 14. Reciprocal Rank Fusion ● e.g. ● K=10 1 id10 2 id7 3 id9 4 id5 5 id3 1 id7 2 id5 3 id9 4 id4 5 id10 id7 ( 0.174 = 1/10+2 +1/10+1) id10 ( 0.157 = 1/10+1 +1/10+5) id5 ( 0.155 = 1/10+4 +1/10+2) id9 ( 0.154 = 1/10+3 +1/10+3) id4 ( 0.071 = 1/10+4) id3 ( 0.666 = 1/10+5)
  • 15. Limitations of Vector-Based Search Reciprocal Rank Fusion APIs Internals Explainability Overview
  • 16. Open Pull Request https://github.com/apache/solr/pull/2489 Challenges N.B. If you are not a full-time contributor, the Lucene/Solr split hits hard!
  • 17. JSON APIs { "queries": { "lexical1": { "lucene": { "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)" } }, "lexical2": { "lucene": { "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)" } } }, "limit": 10, "fields": "[id,score]", "params": { "combiner": true, "combiner.upTo": 5 } } Trivia: “queries” support was originally introduced to be used in faceting for domain/exclusion https://issues.apache.org/jira/brows e/SOLR-12490
  • 18. JSON APIs org.apache.solr.common.params.CombinerParams String COMBINER = "combiner"; String COMBINER_ALGORITHM = COMBINER + ".algorithm"; String RECIPROCAl_RANK_FUSION = "rrf"; String COMBINER_UP_TO = COMBINER + ".upTo"; int COMBINER_UP_TO_DEFAULT = 100; String COMBINER_RRF_K = COMBINER + "." + RECIPROCAl_RANK_FUSION + ".k"; int COMBINER_RRF_K_DEFAULT = 60; // from original paper
  • 19. Limitations of Vector-Based Search Reciprocal Rank Fusion APIs Internals Explainability Overview
  • 20. JSON Request solr/core/src/java/org/apache/solr/request/json/RequestUtil.java ● Just store the query keys as a Solr param (so that we can access later) { "queries": { "lexical1": { … }, "lexical2": { … } }, … } [“lexical1”,”lexical2”]
  • 21. Query Component Prepare It has the scope to prepare the variables and structures to use at processing time ● get the queries to combine from params ○ save them (unparsed query string) in the response builder ● parse the queries ○ save them parsed in the response builder ○ save the query parsers (class name) in the response builder org.apache.solr.handler.component.QueryComponent#prepare
  • 22. Query Component Process It has the scope to do the combination of search results (not breaking single query scenarios!) ● get the parsed queries to combine from response builder ○ retrieve the result sets (one for each query) ● combine the results ○ save the combined set in the response builder org.apache.solr.handler.component.QueryComponent#process
  • 23. Queries Combiner org.apache.solr.search.combining.QueriesCombiner It has the scope to combine results sets, abstracting the algorithm ● init the combined query results ○ keeping track of additional attributes such as Partial Results ● offer abstract methods for ○ combination ○ explainability public abstract QueryResult combine(QueryResult[] rankedLists); public abstract NamedList<Explanation> getExplanations( String[] queryKeys, List<Query> queries, List<DocList> resultsPerQuery, SolrIndexSearcher searcher, IndexSchema schema)
  • 24. Reciprocal Rank Fusion org.apache.solr.search.combining.ReciprocalRankFusion It has the scope to implement the specific algorithm ● combine the search results (>=2) ○ keeping track of additional attributes such as Partial Results ● explain them ○ how the combination happened, based on the rankings ○ how the original score was calculated
  • 25. Limitations This is a first step, the current limitations include: ● Field collapsing/grouping is not supported ● Distributed search support is included but combination happens per node and may not be ideal.
  • 26. Limitations of Vector-Based Search Reciprocal Rank Fusion APIs Internals Explainability Overview
  • 27. Debug Component We have two layers of explainability: ● Query Parsing ● Results explanation ○ Combiner - how the multiple results sets were combined. ○ Original Scores - how the original scores were obtained
  • 28. Query Debug <lst name="debug"> <lst name="queriesToCombine"> <lst name="lexical"> <str name="querystring">id:(10^=2 OR 2^=1)</str> <str name="queryparser">LuceneQParser</str> <str name="parsedquery">ConstantScore(id:10)^2.0 ConstantScore(id:2)^1.0</str> <str name="parsedquery_toString">(ConstantScore(id:10))^2.0 (ConstantScore(id:2))^1.0</str> </lst> <lst name="vector-based"> <str name="querystring">[1.0, 2.0, 3.0, 4.0]</str> <str name="queryparser">KnnQParser</str> <str name="parsedquery">KnnFloatVectorQuery(KnnFloatVectorQuery:vector[1.0,...][5])</str> <str name="parsedquery_toString">KnnFloatVectorQuery:vector[1.0,...][5]</str> </lst> </lst>
  • 29. Results Debug '0.032522473 = 1/(60+1) + 1/(60+2) because its ranks were: 1 for query(lexical), 2 for query(lexical2)' + Original Scores
  • 30. Future Works PR: https://github.com/apache/solr/pull/2489 - Distributed tests - Documentation —------------------------ - better distributed support - field collapsing/grouping - Additional algorithms?