SlideShare a Scribd company logo
Online Testing Learning to Rank
with Solr Interleaving


Alessandro Benedetti, Director
24th
March 2022
‣ Born in Tarquinia(ancient Etruscan city in Italy)
‣ R&D Software Engineer
‣ Director
‣ Master in Computer Science
‣ PC member for ECIR, SIGIR and Desires
‣ Apache Lucene/Solr PMC member/committer
‣ Elasticsearch expert
‣ Semantic, NLP, Machine Learning
technologies passionate
‣ Beach Volleyball player and Snowboarder
Who We Are
Alessandro Benedetti
‣ Headquarter in London/distributed
‣ Open Source Enthusiasts
‣ Apache Lucene/Solr experts
‣ Elasticsearch experts
‣ Community Contributors
‣ Active Researchers
‣ Hot Trends : Neural Search,
Natural Language Processing
Learning To Rank,
Document Similarity,
Search Quality Evaluation,
Relevancy Tuning
SEArch SErvices
www.sease.io
Overview
Learning to Rank
Online Testing for Business
Interleaving
Apache Solr Implementation
DEMO
Overview
Learning to Rank
Online Testing for Business
Interleaving
Apache Solr Implementation
DEMO
Learning from user implicit/explicit feedback
To
Rank documents (sensu lato)
These types of models focus more on the relative ordering of items rather
than the individual label (classification) or score (regression), and are
categorized as Learning To Rank models.
What is it?
• [sci-fi] Sentient system that learn by itself
“Machine Learning stands for that, doesn’t it?” Unknown
• [Integration] Easy to set up and tune it -> it takes patience, time and multiple
experiments
• [Explainability] Easy to give a human understandable explanation of why the model
operates in certain ways
What is not
• Ranking is a central part of many information retrieval problems, such as document
retrieval, collaborative filtering, sentiment analysis, and online advertising.
Application
“Learning to rank is the application of machine
learning, typically supervised, semi-supervised or
reinforcement learning, in the construction of
ranking models for information retrieval
systems.” Wikipedia
Learning To Rank Users Interactions
Logger
Judgement Collector
UI
Interactions
Training
Training Data
Overview
Learning to Rank
Online Testing for Business
Interleaving
Apache Solr Implementation
DEMO
An incorrect or imperfect test set brings us model evaluation results that
aren’t reflecting the real model improvement/regressions
e.g.
We may get an extremely high evaluation metric offline, but only because
we improperly designed the test, even if the model is unfortunately not a
good fit.
There are several problems that are hard to be detected
with an offline evaluation:
[Online] A Business Perspective
An incorrect test set allow us to obtain model evaluation
results that aren’t reflecting the real model improvement.
One sample per query group
One relevance label for all the samples of a query
group
Interactions considered for the data set creation
There are several problems that are hard to be detected
with an offline evaluation:
[Online] A Business Perspective
A incorrect test set allow us to obtain model evaluation
results that aren’t reflecting the real model improvement.
Finding a direct correlation between the offline
evaluation metrics and the parameters used for the online
model performance evaluation (e.g. revenues, click
through rate…).
There are several problems that are hard to be detected
with an offline evaluation:
[Online] A Business Perspective
A incorrect test set allow us to obtain model evaluation
results that aren’t reflecting the real model improvement.
Finding a direct correlation between the offline
evaluation metrics and the parameters used for the online
model performance evaluation (e.g. revenues).
Is based on generated relevance labels that not always
reflect the real user need.
There are several problems that are hard to be detected
with an offline evaluation:
[Online] A Business Perspective
The reliability of the results: we directly observe the user
behaviour.
The interpretability of the results: we directly observe the
impact of the model in terms of online metrics the business
cares.
The possibility to observe the model behavior: we can see
how the user interact with the model and figure out how to
improve it.
Using online testing can lead to many advantages:
[Online] Business Advantages
Overview
Learning to Rank
Online Testing for Business
Interleaving
Apache Solr Implementation
DEMO
50%
50%
A B
20% 40%
Control Variation
[Online] A/B testing
100%
Model A Model B
2
1 3 1 2 3
1 2 3 4
Interleaving
[Online] Interleaving results estimator
>0 winner A
<0 winner B
=0 tie
● It reduces the problem with users’ variance due to their
separation in groups (group A and group B).
● It is more sensitive when comparing models.
● It requires less traffic.
● It requires less time to achieve reliable results.
● It doesn’t necessarily expose a bad model to a sub
population of users.
Interleaving Advantages
There are different types of interleaving:
Balanced Interleaving: alternate insertion with one model
having the priority(decided at the beginning of the
interleaving().
Balanced Interleaving
There are different types of interleaving:
Balanced Interleaving: alternate insertion with one model
having the priority.
DRAWBACK
When comparing two very similar models.
Model A: lA
= (a, b, c, d)
Model B: lB
= (b, c, d, a)
The comparison phase will bring the Model B to win more
often than Model A. This happens regardless of the model
chosen as prior.
This drawback arises due to:
the way in which the evaluation of the results is done.
the fact that model_B rank higher than model_A all documents
with the exception of a.
Balanced Interleaving
There are different types of interleaving:
Balanced Interleaving: alternate insertion with one model
having the priority.
Team-Draft Interleaving: method of team captains in
team-matches.
Team-Draft Interleaving
Team-Draft Interleaving
There are different types of interleaving:
Balanced Interleaving: alternate insertion with one model
having the priority.
Team-Draft Interleaving: method of team capitains in
team-matches.
DRAWBACK
When comparing two very similar models.
Model A: lA
= (a, b, c, d)
Model B: lB
= (b, c, d, a)
Suppose c to be the only relevant document.
With this approach we can obtain four different interleaved lists:
lI1
= (aA
, bB
, cA
, dB
)
lI2
= (bB
, aA
, cB
, dA
)
lI3
= (bB
, aA
, cA
, dB
)
lI4
= (aA
, bB
, cB
, dA
)
All of them putting c at the same rank.
Tie!
But Model B should be chosen
as the best model!
[Online] Team-Draft Interleaving
There are different types of interleaving:
Balanced Interleaving: alternate insertion with one model
having the priority.
Team-Draft Interleaving: method of team captains in
team-matches.
Probabilistic Interleaving: rely on probability distributions.
Every documents have a non-zero probability to be added in
the interleaved result list.
[Online] Probabilistic Interleaving
There are different types of interleaving:
Balanced Interleaving: alternate insertion with one model
having the priority.
Team-Draft Interleaving: method of team captains in
team-matches.
Probabilistic Interleaving: rely on probability distributions.
Every documents have a non-zero probability to be added in
the interleaved result list.
DRAWBACK
The use of probability distribution could lead to a worse user
experience. Less relevant document could be put higher.
[Online] Probabilistic Interleaving
https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
Overview
Learning to Rank
Online Testing for Business
Interleaving
Apache Solr Implementation
DEMO
• Include the required contrib JARs. Note that by default paths are relative to the Solr core so they may need
adjustments to your configuration, or an explicit specification of the $solr.install.dir.
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-ltr-d.*.jar" />
• Declaration of the ltr query parser.
<queryParser name="ltr" class="org.apache.solr.ltr.search.LTRQParserPlugin"/>
• Configuration of the feature values cache.
<cache name="QUERY_DOC_FV"
class="solr.search.LRUCache"
size="4096"
initialSize="2048"
autowarmCount="4096"
regenerator="solr.search.NoOpRegenerator" />
Minimum Requirements
• Declaration of the [features] transformer.
<transformer name="features"
class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory">
<str name="fvCacheName">QUERY_DOC_FV</str>
</transformer>
• Declaration of the [interleaving] transformer.
<transformer name="interleaving"
class="org.apache.solr.ltr.response.transform.LTRInterleavingTransformerFactory"/>
Minimum Requirements
Implements the ranking function:
General form Class Specific examples
Linear LinearModel RankSVM, Pranking
Multiple Additive Trees MultipleAdditiveTreesModel LambdaMART, Gradient
Boosted Regression Trees
(GBRT)
Neural Network NeuralNetworkModel RankNet
(wrapper) DefaultWrapperModel (not applicable)
(custom) (custom class extending
AdapterModel)
(not applicable)
(custom) (custom class extending
LTRScoringModel)
(not applicable)
Models
‣ Computes the scores using a dot product
https://lucene.apache.org/solr/8_8_0//solr-ltr/org/apache/solr/ltr/model/LinearModel.html
Example configuration:
{
"class" : "org.apache.solr.ltr.model.LinearModel",
"name" : "myModelName",
"features" : [
{ "name" : "userTextTitleMatch" },
{ "name" : "originalScore" },
{ "name" : "isBook" }
],
"params" : {
"weights" : {
"userTextTitleMatch" : 1.0,
"originalScore" : 0.5,
"isBook" : 0.1
}
}
}
Linear
‣ computes scores based on the summation of multiple weighted trees
https://lucene.apache.org/solr/8_8_0//solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html
{
"class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"name" : "multipleadditivetreesmodel",
"features":[
{ "name" : "userTextTitleMatch"},
{ "name" : "originalScore"}
],
}
"params" : {
"trees" : [
{
"weight" : "1",
"root": {
"feature" : "userTextTitleMatch",
"threshold" : "0.5",
"left" : {
"value" : "-100"
},
"right" : {
"feature" : "originalScore",
"threshold" : "10.0",
"left" : {
"value" : "50"
},
"right" : {
"value" : "75"
}
}
}
},
{
"weight" : "2",
"root" : {
"value" : "-10"
}
}
]
}
Multiple Additive Trees Model
‣ computes scores using a neural network.
https://lucene.apache.org/solr/8_8_0//solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html
{
"class" : "org.apache.solr.ltr.model.NeuralNetworkModel",
"name" : "rankNetModel",
"features" : [
{ "name" : "documentRecency" },
{ "name" : "isBook" },
{ "name" : "originalScore" }
],
"params" : {
"layers" : [
{
"matrix" : [ [ 1.0, 2.0, 3.0 ],
[ 4.0, 5.0, 6.0 ],
[ 7.0, 8.0, 9.0 ],
[ 10.0, 11.0, 12.0 ] ],
"bias" : [ 13.0, 14.0, 15.0, 16.0 ],
"activation" : "sigmoid"
},
{
"matrix" : [ [ 17.0, 18.0, 19.0, 20.0 ],
[ 21.0, 22.0, 23.0, 24.0 ] ],
"bias" : [ 25.0, 26.0 ],
"activation" : "relu"
},
{
"matrix" : [ [ 27.0, 28.0 ],
[ 29.0, 30.0 ] ],
"bias" : [ 31.0, 32.0 ],
"activation" : "leakyrelu"
},
{
"matrix" : [ [ 33.0, 34.0 ],
[ 35.0, 36.0 ] ],
"bias" : [ 37.0, 38.0 ],
"activation" : "tanh"
},
{
"matrix" : [ [ 39.0, 40.0 ] ],
"bias" : [ 41.0 ],
"activation" : "identity"
}
]
}
Neural Network
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr
model=myModel reRankDocs=100}&fl=id,score
To obtain the feature values computed during reranking, add [features] to the fl parameter, for
example:
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr
model=myModel reRankDocs=100}&fl=id,score,[features]
To rerank using external feature information:
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr
model=myEfiModel efi.text=test efi.preferredManufacturer=Apache
efi.fromMobile=0 efi.answer=13}&fl=id,cat,manu,score,
[features]
Reranking
http://localhost:8983/solr/books/query?q=subjects%3A(%22England%20--%20Fiction%22%20OR%20%22Mystery%20fiction%22)&rq=%7B!ltr%20model=linear
Model1%20reRankDocs=100%20efi.favouriteSubject=%22Mystery%20fiction%22%20efi.fromMobile=1%20efi.age=25%20efi.userLanguage=%22en%22%7D&f
l=id,title,subjects,downloads,score,[features]&debug=results
Hands On: Run a reranking query
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr
model=myModelA model=myModelB reRankDocs=100}&fl=id,score
To obtain the model that interleaving picked for a search result, computed during reranking,
add [interleaving] to the fl parameter, for example:
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr
model=myModelA model=myModelB reRankDocs=100}&fl=id,score,
[interleaving]
Reranking with Interleaving
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"test",
"fl":"id,score,[interleaving]",
"rq":"{!ltr model=myModelA model=myModelB
reRankDocs=100}"}},
"response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
{
"id":"GB18030TEST",
"score":1.0005897,
"[interleaving]":"myModelB"},
{
"id":"UTF8TEST",
"score":0.79656565,
"[interleaving]":"myModelA"}]
}}
Reranking with Interleaving
http://localhost:8983/solr/books/query?q=subjects%3A(%22England%20--%20Fiction%22%20
OR%20%22Mystery%20fiction%22)&rq={!ltr%20model=linearModel1%20model=_OriginalRanking
_%20reRankDocs=100%20efi.favouriteSubject=%22Mystery%20fiction%22%20efi.fromMobile=1
%20efi.age=25%20efi.userLanguage=%22en%22}&fl=id,title,subjects,downloads,score,[fea
tures],[interleaving]&debug=results
"response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs
":[
{
"id":"GB18030TEST",
"score":1.0005897,
"[interleaving]":"_OriginalRanking_"},
{
"id":"UTF8TEST",
"score":0.79656565,
"[interleaving]":"myModel"}]
}}
Interleaving with Original Score
Overview
Learning to Rank
Online Testing for Business
Interleaving
Apache Solr Implementation
DEMO
Future Works
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB
reRankDocs=100 interleavingAlgorithm=TeamDraft}&fl=id,score
Currently the only (and default) algorithm supported is 'TeamDraft'.
How to contribute
Do you want to contribute a new Interleaving Algorithm?
You just need to :
● implement the solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/Interleaving.java interface in a new class
● add the new algorithm in the package: org.apache.solr.ltr.interleaving.algorithms
● add the new algorithm reference in the org.apache.solr.ltr.interleaving.Interleaving#getImplementation
Limitations
● [Distributed Search] Sharding is not supported
Thanks!

More Related Content

PDF
훌륭한 개발자로 성장하기
PPTX
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
PDF
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
PPTX
Laravel overview
PPTX
Oracle REST Data Services Best Practices/ Overview
PPTX
React JS - A quick introduction tutorial
PDF
PDF
Parallelformers
훌륭한 개발자로 성장하기
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Laravel overview
Oracle REST Data Services Best Practices/ Overview
React JS - A quick introduction tutorial
Parallelformers

What's hot (20)

PDF
Javascript essentials
PDF
Java Training | Java Tutorial for Beginners | Java Programming | Java Certifi...
PDF
Build microservice with gRPC in golang
PDF
Universal React apps in Next.js
PDF
こわくないよ❤️ Playframeworkソースコードリーディング入門
PPTX
Express js
PPTX
PPTX
Laravel introduction
PDF
Techdays Helsinki - Creating the distributed apps of the future using dapr - ...
PDF
Deep Dive into GPU Support in Apache Spark 3.x
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PDF
안정적인 서비스 운영 2014.03
PDF
Nodejs presentation
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PPTX
React js programming concept
PDF
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
PDF
cours java complet-2.pdf
PDF
[2D1]Elasticsearch 성능 최적화
PPTX
MongoDB presentation
PDF
Getting Started with SQLite
Javascript essentials
Java Training | Java Tutorial for Beginners | Java Programming | Java Certifi...
Build microservice with gRPC in golang
Universal React apps in Next.js
こわくないよ❤️ Playframeworkソースコードリーディング入門
Express js
Laravel introduction
Techdays Helsinki - Creating the distributed apps of the future using dapr - ...
Deep Dive into GPU Support in Apache Spark 3.x
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
안정적인 서비스 운영 2014.03
Nodejs presentation
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
React js programming concept
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
cours java complet-2.pdf
[2D1]Elasticsearch 성능 최적화
MongoDB presentation
Getting Started with SQLite
Ad

Similar to Online Testing Learning to Rank with Solr Interleaving (20)

PDF
How To Implement Your Online Search Quality Evaluation With Kibana
PPTX
Microsoft for Startups program, designed to help new ventures succeed in comp...
PDF
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
PDF
Different Methodologies For Testing Web Application Testing
PDF
System Design
PDF
system-design-interview-an-insiders-guide-2nbsped-9798664653403.pdf
PDF
Managing the Machine Learning Lifecycle with MLflow
PDF
Nose Dive into Apache Spark ML
DOCX
Divya_Resume
PPT
The Magic Of Application Lifecycle Management In Vs Public
PPTX
VIRLab SIGIR14 Demo
PPT
Manualtestingppt
PPTX
Automated Acceptance Tests & Tool choice
PPTX
#DOAW16 - DevOps@work Roma 2016 - Testing your databases
PDF
Patrick Hall, H2O.ai - The Case for Model Debugging - H2O World 2019 NYC
PDF
E-Commerce Product Rating Based on Customer Review
PPTX
Apache Spark Model Deployment
PPT
Define and Manage Requirements with IBM Rational Requirements Composer
PPT
2cee Master Cocomo20071
PPTX
So Your Boss Wants You to Performance Test Blackboard
How To Implement Your Online Search Quality Evaluation With Kibana
Microsoft for Startups program, designed to help new ventures succeed in comp...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Different Methodologies For Testing Web Application Testing
System Design
system-design-interview-an-insiders-guide-2nbsped-9798664653403.pdf
Managing the Machine Learning Lifecycle with MLflow
Nose Dive into Apache Spark ML
Divya_Resume
The Magic Of Application Lifecycle Management In Vs Public
VIRLab SIGIR14 Demo
Manualtestingppt
Automated Acceptance Tests & Tool choice
#DOAW16 - DevOps@work Roma 2016 - Testing your databases
Patrick Hall, H2O.ai - The Case for Model Debugging - H2O World 2019 NYC
E-Commerce Product Rating Based on Customer Review
Apache Spark Model Deployment
Define and Manage Requirements with IBM Rational Requirements Composer
2cee Master Cocomo20071
So Your Boss Wants You to Performance Test Blackboard
Ad

More from Sease (20)

PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
PPTX
Hybrid Search with Apache Solr Reciprocal Rank Fusion
PPTX
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
PPTX
From Natural Language to Structured Solr Queries using LLMs
PPTX
Hybrid Search With Apache Solr
PPTX
Multi Valued Vectors Lucene
PPTX
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
PDF
Introducing Multi Valued Vectors Fields in Apache Lucene
PPTX
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
PPTX
How does ChatGPT work: an Information Retrieval perspective
PDF
How To Implement Your Online Search Quality Evaluation With Kibana
PPTX
Neural Search Comes to Apache Solr
PPTX
Large Scale Indexing
PDF
Dense Retrieval with Apache Solr Neural Search.pdf
PDF
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
PDF
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
PPTX
How to cache your searches_ an open source implementation.pptx
PDF
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
PDF
Apache Lucene/Solr Document Classification
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Building Search Using OpenSearch: Limitations and Workarounds
Hybrid Search with Apache Solr Reciprocal Rank Fusion
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
From Natural Language to Structured Solr Queries using LLMs
Hybrid Search With Apache Solr
Multi Valued Vectors Lucene
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
Introducing Multi Valued Vectors Fields in Apache Lucene
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
How does ChatGPT work: an Information Retrieval perspective
How To Implement Your Online Search Quality Evaluation With Kibana
Neural Search Comes to Apache Solr
Large Scale Indexing
Dense Retrieval with Apache Solr Neural Search.pdf
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
How to cache your searches_ an open source implementation.pptx
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Apache Lucene/Solr Document Classification

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
MIND Revenue Release Quarter 2 2025 Press Release
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Online Testing Learning to Rank with Solr Interleaving

  • 1. Online Testing Learning to Rank with Solr Interleaving 
 Alessandro Benedetti, Director 24th March 2022
  • 2. ‣ Born in Tarquinia(ancient Etruscan city in Italy) ‣ R&D Software Engineer ‣ Director ‣ Master in Computer Science ‣ PC member for ECIR, SIGIR and Desires ‣ Apache Lucene/Solr PMC member/committer ‣ Elasticsearch expert ‣ Semantic, NLP, Machine Learning technologies passionate ‣ Beach Volleyball player and Snowboarder Who We Are Alessandro Benedetti
  • 3. ‣ Headquarter in London/distributed ‣ Open Source Enthusiasts ‣ Apache Lucene/Solr experts ‣ Elasticsearch experts ‣ Community Contributors ‣ Active Researchers ‣ Hot Trends : Neural Search, Natural Language Processing Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning SEArch SErvices www.sease.io
  • 4. Overview Learning to Rank Online Testing for Business Interleaving Apache Solr Implementation DEMO
  • 5. Overview Learning to Rank Online Testing for Business Interleaving Apache Solr Implementation DEMO
  • 6. Learning from user implicit/explicit feedback To Rank documents (sensu lato) These types of models focus more on the relative ordering of items rather than the individual label (classification) or score (regression), and are categorized as Learning To Rank models. What is it?
  • 7. • [sci-fi] Sentient system that learn by itself “Machine Learning stands for that, doesn’t it?” Unknown • [Integration] Easy to set up and tune it -> it takes patience, time and multiple experiments • [Explainability] Easy to give a human understandable explanation of why the model operates in certain ways What is not
  • 8. • Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. Application
  • 9. “Learning to rank is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems.” Wikipedia Learning To Rank Users Interactions Logger Judgement Collector UI Interactions Training Training Data
  • 10. Overview Learning to Rank Online Testing for Business Interleaving Apache Solr Implementation DEMO
  • 11. An incorrect or imperfect test set brings us model evaluation results that aren’t reflecting the real model improvement/regressions e.g. We may get an extremely high evaluation metric offline, but only because we improperly designed the test, even if the model is unfortunately not a good fit. There are several problems that are hard to be detected with an offline evaluation: [Online] A Business Perspective
  • 12. An incorrect test set allow us to obtain model evaluation results that aren’t reflecting the real model improvement. One sample per query group One relevance label for all the samples of a query group Interactions considered for the data set creation There are several problems that are hard to be detected with an offline evaluation: [Online] A Business Perspective
  • 13. A incorrect test set allow us to obtain model evaluation results that aren’t reflecting the real model improvement. Finding a direct correlation between the offline evaluation metrics and the parameters used for the online model performance evaluation (e.g. revenues, click through rate…). There are several problems that are hard to be detected with an offline evaluation: [Online] A Business Perspective
  • 14. A incorrect test set allow us to obtain model evaluation results that aren’t reflecting the real model improvement. Finding a direct correlation between the offline evaluation metrics and the parameters used for the online model performance evaluation (e.g. revenues). Is based on generated relevance labels that not always reflect the real user need. There are several problems that are hard to be detected with an offline evaluation: [Online] A Business Perspective
  • 15. The reliability of the results: we directly observe the user behaviour. The interpretability of the results: we directly observe the impact of the model in terms of online metrics the business cares. The possibility to observe the model behavior: we can see how the user interact with the model and figure out how to improve it. Using online testing can lead to many advantages: [Online] Business Advantages
  • 16. Overview Learning to Rank Online Testing for Business Interleaving Apache Solr Implementation DEMO
  • 17. 50% 50% A B 20% 40% Control Variation [Online] A/B testing
  • 18. 100% Model A Model B 2 1 3 1 2 3 1 2 3 4 Interleaving
  • 19. [Online] Interleaving results estimator >0 winner A <0 winner B =0 tie
  • 20. ● It reduces the problem with users’ variance due to their separation in groups (group A and group B). ● It is more sensitive when comparing models. ● It requires less traffic. ● It requires less time to achieve reliable results. ● It doesn’t necessarily expose a bad model to a sub population of users. Interleaving Advantages
  • 21. There are different types of interleaving: Balanced Interleaving: alternate insertion with one model having the priority(decided at the beginning of the interleaving(). Balanced Interleaving
  • 22. There are different types of interleaving: Balanced Interleaving: alternate insertion with one model having the priority. DRAWBACK When comparing two very similar models. Model A: lA = (a, b, c, d) Model B: lB = (b, c, d, a) The comparison phase will bring the Model B to win more often than Model A. This happens regardless of the model chosen as prior. This drawback arises due to: the way in which the evaluation of the results is done. the fact that model_B rank higher than model_A all documents with the exception of a. Balanced Interleaving
  • 23. There are different types of interleaving: Balanced Interleaving: alternate insertion with one model having the priority. Team-Draft Interleaving: method of team captains in team-matches. Team-Draft Interleaving
  • 25. There are different types of interleaving: Balanced Interleaving: alternate insertion with one model having the priority. Team-Draft Interleaving: method of team capitains in team-matches. DRAWBACK When comparing two very similar models. Model A: lA = (a, b, c, d) Model B: lB = (b, c, d, a) Suppose c to be the only relevant document. With this approach we can obtain four different interleaved lists: lI1 = (aA , bB , cA , dB ) lI2 = (bB , aA , cB , dA ) lI3 = (bB , aA , cA , dB ) lI4 = (aA , bB , cB , dA ) All of them putting c at the same rank. Tie! But Model B should be chosen as the best model! [Online] Team-Draft Interleaving
  • 26. There are different types of interleaving: Balanced Interleaving: alternate insertion with one model having the priority. Team-Draft Interleaving: method of team captains in team-matches. Probabilistic Interleaving: rely on probability distributions. Every documents have a non-zero probability to be added in the interleaved result list. [Online] Probabilistic Interleaving
  • 27. There are different types of interleaving: Balanced Interleaving: alternate insertion with one model having the priority. Team-Draft Interleaving: method of team captains in team-matches. Probabilistic Interleaving: rely on probability distributions. Every documents have a non-zero probability to be added in the interleaved result list. DRAWBACK The use of probability distribution could lead to a worse user experience. Less relevant document could be put higher. [Online] Probabilistic Interleaving https://sease.io/2020/05/online-testing-for-learning-to-rank-interleaving.html
  • 28. Overview Learning to Rank Online Testing for Business Interleaving Apache Solr Implementation DEMO
  • 29. • Include the required contrib JARs. Note that by default paths are relative to the Solr core so they may need adjustments to your configuration, or an explicit specification of the $solr.install.dir. <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-ltr-d.*.jar" /> • Declaration of the ltr query parser. <queryParser name="ltr" class="org.apache.solr.ltr.search.LTRQParserPlugin"/> • Configuration of the feature values cache. <cache name="QUERY_DOC_FV" class="solr.search.LRUCache" size="4096" initialSize="2048" autowarmCount="4096" regenerator="solr.search.NoOpRegenerator" /> Minimum Requirements
  • 30. • Declaration of the [features] transformer. <transformer name="features" class="org.apache.solr.ltr.response.transform.LTRFeatureLoggerTransformerFactory"> <str name="fvCacheName">QUERY_DOC_FV</str> </transformer> • Declaration of the [interleaving] transformer. <transformer name="interleaving" class="org.apache.solr.ltr.response.transform.LTRInterleavingTransformerFactory"/> Minimum Requirements
  • 31. Implements the ranking function: General form Class Specific examples Linear LinearModel RankSVM, Pranking Multiple Additive Trees MultipleAdditiveTreesModel LambdaMART, Gradient Boosted Regression Trees (GBRT) Neural Network NeuralNetworkModel RankNet (wrapper) DefaultWrapperModel (not applicable) (custom) (custom class extending AdapterModel) (not applicable) (custom) (custom class extending LTRScoringModel) (not applicable) Models
  • 32. ‣ Computes the scores using a dot product https://lucene.apache.org/solr/8_8_0//solr-ltr/org/apache/solr/ltr/model/LinearModel.html Example configuration: { "class" : "org.apache.solr.ltr.model.LinearModel", "name" : "myModelName", "features" : [ { "name" : "userTextTitleMatch" }, { "name" : "originalScore" }, { "name" : "isBook" } ], "params" : { "weights" : { "userTextTitleMatch" : 1.0, "originalScore" : 0.5, "isBook" : 0.1 } } } Linear
  • 33. ‣ computes scores based on the summation of multiple weighted trees https://lucene.apache.org/solr/8_8_0//solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html { "class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel", "name" : "multipleadditivetreesmodel", "features":[ { "name" : "userTextTitleMatch"}, { "name" : "originalScore"} ], } "params" : { "trees" : [ { "weight" : "1", "root": { "feature" : "userTextTitleMatch", "threshold" : "0.5", "left" : { "value" : "-100" }, "right" : { "feature" : "originalScore", "threshold" : "10.0", "left" : { "value" : "50" }, "right" : { "value" : "75" } } } }, { "weight" : "2", "root" : { "value" : "-10" } } ] } Multiple Additive Trees Model
  • 34. ‣ computes scores using a neural network. https://lucene.apache.org/solr/8_8_0//solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html { "class" : "org.apache.solr.ltr.model.NeuralNetworkModel", "name" : "rankNetModel", "features" : [ { "name" : "documentRecency" }, { "name" : "isBook" }, { "name" : "originalScore" } ], "params" : { "layers" : [ { "matrix" : [ [ 1.0, 2.0, 3.0 ], [ 4.0, 5.0, 6.0 ], [ 7.0, 8.0, 9.0 ], [ 10.0, 11.0, 12.0 ] ], "bias" : [ 13.0, 14.0, 15.0, 16.0 ], "activation" : "sigmoid" }, { "matrix" : [ [ 17.0, 18.0, 19.0, 20.0 ], [ 21.0, 22.0, 23.0, 24.0 ] ], "bias" : [ 25.0, 26.0 ], "activation" : "relu" }, { "matrix" : [ [ 27.0, 28.0 ], [ 29.0, 30.0 ] ], "bias" : [ 31.0, 32.0 ], "activation" : "leakyrelu" }, { "matrix" : [ [ 33.0, 34.0 ], [ 35.0, 36.0 ] ], "bias" : [ 37.0, 38.0 ], "activation" : "tanh" }, { "matrix" : [ [ 39.0, 40.0 ] ], "bias" : [ 41.0 ], "activation" : "identity" } ] } Neural Network
  • 35. http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel reRankDocs=100}&fl=id,score To obtain the feature values computed during reranking, add [features] to the fl parameter, for example: http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel reRankDocs=100}&fl=id,score,[features] To rerank using external feature information: http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13}&fl=id,cat,manu,score, [features] Reranking
  • 37. http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score To obtain the model that interleaving picked for a search result, computed during reranking, add [interleaving] to the fl parameter, for example: http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score, [interleaving] Reranking with Interleaving
  • 40. Overview Learning to Rank Online Testing for Business Interleaving Apache Solr Implementation DEMO
  • 41. Future Works http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100 interleavingAlgorithm=TeamDraft}&fl=id,score Currently the only (and default) algorithm supported is 'TeamDraft'. How to contribute Do you want to contribute a new Interleaving Algorithm? You just need to : ● implement the solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/Interleaving.java interface in a new class ● add the new algorithm in the package: org.apache.solr.ltr.interleaving.algorithms ● add the new algorithm reference in the org.apache.solr.ltr.interleaving.Interleaving#getImplementation Limitations ● [Distributed Search] Sharding is not supported