Lecture 09
Information Retrieval
The Vector Space Model for Scoring
Variant tf-idf Functions
 For assigning a weight for each term in each document, a number
of alternatives to tf and tf-idf have been considered.
 Sublinear tf scaling:
 It seems unlikely that twenty occurrences of a term in a document
truly carry twenty times the significance of a single occurrence.
Accordingly, there has been considerable research into variants of
term frequency that go beyond counting the number of
occurrences of a term.
 A common modification is to use instead the logarithm of the term
frequency, which assigns a weight given by:
𝒘𝒇 𝒕,𝒅 =
𝟏 + log 𝟏𝟎 𝒕𝒇 𝒕,𝒅 𝒊𝒇 𝒕𝒇 𝒕,𝒅 > 𝟎
𝟎, 𝑶𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
The Vector Space Model for Scoring
Variant tf-idf Functions
 In this form, we may replace tf by some other function wf as in
(6.13), to obtain:
𝒘𝒇 − 𝒊𝒅𝒇 𝒕,𝒅 = 𝒘𝒇 𝒕,𝒅 ∗ 𝒊𝒅𝒇 𝒕
 Maximum tf normalization:
 One well-studied technique is to normalize the tf weights of all
terms occurring in a document by the maximum tf in that
document. For each document d, let 𝒕𝒇 𝒎𝒂𝒙 𝒅 = 𝒎𝒂𝒙 𝝉∈𝒅 where 𝝉
ranges over all terms in d.
 Then, we compute a normalized term frequency for each term t in
document d by:
𝒏𝒕𝒇 𝒕,𝒅 = 𝒂 + 𝟏 − 𝒂
𝒕𝒇 𝒕,𝒅
𝒕𝒇 𝒎𝒂𝒙 𝒅
 where 𝒂 is a value between 0 and 1 and is generally set to 0.4,
although some early work used the value 0.5.
The Vector Space Model for Scoring
Variant tf-idf Functions
 The term 𝒂 is a smoothing term whose role is to damp the contribution
of the second term – which may be viewed as a scaling down of tf by
the largest tf value in d.
 We will encounter smoothing further when discussing classification.
 The basic idea is to avoid a large swing in 𝒏𝒕𝒇 𝒕,𝒅 from modest changes
in 𝒕𝒇 𝒕,𝒅 (say from 1 to 2).
 The main idea of maximum tf normalization is to mitigate the following
anomaly: we observe higher term frequencies in longer documents,
merely because longer documents tend to repeat the same words over
and over again. To appreciate this, consider the following extreme
example:
 Supposed we were to take a document 𝒅 and create a new document
𝒅 by simply appending a copy of 𝒅 to itself. While 𝒅 should be no more
relevant to any query than 𝒅 is, the use of 𝑺𝒄𝒐𝒓𝒆 𝒅, 𝒒 = 𝒕∈𝒒 𝒕𝒇 − 𝒊𝒅𝒇 𝒕,𝒅
would assign it twice as high a score as 𝒅 .
The Vector Space Model for Scoring
Variant tf-idf Functions
 Maximum tf normalization does suffer from the following issues:
1. The method is unstable in the following sense: a change in the
stop word list can dramatically alter term weightings (and
therefore ranking). Thus, it is hard to tune.
2. A document may contain an outlier term with an unusually large
number of occurrences of that term, not representative of the
content of that document.
3. More generally, a document in which the most frequent term
appears roughly as often as many other terms should be treated
differently from one with a more skewed distribution.
The Vector Space Model for Scoring
Document and Query Weighting Schemes
 The above Equation is fundamental to information retrieval systems
that use any form of vector space scoring.
 Variations from one vector space scoring method to another hinge on
the specific choices of weights in the vectors 𝑽(𝒅) and 𝑽 𝒒 together
with a mnemonic for representing a specific combination of weights;
this system of mnemonics is sometimes called SMART notation,
following the authors of an early text retrieval system.
 The mnemonic for representing a combination of weights takes the
form ddd.qqq where the first triplet gives the term weighting of the
document vector, while the second triplet gives the weighting in the
query vector.
 The first letter in each triplet specifies the term frequency component
of the weighting, the second the document frequency component, and
the third the form of normalization used.
The Vector Space Model for Scoring
Document and Query Weighting Schemes
 It is quite common to apply different normalization functions
to 𝑽(𝒅) and 𝑽 𝒒 .
 For example, a very standard weighting scheme is lnc.ltc
 where the document vector has log-weighted term
frequency , no idf (for both effectiveness and efficiency
reasons), and cosine normalization,
 while the query vector uses log-weighted term frequency, idf
weighting, and cosine normalization.
The Vector Space Model for Scoring
Document and Query Weighting Schemes
 It is quite common to apply different normalization functions
to 𝑽(𝒅) and 𝑽 𝒒 .
 For example, a very standard weighting scheme is lnc.ltc
 where the document vector has log-weighted term
frequency , no idf (for both effectiveness and efficiency
reasons), and cosine normalization,
 while the query vector uses log-weighted term frequency, idf
weighting, and cosine normalization.
Evaluation in information retrieval
 How do we know which of the previously discussed IR techniques
are effective in which applications?
 Should we use stop lists?
 Should we stemming?
 Should we use inverse document frequency weighting?
Evaluation in information retrieval
 To measure ad hoc information retrieval effectiveness in the
standard way, we need a test collection consisting of three things:
1. Document collection
2. A test suite of information needs, expressible as queries
3. A set of relevance judgments, standardly a binary assessment of either
relevant or nonrelevant for each query-document pair.
Evaluation in information retrieval
 A document in the test collection is given a binary classification as
either relevant or nonrelevant.
 This decision is referred to as the gold standard or ground truth
judgment of relevance.
Relevance is assessed relative to an information need, not a query.
This means that: a document is relevant if it addresses the stated
information need, not because it just happens to contain all the words in
the query.
Evaluation in information retrieval
 Information Need:
Query = Jaguar
Evaluation in information retrieval
 Information Need:
Query = Jaguar Speed
Evaluation in information retrieval
 Information Need:
Query = Speed of Jaguar
Evaluation in information retrieval
 Information Need:
Query = Speed of Jaguar animal
Evaluation in information retrieval
 Information Need:
Query = Speed of impala
Evaluation in information retrieval
 Information Need:
Query = Speed of impala animal
Evaluation in information retrieval
 Information Need:
Query = impala
Evaluation in information retrieval
Standard Test Collections
 Information Need:
1. Cranfield Collection:
 Abstracts of 1398 articles
 A set of 225 queries, and their respective relevance judgments.
2. TREC (Text Retrieval Conference):
 6 CDs containing 1.89 million documents
 Relevance judgments for 450 information needs, which are called
topics and specified in detailed text passages.
3. CLEF (Cross Language Evaluation Forum):
 This evaluation series has concentrated on European languages and
cross-language information retrieval.
 Precision: What fraction of the returned results are relevant to
the information need?
𝑃 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
 Recall: What fraction of the relevant documents in the collection
were returned by the system?
𝑅 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛
Evaluation in information retrieval
Precision/Recall . . . Revisited
Evaluation in information retrieval
Precision/Recall . . . Revisited
 Offline Evaluation a.k.a. Manual Judgments
 Ask experts or users to explicitly evaluate your retrieval system.
 Online Evaluation  Observing Users
 See how normal users interact with your retrieval system when
just using it.
Evaluation in information retrieval
General Types of Evaluation
 Offline Evaluation involves:
 Select queries to evaluate on
 Get results for those queries
 Assess the relevance of those results to the queries
 Compute your offline metric
Evaluation in information retrieval
General Types of Evaluation
 Online Evaluation involves:
 Capture users’ search behavior:
 Search queries
 Results and Clicks
 Mouse Movement
 Assess the relevance of those results to the queries
 Compute your offline metric
Evaluation in information retrieval
General Types of Evaluation

More Related Content

PPTX
PPTX
PPTX
PPTX
Scoring, term weighting and the vector space
PPTX
Boolean,vector space retrieval Models
PPT
Ir models
PPTX
Tdm probabilistic models (part 2)
PDF
A-Study_TopicModeling
Scoring, term weighting and the vector space
Boolean,vector space retrieval Models
Ir models
Tdm probabilistic models (part 2)
A-Study_TopicModeling

What's hot (20)

PPTX
Probabilistic models (part 1)
PPTX
Term weighting
PPT
Scalable Discovery Of Hidden Emails From Large Folders
PDF
Ju3517011704
PPT
Information Retrieval 02
PPT
Lec 4,5
PPT
Finding Similar Files in Large Document Repositories
PPTX
Text categorization
PPTX
The vector space model
PPT
Boolean Retrieval
PPT
Text classification using Text kernels
PPTX
Duet @ TREC 2019 Deep Learning Track
PDF
Text classification-php-v4
PDF
Text Categorization Using Improved K Nearest Neighbor Algorithm
PDF
Introduction to Probabilistic Latent Semantic Analysis
PPTX
TextRank: Bringing Order into Texts
PPTX
Adversarial and reinforcement learning-based approaches to information retrieval
PPTX
Similarity Measurement Preliminary Results
PPT
Vsm 벡터공간모델
PPT
Artificial Intelligence
Probabilistic models (part 1)
Term weighting
Scalable Discovery Of Hidden Emails From Large Folders
Ju3517011704
Information Retrieval 02
Lec 4,5
Finding Similar Files in Large Document Repositories
Text categorization
The vector space model
Boolean Retrieval
Text classification using Text kernels
Duet @ TREC 2019 Deep Learning Track
Text classification-php-v4
Text Categorization Using Improved K Nearest Neighbor Algorithm
Introduction to Probabilistic Latent Semantic Analysis
TextRank: Bringing Order into Texts
Adversarial and reinforcement learning-based approaches to information retrieval
Similarity Measurement Preliminary Results
Vsm 벡터공간모델
Artificial Intelligence
Ad

Viewers also liked (20)

PPTX
PDF
PPTX
PDF
Ai 02 intelligent_agents(1)
PPT
Ian Sommerville, Software Engineering, 9th EditionCh 8
DOCX
PPTX
PPT
Angel6 e05
PDF
Ch19 network layer-logical add
DOCX
Swe notes
PDF
PPTX
Chapter02 graphics-programming
PPT
Ian Sommerville, Software Engineering, 9th Edition Ch 23
PPTX
Ch 4 software engineering
PDF
Ai 01 introduction
PDF
Ai 03 solving_problems_by_searching
PPT
Ian Sommerville, Software Engineering, 9th Edition Ch1
PPTX
Vector space model of information retrieval
PPTX
Artifical intelligance
PPT
Ai 02 intelligent_agents(1)
Ian Sommerville, Software Engineering, 9th EditionCh 8
Angel6 e05
Ch19 network layer-logical add
Swe notes
Chapter02 graphics-programming
Ian Sommerville, Software Engineering, 9th Edition Ch 23
Ch 4 software engineering
Ai 01 introduction
Ai 03 solving_problems_by_searching
Ian Sommerville, Software Engineering, 9th Edition Ch1
Vector space model of information retrieval
Artifical intelligance
Ad

Similar to Ir 09 (20)

PDF
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
PDF
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
PDF
Seeds Affinity Propagation Based on Text Clustering
PDF
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
PDF
Testing Different Log Bases for Vector Model Weighting Technique
PDF
Testing Different Log Bases for Vector Model Weighting Technique
PDF
Multi label classification of
PPT
lecture6-tfidf.pptiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
PDF
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
PDF
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
DOCX
UNIT 3 IRT.docx
PPT
Important topics vector space mathematics lecture9.ppt
PDF
Understanding Natural Languange with Corpora-based Generation of Dependency G...
PDF
191CSEH IR UNIT - II for an engineering subject
DOC
TEXT CLUSTERING.doc
PDF
Extractive Document Summarization - An Unsupervised Approach
PDF
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
PDF
Information retrieval as statistical translation
PDF
Improved Text Mining for Bulk Data Using Deep Learning Approach
PPT
lecture-TFIDF information retrieval .ppt
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
Seeds Affinity Propagation Based on Text Clustering
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
Testing Different Log Bases for Vector Model Weighting Technique
Testing Different Log Bases for Vector Model Weighting Technique
Multi label classification of
lecture6-tfidf.pptiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
UNIT 3 IRT.docx
Important topics vector space mathematics lecture9.ppt
Understanding Natural Languange with Corpora-based Generation of Dependency G...
191CSEH IR UNIT - II for an engineering subject
TEXT CLUSTERING.doc
Extractive Document Summarization - An Unsupervised Approach
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
Information retrieval as statistical translation
Improved Text Mining for Bulk Data Using Deep Learning Approach
lecture-TFIDF information retrieval .ppt

Recently uploaded (20)

PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
IP : I ; Unit I : Preformulation Studies
PDF
HVAC Specification 2024 according to central public works department
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
Complications of Minimal Access-Surgery.pdf
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
Literature_Review_methods_ BRACU_MKT426 course material
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
Climate and Adaptation MCQs class 7 from chatgpt
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
IP : I ; Unit I : Preformulation Studies
HVAC Specification 2024 according to central public works department
What’s under the hood: Parsing standardized learning content for AI
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
CRP102_SAGALASSOS_Final_Projects_2025.pdf
Unit 4 Computer Architecture Multicore Processor.pptx
Complications of Minimal Access-Surgery.pdf
AI-driven educational solutions for real-life interventions in the Philippine...
Cambridge-Practice-Tests-for-IELTS-12.docx
Literature_Review_methods_ BRACU_MKT426 course material
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Empowerment Technology for Senior High School Guide
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Race Reva University – Shaping Future Leaders in Artificial Intelligence
Climate and Adaptation MCQs class 7 from chatgpt

Ir 09

  • 2. The Vector Space Model for Scoring Variant tf-idf Functions  For assigning a weight for each term in each document, a number of alternatives to tf and tf-idf have been considered.  Sublinear tf scaling:  It seems unlikely that twenty occurrences of a term in a document truly carry twenty times the significance of a single occurrence. Accordingly, there has been considerable research into variants of term frequency that go beyond counting the number of occurrences of a term.  A common modification is to use instead the logarithm of the term frequency, which assigns a weight given by: 𝒘𝒇 𝒕,𝒅 = 𝟏 + log 𝟏𝟎 𝒕𝒇 𝒕,𝒅 𝒊𝒇 𝒕𝒇 𝒕,𝒅 > 𝟎 𝟎, 𝑶𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
  • 3. The Vector Space Model for Scoring Variant tf-idf Functions  In this form, we may replace tf by some other function wf as in (6.13), to obtain: 𝒘𝒇 − 𝒊𝒅𝒇 𝒕,𝒅 = 𝒘𝒇 𝒕,𝒅 ∗ 𝒊𝒅𝒇 𝒕  Maximum tf normalization:  One well-studied technique is to normalize the tf weights of all terms occurring in a document by the maximum tf in that document. For each document d, let 𝒕𝒇 𝒎𝒂𝒙 𝒅 = 𝒎𝒂𝒙 𝝉∈𝒅 where 𝝉 ranges over all terms in d.  Then, we compute a normalized term frequency for each term t in document d by: 𝒏𝒕𝒇 𝒕,𝒅 = 𝒂 + 𝟏 − 𝒂 𝒕𝒇 𝒕,𝒅 𝒕𝒇 𝒎𝒂𝒙 𝒅  where 𝒂 is a value between 0 and 1 and is generally set to 0.4, although some early work used the value 0.5.
  • 4. The Vector Space Model for Scoring Variant tf-idf Functions  The term 𝒂 is a smoothing term whose role is to damp the contribution of the second term – which may be viewed as a scaling down of tf by the largest tf value in d.  We will encounter smoothing further when discussing classification.  The basic idea is to avoid a large swing in 𝒏𝒕𝒇 𝒕,𝒅 from modest changes in 𝒕𝒇 𝒕,𝒅 (say from 1 to 2).  The main idea of maximum tf normalization is to mitigate the following anomaly: we observe higher term frequencies in longer documents, merely because longer documents tend to repeat the same words over and over again. To appreciate this, consider the following extreme example:  Supposed we were to take a document 𝒅 and create a new document 𝒅 by simply appending a copy of 𝒅 to itself. While 𝒅 should be no more relevant to any query than 𝒅 is, the use of 𝑺𝒄𝒐𝒓𝒆 𝒅, 𝒒 = 𝒕∈𝒒 𝒕𝒇 − 𝒊𝒅𝒇 𝒕,𝒅 would assign it twice as high a score as 𝒅 .
  • 5. The Vector Space Model for Scoring Variant tf-idf Functions  Maximum tf normalization does suffer from the following issues: 1. The method is unstable in the following sense: a change in the stop word list can dramatically alter term weightings (and therefore ranking). Thus, it is hard to tune. 2. A document may contain an outlier term with an unusually large number of occurrences of that term, not representative of the content of that document. 3. More generally, a document in which the most frequent term appears roughly as often as many other terms should be treated differently from one with a more skewed distribution.
  • 6. The Vector Space Model for Scoring Document and Query Weighting Schemes  The above Equation is fundamental to information retrieval systems that use any form of vector space scoring.  Variations from one vector space scoring method to another hinge on the specific choices of weights in the vectors 𝑽(𝒅) and 𝑽 𝒒 together with a mnemonic for representing a specific combination of weights; this system of mnemonics is sometimes called SMART notation, following the authors of an early text retrieval system.  The mnemonic for representing a combination of weights takes the form ddd.qqq where the first triplet gives the term weighting of the document vector, while the second triplet gives the weighting in the query vector.  The first letter in each triplet specifies the term frequency component of the weighting, the second the document frequency component, and the third the form of normalization used.
  • 7. The Vector Space Model for Scoring Document and Query Weighting Schemes  It is quite common to apply different normalization functions to 𝑽(𝒅) and 𝑽 𝒒 .  For example, a very standard weighting scheme is lnc.ltc  where the document vector has log-weighted term frequency , no idf (for both effectiveness and efficiency reasons), and cosine normalization,  while the query vector uses log-weighted term frequency, idf weighting, and cosine normalization.
  • 8. The Vector Space Model for Scoring Document and Query Weighting Schemes  It is quite common to apply different normalization functions to 𝑽(𝒅) and 𝑽 𝒒 .  For example, a very standard weighting scheme is lnc.ltc  where the document vector has log-weighted term frequency , no idf (for both effectiveness and efficiency reasons), and cosine normalization,  while the query vector uses log-weighted term frequency, idf weighting, and cosine normalization.
  • 9. Evaluation in information retrieval  How do we know which of the previously discussed IR techniques are effective in which applications?  Should we use stop lists?  Should we stemming?  Should we use inverse document frequency weighting?
  • 10. Evaluation in information retrieval  To measure ad hoc information retrieval effectiveness in the standard way, we need a test collection consisting of three things: 1. Document collection 2. A test suite of information needs, expressible as queries 3. A set of relevance judgments, standardly a binary assessment of either relevant or nonrelevant for each query-document pair.
  • 11. Evaluation in information retrieval  A document in the test collection is given a binary classification as either relevant or nonrelevant.  This decision is referred to as the gold standard or ground truth judgment of relevance. Relevance is assessed relative to an information need, not a query. This means that: a document is relevant if it addresses the stated information need, not because it just happens to contain all the words in the query.
  • 12. Evaluation in information retrieval  Information Need: Query = Jaguar
  • 13. Evaluation in information retrieval  Information Need: Query = Jaguar Speed
  • 14. Evaluation in information retrieval  Information Need: Query = Speed of Jaguar
  • 15. Evaluation in information retrieval  Information Need: Query = Speed of Jaguar animal
  • 16. Evaluation in information retrieval  Information Need: Query = Speed of impala
  • 17. Evaluation in information retrieval  Information Need: Query = Speed of impala animal
  • 18. Evaluation in information retrieval  Information Need: Query = impala
  • 19. Evaluation in information retrieval Standard Test Collections  Information Need: 1. Cranfield Collection:  Abstracts of 1398 articles  A set of 225 queries, and their respective relevance judgments. 2. TREC (Text Retrieval Conference):  6 CDs containing 1.89 million documents  Relevance judgments for 450 information needs, which are called topics and specified in detailed text passages. 3. CLEF (Cross Language Evaluation Forum):  This evaluation series has concentrated on European languages and cross-language information retrieval.
  • 20.  Precision: What fraction of the returned results are relevant to the information need? 𝑃 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑  Recall: What fraction of the relevant documents in the collection were returned by the system? 𝑅 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛 Evaluation in information retrieval Precision/Recall . . . Revisited
  • 21. Evaluation in information retrieval Precision/Recall . . . Revisited
  • 22.  Offline Evaluation a.k.a. Manual Judgments  Ask experts or users to explicitly evaluate your retrieval system.  Online Evaluation  Observing Users  See how normal users interact with your retrieval system when just using it. Evaluation in information retrieval General Types of Evaluation
  • 23.  Offline Evaluation involves:  Select queries to evaluate on  Get results for those queries  Assess the relevance of those results to the queries  Compute your offline metric Evaluation in information retrieval General Types of Evaluation
  • 24.  Online Evaluation involves:  Capture users’ search behavior:  Search queries  Results and Clicks  Mouse Movement  Assess the relevance of those results to the queries  Compute your offline metric Evaluation in information retrieval General Types of Evaluation