SlideShare a Scribd company logo
The comparative study of Information Retrieval
Models used in Search Engines
By
Muhammad Fawad (01-243181-011)
Bilal Hussain (01-243181-09)
MS(CS)- 2A
Bahria University Islamabad, Campus
1
Outlines
 Introduction to information retrieval
 Information retrieval models
 Evaluation criteria of unranked documents
 evaluation criteria of ranked documents
 Introduction to search engines
 Concludes the findings
2
Information Retrieval
 Find out what user need and do it efficiently
 Collection: set of documents
 Why: Due to different format of data
 Goal: to retrieve information relevant to user needs
 Applications: Text search, image / video search, email search etc.
 Issues: relevance [useful, topic related, new, interesting, authentic, up-to-
date.
 Understanding query syntax
 Understanding search engines
3
Information retrieval models
 Process of storing and retrieving structure and unstructured data
 Determine the predictions of what is relevant and what is not
 A particular way of “looking at things”
 Documents representation, Query representation, retrieval functions
 Boolean Model:
 Exact match , use Boolean logic to combine terms
 AND, OR , NOT, interaction and union
 Simple and easy to implement
 Exact match may lead to retrieve too few or too many documents
 Retrieve documents are not ranked
 Uses: Commercial document database systems
 Example:
 User Need: I am interesting in learning about vitamins other than vitamin e that are
anti-oxidants
 User’s Boolean Query: antioxidant AND vitamin AND NOT vitamin e
4
Information retrieval models
Vector Space Model:
 Documents and queries are treated as vectors
 Each term is associated with weights
 Weights determine degree of similarity between documents stored in the
system and the user’s query
 Retrieve partial matched and ranked documents
 Similarity between vectors is determined by inner or dot product
 Complex in longer documents
 Index terms are assumed to be mutually independent
Probabilistic Model
 Retrieve extract match or partially match
 Documents are ranked in decreasing order of their probability
 Doesn't maintain the frequency with which an index term occurs inside a
document
5
Sr. No Parameters Boolean
Model
Vector Space Model Probabilistic Model
1 Concept It evaluates queries as
evaluating Boolean
expressions
It uses concept of indexed
weights and partial
matching to match a
document to a query
It evaluates the queries by
using the ideal set
probabilistic index terms
2 Representation Weights are binary. The
document is either
relevant or irrelevant
Index terms are weighted.
So, there is ranking created
based on these weights
Weights are binary. Initially
the document either
belongs to the ideal set or
is considered irrelevant
3 Type of Information Doesn't considers any
semantic information
Conceders semantic
information's
Conceders semantic
information's
4 Advantage It is simple to evaluate
based on query and the
document
Simple Not restricted to words
only since it replaces
‘keywords’ by ‘concepts’
5 Disadvantage Does not ranked the
documents and
performance is not that
good
It is more complex than
binary as the index term
weighting needs to be
This is the most complex
model since neither the
weights nor the ideal set is
initially defined
6 Word Occurrence Does not tell about the
number of occurrence
Tells about the number of
occurrence
Tells about the number of
occurrence by terms
document matrix
7 Output Exact match Best match Best match
Comparison of Classical models
6
EVALUATION OF UNRANKED RETRIEVAL SYSTEM
 Unranked retrieval system
 Confusion Matrix
 Recall [Relevant documents retrieved: Total relevant documents]
 Precision [Relevant documents retrieved: Total documents]
 Inverse Recall [documents irrelevant and not retrieved: Total irrelevant documents
 F- measure [Recall + Precision]
 Prevalence [irrelevant: total documents]
 Accuracy [relevant record retrieved: total documents]
 Error Rate [irrelevant record retrieved : total documents]
 Fallout [False positive cases]
 Miss Rate [False negative]
7
EVALUATION CRITRIA FOR RANKDED RETRIVAL
SYSTEM
 Ranked retrieval system
 K- Precision and R-Precision [K-precision refers to finding results on web search in
ranked documents on first few pages ]
 Average precision [Precession is calculated at every point when a ranking list is
matching a relevant document ]
 Mean average precession [user wants large number of documents at a same time
for example research papers ]
8
SEARCH ENGINES
 User interface
 Gathers data
 Presents information to end user after applying suitable sorting algorithms
 Traditional Search Engine: Based on grammar rules, easy to understand
 Low recall and low precision
 Semantic Search Engine: Intelligent
9
Conclusion
 Search engine performance should maximized by using a hybrid search engine
model
10

More Related Content

PPTX
The Simulacrum, a Synthetic Cancer Dataset
PPT
An introduction to conducting a systematic literature review for social scien...
PPTX
Qualitative Data Analysis
PPTX
Information retrival system and PageRank algorithm
DOC
Efficient instant fuzzy search with proximity ranking
PPTX
Analyzing data
PPTX
Analyzing data (chapter 9)
PDF
Query expansion
The Simulacrum, a Synthetic Cancer Dataset
An introduction to conducting a systematic literature review for social scien...
Qualitative Data Analysis
Information retrival system and PageRank algorithm
Efficient instant fuzzy search with proximity ranking
Analyzing data
Analyzing data (chapter 9)
Query expansion

What's hot (19)

PDF
Embase advanced-training-slidespdf (2)
PPTX
Efficient instant fuzzy search with proximity ranking
PPTX
Vector space model of information retrieval
PPT
PDF
Faceted Search for Finding Expertise Bibliographies
PDF
Phd thesis final presentation
PPTX
Transparency and reproducibility in research
PDF
Citation semantic based approaches to identify article quality
PDF
ACIS 2015 Bibliographical-based Facets for Expertise Search
PPT
Pub med+
PPTX
Mean And Median
PPT
Automatic Metadata Generation Charles Duncan
PDF
Navigation through citation network based on content similarity using cosine ...
PDF
Evolution and state-of-the art of Altmetric research: Insights from network a...
PDF
Search term recommendation and non-textual ranking evaluated
PDF
How to handle discrepancies while you collect data for systemic review – pubrica
PPTX
Gaining credit for sharing research data: Viewpoints on Data Publishing
PDF
An empirical performance evaluation of relational keyword search systems
Embase advanced-training-slidespdf (2)
Efficient instant fuzzy search with proximity ranking
Vector space model of information retrieval
Faceted Search for Finding Expertise Bibliographies
Phd thesis final presentation
Transparency and reproducibility in research
Citation semantic based approaches to identify article quality
ACIS 2015 Bibliographical-based Facets for Expertise Search
Pub med+
Mean And Median
Automatic Metadata Generation Charles Duncan
Navigation through citation network based on content similarity using cosine ...
Evolution and state-of-the art of Altmetric research: Insights from network a...
Search term recommendation and non-textual ranking evaluated
How to handle discrepancies while you collect data for systemic review – pubrica
Gaining credit for sharing research data: Viewpoints on Data Publishing
An empirical performance evaluation of relational keyword search systems
Ad

Similar to The comparative study of information retrieval models used in search engines (20)

PPTX
Document ranking using qprp with concept of multi dimensional subspace
PPT
Information Retrieval and Storage Systems
PPTX
Introduction to Information Retrieval (concepts and principles)
PDF
An Introduction to Information Retrieval.pdf
PDF
Information retrieval systems irt ppt do
PPTX
Social Book Search: Techniques and evaluation
PPT
4-IR Models_new.ppt
PPT
4-IR Models_new.ppt
PPT
chapter 5 Information Retrieval Models.ppt
PPT
information technology materrailas paper
PPTX
Model of information retrieval (3)
PPTX
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
PPTX
Boolean,vector space retrieval Models
PDF
Information Retrieval Fundamentals - An introduction
PDF
Research on ontology based information retrieval techniques
PPTX
Information retrieval 10 vector and probabilistic models
PPTX
Search Engines
PPTX
Text mining
PPTX
Information Retrieval Evaluation
PPT
lectueereerrrrrrtttttrre11-probir(1).ppt
Document ranking using qprp with concept of multi dimensional subspace
Information Retrieval and Storage Systems
Introduction to Information Retrieval (concepts and principles)
An Introduction to Information Retrieval.pdf
Information retrieval systems irt ppt do
Social Book Search: Techniques and evaluation
4-IR Models_new.ppt
4-IR Models_new.ppt
chapter 5 Information Retrieval Models.ppt
information technology materrailas paper
Model of information retrieval (3)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Boolean,vector space retrieval Models
Information Retrieval Fundamentals - An introduction
Research on ontology based information retrieval techniques
Information retrieval 10 vector and probabilistic models
Search Engines
Text mining
Information Retrieval Evaluation
lectueereerrrrrrtttttrre11-probir(1).ppt
Ad

Recently uploaded (20)

PPT
tcp ip networks nd ip layering assotred slides
DOCX
Unit-3 cyber security network security of internet system
PPTX
Introduction to Information and Communication Technology
PPTX
international classification of diseases ICD-10 review PPT.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
SAP Ariba Sourcing PPT for learning material
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PDF
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
artificial intelligence overview of it and more
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
Testing WebRTC applications at scale.pdf
tcp ip networks nd ip layering assotred slides
Unit-3 cyber security network security of internet system
Introduction to Information and Communication Technology
international classification of diseases ICD-10 review PPT.pptx
Paper PDF World Game (s) Great Redesign.pdf
SAP Ariba Sourcing PPT for learning material
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
INTERNET------BASICS-------UPDATED PPT PRESENTATION
522797556-Unit-2-Temperature-measurement-1-1.pptx
Tenda Login Guide: Access Your Router in 5 Easy Steps
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PptxGenJS_Demo_Chart_20250317130215833.pptx
Module 1 - Cyber Law and Ethics 101.pptx
Introuction about WHO-FIC in ICD-10.pptx
Power Point - Lesson 3_2.pptx grad school presentation
The New Creative Director: How AI Tools for Social Media Content Creation Are...
artificial intelligence overview of it and more
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Testing WebRTC applications at scale.pdf

The comparative study of information retrieval models used in search engines

  • 1. The comparative study of Information Retrieval Models used in Search Engines By Muhammad Fawad (01-243181-011) Bilal Hussain (01-243181-09) MS(CS)- 2A Bahria University Islamabad, Campus 1
  • 2. Outlines  Introduction to information retrieval  Information retrieval models  Evaluation criteria of unranked documents  evaluation criteria of ranked documents  Introduction to search engines  Concludes the findings 2
  • 3. Information Retrieval  Find out what user need and do it efficiently  Collection: set of documents  Why: Due to different format of data  Goal: to retrieve information relevant to user needs  Applications: Text search, image / video search, email search etc.  Issues: relevance [useful, topic related, new, interesting, authentic, up-to- date.  Understanding query syntax  Understanding search engines 3
  • 4. Information retrieval models  Process of storing and retrieving structure and unstructured data  Determine the predictions of what is relevant and what is not  A particular way of “looking at things”  Documents representation, Query representation, retrieval functions  Boolean Model:  Exact match , use Boolean logic to combine terms  AND, OR , NOT, interaction and union  Simple and easy to implement  Exact match may lead to retrieve too few or too many documents  Retrieve documents are not ranked  Uses: Commercial document database systems  Example:  User Need: I am interesting in learning about vitamins other than vitamin e that are anti-oxidants  User’s Boolean Query: antioxidant AND vitamin AND NOT vitamin e 4
  • 5. Information retrieval models Vector Space Model:  Documents and queries are treated as vectors  Each term is associated with weights  Weights determine degree of similarity between documents stored in the system and the user’s query  Retrieve partial matched and ranked documents  Similarity between vectors is determined by inner or dot product  Complex in longer documents  Index terms are assumed to be mutually independent Probabilistic Model  Retrieve extract match or partially match  Documents are ranked in decreasing order of their probability  Doesn't maintain the frequency with which an index term occurs inside a document 5
  • 6. Sr. No Parameters Boolean Model Vector Space Model Probabilistic Model 1 Concept It evaluates queries as evaluating Boolean expressions It uses concept of indexed weights and partial matching to match a document to a query It evaluates the queries by using the ideal set probabilistic index terms 2 Representation Weights are binary. The document is either relevant or irrelevant Index terms are weighted. So, there is ranking created based on these weights Weights are binary. Initially the document either belongs to the ideal set or is considered irrelevant 3 Type of Information Doesn't considers any semantic information Conceders semantic information's Conceders semantic information's 4 Advantage It is simple to evaluate based on query and the document Simple Not restricted to words only since it replaces ‘keywords’ by ‘concepts’ 5 Disadvantage Does not ranked the documents and performance is not that good It is more complex than binary as the index term weighting needs to be This is the most complex model since neither the weights nor the ideal set is initially defined 6 Word Occurrence Does not tell about the number of occurrence Tells about the number of occurrence Tells about the number of occurrence by terms document matrix 7 Output Exact match Best match Best match Comparison of Classical models 6
  • 7. EVALUATION OF UNRANKED RETRIEVAL SYSTEM  Unranked retrieval system  Confusion Matrix  Recall [Relevant documents retrieved: Total relevant documents]  Precision [Relevant documents retrieved: Total documents]  Inverse Recall [documents irrelevant and not retrieved: Total irrelevant documents  F- measure [Recall + Precision]  Prevalence [irrelevant: total documents]  Accuracy [relevant record retrieved: total documents]  Error Rate [irrelevant record retrieved : total documents]  Fallout [False positive cases]  Miss Rate [False negative] 7
  • 8. EVALUATION CRITRIA FOR RANKDED RETRIVAL SYSTEM  Ranked retrieval system  K- Precision and R-Precision [K-precision refers to finding results on web search in ranked documents on first few pages ]  Average precision [Precession is calculated at every point when a ranking list is matching a relevant document ]  Mean average precession [user wants large number of documents at a same time for example research papers ] 8
  • 9. SEARCH ENGINES  User interface  Gathers data  Presents information to end user after applying suitable sorting algorithms  Traditional Search Engine: Based on grammar rules, easy to understand  Low recall and low precision  Semantic Search Engine: Intelligent 9
  • 10. Conclusion  Search engine performance should maximized by using a hybrid search engine model 10