SlideShare a Scribd company logo
Information Retrieval : 20
Divergence from Randomness
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Divergence from Randomness
• A distinct probabilistic model has been
proposed by Amati and Rijsbergen
• The idea is to compute term weights by
measuring the divergence between a term
distribution produced by a random process
and the actual term distribution
• Thus, the name divergence from randomness
• The model is based on two fundamental
assumptions, as follows.
First assumption:
• Not all words are equally important for describing
the content of the documents
• Words that carry little information are assumed to
be randomly distributed over the whole
document collection C
• Given a term ki, its probability distribution over
the whole collection is referred to as P(ki|C)
• The amount of information associated with this
distribution is given by −log P(ki|C)
• By modifying this probability function, we can
implement distinct notions of term randomness
Second assumption
• A complementary term distribution can be obtained by
considering just the subset of documents that contain
term ki
• This subset is referred to as the elite set
• The corresponding probability distribution, computed
with regard to document dj , is referred to as P(ki|dj)
• Smaller the probability of observing a term ki in a
document dj , more rare and important is the term
considered to be
• Thus, the amount of information associated with the
term in the elite set is defined as 1 − P(ki|dj)
Divergence from Randomness
Random Distribution
• To compute the distribution of terms in the collection,
distinct probability models can be considered
• For instance, consider that Bernoulli trials are used to
model the occurrences of a term in the collection
• To illustrate, consider a collection with 1,000 documents
and a term ki that occurs 10 times in the collection
• Then, the probability of observing 4 occurrences of term
ki in a document is given by
Random Distribution
Random Distribution
• Under these conditions, we can aproximate
the binomial distribution by a Poisson process,
which yields
Distribution over the Elite Set
Normalization
Normalization
Assignment
• Explain the Information Retrieval Model of
Divergence from Randomness

More Related Content

PDF
Job schedulerを活用したoperations as codeの世界
PDF
Chapter 6: OPERATIONS ON GRAPHS
PPTX
Brightree document management
PDF
Theory of Computation Lecture Notes
RTF
Midtown madness codes
PDF
Aglets
PPTX
Object database standards, languages and design
PPTX
Unit iv(simple code generator)
Job schedulerを活用したoperations as codeの世界
Chapter 6: OPERATIONS ON GRAPHS
Brightree document management
Theory of Computation Lecture Notes
Midtown madness codes
Aglets
Object database standards, languages and design
Unit iv(simple code generator)

What's hot (17)

PDF
String matching, naive,
PPTX
5.2 primitive recursive functions
PPTX
Active database
PPTX
Database ,11 Concurrency Control
PPT
PPTX
Language models
PDF
データベース10 - 正規化
PDF
Cell Phone and Mobile Devices Forensics
PDF
แผ่นพับ
PPTX
Push down automata
PPTX
Mobile dbms
PPT
Requirment anlaysis , application, device, network requirements
PDF
Presto As A Service - Treasure DataでのPresto運用事例
PPTX
Deductive databases
PPT
Distributed Deadlock Detection.ppt
PPTX
Homogeneous ddbms
PPTX
RDB開発者のためのApache Cassandra データモデリング入門
String matching, naive,
5.2 primitive recursive functions
Active database
Database ,11 Concurrency Control
Language models
データベース10 - 正規化
Cell Phone and Mobile Devices Forensics
แผ่นพับ
Push down automata
Mobile dbms
Requirment anlaysis , application, device, network requirements
Presto As A Service - Treasure DataでのPresto運用事例
Deductive databases
Distributed Deadlock Detection.ppt
Homogeneous ddbms
RDB開発者のためのApache Cassandra データモデリング入門
Ad

Similar to Information retrieval 20 divergence from randomness (20)

PPTX
Probabilistic retrieval model
PPTX
unit -4MODELING AND RETRIEVAL EVALUATION
PPTX
IRT Unit_ 2.pptx
PPT
The science behind predictive analytics a text mining perspective
PPTX
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
PPTX
Applying a new subject classification scheme for a database by a data-driven ...
PPTX
Learn from Example and Learn Probabilistic Model
PPTX
How to analyse bulk transcriptomic data using Deseq2
PPTX
PA_EPGDM_2_2023.pptx
PPTX
Information retrieval 10 vector and probabilistic models
PPTX
Statistics-3 : Statistical Inference - Core
PPTX
DECISION TREE AND PROBABILISTIC MODELS.pptx
PPTX
Document ranking using qprp with concept of multi dimensional subspace
PPTX
The tale of heavy tails in computer networking
PPTX
determinatiion of
PPT
Language Modeling Putting a curve to the bag of words
PPTX
k-Nearest Neighbors with brief explanation.pptx
PPTX
Information retrieval 12 modern ir and set based models
PDF
191CSEH IR UNIT - II for an engineering subject
Probabilistic retrieval model
unit -4MODELING AND RETRIEVAL EVALUATION
IRT Unit_ 2.pptx
The science behind predictive analytics a text mining perspective
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
Applying a new subject classification scheme for a database by a data-driven ...
Learn from Example and Learn Probabilistic Model
How to analyse bulk transcriptomic data using Deseq2
PA_EPGDM_2_2023.pptx
Information retrieval 10 vector and probabilistic models
Statistics-3 : Statistical Inference - Core
DECISION TREE AND PROBABILISTIC MODELS.pptx
Document ranking using qprp with concept of multi dimensional subspace
The tale of heavy tails in computer networking
determinatiion of
Language Modeling Putting a curve to the bag of words
k-Nearest Neighbors with brief explanation.pptx
Information retrieval 12 modern ir and set based models
191CSEH IR UNIT - II for an engineering subject
Ad

More from Vaibhav Khanna (20)

PPTX
Information and network security 47 authentication applications
PPTX
Information and network security 46 digital signature algorithm
PPTX
Information and network security 45 digital signature standard
PPTX
Information and network security 44 direct digital signatures
PPTX
Information and network security 43 digital signatures
PPTX
Information and network security 42 security of message authentication code
PPTX
Information and network security 41 message authentication code
PPTX
Information and network security 40 sha3 secure hash algorithm
PPTX
Information and network security 39 secure hash algorithm
PPTX
Information and network security 38 birthday attacks and security of hash fun...
PPTX
Information and network security 37 hash functions and message authentication
PPTX
Information and network security 35 the chinese remainder theorem
PPTX
Information and network security 34 primality
PPTX
Information and network security 33 rsa algorithm
PPTX
Information and network security 32 principles of public key cryptosystems
PPTX
Information and network security 31 public key cryptography
PPTX
Information and network security 30 random numbers
PPTX
Information and network security 29 international data encryption algorithm
PPTX
Information and network security 28 blowfish
PPTX
Information and network security 27 triple des
Information and network security 47 authentication applications
Information and network security 46 digital signature algorithm
Information and network security 45 digital signature standard
Information and network security 44 direct digital signatures
Information and network security 43 digital signatures
Information and network security 42 security of message authentication code
Information and network security 41 message authentication code
Information and network security 40 sha3 secure hash algorithm
Information and network security 39 secure hash algorithm
Information and network security 38 birthday attacks and security of hash fun...
Information and network security 37 hash functions and message authentication
Information and network security 35 the chinese remainder theorem
Information and network security 34 primality
Information and network security 33 rsa algorithm
Information and network security 32 principles of public key cryptosystems
Information and network security 31 public key cryptography
Information and network security 30 random numbers
Information and network security 29 international data encryption algorithm
Information and network security 28 blowfish
Information and network security 27 triple des

Recently uploaded (20)

PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
AI in Product Development-omnex systems
PDF
System and Network Administraation Chapter 3
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Nekopoi APK 2025 free lastest update
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
top salesforce developer skills in 2025.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
How to Choose the Right IT Partner for Your Business in Malaysia
Understanding Forklifts - TECH EHS Solution
Odoo Companies in India – Driving Business Transformation.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
AI in Product Development-omnex systems
System and Network Administraation Chapter 3
Design an Analysis of Algorithms I-SECS-1021-03
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Navsoft: AI-Powered Business Solutions & Custom Software Development
Nekopoi APK 2025 free lastest update
Softaken Excel to vCard Converter Software.pdf
Digital Strategies for Manufacturing Companies
Upgrade and Innovation Strategies for SAP ERP Customers
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
top salesforce developer skills in 2025.pdf
PTS Company Brochure 2025 (1).pdf.......
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...

Information retrieval 20 divergence from randomness

  • 1. Information Retrieval : 20 Divergence from Randomness Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2. Divergence from Randomness • A distinct probabilistic model has been proposed by Amati and Rijsbergen • The idea is to compute term weights by measuring the divergence between a term distribution produced by a random process and the actual term distribution • Thus, the name divergence from randomness • The model is based on two fundamental assumptions, as follows.
  • 3. First assumption: • Not all words are equally important for describing the content of the documents • Words that carry little information are assumed to be randomly distributed over the whole document collection C • Given a term ki, its probability distribution over the whole collection is referred to as P(ki|C) • The amount of information associated with this distribution is given by −log P(ki|C) • By modifying this probability function, we can implement distinct notions of term randomness
  • 4. Second assumption • A complementary term distribution can be obtained by considering just the subset of documents that contain term ki • This subset is referred to as the elite set • The corresponding probability distribution, computed with regard to document dj , is referred to as P(ki|dj) • Smaller the probability of observing a term ki in a document dj , more rare and important is the term considered to be • Thus, the amount of information associated with the term in the elite set is defined as 1 − P(ki|dj)
  • 6. Random Distribution • To compute the distribution of terms in the collection, distinct probability models can be considered • For instance, consider that Bernoulli trials are used to model the occurrences of a term in the collection • To illustrate, consider a collection with 1,000 documents and a term ki that occurs 10 times in the collection • Then, the probability of observing 4 occurrences of term ki in a document is given by
  • 8. Random Distribution • Under these conditions, we can aproximate the binomial distribution by a Poisson process, which yields
  • 12. Assignment • Explain the Information Retrieval Model of Divergence from Randomness