SlideShare a Scribd company logo
IMPLEMENTATION OF
INFORMATION RETRIEVAL
  SYSTEMS VIA RDBMS
Relational Database: Definitions

 Relational database: a set of relations
 Relation: made up of 2 parts:
     Instance : a table, with rows and columns.
      #Rows = cardinality, #fields = degree / arity.
     Schema : specifies name of relation, plus name and type of
      each column.
        E.G. Students(sid: string, name: string, login: string,
              age: integer, gpa: real).
 Can think of a relation as a set of rows or tuples (i.e.,
 all rows are distinct).
Example Instance of Students Relation


        sid     name      login            age   gpa
       53666    Jones jones@cs             18    3.4
       53688    Smith smith@eecs           18    3.2
       53650    Smith smith@math           19    3.8

Cardinality = 3, degree = 5, all rows distinct
Relational Query Languages

 A major strength of the relational model: supports
  simple, powerful querying of data.
 Queries can be written intuitively, and the DBMS is
  responsible for efficient evaluation.
The SQL Query Language

 Developed by IBM (system R) in the 1970s
 Need for a standard since it is used by many vendors
 Standards:
    SQL-86
    SQL-89 (minor revision)
    SQL-92 (major revision, current standard)
    SQL-99 (major extensions)
The SQL Query Language

 To find all 18 year old students, we can write:

  SELECT *               sid   name    login     age gpa
  FROM Students S      53666 Jones    jones@cs   18 3.4
  WHERE S.age=18       53688 Smith smith@ee 18 3.2


 •To find just names and logins, replace the first line:
   SELECT S.name, S.login
Querying Multiple Relations
     sid          cid   grade
    53831   Carnatic101  C
    53831   Reggae203    B
    53650   Topology112  A
    53666   History105   B

    SELECT S.name, E.cid
    FROM Students S, Enrolled E
    WHERE S.sid=E.sid AND E.grade=“A”



    S.name E.cid
    Smith  Topology112
Creating Relations in SQL
 Creates the Students relation. Observe
  that the type (domain) of each field      CREATE TABLE Students
   is specified, and enforced by the DBMS        (sid: CHAR(20),
  whenever tuples are added or modified.          name: CHAR(20),
 As another example, the Enrolled table          login: CHAR(10),
  holds information about courses that
  students take.                                  age: INTEGER,
                                                  gpa: REAL)


                                            CREATE TABLE Enrolled
                                                 (sid: CHAR(20),
                                                  cid: CHAR(20),
                                                  grade: CHAR(2))
Combining Separate Systems

  Use an IR and RDBMS systems which are
  independent.
  Divide the query into two:
      Structured part for the RDBMS
      Unstructured (text) part for the IR
  Combine the results from IR and RDBMS
  Good for letting each vendor develop its own system
  Bad for data integrity, recovery, portability, and
  performance
User Defined Operators

  Allow users to modify SQL by adding their own functions
  Some vendors used this approach (such as IBM DB2 text
  extender)
  Lynch and Stonebreaker defined “user defined operators” to
  implement information retrieval in 1988
      //Retrieves documents that contain term1, term2, term3
      SELECT Doc_Id
      FROM Doc
      WHERE SEARCH-TERM(Text, Term1, Term 2, Term3)

       //Retrieves documents that contain term1, term2, term3
       // within a window of 5 terms
       SELECT Doc_Id
       FROM Doc
       WHERE PROXIMITY(Text,5, Term1, Term 2, Term3)
Non-First Normal Form Approaches

  Capture the many-to-many relationships into sets via nested
  relations
  Hard to implement ad-hoc queries
  No standard yet
Using RDBMS for IR

  Benefits:
      Recovery
      Performance
      Data migration
      Concurrency Control
      Access control mechanism
      Logical and physical data independence
Using RDBMS for IR


  Example: A bibliography that includes both structured and
  unstructured information
      DIRECTORY (name, institution) : affiliation of the author
      AUTHOR(name,DocId) :authorship information
      INDEX (name, DocId) :terms that are used to index a document
Using RDBMS for IR

   Preprocessing
       SGML can be used as a starting point which is a standard for
        defining parts of documents

 <DOC>
 <DOCNO> WSJ834234234 </DOCNO>
  <HL> How to make students suffer in IR Course </HL>
 <DD> 03/23/87</DD>
 <DATELINE> Sabanci, Turkey </DATELINE>
 <TEXT>
 Crawler HW, Inverted Index, Querying
 </TEXT>
 </DOC>
Using RDBMS for IR
   Preprocessing
       SGML can be used as a starting point which is a standard for
        defining parts of documents
       Use a parser together with a hash function to identify terms
       Use STOP_TERM table for referencing stop words
       Produce three output tables
          INDEX (DocId, Term, TermFrequency) : Models the inverted index
          DOC (DocId, DocName, PubDate, DateLine) : Document metadata
          TERM (Term, Idf) : stored the weights of each term

 //Construct TERM table, N is the total number of documents
 INSERT INTO TERM
 SELECT Term,log(N/Count(*))
 FROM INDEX
 GROUP BY Term
Using RDBMS for IR
 An offset can be added together with the term to be able to answer proximity
    queries. For example “Vice President” should occur together in the same
    document for relevant documents etc.

 INDEX_PROX (DocId, Term, OffSet)

 //Construct TERM table, N is the total number of documents
 INSERT INTO INDEX
 SELECT DocId, Term, COUNT(*)
 FROM INDEX_PROX
 GROUP BY DocId, Term
Using RDBMS for IR

   Query can be modeled as a relation as well when it is a long
   document
       QUERY(Term,TermFreq)


   Ex: “Find all news documents written on 03/03/2005 about
   Sabanci University
       Data will be extracted from the structured fields
       Terms will be extracted using the inverted index


SELECT d.DocId
FROM DOC d, INDEX i
WHERE i.Term IN (“Sabanci”, “University”) AND d.PubDate = “03/03/2005”
      AND d.DocId = i.DocId
Using RDBMS for IR

    Boolean Queries: Consists of terms with boolean operators
    (AND, OR, and NOT)
    For a single inputTerm: retrieve the document texts that contain
    that term

SELECT d.Text
FROM DOC d,
WHERE d.DocId IN
     (SELECT DISTINCT (i.DocId)
      FROM INDEX i
      WHERE i.Term = inputTerm)


Note that we can store the text part of a document using BLOB or CLOG (
Binary or Character Large Object)
Using RDBMS for IR

   Boolean Queries that contain OR

SELECT DISTINCT (i.DocId)
FROM INDEX i
WHERE i.Term = inputTerm1 OR
      i.Term = inputTerm2 OR
      …..
      i.Term = inputTermn OR
Using RDBMS for IR

     Boolean Queries that contain AND

SELECT DISTINCT (i.DocId)
FROM INDEX i
WHERE i.Term = inputTerm1 AND
      i.Term = inputTerm2 AND
      …..
      i.Term = inputTermn AND

??
Using RDBMS for IR

   Boolean Queries that contain AND (Previous Answer Was
   Wrong)

SELECT DISTINCT (i.DocId)
FROM INDEX i1, INDEX i2, INDEX i3, …. INDEX in
WHERE i1.Term = inputTerm1 AND
       i2.Term = inputTerm2 AND
      …..
      in.Term = inputTermn AND
      i1.DocID = i2.DocId AND
      i2.DocID = i3.DocId AND
      …
      in-1 = in.DocID

OR YOU CAN USE INTERSECTION
Using RDBMS for IR

  Boolean Queries that contain AND
  Commercial DBMSs are not able to process more than a fixed number
  of joins.
  Solution


   SELECT i.DocId
   FROM INDEX i, Query q
   WHERE i.Term = q.term
   GROUP BY i.DocId
   HAVING COUNT(i.Term) = (SELECT COUNT(*) FROM QUERY)

   Works only when the INDEX contains only one occurrence of a given term
   Together with its frequency. No Proximity is recorded.
Using RDBMS for IR

  Boolean Queries that contain AND
  Commercial DBMSs are not able to process more than a fixed number
  of joins.
  Solution for terms appearing more than once in the INDEX


   SELECT i.DocId
   FROM INDEX i, Query q
   WHERE i.Term = q.term
   GROUP BY i.DocId
   HAVING COUNT(DISTINCT(i.Term)) = (SELECT COUNT(*) FROM QUERY)

   This is slower since DISTINC requires a sort for duplicate elimination.
Using RDBMS for IR

  Boolean Queries that contain AND
  Commercial DBMSs are not able to process more than a fixed number
  of joins.
  Implementation of TAND (Threshold AND) is also simple


   SELECT i.DocId
   FROM INDEX i, Query q
   WHERE i.Term = q.term
   GROUP BY i.DocId
   HAVING COUNT(DISTINCT(i.Term)) > k
Using RDBMS for IR

  Proximity Queries for terms within a specific window width


 SELECT a.DocId
 FROM INDEX_PROX a, INDEX_PROX b
 WHERE a.Term IN (SELECT q.Term FROM QUERY q) AND
        b.Term IN (SELECT q.Term FROM QUERY q) AND
        a.DocId = b.DocId AND
        (a.offset –b.offset) BETWEEN 0 AND (width-1)
 GROUP BY a.DocId, b.DocId, a.Term, a.offset
 HAVING COUNT(DISTINCT(b.Term)) = SELECT (COUNT(*) FROM QUERY)
Using RDBMS for IR

  Calculating Relevance

   SELECT i.DocId, SUM(q.tf*t.idf*t.tf*t.idf)
   FROM QUERY q, INDEX i, TERM t
   WHERE q.Term = t.term AND i.Term = t.Term
   GROUP BY i.DocId
   ORDER BY 2 DESC

More Related Content

ODP
Implementation Issue with ORDBMS
PPT
List moderate
PDF
Coclustering Base Classification For Out Of Domain Documents
 
PPT
Object & classes
PPT
PPTX
Introduction To R Language
PPT
Intro to Data warehousing lecture 11
Implementation Issue with ORDBMS
List moderate
Coclustering Base Classification For Out Of Domain Documents
 
Object & classes
Introduction To R Language
Intro to Data warehousing lecture 11

What's hot (20)

PPT
Intro to Data warehousing lecture 19
PPT
Intro to Data warehousing lecture 14
PDF
Sql commands
PPTX
Bringing OpenClinica Data into SAS
PPT
Optimizing Data Accessin Sq Lserver2005
PDF
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
PPT
DBMS_INTRODUCTION OF SQL
PDF
IRE- Algorithm Name Detection in Research Papers
PPTX
BAS 150 Lesson 4 Lecture
PDF
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
PPTX
BAS 150 Lesson 6 Lecture
PPT
Database management system chapter12
PPT
Unit08 dbms
PPT
DBMS _Relational model
PPT
SQL : introduction
PPTX
Sql fundamentals
PPT
Unit 08 dbms
DOCX
DATABASE MANAGEMENT SYSTEM
PDF
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Intro to Data warehousing lecture 19
Intro to Data warehousing lecture 14
Sql commands
Bringing OpenClinica Data into SAS
Optimizing Data Accessin Sq Lserver2005
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
DBMS_INTRODUCTION OF SQL
IRE- Algorithm Name Detection in Research Papers
BAS 150 Lesson 4 Lecture
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
BAS 150 Lesson 6 Lecture
Database management system chapter12
Unit08 dbms
DBMS _Relational model
SQL : introduction
Sql fundamentals
Unit 08 dbms
DATABASE MANAGEMENT SYSTEM
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Ad

Viewers also liked (12)

PPTX
Vector space classification
PPTX
The vector space model
PPTX
Vector space model of information retrieval
PDF
Document similarity with vector space model
PPT
similarity measure
PPTX
Information retrieval system!
PPT
Storage And Retrieval Of Information
PPTX
Information retrieval s
PPTX
Information storage and retrieval
PDF
Prefixes 2
PDF
PDF
Introduction to Information Retrieval & Models
Vector space classification
The vector space model
Vector space model of information retrieval
Document similarity with vector space model
similarity measure
Information retrieval system!
Storage And Retrieval Of Information
Information retrieval s
Information storage and retrieval
Prefixes 2
Introduction to Information Retrieval & Models
Ad

Similar to 2005 fall cs523_lecture_4 (20)

PDF
PT- Oracle session01
PPT
Ch3_Rel_Model-95.ppt
PPT
ch3.ppt
PDF
PPT
ch3.ppt
PPT
ch3.ppt
PPT
Introduction to SQL
PPTX
Cassandra20141009
PPTX
Quick Revision on DATA BASE MANAGEMENT SYSTEMS concepts.pptx
PPTX
ch3.pptx SQL in DBMS all Chapter with details
PPT
ch3.ppt
DOCX
Database Management Lab -SQL Queries
PPT
MongoDB
PDF
MIS5101 WK10 Outcome Measures
PPTX
Cassandra20141113
PPTX
Ado.net by Awais Majeed
PDF
Vsam interview questions and answers.
PPT
3.1- Data Management & Retrieval using data analytics techniques
PDF
DBMS Important notes for education suggestion
PT- Oracle session01
Ch3_Rel_Model-95.ppt
ch3.ppt
ch3.ppt
ch3.ppt
Introduction to SQL
Cassandra20141009
Quick Revision on DATA BASE MANAGEMENT SYSTEMS concepts.pptx
ch3.pptx SQL in DBMS all Chapter with details
ch3.ppt
Database Management Lab -SQL Queries
MongoDB
MIS5101 WK10 Outcome Measures
Cassandra20141113
Ado.net by Awais Majeed
Vsam interview questions and answers.
3.1- Data Management & Retrieval using data analytics techniques
DBMS Important notes for education suggestion

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Electronic commerce courselecture one. Pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
sap open course for s4hana steps from ECC to s4
A comparative analysis of optical character recognition models for extracting...
Electronic commerce courselecture one. Pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Assigned Numbers - 2025 - Bluetooth® Document
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
Programs and apps: productivity, graphics, security and other tools
“AI and Expert System Decision Support & Business Intelligence Systems”

2005 fall cs523_lecture_4

  • 2. Relational Database: Definitions Relational database: a set of relations Relation: made up of 2 parts:  Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity.  Schema : specifies name of relation, plus name and type of each column.  E.G. Students(sid: string, name: string, login: string, age: integer, gpa: real). Can think of a relation as a set of rows or tuples (i.e., all rows are distinct).
  • 3. Example Instance of Students Relation sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8 Cardinality = 3, degree = 5, all rows distinct
  • 4. Relational Query Languages  A major strength of the relational model: supports simple, powerful querying of data.  Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.
  • 5. The SQL Query Language Developed by IBM (system R) in the 1970s Need for a standard since it is used by many vendors Standards:  SQL-86  SQL-89 (minor revision)  SQL-92 (major revision, current standard)  SQL-99 (major extensions)
  • 6. The SQL Query Language To find all 18 year old students, we can write: SELECT * sid name login age gpa FROM Students S 53666 Jones jones@cs 18 3.4 WHERE S.age=18 53688 Smith smith@ee 18 3.2 •To find just names and logins, replace the first line: SELECT S.name, S.login
  • 7. Querying Multiple Relations sid cid grade 53831 Carnatic101 C 53831 Reggae203 B 53650 Topology112 A 53666 History105 B SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=“A” S.name E.cid Smith Topology112
  • 8. Creating Relations in SQL  Creates the Students relation. Observe that the type (domain) of each field CREATE TABLE Students is specified, and enforced by the DBMS (sid: CHAR(20), whenever tuples are added or modified. name: CHAR(20),  As another example, the Enrolled table login: CHAR(10), holds information about courses that students take. age: INTEGER, gpa: REAL) CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2))
  • 9. Combining Separate Systems Use an IR and RDBMS systems which are independent. Divide the query into two:  Structured part for the RDBMS  Unstructured (text) part for the IR Combine the results from IR and RDBMS Good for letting each vendor develop its own system Bad for data integrity, recovery, portability, and performance
  • 10. User Defined Operators Allow users to modify SQL by adding their own functions Some vendors used this approach (such as IBM DB2 text extender) Lynch and Stonebreaker defined “user defined operators” to implement information retrieval in 1988 //Retrieves documents that contain term1, term2, term3 SELECT Doc_Id FROM Doc WHERE SEARCH-TERM(Text, Term1, Term 2, Term3) //Retrieves documents that contain term1, term2, term3 // within a window of 5 terms SELECT Doc_Id FROM Doc WHERE PROXIMITY(Text,5, Term1, Term 2, Term3)
  • 11. Non-First Normal Form Approaches Capture the many-to-many relationships into sets via nested relations Hard to implement ad-hoc queries No standard yet
  • 12. Using RDBMS for IR Benefits:  Recovery  Performance  Data migration  Concurrency Control  Access control mechanism  Logical and physical data independence
  • 13. Using RDBMS for IR Example: A bibliography that includes both structured and unstructured information  DIRECTORY (name, institution) : affiliation of the author  AUTHOR(name,DocId) :authorship information  INDEX (name, DocId) :terms that are used to index a document
  • 14. Using RDBMS for IR Preprocessing  SGML can be used as a starting point which is a standard for defining parts of documents <DOC> <DOCNO> WSJ834234234 </DOCNO> <HL> How to make students suffer in IR Course </HL> <DD> 03/23/87</DD> <DATELINE> Sabanci, Turkey </DATELINE> <TEXT> Crawler HW, Inverted Index, Querying </TEXT> </DOC>
  • 15. Using RDBMS for IR Preprocessing  SGML can be used as a starting point which is a standard for defining parts of documents  Use a parser together with a hash function to identify terms  Use STOP_TERM table for referencing stop words  Produce three output tables  INDEX (DocId, Term, TermFrequency) : Models the inverted index  DOC (DocId, DocName, PubDate, DateLine) : Document metadata  TERM (Term, Idf) : stored the weights of each term //Construct TERM table, N is the total number of documents INSERT INTO TERM SELECT Term,log(N/Count(*)) FROM INDEX GROUP BY Term
  • 16. Using RDBMS for IR An offset can be added together with the term to be able to answer proximity queries. For example “Vice President” should occur together in the same document for relevant documents etc. INDEX_PROX (DocId, Term, OffSet) //Construct TERM table, N is the total number of documents INSERT INTO INDEX SELECT DocId, Term, COUNT(*) FROM INDEX_PROX GROUP BY DocId, Term
  • 17. Using RDBMS for IR Query can be modeled as a relation as well when it is a long document  QUERY(Term,TermFreq) Ex: “Find all news documents written on 03/03/2005 about Sabanci University  Data will be extracted from the structured fields  Terms will be extracted using the inverted index SELECT d.DocId FROM DOC d, INDEX i WHERE i.Term IN (“Sabanci”, “University”) AND d.PubDate = “03/03/2005” AND d.DocId = i.DocId
  • 18. Using RDBMS for IR Boolean Queries: Consists of terms with boolean operators (AND, OR, and NOT) For a single inputTerm: retrieve the document texts that contain that term SELECT d.Text FROM DOC d, WHERE d.DocId IN (SELECT DISTINCT (i.DocId) FROM INDEX i WHERE i.Term = inputTerm) Note that we can store the text part of a document using BLOB or CLOG ( Binary or Character Large Object)
  • 19. Using RDBMS for IR Boolean Queries that contain OR SELECT DISTINCT (i.DocId) FROM INDEX i WHERE i.Term = inputTerm1 OR i.Term = inputTerm2 OR ….. i.Term = inputTermn OR
  • 20. Using RDBMS for IR Boolean Queries that contain AND SELECT DISTINCT (i.DocId) FROM INDEX i WHERE i.Term = inputTerm1 AND i.Term = inputTerm2 AND ….. i.Term = inputTermn AND ??
  • 21. Using RDBMS for IR Boolean Queries that contain AND (Previous Answer Was Wrong) SELECT DISTINCT (i.DocId) FROM INDEX i1, INDEX i2, INDEX i3, …. INDEX in WHERE i1.Term = inputTerm1 AND i2.Term = inputTerm2 AND ….. in.Term = inputTermn AND i1.DocID = i2.DocId AND i2.DocID = i3.DocId AND … in-1 = in.DocID OR YOU CAN USE INTERSECTION
  • 22. Using RDBMS for IR Boolean Queries that contain AND Commercial DBMSs are not able to process more than a fixed number of joins. Solution SELECT i.DocId FROM INDEX i, Query q WHERE i.Term = q.term GROUP BY i.DocId HAVING COUNT(i.Term) = (SELECT COUNT(*) FROM QUERY) Works only when the INDEX contains only one occurrence of a given term Together with its frequency. No Proximity is recorded.
  • 23. Using RDBMS for IR Boolean Queries that contain AND Commercial DBMSs are not able to process more than a fixed number of joins. Solution for terms appearing more than once in the INDEX SELECT i.DocId FROM INDEX i, Query q WHERE i.Term = q.term GROUP BY i.DocId HAVING COUNT(DISTINCT(i.Term)) = (SELECT COUNT(*) FROM QUERY) This is slower since DISTINC requires a sort for duplicate elimination.
  • 24. Using RDBMS for IR Boolean Queries that contain AND Commercial DBMSs are not able to process more than a fixed number of joins. Implementation of TAND (Threshold AND) is also simple SELECT i.DocId FROM INDEX i, Query q WHERE i.Term = q.term GROUP BY i.DocId HAVING COUNT(DISTINCT(i.Term)) > k
  • 25. Using RDBMS for IR Proximity Queries for terms within a specific window width SELECT a.DocId FROM INDEX_PROX a, INDEX_PROX b WHERE a.Term IN (SELECT q.Term FROM QUERY q) AND b.Term IN (SELECT q.Term FROM QUERY q) AND a.DocId = b.DocId AND (a.offset –b.offset) BETWEEN 0 AND (width-1) GROUP BY a.DocId, b.DocId, a.Term, a.offset HAVING COUNT(DISTINCT(b.Term)) = SELECT (COUNT(*) FROM QUERY)
  • 26. Using RDBMS for IR Calculating Relevance SELECT i.DocId, SUM(q.tf*t.idf*t.tf*t.idf) FROM QUERY q, INDEX i, TERM t WHERE q.Term = t.term AND i.Term = t.Term GROUP BY i.DocId ORDER BY 2 DESC