SlideShare a Scribd company logo
Context based Indexing in
Search Engines using
Ontology
SHIKHA GUPTA
ASSISTANT PROFESSOR
ADVANCED EDUCATIONAL INSTITUTE
www.advanced.edu.in
Flow of lesson
1. Search Engine
2. Indexing
3. Architecture
4. Various Components
5. Conclusion
www.advanced.edu.in
What is Search Engine?
Aim
To provide most relevant documents to the users in minimum possible time.
Major issue
Granting efficient and fast accesses to the index
www.advanced.edu.in
Existing architecture
Index is built on the basis of the terms of the document.
www.advanced.edu.in
Indexing
Web Content Mining process
Indexer
Extracts a large amount of information which contain a given term
www.advanced.edu.in
Inverted file (IF)
okeeps account of number of occurrences of each term within every document, maintained in an
index
oSmall size occupancy
oEfficient in resolution of keywords based queries
www.advanced.edu.in
Term Based Index
Less efficient due to two information retrieval problems:
Polysemy (means a word has multiple meanings)
Synonymy (means that multiple words having the same meaning).
www.advanced.edu.in
Content Based Index
Improve search result relevance.
Use the concept of ontology.
www.advanced.edu.in
Ontology
Includes vocabulary for referring to the terms in that subject area and logical statements that
describe the relationships among the terms.
Example
simple ontology for apple
Set of concepts
Capple = {apple, computer device, fruit, eatable, iphone}
Set of relationships
Rapple = {brandname_of (apple, iphone), type_of (apple, fruit)}.
www.advanced.edu.in
www.advanced.edu.in
Architecture of Context based
Indexing
Crawler gathers Web documents and stores them into a huge repository
Every Web page has an associated ID number called document identifier, which is assigned
whenever a new URL is parsed out of a web page
Indexer takes the web pages collected by the spiders and parses them into a highly efficient
index.
www.advanced.edu.in
www.advanced.edu.in
Description of various Components
Repository of web page:
Database containing set of documents collected by crawler.
Indexer:
After the documents have been gathered by the crawler, the indexer maintains an index of the
documents which is in the form of posting lists that contain the term, document identifiers of
the documents and other related information.
www.advanced.edu.in
Preprocessing of document:
It involves removal of stop words. A stop word is any word which has no semantic content.
Common stop words are prepositions and articles.
Thesaurus:
It is a dictionary of words available on the world wide web from thesaurus.com which contains
the words as well as their multiple meanings.
www.advanced.edu.in
Context Repository:
This is a database which contains the various contexts. New contexts derived from thesaurus are
also stored in it. It maintains a database of several types of context data.
Ontology Repository:
This is a database of ontologies which contains the various relationships among objects in
various domains. It contains various concepts with their relationships.
www.advanced.edu.in
Context of the document:
This context represents the theme of the document that has been extracted using context
repository, thesaurus and ontology repository.
Index:
This is the final index that is constructed after extracting the context of the document. It have
context as first field, term as next field and finally the document identifiers of the relevant
documents.
www.advanced.edu.in
Searcher:
It receives user queries via the user interface and after searching the results in the index
provides them to the user.
Search Interface:
It is that user interface through which user types the query along with the context specified.
www.advanced.edu.in
CONCLUSION
An indexing structure that can be constructed on the basis of the context of the document.
Uses ontology for context based index building.
Enables retrieval from index on the basis of context rather than keywords.
Improves the quality of the retrieved results.
Better performance of the existing system.
www.advanced.edu.in
Shikha Gupta
Assistant Professor
Advanced Educational Institute
Advanced Educational Institutions,
70 km Milestone,
Delhi-Mathura Road, Dist. Palwal, Haryana-121105
+91–1275–398400, 302222
Shikha.0909@gmail.com
www.advance.edu.in
www.advanced.edu.in

More Related Content

PPSX
Concordances
PPTX
Automatic indexing
PPTX
Concordancer
PDF
Context Based Indexing in Search Engines Using Ontology: Review
PDF
Text databases and information retrieval
PPTX
Post coordinate indexing .. Library and information science
PPTX
Web search vs ir
PPT
AIRDIP model overview
Concordances
Automatic indexing
Concordancer
Context Based Indexing in Search Engines Using Ontology: Review
Text databases and information retrieval
Post coordinate indexing .. Library and information science
Web search vs ir
AIRDIP model overview

What's hot (19)

PPTX
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
PDF
Text Indexing and Retrieval
PDF
Session5
PPT
5013 Indexing Presentation
PPTX
The impact of web on ir
PPT
Tovek Presentation by Livio Costantini
PPT
Search strategies
PPT
Technical skills in multimedia for odl learners
PPSX
Unit 4 File and Data Management
PPSX
Unit 4 file and data management
PDF
CS6007 information retrieval - 5 units notes
PPTX
Information retrieval s
PPTX
Functions of information retrival system(1)
PPT
2013 CrossRef Workshops System Update: Reference Deposit Processing , Jon Stark
PDF
Data sharing as part of the research workflow
PPTX
PDF
Web_Mining_Overview_Nfaoui_El_Habib
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Text Indexing and Retrieval
Session5
5013 Indexing Presentation
The impact of web on ir
Tovek Presentation by Livio Costantini
Search strategies
Technical skills in multimedia for odl learners
Unit 4 File and Data Management
Unit 4 file and data management
CS6007 information retrieval - 5 units notes
Information retrieval s
Functions of information retrival system(1)
2013 CrossRef Workshops System Update: Reference Deposit Processing , Jon Stark
Data sharing as part of the research workflow
Web_Mining_Overview_Nfaoui_El_Habib
Ad

Viewers also liked (20)

PPT
Alta vista indexing and search engine
PPT
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
PPTX
Adding Semantic Edge to Your Content – From Authoring to Delivery
PDF
Ontological approach for improving semantic web search results
PPTX
Intriduction to Ontotext's KIM platform
PPT
Semantic Search Engines
PPTX
A Taxonomy of Semantic Web data Retrieval Techniques
PPTX
In Search of a Semantic Book Search Engine: Are We There Yet?
PPTX
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
PDF
Semantics And Search
PDF
Semantic data mining: an ontology based approach
PDF
Text Analysis and Semantic Search with GATE
PDF
Semantic security framework and context-aware role-based access control ontol...
PPTX
Semantic Search at Yahoo
PPTX
Use of ontologies in natural language processing
PPTX
Semantic Relation Classification: Task Formalisation and Refinement
PPT
Web crawler
PDF
GATE: a text analysis tool for social media
PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
ODP
The search engine index
Alta vista indexing and search engine
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Adding Semantic Edge to Your Content – From Authoring to Delivery
Ontological approach for improving semantic web search results
Intriduction to Ontotext's KIM platform
Semantic Search Engines
A Taxonomy of Semantic Web data Retrieval Techniques
In Search of a Semantic Book Search Engine: Are We There Yet?
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Semantics And Search
Semantic data mining: an ontology based approach
Text Analysis and Semantic Search with GATE
Semantic security framework and context-aware role-based access control ontol...
Semantic Search at Yahoo
Use of ontologies in natural language processing
Semantic Relation Classification: Task Formalisation and Refinement
Web crawler
GATE: a text analysis tool for social media
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
The search engine index
Ad

Similar to Indexing in Search Engine (20)

PDF
N017249497
PDF
Context Based Web Indexing For Semantic Web
PPT
Indexing
PPT
chapter 1-Overview of Information Retrieval.ppt
PPTX
KOS Management - The case of the Organic.Edunet Ontology
PPTX
Knowledge Organization Systems (KOS): Management of Classification Systems in...
PPTX
Subject Indexing & Techniques
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
PDF
6.domain extraction from research papers
PDF
A Novel Approach for Keyword extraction in learning objects using text mining
PDF
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
PDF
Domain Extraction From Research Papers
PDF
Technical Whitepaper: A Knowledge Correlation Search Engine
PDF
Improving Text Categorization with Semantic Knowledge in Wikipedia
PPTX
Lucene indexing
PPTX
NLP and LSA getting started
PPTX
PDF
Survey On Building A Database Driven Reverse Dictionary
PPTX
EDS for JIBS
PDF
G04124041046
N017249497
Context Based Web Indexing For Semantic Web
Indexing
chapter 1-Overview of Information Retrieval.ppt
KOS Management - The case of the Organic.Edunet Ontology
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Subject Indexing & Techniques
Information_Retrieval_Models_Nfaoui_El_Habib
6.domain extraction from research papers
A Novel Approach for Keyword extraction in learning objects using text mining
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
Domain Extraction From Research Papers
Technical Whitepaper: A Knowledge Correlation Search Engine
Improving Text Categorization with Semantic Knowledge in Wikipedia
Lucene indexing
NLP and LSA getting started
Survey On Building A Database Driven Reverse Dictionary
EDS for JIBS
G04124041046

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Geodesy 1.pptx...............................................
PPT
Mechanical Engineering MATERIALS Selection
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
PPT on Performance Review to get promotions
PPT
introduction to datamining and warehousing
PPTX
OOP with Java - Java Introduction (Basics)
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
web development for engineering and engineering
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
UNIT 4 Total Quality Management .pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Geodesy 1.pptx...............................................
Mechanical Engineering MATERIALS Selection
Operating System & Kernel Study Guide-1 - converted.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Safety Seminar civil to be ensured for safe working.
PPT on Performance Review to get promotions
introduction to datamining and warehousing
OOP with Java - Java Introduction (Basics)
R24 SURVEYING LAB MANUAL for civil enggi
UNIT-1 - COAL BASED THERMAL POWER PLANTS
web development for engineering and engineering
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Foundation to blockchain - A guide to Blockchain Tech
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Indexing in Search Engine

  • 1. Context based Indexing in Search Engines using Ontology SHIKHA GUPTA ASSISTANT PROFESSOR ADVANCED EDUCATIONAL INSTITUTE www.advanced.edu.in
  • 2. Flow of lesson 1. Search Engine 2. Indexing 3. Architecture 4. Various Components 5. Conclusion www.advanced.edu.in
  • 3. What is Search Engine? Aim To provide most relevant documents to the users in minimum possible time. Major issue Granting efficient and fast accesses to the index www.advanced.edu.in
  • 4. Existing architecture Index is built on the basis of the terms of the document. www.advanced.edu.in
  • 5. Indexing Web Content Mining process Indexer Extracts a large amount of information which contain a given term www.advanced.edu.in
  • 6. Inverted file (IF) okeeps account of number of occurrences of each term within every document, maintained in an index oSmall size occupancy oEfficient in resolution of keywords based queries www.advanced.edu.in
  • 7. Term Based Index Less efficient due to two information retrieval problems: Polysemy (means a word has multiple meanings) Synonymy (means that multiple words having the same meaning). www.advanced.edu.in
  • 8. Content Based Index Improve search result relevance. Use the concept of ontology. www.advanced.edu.in
  • 9. Ontology Includes vocabulary for referring to the terms in that subject area and logical statements that describe the relationships among the terms. Example simple ontology for apple Set of concepts Capple = {apple, computer device, fruit, eatable, iphone} Set of relationships Rapple = {brandname_of (apple, iphone), type_of (apple, fruit)}. www.advanced.edu.in
  • 11. Architecture of Context based Indexing Crawler gathers Web documents and stores them into a huge repository Every Web page has an associated ID number called document identifier, which is assigned whenever a new URL is parsed out of a web page Indexer takes the web pages collected by the spiders and parses them into a highly efficient index. www.advanced.edu.in
  • 13. Description of various Components Repository of web page: Database containing set of documents collected by crawler. Indexer: After the documents have been gathered by the crawler, the indexer maintains an index of the documents which is in the form of posting lists that contain the term, document identifiers of the documents and other related information. www.advanced.edu.in
  • 14. Preprocessing of document: It involves removal of stop words. A stop word is any word which has no semantic content. Common stop words are prepositions and articles. Thesaurus: It is a dictionary of words available on the world wide web from thesaurus.com which contains the words as well as their multiple meanings. www.advanced.edu.in
  • 15. Context Repository: This is a database which contains the various contexts. New contexts derived from thesaurus are also stored in it. It maintains a database of several types of context data. Ontology Repository: This is a database of ontologies which contains the various relationships among objects in various domains. It contains various concepts with their relationships. www.advanced.edu.in
  • 16. Context of the document: This context represents the theme of the document that has been extracted using context repository, thesaurus and ontology repository. Index: This is the final index that is constructed after extracting the context of the document. It have context as first field, term as next field and finally the document identifiers of the relevant documents. www.advanced.edu.in
  • 17. Searcher: It receives user queries via the user interface and after searching the results in the index provides them to the user. Search Interface: It is that user interface through which user types the query along with the context specified. www.advanced.edu.in
  • 18. CONCLUSION An indexing structure that can be constructed on the basis of the context of the document. Uses ontology for context based index building. Enables retrieval from index on the basis of context rather than keywords. Improves the quality of the retrieved results. Better performance of the existing system. www.advanced.edu.in
  • 19. Shikha Gupta Assistant Professor Advanced Educational Institute Advanced Educational Institutions, 70 km Milestone, Delhi-Mathura Road, Dist. Palwal, Haryana-121105 +91–1275–398400, 302222 Shikha.0909@gmail.com www.advance.edu.in www.advanced.edu.in