SlideShare a Scribd company logo
ENTERPRISE  SEARCH an introduction
Web Search Desktop Search Enterprise Search
so what is a Search Engine?
a SOFTWARE that  builds index  on Text answers queries  using that index
Any search application has  two major components SEARCH   component  INDEXING   component - of importance to us  developers (read headache) - of importance to the  users
data INDEX  FILES is indexed user sends  search query receives  search results INDEXING   component SEARCH   component
Let’s start with INDEXING
is it easy to search here  . . .
or  here  . . .
that’s information like  garbage   no   structure comes in all  kinds of     shapes, sizes,  formats
And this is  what indexing does  Makes data accessible in a  structured format , easily accessible through search.
so what all   needs to be  Indexed and Searched ?
various   FILE FORMATS Text Files HTML PDF MS Word PPT
coming from various   DATA SOURCES Emails CMS File System Database Web Pages
data  ( documents )   INDEX  FILES user sends  search query receives  search results Analyzer fed to text that should be indexed  removing  stop words  such as "a" or "the" converting all text to  lowercase  letters  for case-insensitive searching Stemming (A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". )-   Index Writer tokenized text
Document 1: Coffee isn't my cup of tea.   Document 2:  Chocolate, men, coffee - some things are better rich.   INDEX coffee  - 1,2 cup - 1  tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
And now the SEARCH  Component
data INDEX  FILES is indexed user receives  search results sends  search query search terms
Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page
introducing   LUCENE
Full-text  search   library Open Source   Documents in  xml  format Can operate on its own or via Solr
 
 
Ways of storing fields  of any document: Indexed   means it is   searchable Stored   you may chose not to make a field searchable,  means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized  means it is run through an  Analyzer , that converts the content into a sequence of  tokens
introducing   SOLR Solr Solr Lucene Index
open source  handles index/Query to Lucene  via HTTP and XML ( also JSON ) manages document update , add and delete requests to Lucene straightforward schema and config files comprehensive HTML Admin Interfaces highly configurable
Adding Documents to SOLR
HTTP POST to /update <add><doc boost=“2”> <field name=“type”>05991</field> <field name=“from”>Apache Solr</field> <field name=“subject”>An intro...</field> <field name=“category”>search</field> <field name=“category”>lucene</field> <field name=“body”>Solr is a full...</field> </doc></add>
Schema.xml   field indexing and display definition
<field name=&quot;subject&quot;  type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;genus_species&quot;  type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;language&quot;  type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/>  <field name=&quot;creator&quot;  type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;control_num&quot;  type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;title_sort&quot;  type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot;/>
Solrconfig.xml  file  defines cache size, faceted field type, request handler customization
Deleting Documents Delete by Id <delete><id>05591</id></delete> Delete by Query (multiple documents) <delete> <query>manufacturer:microsoft</query> </delete>
Search Results http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price
Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
<response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc>  <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float>  </doc>  <doc>  <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update  Handler Caching XML Update  Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here
 

More Related Content

PDF
Fundamentals of Web Development For Non-Developers
PPTX
HTML, CSS And JAVASCRIPT!
PPTX
Static and Dynamic webpage
PPT
HTML Tags
PPTX
Fundamentals of Web Development For Non-Developers
HTML, CSS And JAVASCRIPT!
Static and Dynamic webpage
HTML Tags

What's hot (20)

PPTX
PPT
Lecture 1 intro to web designing
PPTX
Working of search engine
PPTX
Anchor tag HTML Presentation
PDF
CSS Font & Text style
PPTX
Search Engine Powerpoint
PPTX
Html n CSS
PDF
Web Design & Development - Session 1
PPTX
HTML-(workshop)7557.pptx
PPTX
Search engines and its types
PPTX
An Overview of HTML, CSS & Java Script
PPTX
Introduction to Internet
PPTX
Web Page Designing
PPTX
Search Engine
PDF
Bootstrap
PPTX
Elements of html powerpoint
PPTX
Basic Html Knowledge for students
PPTX
presentation in html,css,javascript
PPTX
Lecture 1 intro to web designing
Working of search engine
Anchor tag HTML Presentation
CSS Font & Text style
Search Engine Powerpoint
Html n CSS
Web Design & Development - Session 1
HTML-(workshop)7557.pptx
Search engines and its types
An Overview of HTML, CSS & Java Script
Introduction to Internet
Web Page Designing
Search Engine
Bootstrap
Elements of html powerpoint
Basic Html Knowledge for students
presentation in html,css,javascript
Ad

Viewers also liked (8)

PPT
Search Engines
PPTX
Search Engines Presentation
PPTX
Search engines
PPTX
Search Engine
PPSX
Learn the Search Engine Type and Its Functions!
PPT
Tutorial 3 - Searcing the Web
 
PPTX
Search Engine
PPT
Types of Search Engines
Search Engines
Search Engines Presentation
Search engines
Search Engine
Learn the Search Engine Type and Its Functions!
Tutorial 3 - Searcing the Web
 
Search Engine
Types of Search Engines
Ad

Similar to Introduction to Search Engines (20)

PDF
Solr中国6月21日企业搜索
PDF
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
PDF
Solr search engine with multiple table relation
PDF
Search Engine-Building with Lucene and Solr
PPTX
Introduction to Lucene and Solr - 1
PPT
Lucene and MySQL
PPT
Advanced full text searching techniques using Lucene
PDF
Apache Solr crash course
KEY
Apache Solr - Enterprise search platform
PPTX
Introduction to Apache Lucene/Solr
PDF
Lucene for Solr Developers
PPTX
Apache lucene
PPT
Solr Presentation
PDF
Lucene for Solr Developers
PDF
Introduction to Solr
PPTX
Introduction to Lucene & Solr and Usecases
PDF
Rapid Prototyping with Solr
ODP
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
PPT
Apache Lucene Searching The Web
PPT
Lucene basics
Solr中国6月21日企业搜索
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Solr search engine with multiple table relation
Search Engine-Building with Lucene and Solr
Introduction to Lucene and Solr - 1
Lucene and MySQL
Advanced full text searching techniques using Lucene
Apache Solr crash course
Apache Solr - Enterprise search platform
Introduction to Apache Lucene/Solr
Lucene for Solr Developers
Apache lucene
Solr Presentation
Lucene for Solr Developers
Introduction to Solr
Introduction to Lucene & Solr and Usecases
Rapid Prototyping with Solr
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene Searching The Web
Lucene basics

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
KodekX | Application Modernization Development
PDF
Modernizing your data center with Dell and AMD
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Understanding_Digital_Forensics_Presentation.pptx
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing

Introduction to Search Engines

  • 1. ENTERPRISE SEARCH an introduction
  • 2. Web Search Desktop Search Enterprise Search
  • 3. so what is a Search Engine?
  • 4. a SOFTWARE that builds index on Text answers queries using that index
  • 5. Any search application has two major components SEARCH component INDEXING component - of importance to us developers (read headache) - of importance to the users
  • 6. data INDEX FILES is indexed user sends search query receives search results INDEXING component SEARCH component
  • 8. is it easy to search here . . .
  • 9. or here . . .
  • 10. that’s information like garbage no structure comes in all kinds of shapes, sizes, formats
  • 11. And this is what indexing does Makes data accessible in a structured format , easily accessible through search.
  • 12. so what all needs to be Indexed and Searched ?
  • 13. various FILE FORMATS Text Files HTML PDF MS Word PPT
  • 14. coming from various DATA SOURCES Emails CMS File System Database Web Pages
  • 15. data ( documents ) INDEX FILES user sends search query receives search results Analyzer fed to text that should be indexed removing stop words such as &quot;a&quot; or &quot;the&quot; converting all text to lowercase letters for case-insensitive searching Stemming (A stemming algorithm reduces the words &quot;fishing&quot;, &quot;fished&quot;, &quot;fish&quot;, and &quot;fisher&quot; to the root word, &quot;fish&quot;. )- Index Writer tokenized text
  • 16. Document 1: Coffee isn't my cup of tea. Document 2: Chocolate, men, coffee - some things are better rich. INDEX coffee - 1,2 cup - 1 tea - 1 chocolate - 1 men - 1 things - 1 better - 1 rich - 1
  • 17. And now the SEARCH Component
  • 18. data INDEX FILES is indexed user receives search results sends search query search terms
  • 19. Search Request Terms Taxonomy Spelling Index Correct Search Terms + Incorrect Search Terms Search Terms + Related Terms from Taxonomy + Concept IDs Search engine (INDEX) Search results with 1) Actual Location of the result 2) Rank 3) Details 4) Facet Categorization Results’ Page
  • 20. introducing LUCENE
  • 21. Full-text search library Open Source Documents in xml format Can operate on its own or via Solr
  • 22.  
  • 23.  
  • 24. Ways of storing fields of any document: Indexed means it is searchable Stored you may chose not to make a field searchable, means the content can be displayed in the search results. Example : “ summary associated with a page ” Tokenized means it is run through an Analyzer , that converts the content into a sequence of tokens
  • 25. introducing SOLR Solr Solr Lucene Index
  • 26. open source handles index/Query to Lucene via HTTP and XML ( also JSON ) manages document update , add and delete requests to Lucene straightforward schema and config files comprehensive HTML Admin Interfaces highly configurable
  • 28. HTTP POST to /update <add><doc boost=“2”> <field name=“type”>05991</field> <field name=“from”>Apache Solr</field> <field name=“subject”>An intro...</field> <field name=“category”>search</field> <field name=“category”>lucene</field> <field name=“body”>Solr is a full...</field> </doc></add>
  • 29. Schema.xml field indexing and display definition
  • 30. <field name=&quot;subject&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;genus_species&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;language&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;creator&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;control_num&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot; multiValued=&quot;true&quot;/> <field name=&quot;title_sort&quot; type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;false&quot;/>
  • 31. Solrconfig.xml file defines cache size, faceted field type, request handler customization
  • 32. Deleting Documents Delete by Id <delete><id>05591</id></delete> Delete by Query (multiple documents) <delete> <query>manufacturer:microsoft</query> </delete>
  • 34. Default Parameters http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price param default description q The query start 0 Offset into the list of matches rows 10 Number of documents to return fl * Stored fields to return qt standard Query type; maps to query handler df (schema) Default field to search
  • 35. <response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc> <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float> </doc> <doc> <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
  • 36. Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Search Requests hit here New document to be added here
  • 37.