SlideShare a Scribd company logo
Solr
Solr Windows Installation Download and install    Tomcat  for Windows using the MSI installer. Install it with the tcnative.dll file. Say you installed it in c:\tomcat\  Check if Tomcat is installed correctly by going to    http://localhost:8080/   Change the c:\tomcat\conf\server.xml file to add the URIEncoding Connector element as shown below.  <Connector port=&quot;8080&quot; maxHttpHeaderSize=&quot;8192&quot;  URIEncoding=&quot;UTF-8&quot; maxThreads=&quot;150&quot; minSpareThreads=&quot;25&quot; maxSpareThreads=&quot;75&quot; enableLookups=&quot;false&quot; redirectPort=&quot;8443&quot; acceptCount=&quot;100&quot; connectionTimeout=&quot;20000&quot; disableUploadTimeout=&quot;true&quot; /> Download and unzip the Solr distribution zip file into (say) c:\temp\solrZip\
Solr Windows Installation Make a directory called solr where you intend the application server to function, say c:\web\solr\  Copy the contents of the example\solr directory c:\temp\solrZip\example\solr\ to c:\web\solr\  Stop the Tomcat service  Copy the *solr*.war file from c:\temp\solrZip\dist\ to the Tomcat webapps directory c:\tomcat\webapps\  Rename the *solr*.war file solr.war  Use the system tray icon to configure Tomcat to start with the following Java option: -Dsolr.solr.home=c:\web\solr  Alternative to the previous step goto C:\tomcat\conf\Catalina\localhost and create a file named ”solr.xml” having this line of code (see below in the notes) . But to run the server this way you will not keep the solr.war in webapps folder of tomcat but rather in some other folder like in this case I have kept it in  ${catalina.home}/newsolr/solr.war. Start the Tomcat service  Go to the solr admin page to verify that the installation is working. It will be at    http://localhost:8080/solr/admin
In Solr and Lucene, an index is built of one or more Documents. A Document consists of one or more Fields. A Field consists of a name, content, and metadata telling Solr how to handle the content. For instance, Fields can contain strings, numbers, booleans, or dates, as well as any types you wish to add. A Field can be described using a number of options that tell Solr how to treat the content during indexing and searching.
Field attributes The contents of a stored Field are saved in the index. This is useful for retrieving and highlighting the contents for display but is not necessary for the actual search. For example, many applications store pointers to the location of contents rather than the actual contents of a file.   stored  Indexed Fields are searchable and sortable. You also can run Solr's analysis process on indexed Fields, which can alter the content to improve or change results. The following section provides more information about Solr's analysis process.   indexed  Description  Attribute name
Example &quot;Solr Home&quot; Directory ============================= This directory is provided as an example of what a &quot;Solr Home&quot; directory should look like. It's not strictly necessary that you copy all of the files in this directory when setting up a new instance of Solr, but it is recommended. Basic Directory Structure ------------------------- The Solr Home directory typically contains the following subdirectories... conf/ This directory is mandatory and must contain your solrconfig.xml and schema.xml.  Any other optional configuration files would also  be kept here. data/ This directory is the default location where Solr will keep your index, and is used by the replication scripts for dealing with snapshots.  You can override this location in the solrconfig.xml and scripts.conf files. Solr will create this directory if it does not already exist. lib/ This directory is optional.  If it exists, Solr will load any Jars found in this directory and use them to resolve any &quot;plugins&quot; specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...) bin/ This directory is optional.  It is the default location used for keeping the replication scripts.
What Is Solr SOLR is a REST layer for Lucene Began life at CNET to provide a robust search system Joined Apache Incubator in January 2006 Graduated to Lucene sub-project status in January 2007 A full text search server based on Lucene XML/HTTP Interfaces Loose Schema to define types and fields Web Administration Interface Extensive Caching Index Replication Extensible Open Architecture Written in Java5, deployable as a WAR
Why use SOLR? Easy to set up and get started Powerful full text searching Cross platform - Java and REST Under active development Fast Adds extra functionality on top of Lucene: replication CSV importing JSON results results highlighting synonym support
 
Adding Documents HTTP POST to /update <add><doc boost=“2”> <field name=“article”>05991</field> <field name=“title”>Apache Solr</field> <field name=“subject”>An intro...</field> <field name=“category”>search</field> <field name=“category”>lucene</field> <field name=“body”>Solr is a full...</field> </doc></add>
Deleting Documents Delete by Id <delete><id>05591</id></delete> Delete by Query (multiple documents) <delete> <query>manufacturer:microsoft</query> </delete>
Commit <commit/> makes changes visible closes IndexWriter removes duplicates opens new IndexSearcher newSearcher/firstSearcher events cache warming “ register” the new IndexSearcher <optimize/> same as commit, merges all index segments.
Lucene syntax Required search term – use a “+” +ipod +belkin Field-specific searching – use fieldName name:ipod manu:belkin Wildcard searching – use * or ? ip?d belk* *deo (currently requires modifying solr source) Range searching timestamp:[2006-07-16T12:30:00Z to *] Time needs to be full ISO Proximity searching – use a “~” &quot;video ipod&quot;~3 – up to 3 words apart Fuzzy searches – use a “~” ipod~ will find ipod and ipods belkin~0.7 will find words close spellings
Default Query Syntax Lucene Query Syntax [; sort specification] 1. mission impossible; releaseDate desc 2. +mission +impossible –actor:cruise 3. “mission impossible” –actor:cruise 4. title:spiderman^10 description:spiderman 5. description:“spiderman movie”~10 6. +HDTV +weight:[0 TO 100] 7. Wildcard queries: te?t, te*t, test*
Full control panel interface Start row/max rows – pagination Output type standard (xml), python, json, ruby, xslt Enable highlighting fields to highlight works on wildcard matches
Default Parameters Query Arguments for HTTP GET/POST to /select
Search Results http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price <response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc> <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float> </doc> <doc> <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
Caching IndexSearcher’s view of an index is fixed •  Aggressive caching possible •  Consistency for multi-query requests filterCache – unordered set of document ids matching a query resultCache – ordered subset of document ids matching a query documentCache – the stored fields of documents userCaches – application specific, custom query handlers
Warming for Speed Lucene IndexReader warming field norms, FieldCache, tii – the term index Static Cache warming Configurable static requests to warm new   Searchers Smart Cache Warming (autowarming) Using MRU items in the current cache to prepopulate the new cache Warming in parallel with live requests
Smart Cache Warming
Schema Lucene has no notion of a schema Sorting - string vs. numeric Ranges - val:42 included in val:[1 TO 5] ? Lucene QueryParser has date-range support,   but must guess. Defines fields, their types, properties Defines unique key field, default search field, Similarity implementation
Field Definitions Field Attributes: name, type, indexed, stored, multiValued, omitNorms <field name=&quot;id“ type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot;/> <field name=&quot;sku“ type=&quot;textTight” indexed=&quot;true&quot; stored=&quot;true&quot;/> <field name=&quot;name“ type=&quot;text“ indexed=&quot;true&quot; stored=&quot;true&quot;/> <field name=“reviews“ type=&quot;text“ indexed=&quot;true“ stored=“false&quot;/> <field name=&quot;category“ type=&quot;text_ws“ indexed=&quot;true&quot; stored=&quot;true“ multiValued=&quot;true&quot;/> Dynamic Fields, in the spirit of Lucene! <dynamicField name=&quot;*_i&quot; type=&quot;sint“ indexed=&quot;true&quot; stored=&quot;true&quot;/> <dynamicField name=&quot;*_s&quot; type=&quot;string“ indexed=&quot;true&quot; stored=&quot;true&quot;/> <dynamicField name=&quot;*_t&quot; type=&quot;text“ indexed=&quot;true&quot; stored=&quot;true&quot;/>
Search Relevancy
Configuring Relevancy <fieldtype name=&quot;text&quot; class=&quot;solr.TextField&quot;> <analyzer> <tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/> <filter class=&quot;solr.LowerCaseFilterFactory&quot;/> <filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;synonyms.txt“/> <filter class=&quot;solr.StopFilterFactory“ words=“stopwords.txt”/> <filter class=&quot;solr.EnglishPorterFilterFactory&quot; protected=&quot;protwords.txt&quot;/> </analyzer> </fieldtype>
copyField Copies one field to another at  index time Usecase: Analyze same field different ways copy into a field with a different analyzer boost exact-case, exact-punctuation matches language translations, thesaurus, soundex <field name=“title” type=“text”/> <field name=“title_exact” type=“text_exact” stored=“false”/> <copyField source=“title” dest=“title_exact”/> Usecase: Index multiple fields into single searchable field
 
 
 
 
Web Admin Interface Show Config, Schema, Distribution info Query Interface Statistics Caches: lookups, hits, hitratio, inserts, evictions, size RequestHandlers: requests, errors UpdateHandler: adds, deletes, commits, optimizes IndexReader, open-time, index-version, numDocs,   maxDocs, Analysis Debugger Shows tokens after each Analyzer stage Shows token matches for query vs index
Selling Points Fast Powerful & Configurable High Relevancy Mature Product Same features as software costing $$$ Leverage Community Lucene committers, IR experts Free consulting: shared problems & solutions
Where are we going? OOTB Simple Faceted Browsing Automatic Database Indexing Federated Search HA with failover Alternate output formats (JSON, Ruby) Highlighter integration Spellchecker Alternate APIs (Google Data, OpenSearch)
 

More Related Content

PDF
Introduction to Apache Solr
PPTX
Apache Solr
ODP
Introduction to Apache solr
PPT
HBASE Overview
PDF
Apache Solr crash course
PDF
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
PDF
Change Data Feed in Delta
PDF
A Practical Introduction to Apache Solr
Introduction to Apache Solr
Apache Solr
Introduction to Apache solr
HBASE Overview
Apache Solr crash course
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Change Data Feed in Delta
A Practical Introduction to Apache Solr

What's hot (20)

PDF
Introduction to elasticsearch
PPTX
Centralized log-management-with-elastic-stack
PDF
Hudi architecture, fundamentals and capabilities
PPTX
Apache Spark Fundamentals
PDF
Introduction to Elasticsearch
PDF
Elasticsearch From the Bottom Up
PDF
MongoDB WiredTiger Internals
PDF
What is new in Apache Hive 3.0?
PDF
Productizing Structured Streaming Jobs
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
[Meetup] a successful migration from elastic search to clickhouse
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
PDF
NiFi 시작하기
PDF
Write Faster SQL with Trino.pdf
PPTX
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
PDF
Monitoring with Prometheus
PDF
Oracle RAC 19c: Best Practices and Secret Internals
PDF
Apache Hudi: The Path Forward
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PPTX
Vertica-Database
Introduction to elasticsearch
Centralized log-management-with-elastic-stack
Hudi architecture, fundamentals and capabilities
Apache Spark Fundamentals
Introduction to Elasticsearch
Elasticsearch From the Bottom Up
MongoDB WiredTiger Internals
What is new in Apache Hive 3.0?
Productizing Structured Streaming Jobs
HBase and HDFS: Understanding FileSystem Usage in HBase
[Meetup] a successful migration from elastic search to clickhouse
How to understand and analyze Apache Hive query execution plan for performanc...
NiFi 시작하기
Write Faster SQL with Trino.pdf
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Monitoring with Prometheus
Oracle RAC 19c: Best Practices and Secret Internals
Apache Hudi: The Path Forward
Designing Structured Streaming Pipelines—How to Architect Things Right
Vertica-Database
Ad

Viewers also liked (20)

PPT
Introduction to Apache Solr.
PDF
Introduction to Apache Solr
PPTX
Introduction to Apache Solr
PPTX
20130310 solr tuorial
PDF
An Introduction to Rancher
PDF
Configuring Apache Solr for Thai Text Search
PDF
Docker avec Rancher, du dev à la prod - Makazi au devopsdday 2016
PDF
Meetup Docker Marseille 20160628 - Présentation de Rancher
ODP
Rancher, l'orchestrateur qui vous veut du bien -- BreizhCamp2016
PPT
An Introduction to Solr
PDF
WTF Is Rancher?
PPTX
Intro to Docker and clustering with Rancher from scratch
PDF
Intro to Apache Solr
PPTX
Introduction to Lucene & Solr and Usecases
PDF
Solr Powered Lucene
PDF
High Performance Solr
PPTX
Introction to docker swarm
PDF
Building a real time big data analytics platform with solr
PDF
Docker Online Meetup #28: Production-Ready Docker Swarm
PDF
Docker Swarm 0.2.0
Introduction to Apache Solr.
Introduction to Apache Solr
Introduction to Apache Solr
20130310 solr tuorial
An Introduction to Rancher
Configuring Apache Solr for Thai Text Search
Docker avec Rancher, du dev à la prod - Makazi au devopsdday 2016
Meetup Docker Marseille 20160628 - Présentation de Rancher
Rancher, l'orchestrateur qui vous veut du bien -- BreizhCamp2016
An Introduction to Solr
WTF Is Rancher?
Intro to Docker and clustering with Rancher from scratch
Intro to Apache Solr
Introduction to Lucene & Solr and Usecases
Solr Powered Lucene
High Performance Solr
Introction to docker swarm
Building a real time big data analytics platform with solr
Docker Online Meetup #28: Production-Ready Docker Swarm
Docker Swarm 0.2.0
Ad

Similar to Solr Presentation (20)

PDF
Solr Application Development Tutorial
ODP
Web Scraping with PHP
PPT
Getting Started With The Talis Platform
ODP
Dev8d Apache Solr Tutorial
PPT
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
ODP
Letting In the Light: Using Solr as an External Search Component
ODP
Practical catalyst
PPT
Presentation log4 j
PPT
Presentation log4 j
PPTX
Alfresco Search Internals
ODP
Slug: A Semantic Web Crawler
PPTX
Spring Surf 101
PPT
Struts Portlet
ODP
REST dojo Comet
PPTX
Introduction to SDshare
PDF
Beyond full-text searches with Lucene and Solr
PPT
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
PPT
RESTful SOA - 中科院暑期讲座
PPT
Processing XML with Java
ODP
Solr Application Development Tutorial
Web Scraping with PHP
Getting Started With The Talis Platform
Dev8d Apache Solr Tutorial
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Letting In the Light: Using Solr as an External Search Component
Practical catalyst
Presentation log4 j
Presentation log4 j
Alfresco Search Internals
Slug: A Semantic Web Crawler
Spring Surf 101
Struts Portlet
REST dojo Comet
Introduction to SDshare
Beyond full-text searches with Lucene and Solr
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
RESTful SOA - 中科院暑期讲座
Processing XML with Java

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

Solr Presentation

  • 2. Solr Windows Installation Download and install Tomcat for Windows using the MSI installer. Install it with the tcnative.dll file. Say you installed it in c:\tomcat\ Check if Tomcat is installed correctly by going to http://localhost:8080/ Change the c:\tomcat\conf\server.xml file to add the URIEncoding Connector element as shown below. <Connector port=&quot;8080&quot; maxHttpHeaderSize=&quot;8192&quot; URIEncoding=&quot;UTF-8&quot; maxThreads=&quot;150&quot; minSpareThreads=&quot;25&quot; maxSpareThreads=&quot;75&quot; enableLookups=&quot;false&quot; redirectPort=&quot;8443&quot; acceptCount=&quot;100&quot; connectionTimeout=&quot;20000&quot; disableUploadTimeout=&quot;true&quot; /> Download and unzip the Solr distribution zip file into (say) c:\temp\solrZip\
  • 3. Solr Windows Installation Make a directory called solr where you intend the application server to function, say c:\web\solr\ Copy the contents of the example\solr directory c:\temp\solrZip\example\solr\ to c:\web\solr\ Stop the Tomcat service Copy the *solr*.war file from c:\temp\solrZip\dist\ to the Tomcat webapps directory c:\tomcat\webapps\ Rename the *solr*.war file solr.war Use the system tray icon to configure Tomcat to start with the following Java option: -Dsolr.solr.home=c:\web\solr Alternative to the previous step goto C:\tomcat\conf\Catalina\localhost and create a file named ”solr.xml” having this line of code (see below in the notes) . But to run the server this way you will not keep the solr.war in webapps folder of tomcat but rather in some other folder like in this case I have kept it in ${catalina.home}/newsolr/solr.war. Start the Tomcat service Go to the solr admin page to verify that the installation is working. It will be at http://localhost:8080/solr/admin
  • 4. In Solr and Lucene, an index is built of one or more Documents. A Document consists of one or more Fields. A Field consists of a name, content, and metadata telling Solr how to handle the content. For instance, Fields can contain strings, numbers, booleans, or dates, as well as any types you wish to add. A Field can be described using a number of options that tell Solr how to treat the content during indexing and searching.
  • 5. Field attributes The contents of a stored Field are saved in the index. This is useful for retrieving and highlighting the contents for display but is not necessary for the actual search. For example, many applications store pointers to the location of contents rather than the actual contents of a file. stored Indexed Fields are searchable and sortable. You also can run Solr's analysis process on indexed Fields, which can alter the content to improve or change results. The following section provides more information about Solr's analysis process. indexed Description Attribute name
  • 6. Example &quot;Solr Home&quot; Directory ============================= This directory is provided as an example of what a &quot;Solr Home&quot; directory should look like. It's not strictly necessary that you copy all of the files in this directory when setting up a new instance of Solr, but it is recommended. Basic Directory Structure ------------------------- The Solr Home directory typically contains the following subdirectories... conf/ This directory is mandatory and must contain your solrconfig.xml and schema.xml. Any other optional configuration files would also be kept here. data/ This directory is the default location where Solr will keep your index, and is used by the replication scripts for dealing with snapshots. You can override this location in the solrconfig.xml and scripts.conf files. Solr will create this directory if it does not already exist. lib/ This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve any &quot;plugins&quot; specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...) bin/ This directory is optional. It is the default location used for keeping the replication scripts.
  • 7. What Is Solr SOLR is a REST layer for Lucene Began life at CNET to provide a robust search system Joined Apache Incubator in January 2006 Graduated to Lucene sub-project status in January 2007 A full text search server based on Lucene XML/HTTP Interfaces Loose Schema to define types and fields Web Administration Interface Extensive Caching Index Replication Extensible Open Architecture Written in Java5, deployable as a WAR
  • 8. Why use SOLR? Easy to set up and get started Powerful full text searching Cross platform - Java and REST Under active development Fast Adds extra functionality on top of Lucene: replication CSV importing JSON results results highlighting synonym support
  • 9.  
  • 10. Adding Documents HTTP POST to /update <add><doc boost=“2”> <field name=“article”>05991</field> <field name=“title”>Apache Solr</field> <field name=“subject”>An intro...</field> <field name=“category”>search</field> <field name=“category”>lucene</field> <field name=“body”>Solr is a full...</field> </doc></add>
  • 11. Deleting Documents Delete by Id <delete><id>05591</id></delete> Delete by Query (multiple documents) <delete> <query>manufacturer:microsoft</query> </delete>
  • 12. Commit <commit/> makes changes visible closes IndexWriter removes duplicates opens new IndexSearcher newSearcher/firstSearcher events cache warming “ register” the new IndexSearcher <optimize/> same as commit, merges all index segments.
  • 13. Lucene syntax Required search term – use a “+” +ipod +belkin Field-specific searching – use fieldName name:ipod manu:belkin Wildcard searching – use * or ? ip?d belk* *deo (currently requires modifying solr source) Range searching timestamp:[2006-07-16T12:30:00Z to *] Time needs to be full ISO Proximity searching – use a “~” &quot;video ipod&quot;~3 – up to 3 words apart Fuzzy searches – use a “~” ipod~ will find ipod and ipods belkin~0.7 will find words close spellings
  • 14. Default Query Syntax Lucene Query Syntax [; sort specification] 1. mission impossible; releaseDate desc 2. +mission +impossible –actor:cruise 3. “mission impossible” –actor:cruise 4. title:spiderman^10 description:spiderman 5. description:“spiderman movie”~10 6. +HDTV +weight:[0 TO 100] 7. Wildcard queries: te?t, te*t, test*
  • 15. Full control panel interface Start row/max rows – pagination Output type standard (xml), python, json, ruby, xslt Enable highlighting fields to highlight works on wildcard matches
  • 16. Default Parameters Query Arguments for HTTP GET/POST to /select
  • 17. Search Results http://localhost:8983/solr/select?q=video&start=0&rows=2&fl=name,price <response><responseHeader><status>0</status> <QTime>1</QTime></responseHeader> <result numFound=&quot;16173&quot; start=&quot;0&quot;> <doc> <str name=&quot;name&quot;>Apple 60 GB iPod with Video</str> <float name=&quot;price&quot;>399.0</float> </doc> <doc> <str name=&quot;name&quot;>ASUS Extreme N7800GTX/2DHTV</str> <float name=&quot;price&quot;>479.95</float> </doc> </result> </response>
  • 18. Caching IndexSearcher’s view of an index is fixed • Aggressive caching possible • Consistency for multi-query requests filterCache – unordered set of document ids matching a query resultCache – ordered subset of document ids matching a query documentCache – the stored fields of documents userCaches – application specific, custom query handlers
  • 19. Warming for Speed Lucene IndexReader warming field norms, FieldCache, tii – the term index Static Cache warming Configurable static requests to warm new Searchers Smart Cache Warming (autowarming) Using MRU items in the current cache to prepopulate the new cache Warming in parallel with live requests
  • 21. Schema Lucene has no notion of a schema Sorting - string vs. numeric Ranges - val:42 included in val:[1 TO 5] ? Lucene QueryParser has date-range support, but must guess. Defines fields, their types, properties Defines unique key field, default search field, Similarity implementation
  • 22. Field Definitions Field Attributes: name, type, indexed, stored, multiValued, omitNorms <field name=&quot;id“ type=&quot;string&quot; indexed=&quot;true&quot; stored=&quot;true&quot;/> <field name=&quot;sku“ type=&quot;textTight” indexed=&quot;true&quot; stored=&quot;true&quot;/> <field name=&quot;name“ type=&quot;text“ indexed=&quot;true&quot; stored=&quot;true&quot;/> <field name=“reviews“ type=&quot;text“ indexed=&quot;true“ stored=“false&quot;/> <field name=&quot;category“ type=&quot;text_ws“ indexed=&quot;true&quot; stored=&quot;true“ multiValued=&quot;true&quot;/> Dynamic Fields, in the spirit of Lucene! <dynamicField name=&quot;*_i&quot; type=&quot;sint“ indexed=&quot;true&quot; stored=&quot;true&quot;/> <dynamicField name=&quot;*_s&quot; type=&quot;string“ indexed=&quot;true&quot; stored=&quot;true&quot;/> <dynamicField name=&quot;*_t&quot; type=&quot;text“ indexed=&quot;true&quot; stored=&quot;true&quot;/>
  • 24. Configuring Relevancy <fieldtype name=&quot;text&quot; class=&quot;solr.TextField&quot;> <analyzer> <tokenizer class=&quot;solr.WhitespaceTokenizerFactory&quot;/> <filter class=&quot;solr.LowerCaseFilterFactory&quot;/> <filter class=&quot;solr.SynonymFilterFactory&quot; synonyms=&quot;synonyms.txt“/> <filter class=&quot;solr.StopFilterFactory“ words=“stopwords.txt”/> <filter class=&quot;solr.EnglishPorterFilterFactory&quot; protected=&quot;protwords.txt&quot;/> </analyzer> </fieldtype>
  • 25. copyField Copies one field to another at index time Usecase: Analyze same field different ways copy into a field with a different analyzer boost exact-case, exact-punctuation matches language translations, thesaurus, soundex <field name=“title” type=“text”/> <field name=“title_exact” type=“text_exact” stored=“false”/> <copyField source=“title” dest=“title_exact”/> Usecase: Index multiple fields into single searchable field
  • 26.  
  • 27.  
  • 28.  
  • 29.  
  • 30. Web Admin Interface Show Config, Schema, Distribution info Query Interface Statistics Caches: lookups, hits, hitratio, inserts, evictions, size RequestHandlers: requests, errors UpdateHandler: adds, deletes, commits, optimizes IndexReader, open-time, index-version, numDocs, maxDocs, Analysis Debugger Shows tokens after each Analyzer stage Shows token matches for query vs index
  • 31. Selling Points Fast Powerful & Configurable High Relevancy Mature Product Same features as software costing $$$ Leverage Community Lucene committers, IR experts Free consulting: shared problems & solutions
  • 32. Where are we going? OOTB Simple Faceted Browsing Automatic Database Indexing Federated Search HA with failover Alternate output formats (JSON, Ruby) Highlighter integration Spellchecker Alternate APIs (Google Data, OpenSearch)
  • 33.