SlideShare a Scribd company logo
Workshop
Yasas Senarath
Visiting Instructor & Research Assistant
Dept. of Computer Science and Engineering,
University of Moratuwa
Solr
Introduction [Recall]
● Search Platform
● Open-Source
● Search Applications
● Built on top of Lucene
● Why…
○ Enterprise-ready
○ Fast
○ Highly Scalable
● Search + NoSQL
○ Non Relational Data Storage
Features of Apache Solr [Recall]
● Restful APIs
○ No Java programming skills Required
● Full text search
○ tokens, phrases, spell check, wildcard, and auto-complete
● Enterprise ready
● Flexible and Extensible
● NoSQL database
● Admin Interface
● Highly Scalable
● Text-Centric and Sorted by Relevance
How do Search Engines Work?
Installing Solr
● Go to Solr Website and Download Binary Version of Solr-8.1.1 (Latest Version
of Slor)
● Extract the Downloaded Compressed File to Your System
● Now in the Terminal Run Command (should change directory of terminal to
Extracted Solr Folder)
○ Unix*: bin/solr start
○ Windows: binsolr.cmd start
● Goto http://localhost:8983/
Techproducts Example
● Starting Solr with Example
○ Unix*: bin/solr -e techproducts
○ Windows: binsolr.cmd -e techproducts
● To verify that Solr is running, you can do this:
○ Unix*: bin/solr status
○ Windows: binsolr.cmd status
● Access Admin Panel
○ http://localhost:8983/solr/
Adding Documents
● Open example/exampledocs/sd500.xml
● Add files to Solr using post.jar
○ cd example/exampledocs
○ java -Dc=techproducts -jar post.jar sd500.xml
● 2 main ways
○ HTTP
○ Native client
<add><doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
...
<field name="inStock">true</field>
</doc></add>
Searching Overview
● Select API Command
○ http://localhost:8983/solr/ techproducts/select?q=sd500&wt=json
● Need only Name and ID of all elements?
○ http://localhost:8983/solr/ techproducts/select?q=inStock:false&wt=jso
n&fl=id,name
● Shutdown
○ Unix*: bin/solr stop
○ Windows: binsolr.cmd stop
● Delete Collection
○ Unix*: bin/solr delete -c techproducts
○ Windows: binsolr.cmd delete -c techproducts
Basic Solr Concepts
● Inverted Index
● Index consists of one or more Documents
● Document consists of one or more Fields
● Every field has a Field Type
● Schema
○ Before adding documents to Solr, you need to specify the schema ! (very important)
○ Schema File: schema.xml
● Schema declares
○ what kinds of fields there are
○ which field should be used as the unique/primary key
○ which fields are required
○ how to index and search each field
Basic Solr Concepts [Contd..]
● Field Types
○ float
○ long
○ double
○ date
○ Text
● Define new field types!
<fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
</analyzer>
</fieldtype>
Basic Solr Concepts [Contd..]
● Defining a Field
○ name: Name of the field
○ type: Field type
○ indexed: Should this field be added to the inverted index?
○ stored: Should the original value of this field be stored?
○ multiValued: Can this field have multiple values
<field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
Example Documents
● Use your own project corpus
● Movie Dataset: URL: https://bit.ly/2JhpEhF
Create a Collection
● Start Solr
○ Unix*: bin/solr start
○ Windows: binsolr.cmd start
● Create Collection
○ Unix*: bin/solr create -c movies
○ Windows: binsolr.cmd create -c movies
● Defining Schema
○ Two Approaches
■ Schemaless with “field guessing” feature (Managed Schema)
■ Use schema.xml with custom schema
Custom Schema
● Rename managed_schema file to schema.xml
● schema.xml
○ <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="tagline" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="overview" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="status" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="budget" type="plong" indexed="true" stored="true" multiValued="false"/>
○ <field name="popularity" type="pdouble" indexed="true" stored="true" multiValued="false"/>
○ <field name="release_date" type="pdate" indexed="true" stored="true" multiValued="false"/>
○ <field name="revenue" type="plong" indexed="true" stored="true" multiValued="false"/>
○ <field name="runtime" type="pint" indexed="true" stored="true" multiValued="false"/>
○ <field name="vote_average" type="pfloat" indexed="true" stored="true" multiValued="false"/>
○ <field name="vote_count" type="pint" indexed="true" stored="true" multiValued="false"/>
● solrconfig.xml
○ <schemaFactory class="ClassicIndexSchemaFactory"/>
○ ${update.autoCreateFields:false}
Add Documents
Curl "http://localhost:8983/solr/movies/update?commit=true"
--data-binary @example/movies/movies_metadata.csv -H
"Content-type:application/csv"
Basic Queries
Get All Documents:
http://localhost:8983/solr/movies/select?q=*:*&wt=json
Search Documents Containing “Toy Story” in “title” field:
http://localhost:8983/solr/movies/select?q=title:Toy%20Story&
wt=json
Search Documents Containing “Toy Story”:
http://localhost:8983/solr/movies/select?q=Toy%20Story&wt=j
son (!)
The Fix… (Copy Field)
● Add a Copy Field
<copyField source="*" dest="_text_"/>
● Is it ok? No!
● Only Few Fields
● Which Fields?
○ Title
○ Tagline
○ Overview
Custom Copy Fields
● Add following to schema.xml
<copyField source="title" dest="_text_"/>
<copyField source="tagline" dest="_text_"/>
<copyField source="overview" dest="_text_"/>
● Note that the destination should be marked multiValued="true"
<field name="_text_" type="text_general" indexed="true"
stored="false" multiValued="true"/>
Analyzers
● Analyzers are specified as a child of the <fieldType>
<fieldType name="nametext" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/>
</fieldType>
● Using simple processing steps
<fieldType name="nametext" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"/>
</analyzer>
</fieldType>
● Create custom Text Field: text_title
● Filters used in Analyzers
○ Tokenize : Tokenizer
<tokenizer class="solr.StandardTokenizerFactory"/>
○ Stopwords : Filter (stopwords.txt)
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
○ LowerCase: Filter
<filter class="solr.LowerCaseFilterFactory"/>
○ Synonyms : Filter (synonyms.txt)
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
Filters
Analysis Phases
● Separate Analyzers for Index and Query
<fieldType name="nametext" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Synonyms (synonyms.txt)
● Add Some Synonyms to synonyms.txt
○ story, story, tale, fiction
○ heat, heat, hot, warm
○ se7en, se7en, seven, 7
● Spell correction with Synonyms
○ stores => stories
Toy Stories Example
Advanced Queries
● Search title:Mask AND tagline:hero
○ title:Mask AND tagline:hero
○ http://localhost:8983/solr/movies/select?q=title%3AMask%20AND%20tagline%3Ahero
● Search The Mask in title or Mask in title with hero in tagline
○ title:Mask AND tagline:hero
○ http://localhost:8983/solr/movies/select?q=(title%3AMask%20AND%20tagline%3Ahero)%20O
R%20title%3A%22The%20Mask%22
● Wildcard matching: Search movies that have a title starting with “The”
○ title: ^the
○ http://localhost:8983/solr/movies/select?q=title%3A%22the*%22
● Proximity matching: Search “exorcist spirits" with proximity of 4 words in the
overview field
○ “exorcist spirits"~4
○ http://localhost:8983/solr/movies/select?q=overview%3A%22exorcist%20spirits%22~4
● Range Queries
○ Inclusive Range Query: Square brackets [ & ]
■ budget:[500000 TO *]
○ Exclusive Range Query: Curly brackets { & }
■ budget:{500000 TO *}
● Boosting a Term with ^
○ Want a term to be more relevant?
■ toy^4 story
● For more about Queries:
○ https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
Advanced Queries
The Schemaless Approach
● Let's do the same in Schemaless Approach
Questions?
Yasas Senarath
Visiting Instructor & Research Assistant
Dept. of Computer Science and Engineering,
University of Moratuwa

More Related Content

PPTX
Oracle Database 12c - Data Redaction
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
ECMA Script
PPT
Database performance tuning and query optimization
PPT
Data Block
PDF
Sentiment Analysis Using Solr
PPTX
ASP.MVC Training
PDF
Programação Orientada a objetos em Java
Oracle Database 12c - Data Redaction
Solr Search Engine: Optimize Is (Not) Bad for You
ECMA Script
Database performance tuning and query optimization
Data Block
Sentiment Analysis Using Solr
ASP.MVC Training
Programação Orientada a objetos em Java

What's hot (18)

PPT
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
PPTX
Optimizing queries MySQL
PPTX
Mysql data replication
PPTX
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
PDF
Microsoft SQL Server Query Tuning
PDF
Performance tuning in sql server
PPT
Oracle Architecture
PDF
SQL Server Tuning to Improve Database Performance
PDF
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
PPSX
Php and MySQL
ODP
Aspect Oriented Programming (AOP) - A case study in Android
PPSX
Strings in Java
PDF
Relatório da uml
PDF
Barman (PostgreSql) manual
PPTX
Ten query tuning techniques every SQL Server programmer should know
PPTX
Postgresql Database Administration Basic - Day1
PDF
Method, Constructor, Method Overloading, Method Overriding, Inheritance In Java
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Optimizing queries MySQL
Mysql data replication
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 1
Microsoft SQL Server Query Tuning
Performance tuning in sql server
Oracle Architecture
SQL Server Tuning to Improve Database Performance
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
Php and MySQL
Aspect Oriented Programming (AOP) - A case study in Android
Strings in Java
Relatório da uml
Barman (PostgreSql) manual
Ten query tuning techniques every SQL Server programmer should know
Postgresql Database Administration Basic - Day1
Method, Constructor, Method Overloading, Method Overriding, Inheritance In Java
Ad

Similar to Solr workshop (20)

ODP
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
PDF
Get the most out of Solr search with PHP
PDF
Apache solr liferay
PDF
Using Search API, Search API Solr and Facets in Drupal 8
PPTX
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
PDF
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
PPTX
Journey through high performance django application
PDF
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
PDF
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
PPTX
Implementing full text search with Apache Solr
PPS
Introduction to Solr
PDF
Manticore 6.pdf
PDF
Information Retrieval - Data Science Bootcamp
PPTX
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
PDF
Python for web security - beginner
PDF
Nzitf Velociraptor Workshop
PDF
Apache Solr crash course
PDF
Apache Solr Workshop
PDF
Basics of Solr and Solr Integration with AEM6
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Get the most out of Solr search with PHP
Apache solr liferay
Using Search API, Search API Solr and Facets in Drupal 8
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Journey through high performance django application
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Introduction to Apache Tajo: Data Warehouse for Big Data
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
Implementing full text search with Apache Solr
Introduction to Solr
Manticore 6.pdf
Information Retrieval - Data Science Bootcamp
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Python for web security - beginner
Nzitf Velociraptor Workshop
Apache Solr crash course
Apache Solr Workshop
Basics of Solr and Solr Integration with AEM6
Ad

More from Yasas Senarath (7)

PDF
Aspect Based Sentiment Analysis
PPTX
Forecasting covid 19 by states with mobility data
PDF
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
PDF
Affect Level Opinion Mining
PPTX
Data science / Big Data
PPTX
Lecture on Deep Learning
PPTX
Twitter sentiment analysis
Aspect Based Sentiment Analysis
Forecasting covid 19 by states with mobility data
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Affect Level Opinion Mining
Data science / Big Data
Lecture on Deep Learning
Twitter sentiment analysis

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Modernizing your data center with Dell and AMD
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Modernizing your data center with Dell and AMD
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
NewMind AI Monthly Chronicles - July 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?

Solr workshop

  • 1. Workshop Yasas Senarath Visiting Instructor & Research Assistant Dept. of Computer Science and Engineering, University of Moratuwa Solr
  • 2. Introduction [Recall] ● Search Platform ● Open-Source ● Search Applications ● Built on top of Lucene ● Why… ○ Enterprise-ready ○ Fast ○ Highly Scalable ● Search + NoSQL ○ Non Relational Data Storage
  • 3. Features of Apache Solr [Recall] ● Restful APIs ○ No Java programming skills Required ● Full text search ○ tokens, phrases, spell check, wildcard, and auto-complete ● Enterprise ready ● Flexible and Extensible ● NoSQL database ● Admin Interface ● Highly Scalable ● Text-Centric and Sorted by Relevance
  • 4. How do Search Engines Work?
  • 5. Installing Solr ● Go to Solr Website and Download Binary Version of Solr-8.1.1 (Latest Version of Slor) ● Extract the Downloaded Compressed File to Your System ● Now in the Terminal Run Command (should change directory of terminal to Extracted Solr Folder) ○ Unix*: bin/solr start ○ Windows: binsolr.cmd start ● Goto http://localhost:8983/
  • 6. Techproducts Example ● Starting Solr with Example ○ Unix*: bin/solr -e techproducts ○ Windows: binsolr.cmd -e techproducts ● To verify that Solr is running, you can do this: ○ Unix*: bin/solr status ○ Windows: binsolr.cmd status ● Access Admin Panel ○ http://localhost:8983/solr/
  • 7. Adding Documents ● Open example/exampledocs/sd500.xml ● Add files to Solr using post.jar ○ cd example/exampledocs ○ java -Dc=techproducts -jar post.jar sd500.xml ● 2 main ways ○ HTTP ○ Native client <add><doc> <field name="id">9885A004</field> <field name="name">Canon PowerShot SD500</field> <field name="manu">Canon Inc.</field> ... <field name="inStock">true</field> </doc></add>
  • 8. Searching Overview ● Select API Command ○ http://localhost:8983/solr/ techproducts/select?q=sd500&wt=json ● Need only Name and ID of all elements? ○ http://localhost:8983/solr/ techproducts/select?q=inStock:false&wt=jso n&fl=id,name ● Shutdown ○ Unix*: bin/solr stop ○ Windows: binsolr.cmd stop ● Delete Collection ○ Unix*: bin/solr delete -c techproducts ○ Windows: binsolr.cmd delete -c techproducts
  • 9. Basic Solr Concepts ● Inverted Index ● Index consists of one or more Documents ● Document consists of one or more Fields ● Every field has a Field Type ● Schema ○ Before adding documents to Solr, you need to specify the schema ! (very important) ○ Schema File: schema.xml ● Schema declares ○ what kinds of fields there are ○ which field should be used as the unique/primary key ○ which fields are required ○ how to index and search each field
  • 10. Basic Solr Concepts [Contd..] ● Field Types ○ float ○ long ○ double ○ date ○ Text ● Define new field types! <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" > <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> </analyzer> </fieldtype>
  • 11. Basic Solr Concepts [Contd..] ● Defining a Field ○ name: Name of the field ○ type: Field type ○ indexed: Should this field be added to the inverted index? ○ stored: Should the original value of this field be stored? ○ multiValued: Can this field have multiple values <field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
  • 12. Example Documents ● Use your own project corpus ● Movie Dataset: URL: https://bit.ly/2JhpEhF
  • 13. Create a Collection ● Start Solr ○ Unix*: bin/solr start ○ Windows: binsolr.cmd start ● Create Collection ○ Unix*: bin/solr create -c movies ○ Windows: binsolr.cmd create -c movies ● Defining Schema ○ Two Approaches ■ Schemaless with “field guessing” feature (Managed Schema) ■ Use schema.xml with custom schema
  • 14. Custom Schema ● Rename managed_schema file to schema.xml ● schema.xml ○ <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="tagline" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="overview" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="status" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="budget" type="plong" indexed="true" stored="true" multiValued="false"/> ○ <field name="popularity" type="pdouble" indexed="true" stored="true" multiValued="false"/> ○ <field name="release_date" type="pdate" indexed="true" stored="true" multiValued="false"/> ○ <field name="revenue" type="plong" indexed="true" stored="true" multiValued="false"/> ○ <field name="runtime" type="pint" indexed="true" stored="true" multiValued="false"/> ○ <field name="vote_average" type="pfloat" indexed="true" stored="true" multiValued="false"/> ○ <field name="vote_count" type="pint" indexed="true" stored="true" multiValued="false"/> ● solrconfig.xml ○ <schemaFactory class="ClassicIndexSchemaFactory"/> ○ ${update.autoCreateFields:false}
  • 15. Add Documents Curl "http://localhost:8983/solr/movies/update?commit=true" --data-binary @example/movies/movies_metadata.csv -H "Content-type:application/csv"
  • 16. Basic Queries Get All Documents: http://localhost:8983/solr/movies/select?q=*:*&wt=json Search Documents Containing “Toy Story” in “title” field: http://localhost:8983/solr/movies/select?q=title:Toy%20Story& wt=json Search Documents Containing “Toy Story”: http://localhost:8983/solr/movies/select?q=Toy%20Story&wt=j son (!)
  • 17. The Fix… (Copy Field) ● Add a Copy Field <copyField source="*" dest="_text_"/> ● Is it ok? No! ● Only Few Fields ● Which Fields? ○ Title ○ Tagline ○ Overview
  • 18. Custom Copy Fields ● Add following to schema.xml <copyField source="title" dest="_text_"/> <copyField source="tagline" dest="_text_"/> <copyField source="overview" dest="_text_"/> ● Note that the destination should be marked multiValued="true" <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
  • 19. Analyzers ● Analyzers are specified as a child of the <fieldType> <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/> </fieldType> ● Using simple processing steps <fieldType name="nametext" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer> </fieldType>
  • 20. ● Create custom Text Field: text_title ● Filters used in Analyzers ○ Tokenize : Tokenizer <tokenizer class="solr.StandardTokenizerFactory"/> ○ Stopwords : Filter (stopwords.txt) <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> ○ LowerCase: Filter <filter class="solr.LowerCaseFilterFactory"/> ○ Synonyms : Filter (synonyms.txt) <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Filters
  • 21. Analysis Phases ● Separate Analyzers for Index and Query <fieldType name="nametext" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  • 22. Synonyms (synonyms.txt) ● Add Some Synonyms to synonyms.txt ○ story, story, tale, fiction ○ heat, heat, hot, warm ○ se7en, se7en, seven, 7 ● Spell correction with Synonyms ○ stores => stories
  • 24. Advanced Queries ● Search title:Mask AND tagline:hero ○ title:Mask AND tagline:hero ○ http://localhost:8983/solr/movies/select?q=title%3AMask%20AND%20tagline%3Ahero ● Search The Mask in title or Mask in title with hero in tagline ○ title:Mask AND tagline:hero ○ http://localhost:8983/solr/movies/select?q=(title%3AMask%20AND%20tagline%3Ahero)%20O R%20title%3A%22The%20Mask%22 ● Wildcard matching: Search movies that have a title starting with “The” ○ title: ^the ○ http://localhost:8983/solr/movies/select?q=title%3A%22the*%22 ● Proximity matching: Search “exorcist spirits" with proximity of 4 words in the overview field ○ “exorcist spirits"~4 ○ http://localhost:8983/solr/movies/select?q=overview%3A%22exorcist%20spirits%22~4
  • 25. ● Range Queries ○ Inclusive Range Query: Square brackets [ & ] ■ budget:[500000 TO *] ○ Exclusive Range Query: Curly brackets { & } ■ budget:{500000 TO *} ● Boosting a Term with ^ ○ Want a term to be more relevant? ■ toy^4 story ● For more about Queries: ○ https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html Advanced Queries
  • 26. The Schemaless Approach ● Let's do the same in Schemaless Approach
  • 27. Questions? Yasas Senarath Visiting Instructor & Research Assistant Dept. of Computer Science and Engineering, University of Moratuwa