SlideShare a Scribd company logo
Solr Anti Patterns
Solr Anti - patterns
Rafał Kuć, Sematext Group, Inc.
@kucrafal
@sematext
http://sematext.com
About me
Sematext consultant & engineer
Solr.pl co-founder
Father & husband
The (not so) perfect migration
http://en.wikipedia.org/wiki/Bird_migration
http://www.likesbooks.com/aarafterhours/?p=750
From 3.1 to 4.10 (and hopefully not back)
March 2011 September 2014
The lonely solrconfig.xml
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" />
<requestHandler name="/update/csv" class="solr.CSVRequestHandler" />
<requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />
<luceneMatchVersion>LUCENE_31</luceneMatchVersion>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
DOC
DOC
DOC
And faulty indexing
EXCEPTIONS :)
And faulty indexing
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
</lst>
<lst name="error">
<str name="msg">missing content stream</str>
<int name="code">400</int>
</lst>
</response>
109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Let’s make that right
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/update/json" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="stream.contentType">application/json</str>
</lst>
</requestHandler>
<luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
<nrtMode>true</nrtMode>
<updateLog>
<str name="dir">
${solr.ulog.dir:}
</str>
</updateLog>
The old schema.xml
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
The old schema.xml
The new schema.xml
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
I see deadlocks
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
OK, so now we can actually run queries
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">10000</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper – production
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
Let’s cache everything
<filterCache class="solr.LRUCache"
size="1048576"
initialSize="1048576"
autowarmCount="524288"/>
<queryResultCache class="solr.LRUCache"
size="1048576"
initialSize="1048576"
autowarmCount="524288"/><documentCache class="solr.LRUCache"
size="1048576"
initialSize="1048576"
autowarmCount="0"/>
And now let’s look at the warmup times
And now let’s look at the warmup times
OK, show us the way „Mr. Consultant”
<filterCache class="solr.FastLRUCache"
size="1024"
initialSize="1024"
autowarmCount="512"/>
<queryResultCache class="solr.LRUCache"
size="16000"
initialSize="16000"
autowarmCount="8000"/><documentCache class="solr.LRUCache"
size="16384"
initialSize="16384"
autowarmCount="0"/>
Let’s look at the warmup times again
Let’s look at the warmup times again
Bulks are for noobs
Application Application Application
Doc Doc Doc
Bulks are for noobs
Application Application Application
Doc Doc Doc
But let’s use bulks, just in case
But let’s use bulks, just in case
We need to refresh and hard commit
<autoCommit>
<maxTime>1000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
Maybe we should only refresh?
<autoCommit>
<maxTime>60000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
OK, let’s go easy with refreshing
<autoCommit>
<maxTime>60000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>30000</maxTime>
</autoSoftCommit>
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">100</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="3000000">
.
.
.
</result>
</response>
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">5</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="3000000">
.
.
.
</result>
</response>
But I really need all that data
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="error">
<str name="msg">java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)
.
.
.
Caused by: java.lang.OutOfMemoryError: Java heap space
.
.
.
</str>
<int name="code">500</int>
</lst>
</response>
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
But I really need all that data
Query
But I really need all that data
But I really need all that data
But I really need all that data
Response
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">189</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">*</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="0">
<doc>
...
</doc>
.
.
.
</result>
<str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str>
</response>
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
&cursorMark=AoIIP4AAACY5OTk5OTA='
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
&cursorMark=AoIIP4AAACY5OTk5OTA='
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">184</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="0">
<doc>
...
</doc>
.
.
.
</result>
<str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str>
</response>
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=-1&facet.mincount=0'
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=-1&facet.mincount=0'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9967</int>
<lst name="params">
...
</lst>
</lst>
<result name="response" numFound="3284000" start="0">
.
.
.
</result>
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="tag">
...
</lst>
</lst>
</lst>
</response>
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=-1&facet.mincount=0'
<?xml version="1.0" encoding="UTF-8"?>
<response>
.
.
.
<lst name="error">
<str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields:
java.lang.OutOfMemoryError: Java heap space
.
.
.
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685)
.
.
.
</str>
<int name="code">500</int>
</lst>
</response>
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Magic happens with small changes
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=100&facet.mincount=1'
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Monitoring in production
http://sematext.com/spm/index.html
And remember…
<luceneMatchVersion>
3.1
</luceneMatchVersion>
Quick summary
http://www.soothetube.com/2013/12/29/thats-all-folks/
We are hiring!
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source?
We’re hiring world – wide!
http://sematext.com/about/jobs.html
Thank you!
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com

More Related Content

PDF
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
PDF
Rapid Prototyping with Solr
PPTX
Apache Solr + ajax solr
PDF
Solr & Lucene @ Etsy by Gregg Donovan
PDF
Solr Troubleshooting - TreeMap approach
PPS
Introduction to Solr
PDF
Solr @ Etsy - Apache Lucene Eurocon
PPTX
Open Source Search: An Analysis
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
Rapid Prototyping with Solr
Apache Solr + ajax solr
Solr & Lucene @ Etsy by Gregg Donovan
Solr Troubleshooting - TreeMap approach
Introduction to Solr
Solr @ Etsy - Apache Lucene Eurocon
Open Source Search: An Analysis

What's hot (20)

ODP
Mastering solr
PDF
Make your gui shine with ajax solr
PDF
it's just search
PDF
Solr workshop
PDF
Solr Indexing and Analysis Tricks
PDF
New SPL Features in PHP 5.3
PPTX
Let's write secure Drupal code! DUG Belgium - 08/08/2019
PPTX
JSON in Solr: from top to bottom
ODP
Intro to The PHP SPL
PPTX
SPL: The Undiscovered Library - DataStructures
ODP
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
PPTX
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
TXT
My shell
PDF
The Origin of Lithium
PDF
Solr & Lucene at Etsy
PDF
Solr and Lucene at Etsy - By Gregg Donovan
PDF
The State of Lithium
PDF
Class-based views with Django
PDF
Living with garbage
KEY
Drupal for ng_os
Mastering solr
Make your gui shine with ajax solr
it's just search
Solr workshop
Solr Indexing and Analysis Tricks
New SPL Features in PHP 5.3
Let's write secure Drupal code! DUG Belgium - 08/08/2019
JSON in Solr: from top to bottom
Intro to The PHP SPL
SPL: The Undiscovered Library - DataStructures
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
My shell
The Origin of Lithium
Solr & Lucene at Etsy
Solr and Lucene at Etsy - By Gregg Donovan
The State of Lithium
Class-based views with Django
Living with garbage
Drupal for ng_os
Ad

Viewers also liked (20)

PDF
Tuning Solr for Logs
PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
PDF
Going All-In With Go For CLI Apps
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
PDF
Docker Logging Webinar
PDF
Top Node.js Metrics to Watch
PDF
Tuning Solr & Pipeline for Logs
PPTX
Musings on Secondary Indexing in HBase
PPTX
MongoDB and Apache HBase: Benchmarking
PDF
Ease of use in Apache Solr
ODP
Search Analytics with Flume and HBase
PDF
Docker Monitoring Webinar
PPTX
Apache HBase Application Archetypes
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
PDF
Improvements to Flink & it's Applications in Alibaba Search
PDF
Introduction to solr
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
PDF
Numeric Range Queries in Lucene and Solr
PDF
Using Morphlines for On-the-Fly ETL
Tuning Solr for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Going All-In With Go For CLI Apps
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Docker Logging Webinar
Top Node.js Metrics to Watch
Tuning Solr & Pipeline for Logs
Musings on Secondary Indexing in HBase
MongoDB and Apache HBase: Benchmarking
Ease of use in Apache Solr
Search Analytics with Flume and HBase
Docker Monitoring Webinar
Apache HBase Application Archetypes
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Improvements to Flink & it's Applications in Alibaba Search
Introduction to solr
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Numeric Range Queries in Lucene and Solr
Using Morphlines for On-the-Fly ETL
Ad

Similar to Solr Anti Patterns (20)

PDF
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
PDF
A noobs lesson on solr (configuration)
PDF
Apache Solr Search Mastery
PPTX
Solr02 fields
PPTX
Solr vs. Elasticsearch - Case by Case
PDF
Rapid Prototyping with Solr
PDF
Beyond full-text searches with Lucene and Solr
PDF
Rapid Prototyping with Solr
PDF
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
PPTX
Scaling Solr with Solr Cloud
PDF
Scaling Solr with SolrCloud
PDF
XML Schemas
PDF
Cassandra summit
PDF
เกี่ยวกับ Apache solr 4.0
PDF
Apache solr liferay
ODP
Dev8d Apache Solr Tutorial
PDF
XamarinとAWSをつないでみた話
PDF
Spring Web Service, Spring Integration and Spring Batch
KEY
Eu odeio OpenSocial
PDF
Create WSDL
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
A noobs lesson on solr (configuration)
Apache Solr Search Mastery
Solr02 fields
Solr vs. Elasticsearch - Case by Case
Rapid Prototyping with Solr
Beyond full-text searches with Lucene and Solr
Rapid Prototyping with Solr
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Scaling Solr with Solr Cloud
Scaling Solr with SolrCloud
XML Schemas
Cassandra summit
เกี่ยวกับ Apache solr 4.0
Apache solr liferay
Dev8d Apache Solr Tutorial
XamarinとAWSをつないでみた話
Spring Web Service, Spring Integration and Spring Batch
Eu odeio OpenSocial
Create WSDL

More from Sematext Group, Inc. (19)

PDF
Tweaking the Base Score: Lucene/Solr Similarities Explained
PDF
OOPs, OOMs, oh my! Containerizing JVM apps
PPTX
Is observability good for your brain?
PDF
Introducing log analysis to your organization
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
Solr on Docker - the Good, the Bad and the Ugly
PDF
Monitoring and Log Management for
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
How to Run Solr on Docker and Why
PPT
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
PDF
Side by Side with Elasticsearch & Solr, Part 2
PDF
(Elastic)search in big data
PDF
Side by Side with Elasticsearch and Solr
PDF
Open Source Search Evolution
PDF
Elasticsearch and Solr for Logs
PDF
Introduction to Elasticsearch
PPTX
Administering and Monitoring SolrCloud Clusters
Tweaking the Base Score: Lucene/Solr Similarities Explained
OOPs, OOMs, oh my! Containerizing JVM apps
Is observability good for your brain?
Introducing log analysis to your organization
Solr Search Engine: Optimize Is (Not) Bad for You
Solr on Docker - the Good, the Bad and the Ugly
Monitoring and Log Management for
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Elasticsearch for Logs & Metrics - a deep dive
How to Run Solr on Docker and Why
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Side by Side with Elasticsearch & Solr, Part 2
(Elastic)search in big data
Side by Side with Elasticsearch and Solr
Open Source Search Evolution
Elasticsearch and Solr for Logs
Introduction to Elasticsearch
Administering and Monitoring SolrCloud Clusters

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
KodekX | Application Modernization Development
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PPTX
Cloud computing and distributed systems.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Monthly Chronicles - July 2025
Chapter 3 Spatial Domain Image Processing.pdf
KodekX | Application Modernization Development
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Cloud computing and distributed systems.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...

Solr Anti Patterns

  • 2. Solr Anti - patterns Rafał Kuć, Sematext Group, Inc. @kucrafal @sematext http://sematext.com
  • 3. About me Sematext consultant & engineer Solr.pl co-founder Father & husband
  • 4. The (not so) perfect migration http://en.wikipedia.org/wiki/Bird_migration http://www.likesbooks.com/aarafterhours/?p=750
  • 5. From 3.1 to 4.10 (and hopefully not back) March 2011 September 2014
  • 6. The lonely solrconfig.xml <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" /> <luceneMatchVersion>LUCENE_31</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
  • 8. And faulty indexing <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response> 109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source)
  • 9. Let’s make that right <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">application/json</str> </lst> </requestHandler> <luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> <nrtMode>true</nrtMode> <updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>
  • 10. The old schema.xml <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
  • 11. <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> The old schema.xml
  • 12. The new schema.xml <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  • 13. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 15. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 16. OK, so now we can actually run queries <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 22. The ZooKeeper – production
  • 23. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 24. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 25. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 26. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 27. Let’s cache everything <filterCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/> <queryResultCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/><documentCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="0"/>
  • 28. And now let’s look at the warmup times
  • 29. And now let’s look at the warmup times
  • 30. OK, show us the way „Mr. Consultant” <filterCache class="solr.FastLRUCache" size="1024" initialSize="1024" autowarmCount="512"/> <queryResultCache class="solr.LRUCache" size="16000" initialSize="16000" autowarmCount="8000"/><documentCache class="solr.LRUCache" size="16384" initialSize="16384" autowarmCount="0"/>
  • 31. Let’s look at the warmup times again
  • 32. Let’s look at the warmup times again
  • 33. Bulks are for noobs Application Application Application Doc Doc Doc
  • 34. Bulks are for noobs Application Application Application Doc Doc Doc
  • 35. But let’s use bulks, just in case
  • 36. But let’s use bulks, just in case
  • 37. We need to refresh and hard commit <autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 38. Maybe we should only refresh? <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 39. OK, let’s go easy with refreshing <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit>
  • 40. But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 41. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">100</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 42. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448) . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . </str> <int name="code">500</int> </lst> </response> curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 43. But I really need all that data Query
  • 44. But I really need all that data
  • 45. But I really need all that data
  • 46. But I really need all that data Response
  • 47. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
  • 48. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>
  • 49. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
  • 50. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>
  • 51. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'
  • 52. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>
  • 53. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685) . . . </str> <int name="code">500</int> </lst> </response>
  • 54. Now let’s look at performance
  • 55. Now let’s look at performance
  • 56. Now let’s look at performance
  • 57. Now let’s look at performance
  • 58. Now let’s look at performance
  • 59. Magic happens with small changes curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'
  • 60. Magic happens with small changes
  • 61. Magic happens with small changes
  • 62. Magic happens with small changes
  • 63. Magic happens with small changes
  • 64. Magic happens with small changes
  • 65. Magic happens with small changes
  • 66. Magic happens with small changes
  • 70. We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide! http://sematext.com/about/jobs.html