SlideShare a Scribd company logo
Administering and Monitoring SolrCloud

Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Ta me…
Sematext consultant & engineer
Solr.pl co-founder
Father and husband 
SolrCloud Concepts
Shard1
Replica

Shard2
Replica

Solr Server

Solr Server

Shard2

Shard1

Solr Server

Solr Server

Application
Local SolrCloud Cluster
java -Dbootstrap_confdir=./solr/revolution/conf
-Dcollection.configName=revolution -DzkRun -DnumShards=1 -jar
start.jar

Runs embedded ZooKeeper
Bootstraps collection with 1 shards
Starts Solr
Starting Solr Cluster
No Collection

No Collection

-DzkHost=192.168.1.1:2181,
192.168.1.2:2181,192.168.1.3:2181

Solr Server

-DzkHost=192.168.1.3:2181,
192.168.1.1:2181,192.168.1.2:2181

Solr Server

No Collection

No Collection

-DzkHost=192.168.1.2:2181,
192.168.1.1:2181,192.168.1.3:2181

-DzkHost=192.168.1.3:2181,
192.168.1.1:2181,192.168.1.2:2181

Solr Server

ZooKeeper

ZooKeeper

ZooKeeper

Solr Server
Uploading Collection Configuration
./zkcli.sh -cmd upconfig -zkhost 192.168.1.1:2181
-confdir ./conf/ -confname revolution

ZooKeeper

Collection configuration

ZooKeeper

ZooKeeper

Solr
Collections API
Create
Delete
Reload
Split
Create Alias
Delete Alias
Shard Creation/Deletion

http://wiki.apache.org/solr/SolrCloud
Collection Creation
curl 'http://solrhost:8983/solr/admin/collections?action=CREATE
&name=revolution&numShards=3&replicationFactor=4'

name
numShards
replicationFactor
maxShardsPerNode

createNodeSet
collection.configName
Collection Split Example

$ curl
'http://solr1:8983/solr/admin/collections?action=CREATE&
name=collection1&numShards=2&replicationFactor=1'
Collection Split Example

$ curl 'http://localhost:8983/solr/admin/collections?
action=SPLITSHARD&collection=collection1&shard=shard1'
Getting Deeper – CoreAdmin API
curl 'http://solrhost:8983/solr/admin/cores?action=CREATE
&name=newcore&collection=revolution&shard=shard2'

collection
shard
numShards

collection.configName
Schema – the API
Reading (Solr 4.2)
Fields
Dynamic fields
Types
Copy fields
Name (4.3)
Version (4.3)
Unique Key (4.3)
Similarity (4.3)

Writing (Solr 4.4)
Adding new fields
Adding copy fields
Reading Your Schema
curl -XGET 'http://solrhost:8983/solr/rev/schema/fields/name'
{
"responseHeader" : {
"status" : 0,
"QTime" : 5 },
"field" : {
"name" : "name",
"type" : "text_general",
"indexed" : true,
"stored" : true }
}

Full reference: http://wiki.apache.org/solr/SchemaRESTAPI
Dynamic Schema Modifications
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
curl -XPUT 'http://solrhost:8983/solr/rev/schema/fields/content' –d
'{
"type" : "text",
"stored" : "false",
"copyFields" : ["catchAll"]
}'
curl -XPOST 'http://solrhost:8983/solr/rev/schema/copyFields' -d
'[
{
"source" : "name",
"dest" : [ "text", "personal" ]
}
]'
The Right Directory
StandardDirectory
SimpleFSDirectory

NIOFSDirectory
MMapDirectory

_0.fdt

_0.fdx _0.fnm _0.nvd

_1.fdt

_1.fdx _1.fnm _1.nvd

NRTCachingDirectory
RAMDirectory

<directoryFactory name="DirectoryFactory"
class="solr.NRTCachingDirectoryFactory" />
Segment Merging
Level 0

a

b

f

Level 1

c

c

d

e

g
Segment Merge Under Control
Merge policy
Merge scheduler

Merge factor
Merge policy configuration

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Autocommit or Not?
Automatic data flush (hard commit)

Automatic index view refresh

<autoCommit>
<maxTime>15000</maxTime>
<maxDocs>1000</maxDocs>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
Caches
Refreshed with IndexSearcher
Configurable

Different purposes
Different implementations

Solr Cache
Monitoring Importance
What to Pay Attention to?
Cluster State
Health
Shards and replica status
Shard placement
Failing nodes
Indexing Related Metrics
Index throughput
Document distribution

I/O subsystem metrics
Merging
Search - related Metrics
Count

Latency
Distribution among nodes

Anomalies and spikes
Monitoring Memory and GC
Heap details
Pool size
Pool utilization
Garbage collection count
Garbage collection time
Monitoring OS Related Metrics
CPU details
Load
I/O activity
Network usage
Solr Administration Panel
Solr & JMX
<jmx />
java -Dcom.sun.management.jmxremote –jar start.jar
Solr & JMX
SPM
Index statistics
Request # and latency
Caches and warmup
CPU
JVM Memory and OS Memory
Garbage collector
OS related statistics
SPM Dashboard
Other Monitoring Tools
Ganglia
http://ganglia.sourceforge.net/

New Relic
http://www.newrelic.com/

Opsview
http://www.opsview.com
Too much is too much
Too hot
Caches
We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’re hiring world – wide !
http://sematext.com/about/jobs.html
Thank You !
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com
SPM discount code:

LR2013SPM20

@ Sematext booth ;)

More Related Content

PPTX
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
ODP
Apache SolrCloud
PDF
How to Run Solr on Docker and Why
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
Tuning Solr for Logs
PPTX
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
PDF
Solr cluster with SolrCloud at lucenerevolution (tutorial)
PPTX
Scaling Solr with Solr Cloud
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Apache SolrCloud
How to Run Solr on Docker and Why
Solr Search Engine: Optimize Is (Not) Bad for You
Tuning Solr for Logs
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Scaling Solr with Solr Cloud

What's hot (20)

PDF
Tuning Solr & Pipeline for Logs
PDF
How to make a simple cheap high availability self-healing solr cluster
PDF
SolrCloud Failover and Testing
PDF
Solr for Indexing and Searching Logs
PDF
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
PPTX
Terraform day02
PDF
Terraform introduction
PDF
Docker Monitoring Webinar
PDF
Declarative & workflow based infrastructure with Terraform
PDF
Terraform at Scale - All Day DevOps 2017
PPTX
Terraform day03
PDF
Failsafe Mechanism for Yahoo Homepage
PDF
Scaling search in Oak with Solr
PDF
MySQL Slow Query log Monitoring using Beats & ELK
PDF
Replacing Squid with ATS
KEY
Curator intro
PDF
Zookeeper In Action
PDF
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
Tuning Solr & Pipeline for Logs
How to make a simple cheap high availability self-healing solr cluster
SolrCloud Failover and Testing
Solr for Indexing and Searching Logs
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Elasticsearch for Logs & Metrics - a deep dive
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Terraform day02
Terraform introduction
Docker Monitoring Webinar
Declarative & workflow based infrastructure with Terraform
Terraform at Scale - All Day DevOps 2017
Terraform day03
Failsafe Mechanism for Yahoo Homepage
Scaling search in Oak with Solr
MySQL Slow Query log Monitoring using Beats & ELK
Replacing Squid with ATS
Curator intro
Zookeeper In Action
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
Ad

Viewers also liked (20)

PDF
High Performance Solr
PDF
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
PPTX
Benchmarking Solr Performance at Scale
PPTX
Solr Exchange: Introduction to SolrCloud
PPTX
Benchmarking Solr Performance
PDF
Introduction to SolrCloud
PDF
Monitoring and Log Management for
PPT
Solr and Elasticsearch, a performance study
PDF
Solr's Admin UI - Where does the data come from?
PDF
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
PDF
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
PDF
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
PPTX
Solrcloud Leader Election
PDF
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
PDF
Loadrunner vs Jmeter
PDF
How SolrCloud Changes the User Experience In a Sharded Environment
PPTX
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
PDF
Cross Datacenter Replication in Apache Solr 6
PDF
SolrCloud and Shard Splitting
PDF
Scaling search with SolrCloud
High Performance Solr
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Benchmarking Solr Performance at Scale
Solr Exchange: Introduction to SolrCloud
Benchmarking Solr Performance
Introduction to SolrCloud
Monitoring and Log Management for
Solr and Elasticsearch, a performance study
Solr's Admin UI - Where does the data come from?
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Solrcloud Leader Election
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Loadrunner vs Jmeter
How SolrCloud Changes the User Experience In a Sharded Environment
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Cross Datacenter Replication in Apache Solr 6
SolrCloud and Shard Splitting
Scaling search with SolrCloud
Ad

Similar to Administering and Monitoring SolrCloud Clusters (20)

PPTX
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
PPTX
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
PDF
Deploying and managing Solr at scale
PPTX
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
PPTX
Battle of the Giants round 2
PDF
Scaling Solr with SolrCloud
PDF
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
PPTX
Managing a SolrCloud cluster using APIs
PDF
Best practices for highly available and large scale SolrCloud
PDF
Introduction to solr
PPTX
Big data - Solr Integration
PDF
SOLR Power FTW: short version
PDF
Hadoop-scale Search with Solr
PPTX
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
PDF
Seeley yonik solr performance key innovations
PPTX
Building a Large Scale SEO/SEM Application with Apache Solr
PPTX
Scaling Massive Elasticsearch Clusters
PDF
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
PPTX
Solr 4: Run Solr in SolrCloud Mode on your local file system.
PDF
Solr on Docker: the Good, the Bad, and the Ugly - Radu Gheorghe, Sematext Gro...
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Deploying and managing Solr at scale
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants round 2
Scaling Solr with SolrCloud
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
Managing a SolrCloud cluster using APIs
Best practices for highly available and large scale SolrCloud
Introduction to solr
Big data - Solr Integration
SOLR Power FTW: short version
Hadoop-scale Search with Solr
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
Seeley yonik solr performance key innovations
Building a Large Scale SEO/SEM Application with Apache Solr
Scaling Massive Elasticsearch Clusters
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr on Docker: the Good, the Bad, and the Ugly - Radu Gheorghe, Sematext Gro...

More from Sematext Group, Inc. (20)

PDF
Tweaking the Base Score: Lucene/Solr Similarities Explained
PDF
OOPs, OOMs, oh my! Containerizing JVM apps
PPTX
Is observability good for your brain?
PDF
Introducing log analysis to your organization
PDF
Solr on Docker - the Good, the Bad and the Ugly
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
PDF
Top Node.js Metrics to Watch
PPT
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
PDF
Docker Logging Webinar
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
PDF
Side by Side with Elasticsearch & Solr, Part 2
PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
PDF
Solr Anti Patterns
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
PDF
(Elastic)search in big data
PDF
Side by Side with Elasticsearch and Solr
PDF
Open Source Search Evolution
Tweaking the Base Score: Lucene/Solr Similarities Explained
OOPs, OOMs, oh my! Containerizing JVM apps
Is observability good for your brain?
Introducing log analysis to your organization
Solr on Docker - the Good, the Bad and the Ugly
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Top Node.js Metrics to Watch
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Docker Logging Webinar
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Side by Side with Elasticsearch & Solr, Part 2
Tuning Elasticsearch Indexing Pipeline for Logs
Solr Anti Patterns
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
(Elastic)search in big data
Side by Side with Elasticsearch and Solr
Open Source Search Evolution

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Empathic Computing: Creating Shared Understanding
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced IT Governance
PDF
Machine learning based COVID-19 study performance prediction
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Empathic Computing: Creating Shared Understanding
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The Rise and Fall of 3GPP – Time for a Sabbatical?
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
Advanced IT Governance
Machine learning based COVID-19 study performance prediction
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...

Administering and Monitoring SolrCloud Clusters