SlideShare a Scribd company logo
Solr metrics
Andrzej Białecki
Lucidworks
2
whoami
•  Lucene  /  Solr  user,  contributor,  commi2er
•  Author  of  Luke  –  The  Index  Toolbox
•  Lucidworks  Fusion  developer
3
Agenda
•  MoAvaAon  –  why  Solr  needs  metrics
•  Design  –  how  Solr  collects  metrics  and  what  data  is  being  collected
•  ImplementaAon  –  key  components
•  ConfiguraAon  –  what  to  collect  and  how  to  report  it
•  Examples  of  metrics  and  integraAons  with  external  systems
•  Future  development
4
Motivation
5
Why metrics?
•  DevOps  need  tools  for  monitoring  the  system  behavior
•  ProducAon  troubleshooAng,  eg.  FD  leaks,  outlier  requests
-  Profiling  is  not  an  opAon  in  producAon  deployments
•  OpAmizaAon  of  the  system  on  various  levels
-  OS,  collecAon,  node,  shard,  doc  rouAng,  …
•  Especially  useful  in  locked-­‐down  deployments
6
JIRA
•  Long-­‐standing  request  –  first  created  in  2013!
•  Many  contributors  (and  watchers!)
•  “metrics”  JIRA  component
-  Containing  now  approx.  60  issues
•  Key  JIRAs:
-  SOLR-­‐4735  iniAal  framework
-  SOLR-­‐9812  /admin/metrics  handler
•  First  released  in  Solr  6.4.0
-  With  important  bug  fixes  released  in  6.4.2
7
Design
8
Dropwizard Metrics
•  High-­‐performance  lightweight  metrics  framework
•  Metric  types
-  counter:  monotonically  increasing  counter
-  number  of  processed  docs
-  meter:  counter  +  moving  average  (rate),  1-­‐,  5-­‐  and  15-­‐minute
-  system  load  average,  rate  of  requests
-  histogram:  histogram  of  values  (exponenAally  decaying  by  default)
-  result  sizes,  IO  read  sizes
-  Amer:  meter  and  histogram  of  event  duraAons
-  commit  Ames,  query  Ames,  request  Ames
-  gauge:  instantaneous  reading  of  a  value
-  current  heap  size,  number  of  cores,  TLOG  buffer  size
9
Where the data is collected from?
•  JVM  metrics
-  GC,  heap,  threads,  class  loading,  OS  load  /  mem  /  FDs,  etc
•  Je2y  /  HTTP  metrics
-  connecAons,  thread  pools,  …
•  Container  metrics
-  number  of  cores,  data  paths,  admin  handler  metrics
•  Per-­‐SolrCore  metrics
-  All  RequestHandler-­‐s:  request  counters  and  Amers
-  Searcher  and  cache  stats
-  ReplicaAon
-  Index-­‐level  Amers  and  histograms
-  Other  components
•  SolrCloud  metrics  (opAonal)
-  Aggregated  from  SolrCloud  nodes
JVM	
  
Je&y	
  /	
  HTTP	
  
CoreContainer	
  
SolrCore	
  
	
  
…	
  
Components	
  
Solr  instance
10
Registries
•  Metric  groups  for  each  major  aspect  of  a  Solr  instance:
-  jvm, jetty, node (CoreContainer),  core  (SolrCore)
-  see  SolrInfoBean.Group
•  One  registry  per  group,  and  one  for  each  SolrCore
-  Easier  to  manage  core  metrics  throughout  core  life-­‐cycle
•  No  persistence  across  node  restarts
-  SolrCore  metrics  persist  across  core  reloads
solr.jvm	
  
solr.node	
  
solr.core.collec=on1	
  
11
Registry names
•  Hierarchical,  dot-­‐separated
•  Always  prefixed  with  solr.
•  Overridable  using  System  properAes:
-Dsolr.core.collection1=solr.myCollection
-  This  is  useful  eg.  to  collapse  per-­‐replica  registries  into  one  registry  with  aggregated  
metrics
•  SolrCloud  “core”  registry  name  example:

SolrCore  name: collection1_shard1_replica_n3
Registry  name:  solr.core.collection1.shard1.replica_n3
solr.jvm	
  
solr.node	
  
solr.core.collec=on1	
  
12
Metric names
•  Hierarchical  dot-­‐separated
•  By  convenAon  names  start  with  component  category
-  eg.  CONTAINER, CORE, QUERY  …
-  see  SolrInfoBean.Category
•  Request  handler  metrics  follow  this  naming:
<category>    .    <handler  name  or  scope>    .    <metric  name>
•  Examples:
QUERY./select.requestTimes
UPDATE.updateShardHandler.threadPool.recoveryExecutor.completed
solr.jvm	
  
solr.node	
  
solr.core.collec=on1	
  
CORE.fs.totalSpace	
  
SEARCHER.new	
  
QUERY./get.requests	
  
CACHE.core.fieldCache	
  
13
Metric properties
•  Simple  numeric  /  string  value,  or  nested  JSON  maps
-  Numeric  counter:    QUERY./select.requests
-  Timer:    QUERY./select.requestTimes
-  ProperAes:  count,  meanRate,  1minRate,  5minRate,  15minRate,  min_ms,  max_ms,  
mean_ms,  p75_ms,  …
-  A  data  structure:    CACHE.core.fieldCache
-  ProperAes:  total_size,  entries_count,  entry#0,  entry#1,  …
-  Arbitrary  map:    system.properties
•  It’s  possible  to  retrieve  only  selected  properAes  via  /
admin/metrics handler
-  Eg.  key=solr.jvm:system.properties:user.name
solr.jvm	
  
solr.node	
  
solr.core.collec=on1	
  
CORE.fs.totalSpace	
  
SEARCHER.new	
  
QUERY./get.requests	
  
CACHE.core.fieldCache	
  
total_size	
  
entries_count	
  
entry#0	
  
14
Implementation
15
Components
•  SolrMetricManager
-  One  central  component  to  manage  registries  and  reporters
•  MetricRegistry	
  (Dropwizard  API)
-  Type-­‐safe  Map  keeping  related  metric  instances  and  their  names
•  SolrMetricProducer (interface)
-  Creates  and  registers  metric  instances
-  Many  exisAng  Solr  components  now  implement  this  interface
•  SolrMetricReporter (abstract  class)
-  Reports  collected  metrics  to  external  agents  and/or  files
-  Several  implementaAons  available  out  of  the  box
•  MetricsHandler  (at  /admin/metrics)
-  Provides  access  to  all  local  metric  registries
16
SolrMetricManager	
  
Solr	
  instance	
  
solr.core.collec=on1	
  
solr.core.collec=on2	
  
solr.core.collec=on3	
  
CoreContainer	
  
SolrCore1	
  
SolrCore2	
  
SolrCore3	
  
solr.je&y	
  
SolrMetricReporter
/admin/metrics	
  
Ganglia	
  
Graphite	
  
SLF4j	
  
JMX	
  
solr.node	
  
UI  and  other  reporAng  tools
solr.jvm	
  
17
/admin/metrics handler
•  Shows  metrics  from  all  or  selected  registries
•  Flexible  selecAon  criteria:
-  registry  by:  group  (e.g.  jetty,  node),  or  registry  name  (e.g.  
solr.core.collection1)
-  filter  metrics  by  a  list  of  prefixes  (or  regexes)
-  retrieve  only  some  properAes  using  property  parameter
-  Retrieve  single  metrics  /  properAes  using  fully-­‐qualified  key  (7.1)
18
Example
/admin/metrics
http://localhost:8983
/solr/admin/metrics
?group=core
&prefix=SEARCHER
19
Example
/admin/metrics
http://localhost:8983
/solr/admin/metrics?group=core
&regex=QUERY./select.*Times
&property=max_ms
20
Example /admin/metrics http://localhost:8983
/solr/admin/metrics
?group=node
&prefix=CONTAINER
21
Metrics vs. Solr 6.x MBeans
•  Naming  of  categories  and  groups  has  slightly  changed
-  More  fiqng  categories  for  some  components
•  Solr  6.x  sAll  uses  independent  implementaAons  for  Metrics  and  MBeans
-  Some  staAsAcs  are  either  unavailable  in  each  API  or  reported  differently
•  Solr  7.x  uses  only  Metrics  API  to  report  MBean  stats
-  This  includes  also  nested  and  non-­‐numeric  values
-  <jmx> element  in  solrconfig.xml  is  no  longer  supported  –  instead  use  
SolrJMXReporter  in  solr.xml
-  AutomaAcally  added  if  missing  and  when  an  MBeanServer  is  detected
22
Admin UI in 7.x
23
Configuration
24
Metrics collection
•  Already  happening  J
-  Minimal  overhead,  in  the  order  of  μs/req  and  <  0.5  MB  /  core
•  New  secAon  in  solr.xml:  <solr><metrics>
-  Reporter  configuraAon
-  Custom  metric  implementaAons
-  Some  debug  configuraAon
-  Detailed  histograms  of  index  and  TLOG  processing  Ames,  per  core
25
Reporters
•  Extend  SolrMetricReporter
•  Configured  in  solr.xml <solr><metrics><reporter>
•  Several  implementaAons  provided:
-  JMX:  fully  hierarchical  view  in  e.g.  JConsole
-  Ganglia,  Graphite,  SLF4J:  send  periodic  reports  of  selected  metrics
-  Easy  API  –  create  new  ones!
-  h2ps://github.com/vthacker/solr-­‐metrics-­‐influxdb
•  Created  for  each  selected  registry,  using  group  and/or  registry  list  a2ributes
-  If  neither  is  present  the  reporter  is  created  for  all  registries
26
Reporter configuration details
•  Required  a2ributes:  name  (unique  per  registry),  class  (FQCN)
•  OpAonal  a2ributes:
-  group –  comma-­‐separated  list  of  registry  groups,  eg.  core,jvm
-  registry  –  comma-­‐separated  list  of  registry  prefixes,  eg.  
solr.node,solr.core.coll
•  OpAonal  initArgs:
-  filter  -­‐  report  only  metrics  with  that  prefix,  e.g.  QUERY./select
-  period  -­‐  how  oven  metrics  will  be  reported,  in  seconds
-  ...  other,  depending  on  implementaAon,  e.g.  logger  name  for  SLF4j
*  NOTE:  for  a  given  configuraAon,  separate  reporter  instances  are  created  for  each  matching  registry
27
solr.xml
<solr>
...
<metrics>
<reporter name="global"
class="org.apache.solr.metrics.reporters.SolrJmxReporter"/>
<reporter name="perCore" group="core"
class="org.apache.solr.metrics.reporters.SolrSlf4jReporter”>
<int name=“period”>60</int>
<str name=“logger”>metricsLogger</str>
</reporter>
</metrics>
</solr>
28
Advanced configuration
•  LimitaAons  of  default  histogram  /  Amer  in  Dropwizard  Metrics
-  Uses  ExponenAallyDecayingReservoir  (EDR)  sampling  BUT  assumes  
normal  distribuAon
-  If  distribuAon  is  skewed  then  rare  outliers  may  never  be  captured  or  retained  long  enough
-  EDR  is  tuned  to  prefer  last  5  minutes  of  data  –  but  keeps  only  1028  
random  samples
-  May  LOSE  criAcal  min  /  max  /  percenAle  data  under  higher  rate  of  
updates
-  May  report  obsolete  values  to  snapshot  because  retained  data  is  
replaced  randomly
-  Internal  values  are  “decayed”  only  during  updates  –  no  updates  means  
values  are  stuck!

≠
29
Advanced configuration
•  Custom  parameters  and  implementaAons  for  metrics
-  <solr><metrics><suppliers>  secAon  in  solr.xml
-  Users  can  provide  their  own  implementaAons  of  counters,  meters,  Amers  and  
histograms
•  Solving  the  issue  with  EDR
-  Use  different  reservoir  size,  or  different  reservoir  implementaAon
-  Several  other  implementaAons  available,  with  tradeoffs,  eg.  SlidingTimeWindowReservoir
-  Use  your  own  histogram  implementaAons
-  h2p://github.com/vladimir-­‐bukhtoyarov/rolling-­‐metrics

*  NOTE:  metric  reporters  retrieve  metric  snapshots  concurrently  and  at  arbitrary  :mes,  
DO  NOT  use  implementa:ons  that  reset  to  0  a>er  each  snapshot!
30
Example advanced configuration (solr.xml)
•  Different  reservoir  implementaAon:
<solr>
<metrics>
<suppliers>
<histogram>
<int name=“window”>300</int>
<str name=“reservoir”>com.codahale.metrics.SlidingTimeWindowReservoir</str>
</histogram>
</suppliers>
</metrics>
</solr>
31
SolrCloud metrics (7.x)
•  Shard  metrics
-  Reported  from  replicas  to  shard  leaders
•  Node  metrics
-  Reported  from  mulAple  registries  on  each  node  to  Overseer
•  ParAally  aggregated  (simple  sum,  avg,  mean,  stddev,  string  lists)
-  Some  aggregaAons  wouldn’t  make  sense,  eg.  Histograms
•  AutomaAcally  collected  by  /metrics/collector  handler
•  Configured  in  solr.xml  using  special  shard  and  cluster  groups
32
Example outputs
33
Example JConsole
view
34
Example JConsole
view in 6.x
35
Example
Graphite
view
36
Example Graphite view
37
Example SLF4j view
38
Future
39
Metrics in 7.x
•  Adding  more  configurability
•  Be2er  defaults  for  reservoirs
•  Autoscaling  framework
-  Autoscaling  acAons  are  largely  based  on  metrics,  eg.
-  freedisk, sysLoadAvg, cores, heapUsage,  system  properAes
-  May  use  any  metric  value  eg.  metrics:solr.node:CONTAINER.fs.usableSpace
•  Using  metrics  for  feedback  control  in  Solr  clusters
-  Support  for  modeling  and  simulaAon  of  dynamic  behavior  (SOLR-­‐11285)
40
Summary
•  Metrics  are  a  lightweight  mechanism  for  collecAng  detailed  insights  into  Solr  
operaAon
-  Provided  now  by  most  Solr  components
-  Easy  to  add  new  metrics
•  Metrics  can  be  reported  to  external  systems  in  mulAple  formats  and  protocols
-  Several  popular  systems  already  supported
-  Easy  to  add  new  reporters
•  Metrics  provide  key  data  for  SolrCloud  autoscaling
•  How  do  you  want  to  use  metrics?
41
Q & A
Thank You

More Related Content

PDF
Backup & disaster recovery for Solr
PDF
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
PDF
Solr Query Parsing
PDF
Parquet performance tuning: the missing guide
PDF
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
PDF
Beyond EXPLAIN: Query Optimization From Theory To Code
PDF
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
PDF
Parquet Strata/Hadoop World, New York 2013
Backup & disaster recovery for Solr
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
Solr Query Parsing
Parquet performance tuning: the missing guide
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Beyond EXPLAIN: Query Optimization From Theory To Code
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Parquet Strata/Hadoop World, New York 2013

What's hot (20)

PPTX
Solr Exchange: Introduction to SolrCloud
PDF
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
PDF
Hyperspace: An Indexing Subsystem for Apache Spark
PDF
Dawid Weiss- Finite state automata in lucene
PPTX
Why your Spark Job is Failing
PDF
Berlin Buzzwords 2013 - How does lucene store your data?
PDF
Presto on Apache Spark: A Tale of Two Computation Engines
PDF
Intro to Cypher
PDF
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
PPTX
Achieving 100k Queries per Hour on Hive on Tez
PPTX
Apache Ambari: Managing Hadoop and YARN
PDF
What is in a Lucene index?
PDF
The Google Bigtable
PDF
The Apache Spark File Format Ecosystem
PPT
Hive User Meeting August 2009 Facebook
PDF
Introduction to Apache Calcite
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
PDF
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Solr Exchange: Introduction to SolrCloud
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
Hyperspace: An Indexing Subsystem for Apache Spark
Dawid Weiss- Finite state automata in lucene
Why your Spark Job is Failing
Berlin Buzzwords 2013 - How does lucene store your data?
Presto on Apache Spark: A Tale of Two Computation Engines
Intro to Cypher
20230511 - PGConf Nepal - Clustering in PostgreSQL_ Because one database serv...
Achieving 100k Queries per Hour on Hive on Tez
Apache Ambari: Managing Hadoop and YARN
What is in a Lucene index?
The Google Bigtable
The Apache Spark File Format Ecosystem
Hive User Meeting August 2009 Facebook
Introduction to Apache Calcite
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Ad

Similar to Solr Metrics - Andrzej Białecki, Lucidworks (20)

PDF
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
KEY
Apache Solr - Enterprise search platform
PDF
Deploying and managing Solr at scale
PDF
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
PPTX
Benchmarking Solr Performance at Scale
PPTX
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
PDF
Oslo Solr MeetUp March 2012 - Solr4 alpha
PDF
Lucene/Solr 8: The next major release
PDF
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
PPTX
(Re)Indexing Large Repositories in Alfresco
PPT
Simplescalar Overview- a Superscalar.ppt
PPTX
Solr Lucene Conference 2014 - Nitin Presentation
PPTX
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
PDF
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
PPTX
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
PPTX
Apache Solr Workshop
PDF
Apache Solr Workshop
PPTX
MeetUp Monitoring with Prometheus and Grafana (September 2018)
PDF
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
PPTX
Benchmarking Solr Performance
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Apache Solr - Enterprise search platform
Deploying and managing Solr at scale
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Benchmarking Solr Performance at Scale
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Oslo Solr MeetUp March 2012 - Solr4 alpha
Lucene/Solr 8: The next major release
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
(Re)Indexing Large Repositories in Alfresco
Simplescalar Overview- a Superscalar.ppt
Solr Lucene Conference 2014 - Nitin Presentation
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Apache Solr Workshop
Apache Solr Workshop
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Benchmarking Solr Performance
Ad

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectroscopy.pptx food analysis technology
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation

Solr Metrics - Andrzej Białecki, Lucidworks

  • 2. 2 whoami •  Lucene  /  Solr  user,  contributor,  commi2er •  Author  of  Luke  –  The  Index  Toolbox •  Lucidworks  Fusion  developer
  • 3. 3 Agenda •  MoAvaAon  –  why  Solr  needs  metrics •  Design  –  how  Solr  collects  metrics  and  what  data  is  being  collected •  ImplementaAon  –  key  components •  ConfiguraAon  –  what  to  collect  and  how  to  report  it •  Examples  of  metrics  and  integraAons  with  external  systems •  Future  development
  • 5. 5 Why metrics? •  DevOps  need  tools  for  monitoring  the  system  behavior •  ProducAon  troubleshooAng,  eg.  FD  leaks,  outlier  requests -  Profiling  is  not  an  opAon  in  producAon  deployments •  OpAmizaAon  of  the  system  on  various  levels -  OS,  collecAon,  node,  shard,  doc  rouAng,  … •  Especially  useful  in  locked-­‐down  deployments
  • 6. 6 JIRA •  Long-­‐standing  request  –  first  created  in  2013! •  Many  contributors  (and  watchers!) •  “metrics”  JIRA  component -  Containing  now  approx.  60  issues •  Key  JIRAs: -  SOLR-­‐4735  iniAal  framework -  SOLR-­‐9812  /admin/metrics  handler •  First  released  in  Solr  6.4.0 -  With  important  bug  fixes  released  in  6.4.2
  • 8. 8 Dropwizard Metrics •  High-­‐performance  lightweight  metrics  framework •  Metric  types -  counter:  monotonically  increasing  counter -  number  of  processed  docs -  meter:  counter  +  moving  average  (rate),  1-­‐,  5-­‐  and  15-­‐minute -  system  load  average,  rate  of  requests -  histogram:  histogram  of  values  (exponenAally  decaying  by  default) -  result  sizes,  IO  read  sizes -  Amer:  meter  and  histogram  of  event  duraAons -  commit  Ames,  query  Ames,  request  Ames -  gauge:  instantaneous  reading  of  a  value -  current  heap  size,  number  of  cores,  TLOG  buffer  size
  • 9. 9 Where the data is collected from? •  JVM  metrics -  GC,  heap,  threads,  class  loading,  OS  load  /  mem  /  FDs,  etc •  Je2y  /  HTTP  metrics -  connecAons,  thread  pools,  … •  Container  metrics -  number  of  cores,  data  paths,  admin  handler  metrics •  Per-­‐SolrCore  metrics -  All  RequestHandler-­‐s:  request  counters  and  Amers -  Searcher  and  cache  stats -  ReplicaAon -  Index-­‐level  Amers  and  histograms -  Other  components •  SolrCloud  metrics  (opAonal) -  Aggregated  from  SolrCloud  nodes JVM   Je&y  /  HTTP   CoreContainer   SolrCore     …   Components   Solr  instance
  • 10. 10 Registries •  Metric  groups  for  each  major  aspect  of  a  Solr  instance: -  jvm, jetty, node (CoreContainer),  core  (SolrCore) -  see  SolrInfoBean.Group •  One  registry  per  group,  and  one  for  each  SolrCore -  Easier  to  manage  core  metrics  throughout  core  life-­‐cycle •  No  persistence  across  node  restarts -  SolrCore  metrics  persist  across  core  reloads solr.jvm   solr.node   solr.core.collec=on1  
  • 11. 11 Registry names •  Hierarchical,  dot-­‐separated •  Always  prefixed  with  solr. •  Overridable  using  System  properAes: -Dsolr.core.collection1=solr.myCollection -  This  is  useful  eg.  to  collapse  per-­‐replica  registries  into  one  registry  with  aggregated   metrics •  SolrCloud  “core”  registry  name  example: SolrCore  name: collection1_shard1_replica_n3 Registry  name:  solr.core.collection1.shard1.replica_n3 solr.jvm   solr.node   solr.core.collec=on1  
  • 12. 12 Metric names •  Hierarchical  dot-­‐separated •  By  convenAon  names  start  with  component  category -  eg.  CONTAINER, CORE, QUERY  … -  see  SolrInfoBean.Category •  Request  handler  metrics  follow  this  naming: <category>    .    <handler  name  or  scope>    .    <metric  name> •  Examples: QUERY./select.requestTimes UPDATE.updateShardHandler.threadPool.recoveryExecutor.completed solr.jvm   solr.node   solr.core.collec=on1   CORE.fs.totalSpace   SEARCHER.new   QUERY./get.requests   CACHE.core.fieldCache  
  • 13. 13 Metric properties •  Simple  numeric  /  string  value,  or  nested  JSON  maps -  Numeric  counter:    QUERY./select.requests -  Timer:    QUERY./select.requestTimes -  ProperAes:  count,  meanRate,  1minRate,  5minRate,  15minRate,  min_ms,  max_ms,   mean_ms,  p75_ms,  … -  A  data  structure:    CACHE.core.fieldCache -  ProperAes:  total_size,  entries_count,  entry#0,  entry#1,  … -  Arbitrary  map:    system.properties •  It’s  possible  to  retrieve  only  selected  properAes  via  / admin/metrics handler -  Eg.  key=solr.jvm:system.properties:user.name solr.jvm   solr.node   solr.core.collec=on1   CORE.fs.totalSpace   SEARCHER.new   QUERY./get.requests   CACHE.core.fieldCache   total_size   entries_count   entry#0  
  • 15. 15 Components •  SolrMetricManager -  One  central  component  to  manage  registries  and  reporters •  MetricRegistry  (Dropwizard  API) -  Type-­‐safe  Map  keeping  related  metric  instances  and  their  names •  SolrMetricProducer (interface) -  Creates  and  registers  metric  instances -  Many  exisAng  Solr  components  now  implement  this  interface •  SolrMetricReporter (abstract  class) -  Reports  collected  metrics  to  external  agents  and/or  files -  Several  implementaAons  available  out  of  the  box •  MetricsHandler  (at  /admin/metrics) -  Provides  access  to  all  local  metric  registries
  • 16. 16 SolrMetricManager   Solr  instance   solr.core.collec=on1   solr.core.collec=on2   solr.core.collec=on3   CoreContainer   SolrCore1   SolrCore2   SolrCore3   solr.je&y   SolrMetricReporter /admin/metrics   Ganglia   Graphite   SLF4j   JMX   solr.node   UI  and  other  reporAng  tools solr.jvm  
  • 17. 17 /admin/metrics handler •  Shows  metrics  from  all  or  selected  registries •  Flexible  selecAon  criteria: -  registry  by:  group  (e.g.  jetty,  node),  or  registry  name  (e.g.   solr.core.collection1) -  filter  metrics  by  a  list  of  prefixes  (or  regexes) -  retrieve  only  some  properAes  using  property  parameter -  Retrieve  single  metrics  /  properAes  using  fully-­‐qualified  key  (7.1)
  • 21. 21 Metrics vs. Solr 6.x MBeans •  Naming  of  categories  and  groups  has  slightly  changed -  More  fiqng  categories  for  some  components •  Solr  6.x  sAll  uses  independent  implementaAons  for  Metrics  and  MBeans -  Some  staAsAcs  are  either  unavailable  in  each  API  or  reported  differently •  Solr  7.x  uses  only  Metrics  API  to  report  MBean  stats -  This  includes  also  nested  and  non-­‐numeric  values -  <jmx> element  in  solrconfig.xml  is  no  longer  supported  –  instead  use   SolrJMXReporter  in  solr.xml -  AutomaAcally  added  if  missing  and  when  an  MBeanServer  is  detected
  • 24. 24 Metrics collection •  Already  happening  J -  Minimal  overhead,  in  the  order  of  μs/req  and  <  0.5  MB  /  core •  New  secAon  in  solr.xml:  <solr><metrics> -  Reporter  configuraAon -  Custom  metric  implementaAons -  Some  debug  configuraAon -  Detailed  histograms  of  index  and  TLOG  processing  Ames,  per  core
  • 25. 25 Reporters •  Extend  SolrMetricReporter •  Configured  in  solr.xml <solr><metrics><reporter> •  Several  implementaAons  provided: -  JMX:  fully  hierarchical  view  in  e.g.  JConsole -  Ganglia,  Graphite,  SLF4J:  send  periodic  reports  of  selected  metrics -  Easy  API  –  create  new  ones! -  h2ps://github.com/vthacker/solr-­‐metrics-­‐influxdb •  Created  for  each  selected  registry,  using  group  and/or  registry  list  a2ributes -  If  neither  is  present  the  reporter  is  created  for  all  registries
  • 26. 26 Reporter configuration details •  Required  a2ributes:  name  (unique  per  registry),  class  (FQCN) •  OpAonal  a2ributes: -  group –  comma-­‐separated  list  of  registry  groups,  eg.  core,jvm -  registry  –  comma-­‐separated  list  of  registry  prefixes,  eg.   solr.node,solr.core.coll •  OpAonal  initArgs: -  filter  -­‐  report  only  metrics  with  that  prefix,  e.g.  QUERY./select -  period  -­‐  how  oven  metrics  will  be  reported,  in  seconds -  ...  other,  depending  on  implementaAon,  e.g.  logger  name  for  SLF4j *  NOTE:  for  a  given  configuraAon,  separate  reporter  instances  are  created  for  each  matching  registry
  • 27. 27 solr.xml <solr> ... <metrics> <reporter name="global" class="org.apache.solr.metrics.reporters.SolrJmxReporter"/> <reporter name="perCore" group="core" class="org.apache.solr.metrics.reporters.SolrSlf4jReporter”> <int name=“period”>60</int> <str name=“logger”>metricsLogger</str> </reporter> </metrics> </solr>
  • 28. 28 Advanced configuration •  LimitaAons  of  default  histogram  /  Amer  in  Dropwizard  Metrics -  Uses  ExponenAallyDecayingReservoir  (EDR)  sampling  BUT  assumes   normal  distribuAon -  If  distribuAon  is  skewed  then  rare  outliers  may  never  be  captured  or  retained  long  enough -  EDR  is  tuned  to  prefer  last  5  minutes  of  data  –  but  keeps  only  1028   random  samples -  May  LOSE  criAcal  min  /  max  /  percenAle  data  under  higher  rate  of   updates -  May  report  obsolete  values  to  snapshot  because  retained  data  is   replaced  randomly -  Internal  values  are  “decayed”  only  during  updates  –  no  updates  means   values  are  stuck! ≠
  • 29. 29 Advanced configuration •  Custom  parameters  and  implementaAons  for  metrics -  <solr><metrics><suppliers>  secAon  in  solr.xml -  Users  can  provide  their  own  implementaAons  of  counters,  meters,  Amers  and   histograms •  Solving  the  issue  with  EDR -  Use  different  reservoir  size,  or  different  reservoir  implementaAon -  Several  other  implementaAons  available,  with  tradeoffs,  eg.  SlidingTimeWindowReservoir -  Use  your  own  histogram  implementaAons -  h2p://github.com/vladimir-­‐bukhtoyarov/rolling-­‐metrics *  NOTE:  metric  reporters  retrieve  metric  snapshots  concurrently  and  at  arbitrary  :mes,   DO  NOT  use  implementa:ons  that  reset  to  0  a>er  each  snapshot!
  • 30. 30 Example advanced configuration (solr.xml) •  Different  reservoir  implementaAon: <solr> <metrics> <suppliers> <histogram> <int name=“window”>300</int> <str name=“reservoir”>com.codahale.metrics.SlidingTimeWindowReservoir</str> </histogram> </suppliers> </metrics> </solr>
  • 31. 31 SolrCloud metrics (7.x) •  Shard  metrics -  Reported  from  replicas  to  shard  leaders •  Node  metrics -  Reported  from  mulAple  registries  on  each  node  to  Overseer •  ParAally  aggregated  (simple  sum,  avg,  mean,  stddev,  string  lists) -  Some  aggregaAons  wouldn’t  make  sense,  eg.  Histograms •  AutomaAcally  collected  by  /metrics/collector  handler •  Configured  in  solr.xml  using  special  shard  and  cluster  groups
  • 39. 39 Metrics in 7.x •  Adding  more  configurability •  Be2er  defaults  for  reservoirs •  Autoscaling  framework -  Autoscaling  acAons  are  largely  based  on  metrics,  eg. -  freedisk, sysLoadAvg, cores, heapUsage,  system  properAes -  May  use  any  metric  value  eg.  metrics:solr.node:CONTAINER.fs.usableSpace •  Using  metrics  for  feedback  control  in  Solr  clusters -  Support  for  modeling  and  simulaAon  of  dynamic  behavior  (SOLR-­‐11285)
  • 40. 40 Summary •  Metrics  are  a  lightweight  mechanism  for  collecAng  detailed  insights  into  Solr   operaAon -  Provided  now  by  most  Solr  components -  Easy  to  add  new  metrics •  Metrics  can  be  reported  to  external  systems  in  mulAple  formats  and  protocols -  Several  popular  systems  already  supported -  Easy  to  add  new  reporters •  Metrics  provide  key  data  for  SolrCloud  autoscaling •  How  do  you  want  to  use  metrics?