SlideShare a Scribd company logo
Scaling Lucene
The event of ElasticSearch
Stéphane Gamard
Scalability
• Index Size - The number of entries upon which we act
• QPS - Number of requests serviced per second
• Time to operation - Time taken to be operational
Scalability is defined in 3 main axis:
Lucene
• IR library - Purely focused on Tf-iDf
• Bounded by native resources - Vertical scaling
• NRT Inverse Lookup - Segments
In a nutshell, Lucene does not scale. why?
Lucene
Segments: the lucene storage
just a “bunch of files”
Lucene Indexing
In a “document” perspective
{#hello, #world}
{#there, #is, #a, #brown, #fox}
{#the, … , #kitchen}
…
T1 {#1, #33}
T2 {#2, … , #87}
…
T45 {#2, …}
…
#a T1
#is T2
…
#fox T45
…
Dictionary Inverse Lookup
Segment
Lucene Indexing
Factors of growth
T1 {#1, #33}
T2 {#2, … , #87}
…
T45 {#2, …}
…
#a T1
#is T2
…
#fox T45
…
Dictionary Inverse Lookup
• Dictionary Size - NLP*
• New Inverse Entries
Segment
Lucene Indexing
In a storage perspective
Segment
Lucene Indexing
In a storage perspective
Segment
Lucene Indexing
In a storage perspective
Segment
Lucene Indexing
In a storage perspective
Segment
IndexReader(s)
IndexWriter
Lucene Indexing
In a storage perspective
IndexReader(s)
IndexWriter
Lucene Index
Lucene
Segments: the lucene storage
just a “bunch of files”
Lucene Indexing
The wonderful world of merging segments
http://blog.mikemccandless.com/
2011/02/visualizing-lucenes-
segment-merges.html
Lucene Wrap-up
• A collection of segments
• One or multiple IndexReader
• A single IndexWriter
A Lucene Index is:
Lucene Wrap-up
A single Lucene Index scales to:
• Index- Available HDD/Ram for segments
• QPS - number of IndexReader threads
• T-to-Op - Speed at which indexWriter can ingest (IOPs)
It can only scale vertically!!!
Elasticsearch
Also known as the commodity scaling of Lucene ;)
There is no magic…
It’s about partitioning,
Using an index of indexes as its index.
Elasticsearch
A shard is the magic sauce of web scale
Lucene Lucene Lucene Lucene Lucene
Elasticsearch Index
Elasticsearch
Document Indexing
Lucene Lucene Lucene Lucene Lucene
• Distributed
• Routing
Elasticsearch
Request
Lucene Lucene Lucene Lucene Lucene
• Parallel
• Aggregated
{search: {…}}
Elasticsearch
In a nutshell
• Distributed - Distribute IndexWriter per shard
• Parallel - Parallelise request IndexReader per shard
Clustering
How to leverage ES to scale Lucene
Lucene
• 2 Threads - 1 searcher, 1 writer
• 2G ram - Lucene Cache
• 30G disk - Index size
Sample sizing for xM indexed documents
Elasticsearch Index
Clustering
Lucene
2T/2G/30G
Lucene
2T/2G/30G
Lucene
2T/2G/30G
Lucene
2T/2G/30G
Single Machine Scope: 8Core 16G ram 500G hdd
can sustain 4 times xM documents
Clustering
# Documents
QPS
1 machine -> 4 * xM documents
Clustering
2 machines -> 2 * 4 * xM documents
# Documents
QPS
• 4 Threads - 3 searcher, 1 writer
• 4G ram - Lucene Cache
• 60G disk - Index size
Clustering
# Documents
QPS
4 machines -> 2 * 4 * xM documents
twice more QPS
Clustering
# Documents
QPS
Is there a limit to this scalability?
Clustering
# Documents
QPS
• 8 Threads - 7 searcher, 1 writer
• 8G ram - Lucene Cache
• 120G disk - Index size
4 machines -> 4 * 4 * xM documents
Clustering
The rules of thumbs
• Threads - are the core of the scalability factors
• IOPs - is generally the limiting factor to horizontal scaling
• Ram - is generally the limiting factor of vertical scaling
ES is generally excellent with its parameters
Clustering
Health
• Redundancy - auto-balance shards for best possible HA
• Timing - Warmup and Commit points
• Latency - Result merging (especially on remote aggregations)

More Related Content

PDF
Elk - An introduction
ODP
Deep Dive Into Elasticsearch
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
PPTX
An Introduction to Elastic Search.
PDF
Logging with Elasticsearch, Logstash & Kibana
PPTX
Elastic search overview
PPTX
ElasticSearch Basic Introduction
PPTX
Log analysis using Logstash,ElasticSearch and Kibana
Elk - An introduction
Deep Dive Into Elasticsearch
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
An Introduction to Elastic Search.
Logging with Elasticsearch, Logstash & Kibana
Elastic search overview
ElasticSearch Basic Introduction
Log analysis using Logstash,ElasticSearch and Kibana

What's hot (20)

PPTX
PPTX
Elastic - ELK, Logstash & Kibana
PPTX
Introduction to Elasticsearch with basics of Lucene
PDF
Cosco: An Efficient Facebook-Scale Shuffle Service
PPTX
검색엔진이 데이터를 다루는 법 김종민
PPTX
Centralized Logging System Using ELK Stack
PDF
Networking in Java with NIO and Netty
PPTX
Single-Page-Application & REST security
PDF
Log analysis with elastic stack
PPTX
Log management with ELK
PPTX
Elastic stack Presentation
PPTX
ELK Stack
PPTX
An Intro to Elasticsearch and Kibana
PDF
Introduction to elasticsearch
PDF
Making Structured Streaming Ready for Production
PDF
ELK Stack
PPTX
ELK Elasticsearch Logstash and Kibana Stack for Log Management
PDF
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
PDF
Introduction to elasticsearch
PPTX
Elastic Stack Introduction
Elastic - ELK, Logstash & Kibana
Introduction to Elasticsearch with basics of Lucene
Cosco: An Efficient Facebook-Scale Shuffle Service
검색엔진이 데이터를 다루는 법 김종민
Centralized Logging System Using ELK Stack
Networking in Java with NIO and Netty
Single-Page-Application & REST security
Log analysis with elastic stack
Log management with ELK
Elastic stack Presentation
ELK Stack
An Intro to Elasticsearch and Kibana
Introduction to elasticsearch
Making Structured Streaming Ready for Production
ELK Stack
ELK Elasticsearch Logstash and Kibana Stack for Log Management
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Introduction to elasticsearch
Elastic Stack Introduction
Ad

Viewers also liked (20)

PDF
Scaling Elasticsearch at Synthesio
PPTX
Elasticsearch Introduction
ODP
Comparing open source search engines
PDF
elasticsearch - advanced features in practice
PPTX
Solr
PDF
Introduction To Apache Lucene
ODP
Search Lucene
PDF
Architecture and implementation of Apache Lucene
PPT
Configuring elasticsearch for performance and scale
PDF
Devinsampa nginx-scripting
PDF
Munching & crunching - Lucene index post-processing
PPTX
Index types
PDF
Text Indexing / Inverted Indices
PDF
Lucene
PPT
Lucene and MySQL
PPT
Lucandra
PDF
Intro to Elasticsearch
PPT
Inverted index
PPT
Intelligent crawling and indexing using lucene
PPT
An introduction to inverted index
Scaling Elasticsearch at Synthesio
Elasticsearch Introduction
Comparing open source search engines
elasticsearch - advanced features in practice
Solr
Introduction To Apache Lucene
Search Lucene
Architecture and implementation of Apache Lucene
Configuring elasticsearch for performance and scale
Devinsampa nginx-scripting
Munching & crunching - Lucene index post-processing
Index types
Text Indexing / Inverted Indices
Lucene
Lucene and MySQL
Lucandra
Intro to Elasticsearch
Inverted index
Intelligent crawling and indexing using lucene
An introduction to inverted index
Ad

Similar to From Lucene to Elasticsearch, a short explanation of horizontal scalability (20)

PDF
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
PPTX
The ELK Stack - Launch and Learn presentation
PPTX
Dictionary Based Annotation at Scale with Spark by Sujit Pal
PPTX
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
PPTX
Devnexus 2018
PDF
Introduction to SolrCloud
PPT
HPTS talk on micro sharding with Katta
PPT
Lucene BootCamp
PPTX
ElasticSearch Basics
PDF
Roaring with elastic search sangam2018
PDF
ELK stack introduction
PDF
InfluxDB Internals
PPTX
Elasticsearch features presentation
PDF
Hippo meetup: enterprise search with Solr and elasticsearch
PPTX
The ultimate guide for Elasticsearch plugins
PPTX
Elastic search
PDF
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
PDF
No sql & dq2 tracer service
PPTX
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
PPTX
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
The ELK Stack - Launch and Learn presentation
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Devnexus 2018
Introduction to SolrCloud
HPTS talk on micro sharding with Katta
Lucene BootCamp
ElasticSearch Basics
Roaring with elastic search sangam2018
ELK stack introduction
InfluxDB Internals
Elasticsearch features presentation
Hippo meetup: enterprise search with Solr and elasticsearch
The ultimate guide for Elasticsearch plugins
Elastic search
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
No sql & dq2 tracer service
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx

From Lucene to Elasticsearch, a short explanation of horizontal scalability