SlideShare a Scribd company logo
whoami
Fram Souza
I’ve been working with IT for about 6 years;
I have a BsC in Computer Networks;
I’ve been working with Elasticsearch for about 2 years;
Currenty, I’m IT Specialist at Nextel;
Migration strategies for a mission-critical
cluster
Current architecture
Environment
● version 2.4.6
● 5 instances (master, node, ingest)
● 2 availability zones to distribute a shards
● using lambda to ingest document
● cluster state very large
● only just a big index (2.4 TB)
The problems
● inverted index very big
○ response time very high
● late version
○ important to keep the environment in a current version
● It wasn’t created any purge policy
○ There are documents that aren’t necessary
● A instances haven’t dedicated roles
○ This is bad for this environment
Two major problems
● Shards very large
sds
● Many rejected search
Requirement
no downtime.
The plan
1. Step 01 [ Definition ]
● Define business requirement
○ Define purge policies (keep only one year documents)
● Understand all process (insert and search)
1. Step 02 [ Definition ]
● measure a size for the new cluster (elastic rally)
○ measure search/index rate
○ load test
○ measure infrastructure requirement
● install and configure a environment (v. 6.3) (Infrastructure as a code +
automation)
The plan
1. Step 03 [ Implementation ]
● Define indice structure
○ one index by month
● Create templates/mappings into new cluster
● Define plan for monitoring/alert
1. Step 04 [ Implementation ]
● Remote reindex (with query)
1. Step 05 [ Implementation to avoid a downtime ]
● Add two outputs in lambda (old and new cluster) and add logstash at
Stack
Avoid a downtime
New logic structure
Cluster details
● version 6.3
● x-pack basic
● dedicated role
○ 1 master (why just one master?) / 6 dados ( 500GB disk / 32GB RAM / 16GB
HEAP / 16cores ) / 1 logstash
● 6 shards and 1 replica
● A index by month
○ index: 100GB/month
■ shard: 16GB/shard
● Keep one year of data
○ in 12 months:
■ 2.4TB data
■ 144 total shards
Cluster advanced details
● dynamic template
● alias
● API Shrink
● performance query using filters
● curator
● distributed awareness
● change watermark to 98% (2.94 TB)
● interval index refresh (30s)
● persistent queue logstash
● send notifications
example- dynamic template
example - awareness
example - persistent queue
● When an input has events ready to process, it writes them to the queue;
● When the write to the queue is successful, the input can send an acknowledgement to its data
source;
● An event is recorded as processed if, and only if, the event has been processed completely
by the Logstash pipeline.
ACK
ok
ACK - event completed
example - notifications
improvements
● disaster recovery plan
● coordinators node
● kafka / kinesis
● upgrade x-pack license
long-term planning
● tunning pages queue logstash
● control circuit breaker (why?)
● hot/warm
structure .v2
Finish :D
Contact: fram.souza14@gmail.com
LinkedIn: https://www.linkedin.com/in/francismarasouza/

More Related Content

PDF
Iceberg: a fast table format for S3
PDF
PTD and beyond
PDF
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PDF
Golang in TiDB (GopherChina 2017)
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
PDF
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
PDF
Rust in TiKV
PDF
A Brief Introduction of TiDB (Percona Live)
Iceberg: a fast table format for S3
PTD and beyond
The Dark Side Of Go -- Go runtime related problems in TiDB in production
Golang in TiDB (GopherChina 2017)
Presto Summit 2018 - 09 - Netflix Iceberg
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Rust in TiKV
A Brief Introduction of TiDB (Percona Live)

What's hot (20)

PDF
How to build TiDB
PDF
Big data processing systems research
PDF
TiDB for Big Data
PDF
m2r2: A Framework for Results Materialization and Reuse
PDF
TiDB as an HTAP Database
PPTX
Building a transactional key-value store that scales to 100+ nodes (percona l...
PDF
Scale Relational Database with NewSQL
PDF
TiDB Introduction
PDF
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
PDF
MapReduce: Optimizations, Limitations, and Open Issues
PPTX
Geo data analytics
PDF
Why Spark for large scale data analysis
PDF
Nikhil summer internship 2016
PDF
Apache flink
PDF
Production-Ready BIG ML Workflows - from zero to hero
PDF
Open stack @ iiit hyderabad
PDF
Data pipelines from zero to solid
PDF
Towards Data Operations
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
PDF
Elasticsearch as a time series database
How to build TiDB
Big data processing systems research
TiDB for Big Data
m2r2: A Framework for Results Materialization and Reuse
TiDB as an HTAP Database
Building a transactional key-value store that scales to 100+ nodes (percona l...
Scale Relational Database with NewSQL
TiDB Introduction
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
MapReduce: Optimizations, Limitations, and Open Issues
Geo data analytics
Why Spark for large scale data analysis
Nikhil summer internship 2016
Apache flink
Production-Ready BIG ML Workflows - from zero to hero
Open stack @ iiit hyderabad
Data pipelines from zero to solid
Towards Data Operations
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Elasticsearch as a time series database
Ad

Similar to Migration strategies for a mission critical cluster (20)

PDF
Batch Processing at Scale with Flink & Iceberg
PPTX
Logs @ OVHcloud
PDF
Apache Cassandra at Target - Cassandra Summit 2014
PPTX
Our journey with druid - from initial research to full production scale
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PPTX
Dynamics CRM high volume systems - lessons from the field
PDF
Scaling Monitoring At Databricks From Prometheus to M3
PPTX
Webinar: Building a multi-cloud Kubernetes storage on GitLab
PPTX
Ledingkart Meetup #2: Scaling Search @Lendingkart
PDF
Piano Media - approach to data gathering and processing
PDF
Big data real time architectures
PDF
QuestDB: ingesting a million time series per second on a single instance. Big...
PDF
Security sizing meetup
PDF
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
PDF
Gluster overview & future directions vault 2015
PPTX
How to Design for Database High Availability
 
PPTX
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
PDF
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
ODP
Introduction to Big Data
PDF
Hadoop-2.6.0 Slides
Batch Processing at Scale with Flink & Iceberg
Logs @ OVHcloud
Apache Cassandra at Target - Cassandra Summit 2014
Our journey with druid - from initial research to full production scale
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Dynamics CRM high volume systems - lessons from the field
Scaling Monitoring At Databricks From Prometheus to M3
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Ledingkart Meetup #2: Scaling Search @Lendingkart
Piano Media - approach to data gathering and processing
Big data real time architectures
QuestDB: ingesting a million time series per second on a single instance. Big...
Security sizing meetup
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Gluster overview & future directions vault 2015
How to Design for Database High Availability
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Introduction to Big Data
Hadoop-2.6.0 Slides
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
A Presentation on Artificial Intelligence
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
A Presentation on Artificial Intelligence
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx

Migration strategies for a mission critical cluster

  • 1. whoami Fram Souza I’ve been working with IT for about 6 years; I have a BsC in Computer Networks; I’ve been working with Elasticsearch for about 2 years; Currenty, I’m IT Specialist at Nextel; Migration strategies for a mission-critical cluster
  • 3. Environment ● version 2.4.6 ● 5 instances (master, node, ingest) ● 2 availability zones to distribute a shards ● using lambda to ingest document ● cluster state very large ● only just a big index (2.4 TB)
  • 4. The problems ● inverted index very big ○ response time very high ● late version ○ important to keep the environment in a current version ● It wasn’t created any purge policy ○ There are documents that aren’t necessary ● A instances haven’t dedicated roles ○ This is bad for this environment
  • 5. Two major problems ● Shards very large sds ● Many rejected search
  • 7. The plan 1. Step 01 [ Definition ] ● Define business requirement ○ Define purge policies (keep only one year documents) ● Understand all process (insert and search) 1. Step 02 [ Definition ] ● measure a size for the new cluster (elastic rally) ○ measure search/index rate ○ load test ○ measure infrastructure requirement ● install and configure a environment (v. 6.3) (Infrastructure as a code + automation)
  • 8. The plan 1. Step 03 [ Implementation ] ● Define indice structure ○ one index by month ● Create templates/mappings into new cluster ● Define plan for monitoring/alert 1. Step 04 [ Implementation ] ● Remote reindex (with query) 1. Step 05 [ Implementation to avoid a downtime ] ● Add two outputs in lambda (old and new cluster) and add logstash at Stack
  • 11. Cluster details ● version 6.3 ● x-pack basic ● dedicated role ○ 1 master (why just one master?) / 6 dados ( 500GB disk / 32GB RAM / 16GB HEAP / 16cores ) / 1 logstash ● 6 shards and 1 replica ● A index by month ○ index: 100GB/month ■ shard: 16GB/shard ● Keep one year of data ○ in 12 months: ■ 2.4TB data ■ 144 total shards
  • 12. Cluster advanced details ● dynamic template ● alias ● API Shrink ● performance query using filters ● curator ● distributed awareness ● change watermark to 98% (2.94 TB) ● interval index refresh (30s) ● persistent queue logstash ● send notifications
  • 15. example - persistent queue ● When an input has events ready to process, it writes them to the queue; ● When the write to the queue is successful, the input can send an acknowledgement to its data source; ● An event is recorded as processed if, and only if, the event has been processed completely by the Logstash pipeline. ACK ok ACK - event completed
  • 17. improvements ● disaster recovery plan ● coordinators node ● kafka / kinesis ● upgrade x-pack license long-term planning ● tunning pages queue logstash ● control circuit breaker (why?) ● hot/warm
  • 19. Finish :D Contact: fram.souza14@gmail.com LinkedIn: https://www.linkedin.com/in/francismarasouza/