Migration strategies for a mission critical cluster

whoami
Fram Souza
I’ve been working with IT for about 6 years;
I have a BsC in Computer Networks;
I’ve been working with Elasticsearch for about 2 years;
Currenty, I’m IT Specialist at Nextel;
Migration strategies for a mission-critical
cluster

Environment
● version 2.4.6
● 5 instances (master, node, ingest)
● 2 availability zones to distribute a shards
● using lambda to ingest document
● cluster state very large
● only just a big index (2.4 TB)

The problems
● inverted index very big
○ response time very high
● late version
○ important to keep the environment in a current version
● It wasn’t created any purge policy
○ There are documents that aren’t necessary
● A instances haven’t dedicated roles
○ This is bad for this environment

Two major problems
● Shards very large
sds
● Many rejected search

The plan
1. Step 01 [ Definition ]
● Define business requirement
○ Define purge policies (keep only one year documents)
● Understand all process (insert and search)
1. Step 02 [ Definition ]
● measure a size for the new cluster (elastic rally)
○ measure search/index rate
○ load test
○ measure infrastructure requirement
● install and configure a environment (v. 6.3) (Infrastructure as a code +
automation)

The plan
1. Step 03 [ Implementation ]
● Define indice structure
○ one index by month
● Create templates/mappings into new cluster
● Define plan for monitoring/alert
1. Step 04 [ Implementation ]
● Remote reindex (with query)
1. Step 05 [ Implementation to avoid a downtime ]
● Add two outputs in lambda (old and new cluster) and add logstash at
Stack

Cluster details
● version 6.3
● x-pack basic
● dedicated role
○ 1 master (why just one master?) / 6 dados ( 500GB disk / 32GB RAM / 16GB
HEAP / 16cores ) / 1 logstash
● 6 shards and 1 replica
● A index by month
○ index: 100GB/month
■ shard: 16GB/shard
● Keep one year of data
○ in 12 months:
■ 2.4TB data
■ 144 total shards

Cluster advanced details
● dynamic template
● alias
● API Shrink
● performance query using filters
● curator
● distributed awareness
● change watermark to 98% (2.94 TB)
● interval index refresh (30s)
● persistent queue logstash
● send notifications

example - persistent queue
● When an input has events ready to process, it writes them to the queue;
● When the write to the queue is successful, the input can send an acknowledgement to its data
source;
● An event is recorded as processed if, and only if, the event has been processed completely
by the Logstash pipeline.
ACK
ok
ACK - event completed

improvements
● disaster recovery plan
● coordinators node
● kafka / kinesis
● upgrade x-pack license
long-term planning
● tunning pages queue logstash
● control circuit breaker (why?)
● hot/warm

Finish :D
Contact: fram.souza14@gmail.com
LinkedIn: https://www.linkedin.com/in/francismarasouza/

Migration strategies for a mission critical cluster

More Related Content

What's hot (20)

Similar to Migration strategies for a mission critical cluster (20)

Recently uploaded (20)

Migration strategies for a mission critical cluster