SlideShare a Scribd company logo
ElasticSearch On
Compute Engine
Best practices for
Elasticsearch in GCE
12+ Year Cloud Journey with Google
Running ElasticSearch on Google Compute Engine in Production
Who am I?
Searce – Bangalore
Linkedin.com/rbhuvanesh
Twitter.com/@BhuviTheDataGuy
Medium.com/@BhuviTheDataGuy
https://TheDataGuy.in
Bhuvanesh
Database Architect
Searce
Agenda
Short into about GCE
ElasticSearch Terms
Capacity Planning & Architecture
Best Practices for Production Grade ES Cluster
Compute Engine
Compute Engine delivers configurable virtual machines running in Google’s data centers with access to high-performance
networking infrastructure and block storage.
Live migration for VMs
Compute Engine virtual machines can live-migrate between host systems without
rebooting, which keeps your applications running even when host systems require
maintenance.
Preemptible VMs
Run batch jobs and fault-tolerant workloads on preemptible VMs to reduce your vCPU
and memory costs by up to 80% while still getting the same performance and capabilities
as regular VMs.
Sole-tenant nodes
Sole-tenant nodes are physical Compute Engine servers dedicated exclusively for your
use. Sole-tenant nodes simplify deployment for bring your own license (BYOL)
applications. Sole-tenant nodes give you access to the same machine types and VM
configuration options as regular compute instances.
What is Elastic Search?
• First release 2010
• Open Source search and analytical engine
• Elasticsearch is the central component of the Elastic Stack
• Distributed processing
• Works with all types of data (textual, numerical, geospatial, structured, and unstructured)
• Powerful REST API
• And everything is indexed
Where can we use
Elastic Search?
Where can we use ES?
Use cases
• Logging and log analytics
• Infrastructure metrics and container monitoring
• Application performance monitoring
• Geospatial data analysis and visualization
• Security analytics
• Enterprise search
• Website search
• And more….
Elastic Stack
ES Terms
Master Node:
• Master Node controls the Cluster.
• Responsible for maintaining the metadata about the cluster.
• Decide where to move the data and relocating the data.
• We can have multiple nodes for Master role.
• But Elasticsearch will select any one of the node as an elastic master.
• In the event of failure, a new elastic master will be selected from the available nodes.
ES Terms
Data Node
• All of your is stored here.
• Responsible for managing the stored data.
• Perform the operations when it queried.
Ingest Node
• Pre-process’s documents before the actual document indexing.
• The ingest node intercepts bulk and index requests, applies transformations, and it then passes the
documents back to the index or bulk APIs.
ES Index
ES Index
Design For Failure
Capacity Planning
Memory
Elastic Search will use the memory in 2 ways.
1. Java Heap
2. Other processes
“More memory – More time on
Garbage collection”
CPU
Don’t choose the CPU
core based on some
random calculations.
DISK
• Standard Persistent Disk
• SSD Persistent Disk
• Local SSD
1 GB SSD disk = 30iops
Disk Cont…
Disk Cont…
Disk Cont…
Disk Cont…
Disk Cont…
Network
From GCP Docs,
The egress traffic from a given VM instance is subject to maximum network egress throughput caps. These
caps are dependent on the number of vCPUs that the VM instance has. Each vCPU is subject to a 2 Gbps
cap for peak performance. Each additional vCPU increases the network cap, up to a theoretical maximum of
32 Gbps for each instance. The actual performance you experience will vary depending on your workload.
All caps are meant as maximum possible performance, and not sustained performance.
How to identify the right VM size?
1. Simulate your workload and do the load test.
2. Or use Rally(https://github.com/elastic/rally)
Swapping
• Memory based operations are super fast. But we can’t give a tons of memory to the server.
• The OS will swap out the unused applications memory.
• That’s bad for the performance.
Prevent Swapping
1. From OS Level(temporarily) - sudo swapoff –a
2. Configure swappiness from the Kernal - vm.swappiness=1
3. Enable bootstrap-memory_lock - bootstrap.memory_lock: true
JVM Heap
• By default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB.
• When moving to production, it is important to configure heap size to ensure that Elasticsearch has
enough heap available.
• Set the Heap size <50% of your total Memory
“The more heap available to Elasticsearch, the more memory it can use for its
internal caches, larger heaps can cause longer garbage collection pauses” –
From Elastic
ulimit
Ulimit is the number of open file descriptors per process.
vi /etc/security/limits.conf
elasticsearch - nofile 65535
--For Ubuntu
vi /etc/pam.d/su
session required pam_limits.so
--For systemd
vi /usr/lib/systemd/system/elasticsearch.service
LimitMEMLOCK=infinity
sudo systemctl daemon-reload
MMAP
Elasticsearch uses a mmapfs directory by default to store its indices
sysctl -w vm.max_map_count=262144
/etc/sysctl.conf
vm.max_map_count = 262144
Some common questions while
setting up the Elastic
Search Cluster
CPU Platform
Operating System & File System
• Windows
• Debian
• Ubuntu
• CentOS
• RedHat
• Windows - NTFS
• Linux – Ext4 (if you have less than 1TB Data), XFS for >1TB data
Some parameters for a generic workload
indices.memory.index_buffer_size: 40%
indices.query.cache.enabled: false
thread_pool.bulk.queue_size: 3000
thread_pool.index.queue_size: 3000
store.throttle.type: 'none'
index.refresh_interval: "1m"
SSD vs Local SSD
Persistent SSD Local SSD
Local SSD
• Max size of one Local SSD disk = 375 GB
• You can add up to 8 Local SSD/Instance (3TB)
• You can’t reboot/stop the VM
• In case of the maintenance – Replace the node
How many nodes
• Master – 3 nodes
• Ingest – 2 nodes
• Data – 2-3 nodes (for a fresh setup)
Rally for the benchmark tests
What is Rally?
You want to benchmark Elasticsearch? Then Rally is for you. It can help you with the following tasks:
• Setup and teardown of an Elasticsearch cluster for benchmarking
• Management of benchmark data and specifications even across Elasticsearch versions
• Running benchmarks and recording results
• Finding performance problems by attaching so-called telemetry devices
• Comparing performance results
pip3 install esrally
How to run the esrally
esrally --track=nyc_taxis 
--target-hosts=10.20.4.157:9200 
--pipeline=benchmark-only 
--challenge=append-no-conflicts-index-only 
--on-error=continue 
--report-format=markdown 
--report-file=/opt/report.md
Esrally cont…
Some use cases with ES
Use cases cont…
Thank YouThank You

More Related Content

PPTX
Migrating Existing Open Source Machine Learning to Azure
PPTX
Everyday I’m scaling... Cassandra
PPTX
Load testing Cassandra applications
PPTX
Speeding up R with Parallel Programming in the Cloud
PDF
Resource Scheduling using Apache Mesos in Cloud Native Environments
PPTX
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
PDF
DockerCon14 Cluster Management and Containerization
PDF
re:dash is awesome
Migrating Existing Open Source Machine Learning to Azure
Everyday I’m scaling... Cassandra
Load testing Cassandra applications
Speeding up R with Parallel Programming in the Cloud
Resource Scheduling using Apache Mesos in Cloud Native Environments
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DockerCon14 Cluster Management and Containerization
re:dash is awesome

What's hot (17)

PDF
Micro-batching: High-performance writes
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PPTX
Processing 50,000 events per second with Cassandra and Spark
PDF
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
PPTX
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
PDF
Configuring MongoDB HA Replica Set on AWS EC2
PDF
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
ODP
Bcache and Aerospike
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PPTX
Azure Recovery Services
PDF
Operating PostgreSQL at Scale with Kubernetes
PDF
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PDF
An Introduction to Using PostgreSQL with Docker & Kubernetes
PDF
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
PDF
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
PPTX
Adventures in RDS Load Testing
Micro-batching: High-performance writes
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Processing 50,000 events per second with Cassandra and Spark
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Configuring MongoDB HA Replica Set on AWS EC2
Cross-Cluster and Cross-Datacenter Elasticsearch Replication at sahibinden.com
Bcache and Aerospike
ScyllaDB: NoSQL at Ludicrous Speed
Azure Recovery Services
Operating PostgreSQL at Scale with Kubernetes
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
An Introduction to Using PostgreSQL with Docker & Kubernetes
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Adventures in RDS Load Testing
Ad

Similar to Running ElasticSearch on Google Compute Engine in Production (20)

PPTX
Perl and Elasticsearch
PDF
Elasticsearch 101 - Cluster setup and tuning
PDF
Is your Elastic Cluster Stable and Production Ready?
PPTX
Managing Security At 1M Events a Second using Elasticsearch
PPTX
Dev nexus 2017
PPTX
Boston elasticsearch meetup October 2012
PPTX
Running & Scaling Large Elasticsearch Clusters
PDF
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
PDF
An Introduction to Elasticsearch for Beginners
PDF
Elasticsearch: An Overview
PDF
Prácticas recomendadas en materia de arquitectura y errores que debes evitar
PDF
Architectural Best Practices to Master + Pitfalls to Avoid (P)
PDF
Elasticsearch, a distributed search engine with real-time analytics
PPTX
Devnexus 2018
PPTX
Elasticsearch - Scalability and Multitenancy
PDF
Elasticsearch on Kubernetes
PPTX
Elasticsearch
PDF
Elastic 101 tutorial - Percona Europe 2018
PDF
Elastic101tutorial Percona Live Europe 2018
PPTX
Elastic meetup june16
Perl and Elasticsearch
Elasticsearch 101 - Cluster setup and tuning
Is your Elastic Cluster Stable and Production Ready?
Managing Security At 1M Events a Second using Elasticsearch
Dev nexus 2017
Boston elasticsearch meetup October 2012
Running & Scaling Large Elasticsearch Clusters
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
An Introduction to Elasticsearch for Beginners
Elasticsearch: An Overview
Prácticas recomendadas en materia de arquitectura y errores que debes evitar
Architectural Best Practices to Master + Pitfalls to Avoid (P)
Elasticsearch, a distributed search engine with real-time analytics
Devnexus 2018
Elasticsearch - Scalability and Multitenancy
Elasticsearch on Kubernetes
Elasticsearch
Elastic 101 tutorial - Percona Europe 2018
Elastic101tutorial Percona Live Europe 2018
Elastic meetup june16
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Approach and Philosophy of On baking technology
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MIND Revenue Release Quarter 2 2025 Press Release
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
Approach and Philosophy of On baking technology
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Running ElasticSearch on Google Compute Engine in Production

  • 3. 12+ Year Cloud Journey with Google
  • 5. Who am I? Searce – Bangalore Linkedin.com/rbhuvanesh Twitter.com/@BhuviTheDataGuy Medium.com/@BhuviTheDataGuy https://TheDataGuy.in Bhuvanesh Database Architect Searce
  • 6. Agenda Short into about GCE ElasticSearch Terms Capacity Planning & Architecture Best Practices for Production Grade ES Cluster
  • 7. Compute Engine Compute Engine delivers configurable virtual machines running in Google’s data centers with access to high-performance networking infrastructure and block storage. Live migration for VMs Compute Engine virtual machines can live-migrate between host systems without rebooting, which keeps your applications running even when host systems require maintenance. Preemptible VMs Run batch jobs and fault-tolerant workloads on preemptible VMs to reduce your vCPU and memory costs by up to 80% while still getting the same performance and capabilities as regular VMs. Sole-tenant nodes Sole-tenant nodes are physical Compute Engine servers dedicated exclusively for your use. Sole-tenant nodes simplify deployment for bring your own license (BYOL) applications. Sole-tenant nodes give you access to the same machine types and VM configuration options as regular compute instances.
  • 8. What is Elastic Search? • First release 2010 • Open Source search and analytical engine • Elasticsearch is the central component of the Elastic Stack • Distributed processing • Works with all types of data (textual, numerical, geospatial, structured, and unstructured) • Powerful REST API • And everything is indexed
  • 9. Where can we use Elastic Search?
  • 10. Where can we use ES?
  • 11. Use cases • Logging and log analytics • Infrastructure metrics and container monitoring • Application performance monitoring • Geospatial data analysis and visualization • Security analytics • Enterprise search • Website search • And more….
  • 13. ES Terms Master Node: • Master Node controls the Cluster. • Responsible for maintaining the metadata about the cluster. • Decide where to move the data and relocating the data. • We can have multiple nodes for Master role. • But Elasticsearch will select any one of the node as an elastic master. • In the event of failure, a new elastic master will be selected from the available nodes.
  • 14. ES Terms Data Node • All of your is stored here. • Responsible for managing the stored data. • Perform the operations when it queried. Ingest Node • Pre-process’s documents before the actual document indexing. • The ingest node intercepts bulk and index requests, applies transformations, and it then passes the documents back to the index or bulk APIs.
  • 19. Memory Elastic Search will use the memory in 2 ways. 1. Java Heap 2. Other processes “More memory – More time on Garbage collection”
  • 20. CPU Don’t choose the CPU core based on some random calculations.
  • 21. DISK • Standard Persistent Disk • SSD Persistent Disk • Local SSD 1 GB SSD disk = 30iops
  • 27. Network From GCP Docs, The egress traffic from a given VM instance is subject to maximum network egress throughput caps. These caps are dependent on the number of vCPUs that the VM instance has. Each vCPU is subject to a 2 Gbps cap for peak performance. Each additional vCPU increases the network cap, up to a theoretical maximum of 32 Gbps for each instance. The actual performance you experience will vary depending on your workload. All caps are meant as maximum possible performance, and not sustained performance.
  • 28. How to identify the right VM size? 1. Simulate your workload and do the load test. 2. Or use Rally(https://github.com/elastic/rally)
  • 29. Swapping • Memory based operations are super fast. But we can’t give a tons of memory to the server. • The OS will swap out the unused applications memory. • That’s bad for the performance. Prevent Swapping 1. From OS Level(temporarily) - sudo swapoff –a 2. Configure swappiness from the Kernal - vm.swappiness=1 3. Enable bootstrap-memory_lock - bootstrap.memory_lock: true
  • 30. JVM Heap • By default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB. • When moving to production, it is important to configure heap size to ensure that Elasticsearch has enough heap available. • Set the Heap size <50% of your total Memory “The more heap available to Elasticsearch, the more memory it can use for its internal caches, larger heaps can cause longer garbage collection pauses” – From Elastic
  • 31. ulimit Ulimit is the number of open file descriptors per process. vi /etc/security/limits.conf elasticsearch - nofile 65535 --For Ubuntu vi /etc/pam.d/su session required pam_limits.so --For systemd vi /usr/lib/systemd/system/elasticsearch.service LimitMEMLOCK=infinity sudo systemctl daemon-reload
  • 32. MMAP Elasticsearch uses a mmapfs directory by default to store its indices sysctl -w vm.max_map_count=262144 /etc/sysctl.conf vm.max_map_count = 262144
  • 33. Some common questions while setting up the Elastic Search Cluster
  • 35. Operating System & File System • Windows • Debian • Ubuntu • CentOS • RedHat • Windows - NTFS • Linux – Ext4 (if you have less than 1TB Data), XFS for >1TB data
  • 36. Some parameters for a generic workload indices.memory.index_buffer_size: 40% indices.query.cache.enabled: false thread_pool.bulk.queue_size: 3000 thread_pool.index.queue_size: 3000 store.throttle.type: 'none' index.refresh_interval: "1m"
  • 37. SSD vs Local SSD Persistent SSD Local SSD
  • 38. Local SSD • Max size of one Local SSD disk = 375 GB • You can add up to 8 Local SSD/Instance (3TB) • You can’t reboot/stop the VM • In case of the maintenance – Replace the node
  • 39. How many nodes • Master – 3 nodes • Ingest – 2 nodes • Data – 2-3 nodes (for a fresh setup)
  • 40. Rally for the benchmark tests What is Rally? You want to benchmark Elasticsearch? Then Rally is for you. It can help you with the following tasks: • Setup and teardown of an Elasticsearch cluster for benchmarking • Management of benchmark data and specifications even across Elasticsearch versions • Running benchmarks and recording results • Finding performance problems by attaching so-called telemetry devices • Comparing performance results pip3 install esrally
  • 41. How to run the esrally esrally --track=nyc_taxis --target-hosts=10.20.4.157:9200 --pipeline=benchmark-only --challenge=append-no-conflicts-index-only --on-error=continue --report-format=markdown --report-file=/opt/report.md
  • 43. Some use cases with ES