SlideShare a Scribd company logo
複数のElasticsearchクラスタの運用
で消耗しないために
Hokuto Kagaya
開発2センター
ゲームプラットフォームサービス開発室
PION C チーム
In-game Community / Marketing Platform
WHAT'S PION?
• As a time series DB
• As a search engine
• As a log store
WHAT’S Elasticsearch?
Logging
WE HAVE MULTIPLE CLUSTERS FOR..
Event
Processi
ng
Service
Develop
ment
RealSandbox
Purpose
Environment
“Which clusters did I install which plugins on?”
For example..
MULTIPLE CLUSTERS WILL CAUSE..
Basically our clusters are provisioned by Ansible
BUT…
Someone: “Hey, let’s try the XXX plugin on the node YYY of the cluster
ZZZ in DEV environment!”
They forgot to record XXX, YYY, ZZZ…
Easily go down to chaos!
WE NEED A MANAGEMENT TOOL!
• ElasticHQ (OSS)
• Kibana (by Elastic)
• cerebro (OSS)
EXISTING TOOLS
For a single cluster
One of its strengths is that it can support multiple clusters
OK, let’s use
this!
However its main purpose is also deep management of a single cluster
Not for browsing a cluster list
ANOTHER PROBLEM ON Elasticsearch
Not too easy to:
monitor an Elasticsearch cluster
alert us to the abnormal status based on the result of monitoring properly
Kibana or many OSS are very nice, but:
Some detailed metrics (like latency 95%ile) cannot retrieved directly
We cannot see them when Es is under too heavy load
COMPARISON
Multiple clusters? Monitoring? Alerting?
Kibana
partial support
(cross cluster search, dedicated separate
cluster for monitoring)
partial support
(server side metrics)
✔
(with Watcher)
ElasticHQ ✔
(not for browsing)
partial support
(server side metrics)
✘
What we need
✔
(w/ high browsability)
✔ ✔
OK, let’s make it by
ourselves!
Screenshots
RUBBER BAND - TOOLKIT FOR ES MANAGEMENT
Rubber Band UI, Health Watcher, Client - architecture
Rubber Band UI, Health Watcher, Client - architecture
Rubber Band UI, Health Watcher, Client - architecture
TWO OPTIONS FOR MONITORING
Monitor clusters’ states directly
/_cat/***
/_cluster/health
/_nodes/***
Monitor client-side metrics
can compute detailed metrics
can access even when a cluster is highly loaded (via our tool)
Rubber Band UI, Health Watcher, Client - architecture
HOW TO ALERT ON A CLUSTER STATUS?
The X-Pack GOLD license supports Watcher, which also can be
used to check the cluster health out-of-the-box!
{
"trigger" : {
"schedule" : { "interval" : "10s" }
},
"input" : {
"http" : {
"request" : {
"host" : "localhost",
"port" : 9200,
"path" : "/_cluster/health"
}
}
Uses cluster health API!
We can also utilize it
by ourselves:)
EXAMPLES OF ALERT FROM HEALTH WATCHER
Rubber Band UI, Health Watcher, Client - architecture
MILESTONE
PHASE 1
Rubber Band UI
Rubber Band Health Watcher
Rubber Band Client (Simple REST client wrapper)
PHASE 2
• Rubber Band Curator (Centralized wrapper of curator)
• Open to the other internal teams
PHASE 3 • Publish it as a OSS
KEY TAKEAWAYS
How can we manage multiple clusters without any chaos?
Our toolkit: Rubber Band
A simple UI with information aggregation and appropriate delegation
How can we do proper monitoring and alerting?
Uses both of direct server states and client metrics
Implements a simple health-check server by ourselves
And..
WE ARE HIRING!
THANK YOU
@Component
public class ElasticsearchClientWrapper {
private final RestHighLevelClient elasticsearchClient;
private final MeterRegistry meterRegistry;
public ElasticsearchClientWrapper(RestHighLevelClient elasticsearchClient,
MeterRegistry meterRegistry) {
this.elasticsearchClient = elasticsearchClient;
this.meterRegistry = meterRegistry;
}
public void searchAndGetAggregationAsync(SearchRequest searchRequest) {
Timer.Sample sample = Timer.start(meterRegistry);
elasticsearchClient.searchAsync(searchRequest, new ActionListener<SearchResponse>() {
@Override
public void onResponse(SearchResponse searchResponse) {
sample.stop(meterRegistry.timer("metrics.timer", "success"));
// do stuff..
}
@Override
public void onFailure(Exception e) {
sample.stop(meterRegistry.timer("metrics.timer", "failure"));
// do fallback..
}
});
}
Wrap the official HighLevelRESTClient
See also: Elasticsearch を検索エンジンとして利用する際のポイント
https://engineering.linecorp.com/ja/blog/detail/99

More Related Content

PPTX
k8sjp#9 KubeCon - Service Mesh, ML/DL on k8s
KEY
MongoDB on CloudFoundry
KEY
CloudFoundry@home
PDF
Wantedly on AWS #ctonight
PDF
HashiCorp at Just Eat
PDF
Heroku Dockerの使い所
PDF
Hashicorp @ JUST EAT - Part 2
PDF
Openstack CPI cloudfoundry
k8sjp#9 KubeCon - Service Mesh, ML/DL on k8s
MongoDB on CloudFoundry
CloudFoundry@home
Wantedly on AWS #ctonight
HashiCorp at Just Eat
Heroku Dockerの使い所
Hashicorp @ JUST EAT - Part 2
Openstack CPI cloudfoundry

What's hot (20)

PPTX
DevOps Practices: Configuration as Code
PPT
308 the dark side of containers new
PDF
Serverless framework와 CircleCI를 통한 NoOps 맛보기
PDF
CMS Tools for Developers- Owen Harris
PDF
Cloud infrastructures - Slide Set 6 - BOSH | anynines
PDF
CI/CD Pipeline with Octopus Deploy
PDF
Containerize All the (Multi-Platform) Things! by Phil Estes
PPTX
A journey-to-a-button
PDF
Building a Container Platform with docker swarm
PDF
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
PDF
Mocloudos - Feather-weight Cloud OS developed within
14 man-days
PPTX
Real dev ops with containers
ODP
Brainlunch Docker.io
PPTX
Containers in the Microsoft ecosystem
PPTX
Introduction to ansible
PDF
DockerCon SF 2015: Getting Started w/ Docker
PPTX
Hashicorp: Delivering the Tao of DevOps
PPTX
Making app cluster ready
PDF
Continuum Overview
PDF
MongoDB + Node.JS + EPAM ROAD
DevOps Practices: Configuration as Code
308 the dark side of containers new
Serverless framework와 CircleCI를 통한 NoOps 맛보기
CMS Tools for Developers- Owen Harris
Cloud infrastructures - Slide Set 6 - BOSH | anynines
CI/CD Pipeline with Octopus Deploy
Containerize All the (Multi-Platform) Things! by Phil Estes
A journey-to-a-button
Building a Container Platform with docker swarm
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
Mocloudos - Feather-weight Cloud OS developed within
14 man-days
Real dev ops with containers
Brainlunch Docker.io
Containers in the Microsoft ecosystem
Introduction to ansible
DockerCon SF 2015: Getting Started w/ Docker
Hashicorp: Delivering the Tao of DevOps
Making app cluster ready
Continuum Overview
MongoDB + Node.JS + EPAM ROAD
Ad

Similar to Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters (20)

PDF
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
PDF
Serverless in production (O'Reilly Software Architecture)
PDF
Modular Architectures using Micro Services
PDF
FIWARE Wednesday Webinars - Short Term History within Smart Systems
PDF
DEFCON 18- These Aren't the Permissions You're Looking For
PDF
Escape the defaults - Configure Sling like AEM as a Cloud Service
PDF
Spring boot microservice metrics monitoring
PDF
Spring Boot - Microservice Metrics Monitoring
PPTX
microservice architecture public education v2
PPTX
Elasticsearch features and ecosystem
PPTX
Successful Patterns for running platforms
PDF
Microservices development at scale
PPTX
Private Apps in the Public Cloud - DevConTLV March 2016
PDF
Masterless Puppet Using AWS S3 Buckets and IAM Roles
PPTX
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
PDF
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
PPTX
OpenStack Technology Overview
PDF
Docker Madison, Introduction to Kubernetes
PDF
Dockers zero to hero
PDF
Operational Visibiliy and Analytics - BU Seminar
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
Serverless in production (O'Reilly Software Architecture)
Modular Architectures using Micro Services
FIWARE Wednesday Webinars - Short Term History within Smart Systems
DEFCON 18- These Aren't the Permissions You're Looking For
Escape the defaults - Configure Sling like AEM as a Cloud Service
Spring boot microservice metrics monitoring
Spring Boot - Microservice Metrics Monitoring
microservice architecture public education v2
Elasticsearch features and ecosystem
Successful Patterns for running platforms
Microservices development at scale
Private Apps in the Public Cloud - DevConTLV March 2016
Masterless Puppet Using AWS S3 Buckets and IAM Roles
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
OpenStack Technology Overview
Docker Madison, Introduction to Kubernetes
Dockers zero to hero
Operational Visibiliy and Analytics - BU Seminar
Ad

More from LINE Corporation (20)

PDF
JJUG CCC 2018 Fall 懇親会LT
PDF
Reduce dependency on Rx with Kotlin Coroutines
PDF
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
PDF
Use Kotlin scripts and Clova SDK to build your Clova extension
PDF
The Magic of LINE 購物 Testing
PPTX
GA Test Automation
PDF
UI Automation Test with JUnit5
PDF
Feature Detection for UI Testing
PDF
LINE 新星計劃介紹與新創團隊分享
PDF
​LINE 技術合作夥伴與應用分享
PDF
LINE 開發者社群經營與技術推廣
PDF
日本開發者大會短講分享
PDF
LINE Chatbot - 活動報名報到設計分享
PDF
在 LINE 私有雲中使用 Managed Kubernetes
PDF
LINE TODAY高效率的敏捷測試開發技巧
PDF
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
PDF
LINE Things - LINE IoT平台新技術分享
PDF
LINE Pay - 一卡通支付新體驗
PDF
LINE Platform API Update - 打造一個更好的Chatbot服務
PDF
Keynote - ​LINE 的技術策略佈局與跨國產品開發
JJUG CCC 2018 Fall 懇親会LT
Reduce dependency on Rx with Kotlin Coroutines
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Use Kotlin scripts and Clova SDK to build your Clova extension
The Magic of LINE 購物 Testing
GA Test Automation
UI Automation Test with JUnit5
Feature Detection for UI Testing
LINE 新星計劃介紹與新創團隊分享
​LINE 技術合作夥伴與應用分享
LINE 開發者社群經營與技術推廣
日本開發者大會短講分享
LINE Chatbot - 活動報名報到設計分享
在 LINE 私有雲中使用 Managed Kubernetes
LINE TODAY高效率的敏捷測試開發技巧
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE Things - LINE IoT平台新技術分享
LINE Pay - 一卡通支付新體驗
LINE Platform API Update - 打造一個更好的Chatbot服務
Keynote - ​LINE 的技術策略佈局與跨國產品開發

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Per capita expenditure prediction using model stacking based on satellite ima...
Programs and apps: productivity, graphics, security and other tools
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction

Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters

  • 2. In-game Community / Marketing Platform WHAT'S PION?
  • 3. • As a time series DB • As a search engine • As a log store WHAT’S Elasticsearch?
  • 4. Logging WE HAVE MULTIPLE CLUSTERS FOR.. Event Processi ng Service Develop ment RealSandbox Purpose Environment
  • 5. “Which clusters did I install which plugins on?” For example.. MULTIPLE CLUSTERS WILL CAUSE.. Basically our clusters are provisioned by Ansible BUT… Someone: “Hey, let’s try the XXX plugin on the node YYY of the cluster ZZZ in DEV environment!” They forgot to record XXX, YYY, ZZZ… Easily go down to chaos!
  • 6. WE NEED A MANAGEMENT TOOL!
  • 7. • ElasticHQ (OSS) • Kibana (by Elastic) • cerebro (OSS) EXISTING TOOLS For a single cluster One of its strengths is that it can support multiple clusters OK, let’s use this! However its main purpose is also deep management of a single cluster Not for browsing a cluster list
  • 8. ANOTHER PROBLEM ON Elasticsearch Not too easy to: monitor an Elasticsearch cluster alert us to the abnormal status based on the result of monitoring properly Kibana or many OSS are very nice, but: Some detailed metrics (like latency 95%ile) cannot retrieved directly We cannot see them when Es is under too heavy load
  • 9. COMPARISON Multiple clusters? Monitoring? Alerting? Kibana partial support (cross cluster search, dedicated separate cluster for monitoring) partial support (server side metrics) ✔ (with Watcher) ElasticHQ ✔ (not for browsing) partial support (server side metrics) ✘ What we need ✔ (w/ high browsability) ✔ ✔ OK, let’s make it by ourselves!
  • 10. Screenshots RUBBER BAND - TOOLKIT FOR ES MANAGEMENT
  • 11. Rubber Band UI, Health Watcher, Client - architecture
  • 12. Rubber Band UI, Health Watcher, Client - architecture
  • 13. Rubber Band UI, Health Watcher, Client - architecture
  • 14. TWO OPTIONS FOR MONITORING Monitor clusters’ states directly /_cat/*** /_cluster/health /_nodes/*** Monitor client-side metrics can compute detailed metrics can access even when a cluster is highly loaded (via our tool)
  • 15. Rubber Band UI, Health Watcher, Client - architecture
  • 16. HOW TO ALERT ON A CLUSTER STATUS? The X-Pack GOLD license supports Watcher, which also can be used to check the cluster health out-of-the-box! { "trigger" : { "schedule" : { "interval" : "10s" } }, "input" : { "http" : { "request" : { "host" : "localhost", "port" : 9200, "path" : "/_cluster/health" } } Uses cluster health API! We can also utilize it by ourselves:)
  • 17. EXAMPLES OF ALERT FROM HEALTH WATCHER
  • 18. Rubber Band UI, Health Watcher, Client - architecture
  • 19. MILESTONE PHASE 1 Rubber Band UI Rubber Band Health Watcher Rubber Band Client (Simple REST client wrapper) PHASE 2 • Rubber Band Curator (Centralized wrapper of curator) • Open to the other internal teams PHASE 3 • Publish it as a OSS
  • 20. KEY TAKEAWAYS How can we manage multiple clusters without any chaos? Our toolkit: Rubber Band A simple UI with information aggregation and appropriate delegation How can we do proper monitoring and alerting? Uses both of direct server states and client metrics Implements a simple health-check server by ourselves And..
  • 23. @Component public class ElasticsearchClientWrapper { private final RestHighLevelClient elasticsearchClient; private final MeterRegistry meterRegistry; public ElasticsearchClientWrapper(RestHighLevelClient elasticsearchClient, MeterRegistry meterRegistry) { this.elasticsearchClient = elasticsearchClient; this.meterRegistry = meterRegistry; } public void searchAndGetAggregationAsync(SearchRequest searchRequest) { Timer.Sample sample = Timer.start(meterRegistry); elasticsearchClient.searchAsync(searchRequest, new ActionListener<SearchResponse>() { @Override public void onResponse(SearchResponse searchResponse) { sample.stop(meterRegistry.timer("metrics.timer", "success")); // do stuff.. } @Override public void onFailure(Exception e) { sample.stop(meterRegistry.timer("metrics.timer", "failure")); // do fallback.. } }); } Wrap the official HighLevelRESTClient See also: Elasticsearch を検索エンジンとして利用する際のポイント https://engineering.linecorp.com/ja/blog/detail/99