SlideShare a Scribd company logo
ES Hero @ Instacart
Nick Elser
Elasticsearch
• Denormalized product catalog
• Relatively dynamic queries + fast lookups
• Multi-tenant
• Hosted via Elastic Cloud
• 18 clusters under management
• Primary catalog is > 1.5TB (RAM), 3 primary clusters
Multitenant datastore
Multiple teams leveraging Elasticsearch
Each with their own access pattern
The Biggest Problem
Lack of visibility
You can’t fix things if you
don’t know what’s broken
Existing Monitoring
• Slow Query Log
• 3rd Party APM
• Error reporting
• Marvel
• Raw logs
Problems
1. Which specific queries are causing problems?
2. Which API endpoints are causing problems?
3. How can we prioritize important calls?
These added up to
huge problems
Repeated outages during peak hours
ESHero to the rescue!
Client-side instrumentation, aggregated
How do we anonymize
and group like queries?
Remove all variables
in queries
InstacartSearch::QueryAnonymizer
09b62d742049466249e782bacf5bb48518179828614928a3fc503160c694e091
Collected Attributes
caller location
cluster name
user level data
owner of query
execution time
status code
query fingerprint
request UUID
controller#action
application
Collecting all this data
app
Kinesis Druid
HTTP middleware + query anonymizer
Production ES
ESHero
Eventer
Back to the production
problems
1. Which specific queries are causing problems?
2. Which API endpoints are causing problems?
3. How can we prioritize important calls?
ES Hero @ Instacart
A few bad queries can
really hurt the cluster
Grouped by fingerprint!
ES Hero @ Instacart
1. Which specific queries are causing problems?
2. Which API endpoints are causing problems?
3. How can we prioritize important calls?
Group by API request
UUID!
A single page load can
result in hundreds of
queries (N+1)
1. Which specific queries are causing problems?
2. Which applications are causing problems?
3. How can we prioritize important work?
Arbitrary retries are
extremely dangerous
Extremely fast queries that
are likely to succeed are
safe to retry
*Manually whitelisted, for now
What’s next for
ESHero?
• open source
• bindings for other languages
• proxy support?
Your feedback is really helpful!
We are hiring! 😀
Nick Elser
github.com/nickelser
Original slides by Jon Phillips
twitter.com/elguapo1611
Thanks to John Meagher for feedback + content.
Thank You!
Questions?

More Related Content

PDF
Jinchao demo v3
PDF
Software design with Domain-driven design
PDF
From the Trenches: Effectively Scaling Your Cloud Infrastructure and Optimizi...
PPTX
Killer APIs (All About the Strangler Pattern)
PPTX
Hibernate, how the magic is really done
PDF
How to improve your Tizen native program
PPTX
The tale of 100 cve's
Jinchao demo v3
Software design with Domain-driven design
From the Trenches: Effectively Scaling Your Cloud Infrastructure and Optimizi...
Killer APIs (All About the Strangler Pattern)
Hibernate, how the magic is really done
How to improve your Tizen native program
The tale of 100 cve's

Similar to ES Hero @ Instacart (20)

PDF
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
PPTX
Oracle InMemory hardcore edition
PDF
Monitor Microservices with MicroProfile Metrics
PDF
Monitor Micro-service with MicroProfile metrics
PPTX
Building a Large Scale SEO/SEM Application with Apache Solr
PPT
High Performance Mysql
PDF
Control and monitor_microservices_with_microprofile
PDF
Fixing twitter
PDF
Fixing_Twitter
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
PPTX
Paranoia 2018: A Process is No One
PPTX
Dev nexus 2017
PPTX
Data monstersrealtimeetl new
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
PDF
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
PDF
Presto: Fast SQL on Everything
PDF
What's new in JBoss ON 3.2
PPTX
REST Api Tips and Tricks
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Oracle InMemory hardcore edition
Monitor Microservices with MicroProfile Metrics
Monitor Micro-service with MicroProfile metrics
Building a Large Scale SEO/SEM Application with Apache Solr
High Performance Mysql
Control and monitor_microservices_with_microprofile
Fixing twitter
Fixing_Twitter
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Paranoia 2018: A Process is No One
Dev nexus 2017
Data monstersrealtimeetl new
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Presto: Fast SQL on Everything
What's new in JBoss ON 3.2
REST Api Tips and Tricks
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Advanced IT Governance
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced Soft Computing BINUS July 2025.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Advanced IT Governance
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
Advanced Soft Computing BINUS July 2025.pdf
Ad

ES Hero @ Instacart