SlideShare a Scribd company logo
Eventual Consistency 
@WalmartLabs with Kafka, 
SolrCloud and Hadoop 
Ayon Sinha 
asinha@walmartlabs.com
Introductions 
• @WalmartLabs – Building Walmart Global eCommerce from the 
2 
ground up 
• Data Foundation Team – Build, manage and provide tools for all OLTP 
operations
Large Scale eCommerce problems 
• Our customers love to shop online 24X7 and we love them for that 
• Reads are many orders of magnitude more than writes, and reads 
3 
have to be blazing fast (every millisecond has a monetary value attached to it, according to 
some studies) 
• Scaling up only takes you so far, you have to scale out 
• Low latency analytics absolutely canNOT be on OLTP data stores 
• No full table scans 
• Too many RDBMS column indexes leading to slow writes
Data Foundation Architecture 
4 
End
Very large scale and always available means.. 
• There is really NO way around Brewer’s CAP theorem 
Source: http://blog.mccrory.me/2010/11/03/cap-theorem- 
and-the-clouds/ 
• Embrace “eventual” consistency and asynchrony 
• Clearly articulate “eventual” to business stakeholders. Computer 
5 
“eventual” and human “eventual” are different scales entirely.
EC Use cases 
6
Typical data flow into EC data stores 
IC Web Service 
Web Service Client 
7 
Client 
Web Service Client 
EC Web Service 
Web Service Client 
Orchestrator Service 
Client 
Resource Tier Resource Tier Resource Tier 
Batch layer (processes data on 
Hadoop and loads into faster serving 
Kafka 
Event driven 
updater 
Kafka Consumer 
for Solr 
datastore) 
Fire job and pull results 
Kafka Consumer 
for Hadoop 
SolrCloud Hadoop 
Web Service Client 
70-80% of 
total load 
read 
write write
Challenges 
• Messaging System: Kafka was already being used and supported by 
8 
our Big Fast Data team 
• Virtualization 
– Shared CPU and memory among compute tenants generally bad 
for Search engine infrastructure. If your use-case takes off, you will 
eventually move to dedicated hardware. 
– We started with big dedicated bare-metal hardware 
– Virtualization requires complete lifecycle management 
• Serialization format 
– Our choice Avro (Schema + Data) 
• Hierarchical Object to Flat 
– If you are familiar with ElasticSearch, you’d say “No 
problem..maybe” 
– If you are already using HBase or Cassandra or similar, you’d say 
“No problem..maybe” 
– For Solr people, lets talk about schema.xml and plugin based 
flattening
SolrCloud 101 
• Solr is the web app wrapper on Lucene 
• SolrCloud is the distributed search where a bunch of Solr nodes 
9 
coordinate using ZooKeeper 
Source: SolrCloud Wiki
Solr schema.xml choices 
• Let each team build their own schema.xml from scratch 
10 
– This would require each customer team to intimately learn search 
engines, Solr etc. 
– This would also mean that each time there is a change in 
schema.xml, everything must be re-indexed. 
• Leverage Solr’s dynamic fields and create a naming convention 
– this gives the customer a kick-start 
– Schema.xml doesn’t need to change often and can be mostly used 
unchanged team to team
Best possible (unrealistic) scenario 
• No writes 
• No scoring, sorting, faceting 
• 100% document cache hit ratio 
• 99.6% of 192GB physical memory usage 
• 2000+ select/sec 
• 0.3 ms/query 
11
We even got.. 
12
Initial Solr Settings 
13
Getting Worse.. 
• Hundreds of ms/query with close to 100% Doc cache hit ratio 
14
Most common causes of slowdowns 
• GC pauses. Cure: trial-and-error with help from experts 
15
More naïve mistakes.. 
• Zookeeper in the same Solr machine 
16 
– We did not experience this, as we knew this going in 
• Frequent commits (in our case was DB-style, 1 doc/update + commit) 
– DON’T commit after every update. Solr commit is very different 
from DBMS commit. It opens up a new searcher and warms it up in 
the background. “Too many on-deck searchers” warning is a telltale 
sign 
– Batch as many docs as your application can tolerate in a single 
update post 
– We chose batching docs for 1 sec 
• IO contention (Log level too high) 
– Easy fix
Zookeeper 
• Prefer odd number of nodes for the ensemble as quorum is N/2 + 1 
• More nodes are not necessarily better 
17 
– 3 nodes is too low as you can handle only 1 failure 
– 5 nodes is good balance between HA and write speed. More nodes 
creates slower writes and slower quorums. 
– We had to go with 9 = 3 nodes in each of 3 protects us from a 
complete outage in one cloud. 
• Pay good attention to Zookeeper availability as SolrCloud will only 
function for a little while after ZK is dead 
• CloudSolrServer (SolrJ client) completely relies on Zookeeper for 
talking to SolrCloud
How do you do Disaster Recovery? 
• SolrCloud is CP model (CAP theorem) 
• You should not add replica from another data center. Every write will 
18 
get excruciatingly slow 
• Use Kafka or other messaging system to send data cross-DC 
• Get used to cross-DC eventual consistency. Monitor for tolerance 
thresholds
Metrics Monitoring 
• We poll metrics from Mbeans and push to Graphite servers 
19
Real Query Performance 
20
Real Update Performance 
21
Real Customer Results 
22
23 
Q&A

More Related Content

PPTX
Ubiquitous Solr - A Database's not-so-evil Twin
PPTX
Scaling SolrCloud to a large number of Collections
PPTX
Realtime classroom analytics powered by apache druid
PDF
Best practices for highly available and large scale SolrCloud
PDF
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PDF
Apache Solr 5.0 and beyond
PDF
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
PPT
Tale of two streaming frameworks- Apace Storm & Apache Flink
Ubiquitous Solr - A Database's not-so-evil Twin
Scaling SolrCloud to a large number of Collections
Realtime classroom analytics powered by apache druid
Best practices for highly available and large scale SolrCloud
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
Apache Solr 5.0 and beyond
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
Tale of two streaming frameworks- Apace Storm & Apache Flink

What's hot (20)

PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
SolrCloud on Hadoop
PPTX
Putting Kafka Into Overdrive
PPTX
Papers we love realtime at facebook
KEY
From 100s to 100s of Millions
PDF
Cassandra Core Concepts
PPTX
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
PDF
Make 2016 your year of SMACK talk
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
PPTX
kafka for db as postgres
PDF
Disaster Recovery Plans for Apache Kafka
PDF
Standing Up Your First Cluster
PDF
How to deploy Apache Spark 
to Mesos/DCOS
PPTX
Kafka & Hadoop - for NYC Kafka Meetup
PPTX
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
PPTX
Kafka at scale facebook israel
PPTX
Introduction to Kafka and Zookeeper
PDF
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
PPTX
Event Detection Pipelines with Apache Kafka
PPTX
Multi-Datacenter Kafka - Strata San Jose 2017
Cassandra @ Sony: The good, the bad, and the ugly part 2
SolrCloud on Hadoop
Putting Kafka Into Overdrive
Papers we love realtime at facebook
From 100s to 100s of Millions
Cassandra Core Concepts
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Make 2016 your year of SMACK talk
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
kafka for db as postgres
Disaster Recovery Plans for Apache Kafka
Standing Up Your First Cluster
How to deploy Apache Spark 
to Mesos/DCOS
Kafka & Hadoop - for NYC Kafka Meetup
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Kafka at scale facebook israel
Introduction to Kafka and Zookeeper
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Event Detection Pipelines with Apache Kafka
Multi-Datacenter Kafka - Strata San Jose 2017
Ad

Similar to Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop (20)

KEY
Writing Scalable Software in Java
PDF
Building Big Data Streaming Architectures
PDF
Building Asynchronous Applications
PDF
Architecting for the cloud elasticity security
PPTX
Hardware Provisioning
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PDF
Strata London 2019 Scaling Impala
PPTX
Strata London 2019 Scaling Impala.pptx
PDF
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
PDF
How Optimizely (Safely) Maximizes Database Concurrency.pdf
PDF
Scalability, Availability & Stability Patterns
PPTX
M6d cassandrapresentation
PPT
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
PPTX
Maria DB Galera Cluster for High Availability
PPTX
MariaDB Galera Cluster
PDF
Mtc learnings from isv & enterprise interaction
PPTX
Mtc learnings from isv & enterprise (dated - Dec -2014)
KEY
High scale flavour
PDF
Best Practice for Achieving High Availability in MariaDB
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
Writing Scalable Software in Java
Building Big Data Streaming Architectures
Building Asynchronous Applications
Architecting for the cloud elasticity security
Hardware Provisioning
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Strata London 2019 Scaling Impala
Strata London 2019 Scaling Impala.pptx
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
How Optimizely (Safely) Maximizes Database Concurrency.pdf
Scalability, Availability & Stability Patterns
M6d cassandrapresentation
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
Maria DB Galera Cluster for High Availability
MariaDB Galera Cluster
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise (dated - Dec -2014)
High scale flavour
Best Practice for Achieving High Availability in MariaDB
Building High-Throughput, Low-Latency Pipelines in Kafka
Ad

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
A Presentation on Artificial Intelligence
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Modernizing your data center with Dell and AMD
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
The AUB Centre for AI in Media Proposal.docx
A Presentation on Artificial Intelligence
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
“AI and Expert System Decision Support & Business Intelligence Systems”
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf

Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop

  • 1. Eventual Consistency @WalmartLabs with Kafka, SolrCloud and Hadoop Ayon Sinha asinha@walmartlabs.com
  • 2. Introductions • @WalmartLabs – Building Walmart Global eCommerce from the 2 ground up • Data Foundation Team – Build, manage and provide tools for all OLTP operations
  • 3. Large Scale eCommerce problems • Our customers love to shop online 24X7 and we love them for that • Reads are many orders of magnitude more than writes, and reads 3 have to be blazing fast (every millisecond has a monetary value attached to it, according to some studies) • Scaling up only takes you so far, you have to scale out • Low latency analytics absolutely canNOT be on OLTP data stores • No full table scans • Too many RDBMS column indexes leading to slow writes
  • 5. Very large scale and always available means.. • There is really NO way around Brewer’s CAP theorem Source: http://blog.mccrory.me/2010/11/03/cap-theorem- and-the-clouds/ • Embrace “eventual” consistency and asynchrony • Clearly articulate “eventual” to business stakeholders. Computer 5 “eventual” and human “eventual” are different scales entirely.
  • 7. Typical data flow into EC data stores IC Web Service Web Service Client 7 Client Web Service Client EC Web Service Web Service Client Orchestrator Service Client Resource Tier Resource Tier Resource Tier Batch layer (processes data on Hadoop and loads into faster serving Kafka Event driven updater Kafka Consumer for Solr datastore) Fire job and pull results Kafka Consumer for Hadoop SolrCloud Hadoop Web Service Client 70-80% of total load read write write
  • 8. Challenges • Messaging System: Kafka was already being used and supported by 8 our Big Fast Data team • Virtualization – Shared CPU and memory among compute tenants generally bad for Search engine infrastructure. If your use-case takes off, you will eventually move to dedicated hardware. – We started with big dedicated bare-metal hardware – Virtualization requires complete lifecycle management • Serialization format – Our choice Avro (Schema + Data) • Hierarchical Object to Flat – If you are familiar with ElasticSearch, you’d say “No problem..maybe” – If you are already using HBase or Cassandra or similar, you’d say “No problem..maybe” – For Solr people, lets talk about schema.xml and plugin based flattening
  • 9. SolrCloud 101 • Solr is the web app wrapper on Lucene • SolrCloud is the distributed search where a bunch of Solr nodes 9 coordinate using ZooKeeper Source: SolrCloud Wiki
  • 10. Solr schema.xml choices • Let each team build their own schema.xml from scratch 10 – This would require each customer team to intimately learn search engines, Solr etc. – This would also mean that each time there is a change in schema.xml, everything must be re-indexed. • Leverage Solr’s dynamic fields and create a naming convention – this gives the customer a kick-start – Schema.xml doesn’t need to change often and can be mostly used unchanged team to team
  • 11. Best possible (unrealistic) scenario • No writes • No scoring, sorting, faceting • 100% document cache hit ratio • 99.6% of 192GB physical memory usage • 2000+ select/sec • 0.3 ms/query 11
  • 14. Getting Worse.. • Hundreds of ms/query with close to 100% Doc cache hit ratio 14
  • 15. Most common causes of slowdowns • GC pauses. Cure: trial-and-error with help from experts 15
  • 16. More naïve mistakes.. • Zookeeper in the same Solr machine 16 – We did not experience this, as we knew this going in • Frequent commits (in our case was DB-style, 1 doc/update + commit) – DON’T commit after every update. Solr commit is very different from DBMS commit. It opens up a new searcher and warms it up in the background. “Too many on-deck searchers” warning is a telltale sign – Batch as many docs as your application can tolerate in a single update post – We chose batching docs for 1 sec • IO contention (Log level too high) – Easy fix
  • 17. Zookeeper • Prefer odd number of nodes for the ensemble as quorum is N/2 + 1 • More nodes are not necessarily better 17 – 3 nodes is too low as you can handle only 1 failure – 5 nodes is good balance between HA and write speed. More nodes creates slower writes and slower quorums. – We had to go with 9 = 3 nodes in each of 3 protects us from a complete outage in one cloud. • Pay good attention to Zookeeper availability as SolrCloud will only function for a little while after ZK is dead • CloudSolrServer (SolrJ client) completely relies on Zookeeper for talking to SolrCloud
  • 18. How do you do Disaster Recovery? • SolrCloud is CP model (CAP theorem) • You should not add replica from another data center. Every write will 18 get excruciatingly slow • Use Kafka or other messaging system to send data cross-DC • Get used to cross-DC eventual consistency. Monitor for tolerance thresholds
  • 19. Metrics Monitoring • We poll metrics from Mbeans and push to Graphite servers 19