SlideShare a Scribd company logo
CONFIDENTIAL – Do Not DistributeRetail Core Technology
Storm in Retail Context
Catalog data processing using Kafka, Storm & Micro-services
Karthik Deivasigamani
@WalmartLabs
2CONFIDENTIAL – Do Not DistributeRetail Core Technology
Retail Brick & Mortar
3CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
• Normalization
• Taxonomy
• Product Matching
• Shelving
• Attributes
• Grouping
4CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Normalization
• Attribute Normalization
• clothing_size, clothing_size_type,shoe_size, rug_size,
shirt_size,baby_clothing_size, ring_size, bed_size, pet_size,
pant_size, sock_size, eyewear_frame_size, serving_size,
table_size, waist_size…. => size
• Value Normalization
• e.l.f. cosmetics, e.l.f. Cosmetics, e.l.f, elf cosmetics, E.L.F. cosmetics, ELF
Cosmetics => elf Cosmetics
5CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Taxonomy
Classification => Product Type Category => Shelves
6CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Attributes
ProductTitle
Description
Brand
Color
Manufacturer
Model Number
Dimensions
7CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Product Matching
• UPC, GTIN, PLU, ISBN
• Algorithms
8CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Catalog
Grouping
Variants
Bundles
9CONFIDENTIAL – Do Not DistributeRetail Core Technology
Sources for catalog
• Market place Seller
• Content Providers
• Suppliers
• Merchants
• Legacy Catalogs
Product Catalog
10CONFIDENTIAL – Do Not DistributeRetail Core Technology
Characteristics of ingestion pipeline
• Zero message loss
• Fault Tolerance
• Source based Priority Queue
• Scale to millions of product updates in an hour.
• Product updates in NRT
• Checkpoint at various stages
11CONFIDENTIAL – Do Not DistributeRetail Core Technology
Processing source data
12CONFIDENTIAL – Do Not DistributeRetail Core Technology
Processing source data
• Choice of language
• Teams operate independently
• Platform with pluggable services
Bolt
13CONFIDENTIAL – Do Not DistributeRetail Core Technology
Source Pipeline
Kafka
Spout
Validate
Persist
Normalization
Classification
Attribute Extraction
Matching
Source
Variant
Grouping
Validate
Persist
Publish
14CONFIDENTIAL – Do Not DistributeRetail Core Technology
Product Pipeline
Kafka
Spout
Validate
Merge
Shelve
Attribute Extraction
Product
Variant
Grouping
Validate
Persist
Publish
15CONFIDENTIAL – Do Not DistributeRetail Core Technology
Micro batched Grouping Pipeline
Kafka
Spout
Router
Bolt
Product Group
Emitter Bolt
Validate
Persist
Publish
Micro-
Batching
Bolt
Kafka Payload Sample:
{
“variant_product_id” : “1234”,
“product_group_id” : “ABC”
}
Field
Grouping
16CONFIDENTIAL – Do Not DistributeRetail Core Technology
Back Pressure
• Message loss
• Spout stops emitting
Knobs
• Spout parallelism
• kafka message fetch size
• max.spout.pending = max number of tuples that can be unacked at any given time
• Worker parallelism
• Bolt parallelism
17CONFIDENTIAL – Do Not DistributeRetail Core Technology
Failures
• Data Errors
• Services Timeout
• Service outage
• Fatal Errors
• Validations at various stages
• Async IO using RxJava, Hystrix, Retries
• Hystrix Circuit Breaker
• Failing Tuples
18CONFIDENTIAL – Do Not DistributeRetail Core Technology
Characteristics of ingestion pipeline
• Zero message loss
– Anchoring and Failing Tuple, maxOffsetBehind = Long.MAX_VALUE
• Product updates in NRT
• Priority Queue
– Partition based and topic based
• Scale to millions of product updates in an hour.
• Fault Tolerance
– Worker failures, Node failures are handled by storm
– Nimbus and Supervisors are stateless, fail-fast
• Checkpoint at various stages
19CONFIDENTIAL – Do Not DistributeRetail Core Technology
What we monitor
• Kafka Lag
• Bolt Capacity
• JVM – heap, threads
• Service SLA
• Acked and Failed Tuples
• Data Errors and System Errors
• OS Metrics
20CONFIDENTIAL – Do Not DistributeRetail Core Technology
Tools For Monitoring
• Kafkamon – Monitor lag in the pipeline
• Guano – Dump and restore ZK state
• Storm UI
• Elastic & Kibana – Async logging using log4j2, scribe
• Grafana to monitor service latency
• Druid for tracking and analytics
• FIT – Fault Injection Tool
21CONFIDENTIAL – Do Not DistributeRetail Core Technology
Storm Cluster – Product Catalog
2
Nimbus
7
Supervisor
320
Cores
2TB
Memory
35
Slots
14
Topologies
150M
Kafka
Messages
6481
Executors
360M
Network IO
Microservice
22CONFIDENTIAL – Do Not DistributeRetail Core Technology
Storm Cluster – Audit / Tracking
1
Nimbus
5
Supervisor
160
Cores
1TB
Memory
155
Slots
94
Topologies
1B+
Kafka
Messages
1396
Executors
23CONFIDENTIAL – Do Not DistributeRetail Core Technology
Holiday Season
• Few thousands sellers
• 100M+ seller SKU
• 6x traffic
• Upgraded to 1.0.2 – HA Nimbus, Improved performance, Improved backpressure handling
• Change detection
• Improved our monitoring, periodic fault injection
• Fast track / Priority Queue for top items
How we prepared
24CONFIDENTIAL – Do Not DistributeRetail Core Technology
Lessons learnt
• Things will fail
• Monitor everything
• Automation
• Scale is not a feature
• Storm works well with large payloads
• Logs don’t lie
• Micro services come at a cost
25CONFIDENTIAL – Do Not DistributeRetail Core Technology
Path ahead
• Stateful stream processing
• Storm 1.1.0
– Streaming SQL
– Druid integration
– PMML(Predictive Model Markup
Language) Support
26CONFIDENTIAL – Do Not DistributeRetail Core Technology
Team
Yes, we are hiring!
http://www.walmartlabs.com/jobs/

More Related Content

PPTX
Storm worker redesign
PPT
Tale of two streaming frameworks- Apace Storm & Apache Flink
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
PPTX
Architecture of a Kafka camus infrastructure
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
PPTX
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
Storm worker redesign
Tale of two streaming frameworks- Apace Storm & Apache Flink
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Architecture of a Kafka camus infrastructure
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021

What's hot (20)

PDF
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
PPTX
Espresso Database Replication with Kafka, Tom Quiggle
PDF
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
PPTX
Apache Kafka 0.8 basic training - Verisign
PPTX
How to Lock Down Apache Kafka and Keep Your Streams Safe
PPTX
Capture the Streams of Database Changes
PPTX
Apache Kafka at LinkedIn
PPTX
kafka for db as postgres
PDF
Flink forward-2017-netflix keystones-paas
KEY
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
PDF
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
PPTX
Introduction to Kafka
PDF
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
PPTX
Kafka - Linkedin's messaging backbone
PDF
Data pipeline with kafka
PPTX
Streaming in Practice - Putting Apache Kafka in Production
PPTX
Exactly-once Stream Processing with Kafka Streams
PDF
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
PPTX
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Espresso Database Replication with Kafka, Tom Quiggle
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Apache Kafka 0.8 basic training - Verisign
How to Lock Down Apache Kafka and Keep Your Streams Safe
Capture the Streams of Database Changes
Apache Kafka at LinkedIn
kafka for db as postgres
Flink forward-2017-netflix keystones-paas
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Introduction to Kafka
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Kafka - Linkedin's messaging backbone
Data pipeline with kafka
Streaming in Practice - Putting Apache Kafka in Production
Exactly-once Stream Processing with Kafka Streams
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Ad

Similar to Apache Storm In Retail Context (20)

PDF
[Retail & CPG Day 2019] 기조연설 | Cloud Journey of Traditional Retailers for Dig...
PDF
IBM Retail Tech Trends
PPTX
Top 5 Strategies for Retail Data Analytics
PPTX
Retail therapy - the digital transformation of shopping
PDF
5 Key Retail Trends To Watch in 2018
PDF
Computer Vision: Coming to a Store Near You - Brent Biddulph
PDF
Event Streaming in Retail with Apache Kafka
PPTX
Machine Learning is Much More Than Product Recommendations
PDF
Digital Transformation in Retail
PDF
Zinnov zones - Digital in Retail
PDF
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
PPTX
The Age of Retail Automation
PDF
Digital Transformation Shaping Future of Retail Industry.pdf
PDF
Retail Technologies and Retail Trends That will Define The Future
PPTX
PPTX
PDF
Integration of Retail and IoT in Japan
DOCX
Future of Work in Retail Sector in India
PPTX
NRF Big Show
PDF
Retail Technology Trends.pdf
[Retail & CPG Day 2019] 기조연설 | Cloud Journey of Traditional Retailers for Dig...
IBM Retail Tech Trends
Top 5 Strategies for Retail Data Analytics
Retail therapy - the digital transformation of shopping
5 Key Retail Trends To Watch in 2018
Computer Vision: Coming to a Store Near You - Brent Biddulph
Event Streaming in Retail with Apache Kafka
Machine Learning is Much More Than Product Recommendations
Digital Transformation in Retail
Zinnov zones - Digital in Retail
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
The Age of Retail Automation
Digital Transformation Shaping Future of Retail Industry.pdf
Retail Technologies and Retail Trends That will Define The Future
Integration of Retail and IoT in Japan
Future of Work in Retail Sector in India
NRF Big Show
Retail Technology Trends.pdf
Ad

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Monthly Chronicles - July 2025
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Monthly Chronicles - July 2025

Apache Storm In Retail Context

  • 1. CONFIDENTIAL – Do Not DistributeRetail Core Technology Storm in Retail Context Catalog data processing using Kafka, Storm & Micro-services Karthik Deivasigamani @WalmartLabs
  • 2. 2CONFIDENTIAL – Do Not DistributeRetail Core Technology Retail Brick & Mortar
  • 3. 3CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog • Normalization • Taxonomy • Product Matching • Shelving • Attributes • Grouping
  • 4. 4CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Normalization • Attribute Normalization • clothing_size, clothing_size_type,shoe_size, rug_size, shirt_size,baby_clothing_size, ring_size, bed_size, pet_size, pant_size, sock_size, eyewear_frame_size, serving_size, table_size, waist_size…. => size • Value Normalization • e.l.f. cosmetics, e.l.f. Cosmetics, e.l.f, elf cosmetics, E.L.F. cosmetics, ELF Cosmetics => elf Cosmetics
  • 5. 5CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Taxonomy Classification => Product Type Category => Shelves
  • 6. 6CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Attributes ProductTitle Description Brand Color Manufacturer Model Number Dimensions
  • 7. 7CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Product Matching • UPC, GTIN, PLU, ISBN • Algorithms
  • 8. 8CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Catalog Grouping Variants Bundles
  • 9. 9CONFIDENTIAL – Do Not DistributeRetail Core Technology Sources for catalog • Market place Seller • Content Providers • Suppliers • Merchants • Legacy Catalogs Product Catalog
  • 10. 10CONFIDENTIAL – Do Not DistributeRetail Core Technology Characteristics of ingestion pipeline • Zero message loss • Fault Tolerance • Source based Priority Queue • Scale to millions of product updates in an hour. • Product updates in NRT • Checkpoint at various stages
  • 11. 11CONFIDENTIAL – Do Not DistributeRetail Core Technology Processing source data
  • 12. 12CONFIDENTIAL – Do Not DistributeRetail Core Technology Processing source data • Choice of language • Teams operate independently • Platform with pluggable services Bolt
  • 13. 13CONFIDENTIAL – Do Not DistributeRetail Core Technology Source Pipeline Kafka Spout Validate Persist Normalization Classification Attribute Extraction Matching Source Variant Grouping Validate Persist Publish
  • 14. 14CONFIDENTIAL – Do Not DistributeRetail Core Technology Product Pipeline Kafka Spout Validate Merge Shelve Attribute Extraction Product Variant Grouping Validate Persist Publish
  • 15. 15CONFIDENTIAL – Do Not DistributeRetail Core Technology Micro batched Grouping Pipeline Kafka Spout Router Bolt Product Group Emitter Bolt Validate Persist Publish Micro- Batching Bolt Kafka Payload Sample: { “variant_product_id” : “1234”, “product_group_id” : “ABC” } Field Grouping
  • 16. 16CONFIDENTIAL – Do Not DistributeRetail Core Technology Back Pressure • Message loss • Spout stops emitting Knobs • Spout parallelism • kafka message fetch size • max.spout.pending = max number of tuples that can be unacked at any given time • Worker parallelism • Bolt parallelism
  • 17. 17CONFIDENTIAL – Do Not DistributeRetail Core Technology Failures • Data Errors • Services Timeout • Service outage • Fatal Errors • Validations at various stages • Async IO using RxJava, Hystrix, Retries • Hystrix Circuit Breaker • Failing Tuples
  • 18. 18CONFIDENTIAL – Do Not DistributeRetail Core Technology Characteristics of ingestion pipeline • Zero message loss – Anchoring and Failing Tuple, maxOffsetBehind = Long.MAX_VALUE • Product updates in NRT • Priority Queue – Partition based and topic based • Scale to millions of product updates in an hour. • Fault Tolerance – Worker failures, Node failures are handled by storm – Nimbus and Supervisors are stateless, fail-fast • Checkpoint at various stages
  • 19. 19CONFIDENTIAL – Do Not DistributeRetail Core Technology What we monitor • Kafka Lag • Bolt Capacity • JVM – heap, threads • Service SLA • Acked and Failed Tuples • Data Errors and System Errors • OS Metrics
  • 20. 20CONFIDENTIAL – Do Not DistributeRetail Core Technology Tools For Monitoring • Kafkamon – Monitor lag in the pipeline • Guano – Dump and restore ZK state • Storm UI • Elastic & Kibana – Async logging using log4j2, scribe • Grafana to monitor service latency • Druid for tracking and analytics • FIT – Fault Injection Tool
  • 21. 21CONFIDENTIAL – Do Not DistributeRetail Core Technology Storm Cluster – Product Catalog 2 Nimbus 7 Supervisor 320 Cores 2TB Memory 35 Slots 14 Topologies 150M Kafka Messages 6481 Executors 360M Network IO Microservice
  • 22. 22CONFIDENTIAL – Do Not DistributeRetail Core Technology Storm Cluster – Audit / Tracking 1 Nimbus 5 Supervisor 160 Cores 1TB Memory 155 Slots 94 Topologies 1B+ Kafka Messages 1396 Executors
  • 23. 23CONFIDENTIAL – Do Not DistributeRetail Core Technology Holiday Season • Few thousands sellers • 100M+ seller SKU • 6x traffic • Upgraded to 1.0.2 – HA Nimbus, Improved performance, Improved backpressure handling • Change detection • Improved our monitoring, periodic fault injection • Fast track / Priority Queue for top items How we prepared
  • 24. 24CONFIDENTIAL – Do Not DistributeRetail Core Technology Lessons learnt • Things will fail • Monitor everything • Automation • Scale is not a feature • Storm works well with large payloads • Logs don’t lie • Micro services come at a cost
  • 25. 25CONFIDENTIAL – Do Not DistributeRetail Core Technology Path ahead • Stateful stream processing • Storm 1.1.0 – Streaming SQL – Druid integration – PMML(Predictive Model Markup Language) Support
  • 26. 26CONFIDENTIAL – Do Not DistributeRetail Core Technology Team Yes, we are hiring! http://www.walmartlabs.com/jobs/