SlideShare a Scribd company logo
Taming Distributed
Systems
April 2024
Key Insights
from Wix's Large-Scale Experience
Taming Distributed Systems
Hi,
I’m Natan
Backend Infra Tech Lead @Wix
Yoga enthusiast
Speaker
Blogger
natansilnitsky
www.natansil.com
@NSilnitsky
Taming Distributed Systems @NSilnitsky Taming Distributed Systems
Taming Distributed Systems @NSilnitsky
~4B
Daily HTTP
Transactions
4000±
Microservices
in production
~70B
Kafka
messages a day
Taming Distributed Systems @NSilnitsky
Agenda
Event sourcing & CQRS Standardized CRUD services
Dev velocity
1
Scalability
2
Performance
3
Resilience
4
* stores
Taming Distributed Systems @NSilnitsky
Event Sourcing & CQRS
Wix Stores Example
Taming Distributed Systems @NSilnitsky
Add/Update product QueryProduct
Product Catalog
Taming Distributed Systems @NSilnitsky
WriteProduct
Event sourcing and CQRS
Product 123:
Product Created
1
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
4
ReadProduct
Name
Price
Description
Stock-level
Append-Only
Events
Snapshot
Taming Distributed Systems @NSilnitsky
Event
Store
CreateProduct:
Catalog Write API
Event sourcing and CQRS
Product Created
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS
Product 123:
Product Created
1
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
4
Event
Store
CreateProduct:
Catalog Write API
Product Created
Taming Distributed Systems @NSilnitsky
Replay
GetProduct
Catalog Read API
Event sourcing and CQRS
Replay
events
Product 123:
Product Created
1
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
4
Event
Store
CreateProduct:
Catalog Write API
Product Created
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS – Advantages
Debug with
“time travel”
GetProduct
Catalog Read API
Replay
events
Product 123:
Product Created
1
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
4
Event
Store
CreateProduct:
Catalog Write API
Product Created
Taming Distributed Systems @NSilnitsky
Product 123:
Product Created
1
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
4

..
5
Product Changed (stock-level)
30
Event
Store
Event sourcing and CQRS
Taming Distributed Systems @NSilnitsky
Product 123:
Product Created
1
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
4

..
5
Product Changed (stock-level)
30
Event
Store
Product 123 Snapshot
Snapshot
Repository
Event sourcing and CQRS
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS - Advantages
Catalog Read API
Inventory
snapshot
Product
Snapshot
Product Created
Product Changed (price)
2
Product Changed (description)
3
Product Changed (stock-level)
Product 123:
1
4
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS - Disadvantages
Product
DB
Create/Read Product
Catalog CRUD
Event
Store
CreateProduct:
Catalog Write API
Product Created
Product 123 Snapshot
Snapshot
Repository
Catalog Read API
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS - Disadvantages
Product Created (price 4$)
1
Product Changed (price 6$)
2
Product 123:
Snapshot
Repository
Delayed
Product snapshot Consumer
Catalog Write API
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS - Disadvantages
Product 123:
Snapshot
Repository
Product Created (price 4$)
1
Product snapshot Consumer
Catalog Write API
Product Changed
2
Product Created (price 4$)
1
Product Changed (price 6$)
2 Delayed
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS - Disadvantages
Product Changed (descriptionA)
3
Product Changed (descriptionB)
4
Product 123:
Snapshot
Repository
Product Created (price 4$)
1
Product Changed (descriptionB)
2
Catalog Write API
Product Changed (descriptionA)
3
Product Created (price 4$)
1
Product Changed (price 6$)
2 Delayed
Product snapshot Consumer
Taming Distributed Systems @NSilnitsky
Event sourcing and CQRS - Disadvantages
Product Changed (descriptionA)
3
Product Changed (descriptionB)
4
Product 123:
Snapshot
Repository
Product Created (price 4$)
1
Product 123 Snapshot
Product Changed (descriptionB)
2
Product
4$
DescriptionA
Product Changed (descriptionA)
3
Catalog Write API
Product Created (price 4$)
1
Product Changed (price 6$)
2 Delayed
Product snapshot Consumer
Taming Distributed Systems @NSilnitsky
Read your own writes
Event sourcing and CQRS - Disadvantages
Taming Distributed Systems
Event sourcing and
CQRS -
Disadvantages
Complexity
Eventual consistency only
Massive scale
Corrupted snapshot
Read your own writes
Taming Distributed Systems @NSilnitsky
CRUD APIs
+
Standardization
Taming Distributed Systems @NSilnitsky
CreateProduct
Product
Document Store
Catalog API
CRUD - platformized
ReadProduct
UpdateProduct
DeleteProduct
Unified!
Taming Distributed Systems @NSilnitsky
Wix’s Open Platform
CRUD
CRUD
+
Event sourcing
Was
Independent
“startups”
Now
Single Open
Platform
Taming Distributed Systems @NSilnitsky
Wix’s Open Platform
CRUD
CRUD
+
Event sourcing
API First
APIs - TDD
+
FE driven
Was
Independent
“startups”
Now
Single Open
Platform
Taming Distributed Systems
@NSilnitsky
RPC
Product Catalog API
Cart
Service
Internal Wix
Developer
API First - platformized CRUD
Taming Distributed Systems
@NSilnitsky
HTTP/SDK
Product Catalog API
Internal Wix
Developer
PoS
App
External App
Developer
API First - platformized CRUD
Taming Distributed Systems
@NSilnitsky
JavaScript
method
Product Catalog API
Internal Wix
Developer
Custom
Filter
External App
Developer
External Wix Site
Developer (Velo)
API First - platformized CRUD
Taming Distributed Systems @NSilnitsky
Wix Cart service
3rd party
PoS app
API
Wix Stores on Wix’s Open Platform
Site Code
extensions
“Velo”
API
Taming Distributed Systems @NSilnitsky
Wix Cart service
3rd party Tax
Calculator
SPI
SPI
Wix Product
catalog service
Wix Stores on Wix’s Open Platform
Taming Distributed Systems @NSilnitsky
Wix Stores on Wix’s Open Platform
Wix Cart service
3rd party
Analytics app
Cart Item Added
Site Code
extensions
“Velo”
Taming Distributed Systems @NSilnitsky
Wix Stores on Wix’s Open Platform
Wix Cart service
3rd party
PoS app
3rd party
Analytics app
3rd party Tax
Calculator
SPI
API
Cart Item Added
SPI
Wix Product
catalog service
Site Code
extensions
“Velo”
API
Taming Distributed Systems @NSilnitsky
Stores
Bookings
Events
Forms
Loyalty Rewards
Tickets Policies
Checkout
Time
Slots
Schemas
Sub
missions
Guests
Coupons
Calendar
Orders
Waitlist
Cart
Catalog
Programs
API First - platformized CRUD
Internal Wix
Developer
External App
Developer
External Wix Site
Developer (Velo)
Taming Distributed Systems @NSilnitsky
CreateProduct
ReadProduct
UpdateProduct
DeleteProduct
Catalog API
API First - platformized CRUD
Taming Distributed Systems @NSilnitsky
CreateProduct
ReadProduct
UpdateProduct
DeleteProduct
Catalog API
API First - platformized CRUD
Taming Distributed Systems @NSilnitsky
CreateProduct
ReadProduct
UpdateProduct
DeleteProduct
Catalog API
service ProductService {
option (service_entity).message = "...v3.Product";
rpc CreateProduct (CreateProductRequest) returns
(CreateProductResponse) ...
rpc GetProduct (GetProductRequest) returns
(GetProductResponse) ...
rpc UpdateProduct (UpdateProduct) returns (UpdateProduct)
...


}
message Product {
option (entity) = {fqdn: "...v3.product"};
google.protobuf.StringValue id = 1;
google.protobuf.Int64Value revision = 2;
...
repeated Inventory inventory = 25;
...
}
API First - platformized CRUD
Taming Distributed Systems @NSilnitsky
API First - platformized CRUD
CreateProduct
ReadProduct
UpdateProduct
DeleteProduct
Catalog API
service ProductService {
option (service_entity).message = "...v3.Product";
rpc CreateProduct (CreateProductRequest) returns
(CreateProductResponse) ...
rpc GetProduct (GetProductRequest) returns
(GetProductResponse) ...
rpc UpdateProduct (UpdateProduct) returns (UpdateProduct)
...


}
message Product {
option (entity) = {fqdn: "...v3.product"};
google.protobuf.StringValue id = 1;
google.protobuf.Int64Value revision = 2;
...
repeated Inventory inventory = 25;
...
}
Taming Distributed Systems @NSilnitsky
API First - platformized CRUD
Taming Distributed Systems @NSilnitsky
API First - platformized CRUD
Taming Distributed Systems @NSilnitsky
Event Driven Architecture
Automatic Domain Events FTW
Taming Distributed Systems @NSilnitsky
EDA - Domain Events
CreateProduct
UpdateProduct
DeleteProduct
Product Created
Product Updated
Product Deleted
Product
Document Store
Catalog Service
* DE describes

Taming Distributed Systems @NSilnitsky
EDA - Domain Events
CreateProduct
UpdateProduct
DeleteProduct
Product Created
Product Updated
Product Deleted
Catalog Service
Product
SDL
* SDL is
document based.
no direct SQL
Taming Distributed Systems @NSilnitsky
EDA - Domain Events
CreateProduct
UpdateProduct
DeleteProduct
Product Created
Product Updated
Product Deleted
Product
SDL
Catalog Service
Product
SDL
service ProductService {
...
rpc CreateProduct (CreateProductRequest) returns
(CreateProductResponse) {
...
option (callback) = {
event_type: CREATED
};
}
}
Taming Distributed Systems @NSilnitsky
EDA - Domain Events
Data
warehouse/Lake
CreateProduct
UpdateProduct
DeleteProduct
Product Created
Product Updated
Product Deleted
Catalog Service
Product
SDL
* debugging
corruption
Taming Distributed Systems @NSilnitsky
EDA - Domain Events
ebay-bridge Service
CreateProduct
UpdateProduct
DeleteProduct
Catalog Service
Product Created
Product Updated
Product Deleted
Product
SDL
Taming Distributed Systems @NSilnitsky
EDA - Domain Events
CreateProduct
UpdateProduct
DeleteProduct
Product Created
Product Updated
Product Deleted
Catalog Service
Product
SDL
Taming Distributed Systems @NSilnitsky
Data consistency in Wix’s EDA
Resilient Producers and consumers
Taming Distributed Systems @NSilnitsky
Make DB Update & Event Producing Atomic
Catalog
Service
Ebay Bridge
Service
* atomic,
otherwise
Taming Distributed Systems @NSilnitsky
Produce event to S3
Resilient Producer
Catch Unsent Events
Catalog Service
Taming Distributed Systems @NSilnitsky
Produce
to Kafka
Poll
Produce event to S3
Resilient Producer
Fallback to S3 and Heal
Catalog Service Healer Service
Taming Distributed Systems @NSilnitsky
Consumer retries + DLQ
Make DB Update & Event Producing Atomic
Catalog
Service
Ebay Bridge
Service
Taming Distributed Systems @NSilnitsky
Alternative - use outbox pattern and/or CDC
Transaction
Outbox Table
Insert
Product Table
Insert
Update
Delete
Database
Instant read-your-own-writes
consistency in Catalog service
Write to
database CDC
Read from
Outbox Table
Kafka
Connect
Debezium
connector
Publishes
messages
to brokers
Kafka
Broker
Eventually consistent data
exchange with Ebay Bridge Service
Catalog
Service
Ebay Bridge Service
Taming Distributed Systems @NSilnitsky
Data Projection &
query optimization
Materializer
Taming Distributed Systems @NSilnitsky
Query latency - naive
CreateProduct
ReadProduct
DeleteProduct
Catalog Service
Product
SDL
FilterProductWithInventory
Inventory Service
Inventory
SDL
RPC
1 2
Taming Distributed Systems @NSilnitsky
Multi-step
Query latency - naive
CreateProduct
ReadProduct
DeleteProduct
Catalog Service
Product
SDL
FilterProductWithInventory
Inventory Service
Inventory
SDL
RPC
price < 100 and stock > 4
2
1
Taming Distributed Systems @NSilnitsky
Query latency - DB join
CreateProduct
ReadProduct
DeleteProduct
Catalog Service
Product
SDL
FilterProductWithInventory
Inventory Service
Inventory
SDL
DB level Join
Taming Distributed Systems @NSilnitsky
Query Latency - Materializer
Product +
Inventory
Materializer
Inventory
Inventory updated Event
Catalog
Service
Inventory
Service
FilterProductWithInventory
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
* no consistency
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
Taming Distributed Systems @NSilnitsky
Simplicity
Onboarding new team member
Write performance
Read performance
consistency
Audit log/time machine
Projections/queries
Comparing Event sourcing
to Wix’s CRUD based solution
Event Sourcing
CRUD
SDL+Domain Events
Materializer
Taming Distributed Systems @NSilnitsky
Summary
Wix successfully shifted its vast distributed system entirely to
CRUD-based microservices, moving away from a CRUD/event sourcing
hybrid.
.
.
.
Taming Distributed Systems @NSilnitsky
Summary
Wix successfully shifted its vast distributed system entirely to
CRUD-based microservices, moving away from a CRUD/event sourcing
hybrid.
This transformation was driven by a commitment to standardization,
managed infrastructure with automated code generation, and a
decoupled architecture.
Taming Distributed Systems @NSilnitsky
Summary
Advanced tools were also implemented to boost development speed,
ensure system resilience, and optimize for scale and performance.
Domain Events
Resilient Producer Materializer
Simple Data Layer
Taming Distributed Systems
Q & A
Thank you
natansilnitsky www.natansil.com
@NSilnitsky

More Related Content

PDF
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
PDF
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
PDF
CQRS and Event Sourcing: A DevOps perspective
PDF
OSA Con 2022 - Building a Real-time Analytics Application with Apache Pulsar ...
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Architecting Microservices Applications with Instant Analytics
PPTX
Cloud Computing in the Enterprise
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
CQRS and Event Sourcing: A DevOps perspective
OSA Con 2022 - Building a Real-time Analytics Application with Apache Pulsar ...
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Architecting Microservices Applications with Instant Analytics
Cloud Computing in the Enterprise

Similar to Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - Devoxx Greece (20)

PDF
Prototyping applications with heroku and elasticsearch
PDF
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 팹턮 싀슔
PPTX
CQRS and Event Sourcing
PDF
The Scout24 Data Platform (A Technical Deep Dive)
PPTX
Redis Streams plus Spark Structured Streaming
PDF
Microservices with Kafka Ecosystem
PDF
What is Apache Kafka and What is an Event Streaming Platform?
PDF
Mobile Analytics mit Elasticsearch und Kibana
PPTX
Kubernetes Controller for Pull Request Based Environment
PDF
Intershop Commerce Management with Microsoft SQL Server
PDF
GPPB2020 - Milan - Power BI dataflows deep dive
PPTX
Amazon RDS for PostgreSQL - PGConf 2016
PDF
Building event-driven (Micro)Services with Apache Kafka Ecosystem
PDF
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Micro frontend: The microservices puzzle extended to frontend
PDF
Neo4j Database and Graph Platform Overview
 
PDF
Streaming Visualization
PDF
Patterns for Building Streaming Apps
PDF
GraphQL Advanced
Prototyping applications with heroku and elasticsearch
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 팹턮 싀슔
CQRS and Event Sourcing
The Scout24 Data Platform (A Technical Deep Dive)
Redis Streams plus Spark Structured Streaming
Microservices with Kafka Ecosystem
What is Apache Kafka and What is an Event Streaming Platform?
Mobile Analytics mit Elasticsearch und Kibana
Kubernetes Controller for Pull Request Based Environment
Intershop Commerce Management with Microsoft SQL Server
GPPB2020 - Milan - Power BI dataflows deep dive
Amazon RDS for PostgreSQL - PGConf 2016
Building event-driven (Micro)Services with Apache Kafka Ecosystem
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Building Event Driven (Micro)services with Apache Kafka
Micro frontend: The microservices puzzle extended to frontend
Neo4j Database and Graph Platform Overview
 
Streaming Visualization
Patterns for Building Streaming Apps
GraphQL Advanced
Ad

More from Natan Silnitsky (20)

PDF
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
PDF
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
PDF
Reinventing Microservices Efficiency and Innovation with Single-Runtime
PDF
Async Excellence Unlocking Scalability with Kafka - Devoxx Greece
PDF
Wix Single-Runtime - Conquering the multi-service challenge
PDF
WeAreDevs - Supercharge Your Developer Journey with Tiny Atomic Habits
PDF
Effective Strategies for Wix's Scaling challenges - GeeCon
PDF
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
PDF
DevSum - Lessons Learned from 2000 microservices
PDF
GeeCon - Lessons Learned from 2000 microservices
PDF
Migrating to Multi Cluster Managed Kafka - ApacheKafkaIL
PDF
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
PDF
BuildStuff - Lessons Learned from 2000 Event Driven Microservices
PDF
Lessons Learned from 2000 Event Driven Microservices - Reversim
PDF
Devoxx Ukraine - Kafka based Global Data Mesh
PDF
Devoxx UK - Migrating to Multi Cluster Managed Kafka
PDF
Dev Days Europe - Kafka based Global Data Mesh at Wix
PDF
Kafka Summit London - Kafka based Global Data Mesh at Wix
PDF
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
PDF
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
Async-ronizing Success at Wix - Patterns for Seamless Microservices - Devoxx ...
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Reinventing Microservices Efficiency and Innovation with Single-Runtime
Async Excellence Unlocking Scalability with Kafka - Devoxx Greece
Wix Single-Runtime - Conquering the multi-service challenge
WeAreDevs - Supercharge Your Developer Journey with Tiny Atomic Habits
Effective Strategies for Wix's Scaling challenges - GeeCon
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
DevSum - Lessons Learned from 2000 microservices
GeeCon - Lessons Learned from 2000 microservices
Migrating to Multi Cluster Managed Kafka - ApacheKafkaIL
Wix+Confluent Meetup - Lessons Learned from 2000 Event Driven Microservices
BuildStuff - Lessons Learned from 2000 Event Driven Microservices
Lessons Learned from 2000 Event Driven Microservices - Reversim
Devoxx Ukraine - Kafka based Global Data Mesh
Devoxx UK - Migrating to Multi Cluster Managed Kafka
Dev Days Europe - Kafka based Global Data Mesh at Wix
Kafka Summit London - Kafka based Global Data Mesh at Wix
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
Ad

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPT
Introduction Database Management System for Course Database
PDF
AI in Product Development-omnex systems
PPTX
Introduction to Artificial Intelligence
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
System and Network Administration Chapter 2
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Nekopoi APK 2025 free lastest update
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How Creative Agencies Leverage Project Management Software.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
top salesforce developer skills in 2025.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Upgrade and Innovation Strategies for SAP ERP Customers
Introduction Database Management System for Course Database
AI in Product Development-omnex systems
Introduction to Artificial Intelligence
VVF-Customer-Presentation2025-Ver1.9.pptx
Transform Your Business with a Software ERP System
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
How to Choose the Right IT Partner for Your Business in Malaysia
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PTS Company Brochure 2025 (1).pdf.......
How to Migrate SBCGlobal Email to Yahoo Easily
System and Network Administration Chapter 2
Understanding Forklifts - TECH EHS Solution
Nekopoi APK 2025 free lastest update
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - Devoxx Greece

  • 1. Taming Distributed Systems April 2024 Key Insights from Wix's Large-Scale Experience
  • 2. Taming Distributed Systems Hi, I’m Natan Backend Infra Tech Lead @Wix Yoga enthusiast Speaker Blogger natansilnitsky www.natansil.com @NSilnitsky
  • 3. Taming Distributed Systems @NSilnitsky Taming Distributed Systems
  • 4. Taming Distributed Systems @NSilnitsky ~4B Daily HTTP Transactions 4000± Microservices in production ~70B Kafka messages a day
  • 5. Taming Distributed Systems @NSilnitsky Agenda Event sourcing & CQRS Standardized CRUD services Dev velocity 1 Scalability 2 Performance 3 Resilience 4 * stores
  • 6. Taming Distributed Systems @NSilnitsky Event Sourcing & CQRS Wix Stores Example
  • 7. Taming Distributed Systems @NSilnitsky Add/Update product QueryProduct Product Catalog
  • 8. Taming Distributed Systems @NSilnitsky WriteProduct Event sourcing and CQRS Product 123: Product Created 1 Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) 4 ReadProduct Name Price Description Stock-level Append-Only Events Snapshot
  • 9. Taming Distributed Systems @NSilnitsky Event Store CreateProduct: Catalog Write API Event sourcing and CQRS Product Created
  • 10. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS Product 123: Product Created 1 Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) 4 Event Store CreateProduct: Catalog Write API Product Created
  • 11. Taming Distributed Systems @NSilnitsky Replay GetProduct Catalog Read API Event sourcing and CQRS Replay events Product 123: Product Created 1 Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) 4 Event Store CreateProduct: Catalog Write API Product Created
  • 12. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS – Advantages Debug with “time travel” GetProduct Catalog Read API Replay events Product 123: Product Created 1 Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) 4 Event Store CreateProduct: Catalog Write API Product Created
  • 13. Taming Distributed Systems @NSilnitsky Product 123: Product Created 1 Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) 4 
.. 5 Product Changed (stock-level) 30 Event Store Event sourcing and CQRS
  • 14. Taming Distributed Systems @NSilnitsky Product 123: Product Created 1 Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) 4 
.. 5 Product Changed (stock-level) 30 Event Store Product 123 Snapshot Snapshot Repository Event sourcing and CQRS
  • 15. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS - Advantages Catalog Read API Inventory snapshot Product Snapshot Product Created Product Changed (price) 2 Product Changed (description) 3 Product Changed (stock-level) Product 123: 1 4
  • 16. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS - Disadvantages Product DB Create/Read Product Catalog CRUD Event Store CreateProduct: Catalog Write API Product Created Product 123 Snapshot Snapshot Repository Catalog Read API
  • 17. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS - Disadvantages Product Created (price 4$) 1 Product Changed (price 6$) 2 Product 123: Snapshot Repository Delayed Product snapshot Consumer Catalog Write API
  • 18. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS - Disadvantages Product 123: Snapshot Repository Product Created (price 4$) 1 Product snapshot Consumer Catalog Write API Product Changed 2 Product Created (price 4$) 1 Product Changed (price 6$) 2 Delayed
  • 19. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS - Disadvantages Product Changed (descriptionA) 3 Product Changed (descriptionB) 4 Product 123: Snapshot Repository Product Created (price 4$) 1 Product Changed (descriptionB) 2 Catalog Write API Product Changed (descriptionA) 3 Product Created (price 4$) 1 Product Changed (price 6$) 2 Delayed Product snapshot Consumer
  • 20. Taming Distributed Systems @NSilnitsky Event sourcing and CQRS - Disadvantages Product Changed (descriptionA) 3 Product Changed (descriptionB) 4 Product 123: Snapshot Repository Product Created (price 4$) 1 Product 123 Snapshot Product Changed (descriptionB) 2 Product 4$ DescriptionA Product Changed (descriptionA) 3 Catalog Write API Product Created (price 4$) 1 Product Changed (price 6$) 2 Delayed Product snapshot Consumer
  • 21. Taming Distributed Systems @NSilnitsky Read your own writes Event sourcing and CQRS - Disadvantages
  • 22. Taming Distributed Systems Event sourcing and CQRS - Disadvantages Complexity Eventual consistency only Massive scale Corrupted snapshot Read your own writes
  • 23. Taming Distributed Systems @NSilnitsky CRUD APIs + Standardization
  • 24. Taming Distributed Systems @NSilnitsky CreateProduct Product Document Store Catalog API CRUD - platformized ReadProduct UpdateProduct DeleteProduct Unified!
  • 25. Taming Distributed Systems @NSilnitsky Wix’s Open Platform CRUD CRUD + Event sourcing Was Independent “startups” Now Single Open Platform
  • 26. Taming Distributed Systems @NSilnitsky Wix’s Open Platform CRUD CRUD + Event sourcing API First APIs - TDD + FE driven Was Independent “startups” Now Single Open Platform
  • 27. Taming Distributed Systems @NSilnitsky RPC Product Catalog API Cart Service Internal Wix Developer API First - platformized CRUD
  • 28. Taming Distributed Systems @NSilnitsky HTTP/SDK Product Catalog API Internal Wix Developer PoS App External App Developer API First - platformized CRUD
  • 29. Taming Distributed Systems @NSilnitsky JavaScript method Product Catalog API Internal Wix Developer Custom Filter External App Developer External Wix Site Developer (Velo) API First - platformized CRUD
  • 30. Taming Distributed Systems @NSilnitsky Wix Cart service 3rd party PoS app API Wix Stores on Wix’s Open Platform Site Code extensions “Velo” API
  • 31. Taming Distributed Systems @NSilnitsky Wix Cart service 3rd party Tax Calculator SPI SPI Wix Product catalog service Wix Stores on Wix’s Open Platform
  • 32. Taming Distributed Systems @NSilnitsky Wix Stores on Wix’s Open Platform Wix Cart service 3rd party Analytics app Cart Item Added Site Code extensions “Velo”
  • 33. Taming Distributed Systems @NSilnitsky Wix Stores on Wix’s Open Platform Wix Cart service 3rd party PoS app 3rd party Analytics app 3rd party Tax Calculator SPI API Cart Item Added SPI Wix Product catalog service Site Code extensions “Velo” API
  • 34. Taming Distributed Systems @NSilnitsky Stores Bookings Events Forms Loyalty Rewards Tickets Policies Checkout Time Slots Schemas Sub missions Guests Coupons Calendar Orders Waitlist Cart Catalog Programs API First - platformized CRUD Internal Wix Developer External App Developer External Wix Site Developer (Velo)
  • 35. Taming Distributed Systems @NSilnitsky CreateProduct ReadProduct UpdateProduct DeleteProduct Catalog API API First - platformized CRUD
  • 36. Taming Distributed Systems @NSilnitsky CreateProduct ReadProduct UpdateProduct DeleteProduct Catalog API API First - platformized CRUD
  • 37. Taming Distributed Systems @NSilnitsky CreateProduct ReadProduct UpdateProduct DeleteProduct Catalog API service ProductService { option (service_entity).message = "...v3.Product"; rpc CreateProduct (CreateProductRequest) returns (CreateProductResponse) ... rpc GetProduct (GetProductRequest) returns (GetProductResponse) ... rpc UpdateProduct (UpdateProduct) returns (UpdateProduct) ... 
 } message Product { option (entity) = {fqdn: "...v3.product"}; google.protobuf.StringValue id = 1; google.protobuf.Int64Value revision = 2; ... repeated Inventory inventory = 25; ... } API First - platformized CRUD
  • 38. Taming Distributed Systems @NSilnitsky API First - platformized CRUD CreateProduct ReadProduct UpdateProduct DeleteProduct Catalog API service ProductService { option (service_entity).message = "...v3.Product"; rpc CreateProduct (CreateProductRequest) returns (CreateProductResponse) ... rpc GetProduct (GetProductRequest) returns (GetProductResponse) ... rpc UpdateProduct (UpdateProduct) returns (UpdateProduct) ... 
 } message Product { option (entity) = {fqdn: "...v3.product"}; google.protobuf.StringValue id = 1; google.protobuf.Int64Value revision = 2; ... repeated Inventory inventory = 25; ... }
  • 39. Taming Distributed Systems @NSilnitsky API First - platformized CRUD
  • 40. Taming Distributed Systems @NSilnitsky API First - platformized CRUD
  • 41. Taming Distributed Systems @NSilnitsky Event Driven Architecture Automatic Domain Events FTW
  • 42. Taming Distributed Systems @NSilnitsky EDA - Domain Events CreateProduct UpdateProduct DeleteProduct Product Created Product Updated Product Deleted Product Document Store Catalog Service * DE describes

  • 43. Taming Distributed Systems @NSilnitsky EDA - Domain Events CreateProduct UpdateProduct DeleteProduct Product Created Product Updated Product Deleted Catalog Service Product SDL * SDL is document based. no direct SQL
  • 44. Taming Distributed Systems @NSilnitsky EDA - Domain Events CreateProduct UpdateProduct DeleteProduct Product Created Product Updated Product Deleted Product SDL Catalog Service Product SDL service ProductService { ... rpc CreateProduct (CreateProductRequest) returns (CreateProductResponse) { ... option (callback) = { event_type: CREATED }; } }
  • 45. Taming Distributed Systems @NSilnitsky EDA - Domain Events Data warehouse/Lake CreateProduct UpdateProduct DeleteProduct Product Created Product Updated Product Deleted Catalog Service Product SDL * debugging corruption
  • 46. Taming Distributed Systems @NSilnitsky EDA - Domain Events ebay-bridge Service CreateProduct UpdateProduct DeleteProduct Catalog Service Product Created Product Updated Product Deleted Product SDL
  • 47. Taming Distributed Systems @NSilnitsky EDA - Domain Events CreateProduct UpdateProduct DeleteProduct Product Created Product Updated Product Deleted Catalog Service Product SDL
  • 48. Taming Distributed Systems @NSilnitsky Data consistency in Wix’s EDA Resilient Producers and consumers
  • 49. Taming Distributed Systems @NSilnitsky Make DB Update & Event Producing Atomic Catalog Service Ebay Bridge Service * atomic, otherwise
  • 50. Taming Distributed Systems @NSilnitsky Produce event to S3 Resilient Producer Catch Unsent Events Catalog Service
  • 51. Taming Distributed Systems @NSilnitsky Produce to Kafka Poll Produce event to S3 Resilient Producer Fallback to S3 and Heal Catalog Service Healer Service
  • 52. Taming Distributed Systems @NSilnitsky Consumer retries + DLQ Make DB Update & Event Producing Atomic Catalog Service Ebay Bridge Service
  • 53. Taming Distributed Systems @NSilnitsky Alternative - use outbox pattern and/or CDC Transaction Outbox Table Insert Product Table Insert Update Delete Database Instant read-your-own-writes consistency in Catalog service Write to database CDC Read from Outbox Table Kafka Connect Debezium connector Publishes messages to brokers Kafka Broker Eventually consistent data exchange with Ebay Bridge Service Catalog Service Ebay Bridge Service
  • 54. Taming Distributed Systems @NSilnitsky Data Projection & query optimization Materializer
  • 55. Taming Distributed Systems @NSilnitsky Query latency - naive CreateProduct ReadProduct DeleteProduct Catalog Service Product SDL FilterProductWithInventory Inventory Service Inventory SDL RPC 1 2
  • 56. Taming Distributed Systems @NSilnitsky Multi-step Query latency - naive CreateProduct ReadProduct DeleteProduct Catalog Service Product SDL FilterProductWithInventory Inventory Service Inventory SDL RPC price < 100 and stock > 4 2 1
  • 57. Taming Distributed Systems @NSilnitsky Query latency - DB join CreateProduct ReadProduct DeleteProduct Catalog Service Product SDL FilterProductWithInventory Inventory Service Inventory SDL DB level Join
  • 58. Taming Distributed Systems @NSilnitsky Query Latency - Materializer Product + Inventory Materializer Inventory Inventory updated Event Catalog Service Inventory Service FilterProductWithInventory
  • 59. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer
  • 60. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer
  • 61. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer
  • 62. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer
  • 63. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer * no consistency
  • 64. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer
  • 65. Taming Distributed Systems @NSilnitsky Simplicity Onboarding new team member Write performance Read performance consistency Audit log/time machine Projections/queries Comparing Event sourcing to Wix’s CRUD based solution Event Sourcing CRUD SDL+Domain Events Materializer
  • 66. Taming Distributed Systems @NSilnitsky Summary Wix successfully shifted its vast distributed system entirely to CRUD-based microservices, moving away from a CRUD/event sourcing hybrid. . . .
  • 67. Taming Distributed Systems @NSilnitsky Summary Wix successfully shifted its vast distributed system entirely to CRUD-based microservices, moving away from a CRUD/event sourcing hybrid. This transformation was driven by a commitment to standardization, managed infrastructure with automated code generation, and a decoupled architecture.
  • 68. Taming Distributed Systems @NSilnitsky Summary Advanced tools were also implemented to boost development speed, ensure system resilience, and optimize for scale and performance. Domain Events Resilient Producer Materializer Simple Data Layer
  • 69. Taming Distributed Systems Q & A Thank you natansilnitsky www.natansil.com @NSilnitsky