SlideShare a Scribd company logo
Serverless and Streaming
Building ‘ebay’ by turning the database inside out
Neil Avery, @avery_neil
Office of the CTO
Kafka Summit, San Francisco, October 2018
- Building the auction platform
- Event driven architectures on Kafka
- FaaS for streaming
- Event driven patterns
- Putting it in action
Agenda
Building the Auction platform
Top down approach:
● What does it look like?
● What qualities?
● How many items?
● How many users?
The grand plan: what are we building?
users
the world
selling
buying
the sketch
A stack view
from stateful to stateless
{...}
{...}
{...}
FaaSKafka cluster
stream
processin
g
Kafka Streams
KSQL
Streaming on the cloud
Think: Streaming First
Stream processors versus FaaS processors
FaaS - GCP Fn, Azure Fn, AWS Lambda etc
- Stateless
- Out of band (generally)
- Edge (in band)
- Ad hoc
- Super-elastic
Kafka Streams & KSQL
- Stateful (and stateless)
- In band
- Dataflow - control plane
- Not ad hoc (except KSQL)
- Elastic
Think: data-model --> {...}Think: fire and forget -->
Event driven architectures with Kafka
Events
What is an event?
FACT!
SOMETHING
HAPPENED!
Events
A Sale An Invoice A Trade A Customer
Experience
Events
Why do you care?
Loose coupling, autonomy, evolvability, scalability, resilience, traceability, replayability
BEING EVENT-FIRST CHANGES HOW YOU
THINK ABOUT WHAT YOU ARE BUILDING
...more importantly...
Events versus commands
Events:
- Fact
- Something happen
- Immutable
- When
- Behavioural
Commands:
- Intent
- Contract
- Do something
- Coupling
- Structural
EVENTS ARE USED TO MODEL BEHAVIOUR IN A DOMAIN
EVENT-SOURCING CAPTURES THAT BEHAVIOUR
What is a company?
A business is a series of events and reacting to those events.
All your data is a stream of events
Events and data pipelines: Events as data
Databases Databases
Customer
Data Updates
Unified 360
Merged Customer
Profiles
“Who bought what”
Events
Events, Streams, Partitions, Tables
producers brokers
KSQL
Kafka streams
consumer
Stream
processor
KSQL
Kafka streams
consumer
Stream
processor
stream processors
CDC streams
Turning the database inside out
• Model the transaction log as a Stream
• Use stream processors to materialize tables
• Scale out using many stream processors
• Record every fact in a log (Kafka!)
• Replay every fact from the log
Kafka: the log
MKleppmann2015:
Turningthedatabaseinsideout
Stream
processor
Stream
processor
Stream
processor
Stream
processor
materialized viewmaterialized viewmaterialized viewMaterialized view
You need both tables and streams
Streaming correctness (stateful)
• Preservation of order
• Support late arriving data
• Tolerance for out-of-sequence data
• Support machine failure scenarios
• Exactly once --- plus the ability to reason about time
{sensor:100}
{sensor:150}
{sensor:190}
Stream
processor
Resilient state storage
{delta:50}
{delta:50}
{delta:40}
Streaming correctness (stateless)
Stream
processor
{sensor:100}
{sensor:150}
{sensor:190} good stream
bad stream
Data Modelling
Think events not commands,
-> streams of events,
-> series of streams to model the domain
{
user: 100
type: bid
item: 389
cat: bikes/mtb
region: dc-east
}
/bikes/mtb by item-id
/bikes/ by dc-east‘-’item-id
- Keyspace
- Throughput (events per sec)
- Throughput (historic)
- Parallelism
- Replicas
- Retention
- Data evolution
{
user: 100
type: bid
item: 389
cat: bikes/mtb
region: dc-east
}
Data Modelling
/bikes/mtb by item-id
- Keyspace
- Throughput (events per sec)
- Throughput (historic)
- Parallelism
- Replicas
- Retention
- Data evolution
key#
partition
topic
key space
Stateless versus Stateful
Stateful
- Window
- Aggregate
- Join (stream-table, stream-stream etc)
Think: remembers events!
Stateless
- Filtering
- Transform
- Projection
Think: fire and forget!
/stream-1
/stream-2
join
Stream
processor
Winning with Event-driven streaming
architectures
the secret sauce
the log, event sourcing, source of truth,
CQRS, event collaboration, replayability,
at-least once, exactly one, evolutionary
architectures, data-virtualization and
more
FaaS and Stream processing
{...}
What is FaaS?
- Bring your own compute
- Elastic
- Pay per use
- Stateless
- Cloud native
Any language, concurrent, sync or async
{...}
FaaS apps
1. Single site, events in any order
Async chained, stateless
2. Non-stream oriented, non-time critical
3. Edge processing (in or out)
4. Enrich incoming events (stateless etl)
or
outgoing - email users, enrich image, ad-hoc requests
{...}
FaaS BUTs
- Sync chaining FaaS anti-pattern
- Complicated on different platforms
- Really really granular - like micronano-services
- Not cross cloud interoperable yet (without kafka or svrless fwk)
- Testability sucks
- Automation sucks
{...}
… but …
FaaS requires the event-driven paradigm in order to be successful.
You cannot throw away 25 years of event-driven legacy and go back to rpc (fail!)
It’s not ‘event driven’ unless you use event-sourcing
FaaS and event-sourcing
{...}
{...}
{...}
Recap:
- Events are behavioural
- FaaS is meant to be event driven
- Event-sourcing ensure replayability + others
the log the log
Event sourcing:
How did we get here?
FaaS and streaming correctness
{...}
FaaS qualities
- Elastic
- Concurrency
- Sync or async
- Short-lived
{...}
{...}
{...}
bad stream
FaaS for Stream processing
{...}
Needs...
- throughput
- ordering
- concurrency
- async
Works when...
- throughput per stream >
invocation
- stateless
- not-historic processing
- async
Otherwise...
- pin streams to individual faas
processors
- only 1 faas per stream
- problematic for scale :(
FaaS and the auction platform
- Enrich users on signup (address validation)
- Geo-enrich items on placement (city, state, lat-lon cell identification)
- Notify users on item sold or reserve not met
- Perform analytics on auction when item ‘completes’`
- Notify user of items-of-interest from their history when browsing
- Ad-placement analytics (watched items, interested items, users-purchased
- Monte-carlo auction simulations to guide users and calc item trending scores
In-band but edge
Out of band, edge
ad-hoc
Patterns
Patterns for Infrastructure
1. Ops: (Observability, instrumentation, metrification) = monitoring patterns, dead-letter-
queues, error-queues, audit-logs, application-logging, data-lineage
2. Stream-based: Worker Queue (compute grid or faas), event-backbone
3. Data-based: K/V store, Queryable Data fabric,
Data virtualization (via connect cdc streams) → Data Fabric
4. FaaS: fire and forget, event sourcing, unit-of-work
Patterns for Infrastructure: Ops
Observability, instrumentation, metrification
dead-letter-queues, error-queues, audit-
logs, application-logging, data-lineage
Stream
processor
/dead-letter/bid/region/processor
/auction/region/bid
/audit/sec-ops/category/region/
/ops/logs/category/region/
Ops Queues
/ops/metrics/elastic_search
Events as a backbone
“digital nervous system”
Dept 2 Dept 3 Dept 4Dept 1
Patterns for Infrastructure: Stream based
Digital nervous system
W
A
R
N
IN
G
:
A
void
ESB
anti-patterns
Patterns for Infrastructure: Stream based
Worker Queue (compute grid or faas)
Stream
processor
/priority-10
/priority-9
/priority-8
worker
worker
worker
worker
workerworker
Worker Queue
millions of atomic events with differing priority
Patterns for Infrastructure: Stream based
Payment processing flow
DEBIT
Check
[from]
/payment-inflight
[from, to]
Payment processing
millions of concurrent payments
CREDIT
[to]
Balance
conf’d
Balance
in-flight
/payment-complete
[from, to]
/payment
*from, to
/payment
from, *to
/payment
from, to
Patterns for Infrastructure: Data
K/V store, Queryable Data fabric,
Stream
processor
/stream-1
/stream-2
/stream-3
worker
worker
worker
worker
worker
Clients
Queryable data fabric
Stream
processor
query
interactive
query
Patterns for Infrastructure: Data
Data virtualization (via connect cdc streams) → Data Fabric
Stream
processor
/items
/users
/bids
worker
worker
worker
worker
worker
Client
requests
Kafka connect: data virtualization
Stream
processor
items
users
bids
Patterns for Infrastructure: FaaS
FaaS
Connector/items
Fire-and-forget
{...}
{...}
{...}
cloud-events
FaaS
W
ORK
IN
PROGRESS
AWS Lambda connector
Patterns for Infrastructure: FaaS
FaaS
Connecto
r
/items
Event-sourcing
{...}
{...}
{...}
cloud-events
FaaS
W
O
RK
IN
PRO
G
RESS
/item-analytics
Patterns for Infrastructure: FaaS
/items
Unit-of-work
{...}
{...}
{...}
‘N’
cloud-events
FaaS
(calculation)
W
O
RK
IN
PRO
G
RESS
/item-analytics
{...}
FaaS
(aggregate)
Building the Auction Platform
Auction functionality
Search, Bid, Item-Complete, marketplace analytics
● Search: Find an item to bid against
● Bid: Compete against others to win the auction
● Item-Complete: When the time has expired tell users to pay or relist
● Marketplace analytics: Educate users in order to drive competition
Auction functionality: BID
search, bid, item-complete, analytics
Auction items
Items being placed
1.
Bids against item ‘duran duran’
stream-table
join against ‘duran duran’
2.
Item ‘duran duran’
3.
CDC Stream
{ item:duran res: 100 bid: 125 buyer: michael }
{ item:duran res: 100 bid: 120 buyer: nick }
{ item:duran res: 100 bid: 121 buyer: damian }
{ item:duran res: 100 bid: 115 buyer: andy }
{ item:duran res: 100 bid: 110 buyer: andy }
KSQL: SELECT * from bidding-stream
where item=’duran duran’
4.
Auction functionality: SEARCH
search, bid, item-complete, analytics
4. KStream: Interactive query
3. Table: Materialized view
topic: /auction/records/80sSearch: Identify set of topics
1.
2. Locate KTables
Auction functionality: ITEM COMPLETE
search, bid, Item-complete, analytics
1. Stream processor runs ‘future’ on
local-state of item-bids
auction items
3. Rejoined
removes item
4. Stream triggers FaaS
complete item processor
bid-history
Stream table join
5. FaaS Notify all bidders
bid-notifications
2.
Rewrite ‘completed’ item status
to retrigger join
item status
Auction functionality: ANALYTICS
search, bid, item-complete, analytics
How is the item trending? Banding on condition (new, as-new, used, worn, for-parts)
Indicative pricing bands: reserve versus final bid?
What’s the usual bid-offer spread?
Frequency of sale?
Percentiles?
{...}
{...}
{...}
{ bid:100; sold:1000;}
/bid-history
{
stream-lib.TDigest(values[])
}
{
stream-lib.TDigest(values[])
}
{
stream-lib.TDigest(values[])
}
Calculating percentiles using the ‘unit-of-work’ pattern
{
digest.merge(digest)
}
{...}
Auction functionality: SYSTEM
Stream processing and FaaS
/auction/items
{...}
{ item:100;}
Stream
processor
1. Items
{ bid:100;}
{...} Stream
processor
/auction/bids
2. Bidding
{...}
notify bidders & seller
{ item:100; offer:101}
/auction/bids/history
{...}
analytics for bidders
3. Processing
Wrapping it up!
Key takeaways:
● Model: Events as the API, model the use-cases, model for scale, evolve the
data-model : DDD
● As Streams: Streams are the database, tables materialized views, architect
for evolutionary apps by using events
● App Infra: Build patterns to underpin higher order models (metrics, k/v etc)
● FaaS: Ad-hoc and edge stream processing or pinned processors
Key takeaways:
Event-first forces you to think about behaviour of the system
Event sourcing captures that behaviour
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
The future
● Stream processing: more powerful = KStreams, KSQL, UDFs
● Streaming platform: more accessible, complete experience, cross-cloud
● Closer affinity: between Streaming and FaaS for streaming processing
● FaaS: CNCF working group: CloudEvents, Event function flow
Thank You!
Check out my ‘serverless and stream processing’ blog via twitter: @avery_neil
As of September 2018, Ebay announced that they were replatforming onto Kafka
Kafka Summit - New York (April 2) - CFP soon
Kafka Summit - London (May 13-14)
Rate this talk on the Summit-App ;)
60Confidential
Kafka Summit SF 2018 - Download the Mobile App
● Search App store for “Kafka Summit”
● https://guidebook.com/g/kafkasummitsf
● See speakers and schedules
● Personalize your agenda
● Rate speakers and sessions!
● Network with fellow attendees
● Share comments and photos on social wall
● Turn on notifications to receive up to date info
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’

More Related Content

PDF
So You Want to Write a Connector?
PDF
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
PDF
Streaming ETL - from RDBMS to Dashboard with KSQL
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
PDF
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
PDF
ksqlDB - Stream Processing simplified!
PDF
Introduction to Kafka Streams
So You Want to Write a Connector?
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Streaming ETL - from RDBMS to Dashboard with KSQL
Performance Tuning RocksDB for Kafka Streams’ State Stores
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
ksqlDB - Stream Processing simplified!
Introduction to Kafka Streams

What's hot (20)

PPTX
Apache Kafka Streams
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
PDF
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
PDF
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
PDF
Building a Streaming Platform with Kafka
PDF
ksqlDB: A Stream-Relational Database System
PDF
Kafka Streams: the easiest way to start with stream processing
PDF
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
PDF
Stream Processing made simple with Kafka
PDF
Introduction to apache kafka, confluent and why they matter
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Introduction to Spark Streaming
PDF
A Tour of Apache Kafka
PDF
The State of Stream Processing
PDF
Apache Kafka, and the Rise of Stream Processing
PDF
Real-world Streaming Architectures
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka Streams
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Building a Streaming Platform with Kafka
ksqlDB: A Stream-Relational Database System
Kafka Streams: the easiest way to start with stream processing
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Stream Processing made simple with Kafka
Introduction to apache kafka, confluent and why they matter
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Introduction to Spark Streaming
A Tour of Apache Kafka
The State of Stream Processing
Apache Kafka, and the Rise of Stream Processing
Real-world Streaming Architectures
Deploying Kafka Streams Applications with Docker and Kubernetes
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Ad

Similar to Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’ (20)

PDF
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
PDF
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
PDF
Serverless London 2019 FaaS composition using Kafka and CloudEvents
PDF
Cloud Native London 2019 Faas composition using Kafka and cloud-events
PDF
The Future of Streaming: Global Apps, Event Stores and Serverless
PDF
The art of the event streaming application: streams, stream processors and sc...
PDF
Kafka summit SF 2019 - the art of the event-streaming app
PDF
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
PDF
Unlocking value with event-driven architecture by Confluent
PPTX
Kakfa summit london 2019 - the art of the event-streaming app
PDF
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
PDF
20220311-EB-Designing_Event_Driven_Systems.pdf
PDF
Events Everywhere: Enabling Digital Transformation in the Public Sector
PDF
Event Driven Services Part 3: Putting the Micro into Microservices with State...
PDF
Putting the Micro into Microservices with Stateful Stream Processing
PDF
Event streaming: A paradigm shift in enterprise software architecture
PDF
EDA Meets Data Engineering – What's the Big Deal?
PDF
Citi Tech Talk: Event Driven Kafka Microservices
PPTX
Using Event Streams in Serverless Applications
PDF
Devoxx London 2017 - Rethinking Services With Stateful Streams
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Cloud Native London 2019 Faas composition using Kafka and cloud-events
The Future of Streaming: Global Apps, Event Stores and Serverless
The art of the event streaming application: streams, stream processors and sc...
Kafka summit SF 2019 - the art of the event-streaming app
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
Unlocking value with event-driven architecture by Confluent
Kakfa summit london 2019 - the art of the event-streaming app
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
20220311-EB-Designing_Event_Driven_Systems.pdf
Events Everywhere: Enabling Digital Transformation in the Public Sector
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Putting the Micro into Microservices with Stateful Stream Processing
Event streaming: A paradigm shift in enterprise software architecture
EDA Meets Data Engineering – What's the Big Deal?
Citi Tech Talk: Event Driven Kafka Microservices
Using Event Streams in Serverless Applications
Devoxx London 2017 - Rethinking Services With Stateful Streams
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
PDF
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Understanding_Digital_Forensics_Presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’

  • 1. Serverless and Streaming Building ‘ebay’ by turning the database inside out Neil Avery, @avery_neil Office of the CTO Kafka Summit, San Francisco, October 2018
  • 2. - Building the auction platform - Event driven architectures on Kafka - FaaS for streaming - Event driven patterns - Putting it in action Agenda
  • 4. Top down approach: ● What does it look like? ● What qualities? ● How many items? ● How many users? The grand plan: what are we building? users the world selling buying the sketch
  • 5. A stack view from stateful to stateless {...} {...} {...} FaaSKafka cluster stream processin g Kafka Streams KSQL
  • 6. Streaming on the cloud Think: Streaming First
  • 7. Stream processors versus FaaS processors FaaS - GCP Fn, Azure Fn, AWS Lambda etc - Stateless - Out of band (generally) - Edge (in band) - Ad hoc - Super-elastic Kafka Streams & KSQL - Stateful (and stateless) - In band - Dataflow - control plane - Not ad hoc (except KSQL) - Elastic Think: data-model --> {...}Think: fire and forget -->
  • 9. Events What is an event? FACT! SOMETHING HAPPENED!
  • 10. Events A Sale An Invoice A Trade A Customer Experience
  • 11. Events Why do you care? Loose coupling, autonomy, evolvability, scalability, resilience, traceability, replayability BEING EVENT-FIRST CHANGES HOW YOU THINK ABOUT WHAT YOU ARE BUILDING ...more importantly...
  • 12. Events versus commands Events: - Fact - Something happen - Immutable - When - Behavioural Commands: - Intent - Contract - Do something - Coupling - Structural EVENTS ARE USED TO MODEL BEHAVIOUR IN A DOMAIN EVENT-SOURCING CAPTURES THAT BEHAVIOUR
  • 13. What is a company? A business is a series of events and reacting to those events.
  • 14. All your data is a stream of events
  • 15. Events and data pipelines: Events as data Databases Databases Customer Data Updates Unified 360 Merged Customer Profiles “Who bought what” Events
  • 16. Events, Streams, Partitions, Tables producers brokers KSQL Kafka streams consumer Stream processor KSQL Kafka streams consumer Stream processor stream processors CDC streams
  • 17. Turning the database inside out • Model the transaction log as a Stream • Use stream processors to materialize tables • Scale out using many stream processors • Record every fact in a log (Kafka!) • Replay every fact from the log Kafka: the log MKleppmann2015: Turningthedatabaseinsideout Stream processor Stream processor Stream processor Stream processor materialized viewmaterialized viewmaterialized viewMaterialized view
  • 18. You need both tables and streams
  • 19. Streaming correctness (stateful) • Preservation of order • Support late arriving data • Tolerance for out-of-sequence data • Support machine failure scenarios • Exactly once --- plus the ability to reason about time {sensor:100} {sensor:150} {sensor:190} Stream processor Resilient state storage {delta:50} {delta:50} {delta:40}
  • 21. Data Modelling Think events not commands, -> streams of events, -> series of streams to model the domain { user: 100 type: bid item: 389 cat: bikes/mtb region: dc-east } /bikes/mtb by item-id /bikes/ by dc-east‘-’item-id - Keyspace - Throughput (events per sec) - Throughput (historic) - Parallelism - Replicas - Retention - Data evolution
  • 22. { user: 100 type: bid item: 389 cat: bikes/mtb region: dc-east } Data Modelling /bikes/mtb by item-id - Keyspace - Throughput (events per sec) - Throughput (historic) - Parallelism - Replicas - Retention - Data evolution key# partition topic key space
  • 23. Stateless versus Stateful Stateful - Window - Aggregate - Join (stream-table, stream-stream etc) Think: remembers events! Stateless - Filtering - Transform - Projection Think: fire and forget! /stream-1 /stream-2 join Stream processor
  • 24. Winning with Event-driven streaming architectures the secret sauce the log, event sourcing, source of truth, CQRS, event collaboration, replayability, at-least once, exactly one, evolutionary architectures, data-virtualization and more
  • 25. FaaS and Stream processing {...}
  • 26. What is FaaS? - Bring your own compute - Elastic - Pay per use - Stateless - Cloud native Any language, concurrent, sync or async {...}
  • 27. FaaS apps 1. Single site, events in any order Async chained, stateless 2. Non-stream oriented, non-time critical 3. Edge processing (in or out) 4. Enrich incoming events (stateless etl) or outgoing - email users, enrich image, ad-hoc requests {...}
  • 28. FaaS BUTs - Sync chaining FaaS anti-pattern - Complicated on different platforms - Really really granular - like micronano-services - Not cross cloud interoperable yet (without kafka or svrless fwk) - Testability sucks - Automation sucks {...} … but … FaaS requires the event-driven paradigm in order to be successful. You cannot throw away 25 years of event-driven legacy and go back to rpc (fail!) It’s not ‘event driven’ unless you use event-sourcing
  • 29. FaaS and event-sourcing {...} {...} {...} Recap: - Events are behavioural - FaaS is meant to be event driven - Event-sourcing ensure replayability + others the log the log Event sourcing: How did we get here?
  • 30. FaaS and streaming correctness {...} FaaS qualities - Elastic - Concurrency - Sync or async - Short-lived {...} {...} {...} bad stream
  • 31. FaaS for Stream processing {...} Needs... - throughput - ordering - concurrency - async Works when... - throughput per stream > invocation - stateless - not-historic processing - async Otherwise... - pin streams to individual faas processors - only 1 faas per stream - problematic for scale :(
  • 32. FaaS and the auction platform - Enrich users on signup (address validation) - Geo-enrich items on placement (city, state, lat-lon cell identification) - Notify users on item sold or reserve not met - Perform analytics on auction when item ‘completes’` - Notify user of items-of-interest from their history when browsing - Ad-placement analytics (watched items, interested items, users-purchased - Monte-carlo auction simulations to guide users and calc item trending scores In-band but edge Out of band, edge ad-hoc
  • 34. Patterns for Infrastructure 1. Ops: (Observability, instrumentation, metrification) = monitoring patterns, dead-letter- queues, error-queues, audit-logs, application-logging, data-lineage 2. Stream-based: Worker Queue (compute grid or faas), event-backbone 3. Data-based: K/V store, Queryable Data fabric, Data virtualization (via connect cdc streams) → Data Fabric 4. FaaS: fire and forget, event sourcing, unit-of-work
  • 35. Patterns for Infrastructure: Ops Observability, instrumentation, metrification dead-letter-queues, error-queues, audit- logs, application-logging, data-lineage Stream processor /dead-letter/bid/region/processor /auction/region/bid /audit/sec-ops/category/region/ /ops/logs/category/region/ Ops Queues /ops/metrics/elastic_search
  • 36. Events as a backbone “digital nervous system” Dept 2 Dept 3 Dept 4Dept 1 Patterns for Infrastructure: Stream based Digital nervous system W A R N IN G : A void ESB anti-patterns
  • 37. Patterns for Infrastructure: Stream based Worker Queue (compute grid or faas) Stream processor /priority-10 /priority-9 /priority-8 worker worker worker worker workerworker Worker Queue millions of atomic events with differing priority
  • 38. Patterns for Infrastructure: Stream based Payment processing flow DEBIT Check [from] /payment-inflight [from, to] Payment processing millions of concurrent payments CREDIT [to] Balance conf’d Balance in-flight /payment-complete [from, to] /payment *from, to /payment from, *to /payment from, to
  • 39. Patterns for Infrastructure: Data K/V store, Queryable Data fabric, Stream processor /stream-1 /stream-2 /stream-3 worker worker worker worker worker Clients Queryable data fabric Stream processor query interactive query
  • 40. Patterns for Infrastructure: Data Data virtualization (via connect cdc streams) → Data Fabric Stream processor /items /users /bids worker worker worker worker worker Client requests Kafka connect: data virtualization Stream processor items users bids
  • 41. Patterns for Infrastructure: FaaS FaaS Connector/items Fire-and-forget {...} {...} {...} cloud-events FaaS W ORK IN PROGRESS AWS Lambda connector
  • 42. Patterns for Infrastructure: FaaS FaaS Connecto r /items Event-sourcing {...} {...} {...} cloud-events FaaS W O RK IN PRO G RESS /item-analytics
  • 43. Patterns for Infrastructure: FaaS /items Unit-of-work {...} {...} {...} ‘N’ cloud-events FaaS (calculation) W O RK IN PRO G RESS /item-analytics {...} FaaS (aggregate)
  • 45. Auction functionality Search, Bid, Item-Complete, marketplace analytics ● Search: Find an item to bid against ● Bid: Compete against others to win the auction ● Item-Complete: When the time has expired tell users to pay or relist ● Marketplace analytics: Educate users in order to drive competition
  • 46. Auction functionality: BID search, bid, item-complete, analytics Auction items Items being placed 1. Bids against item ‘duran duran’ stream-table join against ‘duran duran’ 2. Item ‘duran duran’ 3. CDC Stream { item:duran res: 100 bid: 125 buyer: michael } { item:duran res: 100 bid: 120 buyer: nick } { item:duran res: 100 bid: 121 buyer: damian } { item:duran res: 100 bid: 115 buyer: andy } { item:duran res: 100 bid: 110 buyer: andy } KSQL: SELECT * from bidding-stream where item=’duran duran’ 4.
  • 47. Auction functionality: SEARCH search, bid, item-complete, analytics 4. KStream: Interactive query 3. Table: Materialized view topic: /auction/records/80sSearch: Identify set of topics 1. 2. Locate KTables
  • 48. Auction functionality: ITEM COMPLETE search, bid, Item-complete, analytics 1. Stream processor runs ‘future’ on local-state of item-bids auction items 3. Rejoined removes item 4. Stream triggers FaaS complete item processor bid-history Stream table join 5. FaaS Notify all bidders bid-notifications 2. Rewrite ‘completed’ item status to retrigger join item status
  • 49. Auction functionality: ANALYTICS search, bid, item-complete, analytics How is the item trending? Banding on condition (new, as-new, used, worn, for-parts) Indicative pricing bands: reserve versus final bid? What’s the usual bid-offer spread? Frequency of sale? Percentiles? {...} {...} {...} { bid:100; sold:1000;} /bid-history { stream-lib.TDigest(values[]) } { stream-lib.TDigest(values[]) } { stream-lib.TDigest(values[]) } Calculating percentiles using the ‘unit-of-work’ pattern { digest.merge(digest) } {...}
  • 50. Auction functionality: SYSTEM Stream processing and FaaS /auction/items {...} { item:100;} Stream processor 1. Items { bid:100;} {...} Stream processor /auction/bids 2. Bidding {...} notify bidders & seller { item:100; offer:101} /auction/bids/history {...} analytics for bidders 3. Processing
  • 52. Key takeaways: ● Model: Events as the API, model the use-cases, model for scale, evolve the data-model : DDD ● As Streams: Streams are the database, tables materialized views, architect for evolutionary apps by using events ● App Infra: Build patterns to underpin higher order models (metrics, k/v etc) ● FaaS: Ad-hoc and edge stream processing or pinned processors
  • 53. Key takeaways: Event-first forces you to think about behaviour of the system Event sourcing captures that behaviour
  • 55. The future ● Stream processing: more powerful = KStreams, KSQL, UDFs ● Streaming platform: more accessible, complete experience, cross-cloud ● Closer affinity: between Streaming and FaaS for streaming processing ● FaaS: CNCF working group: CloudEvents, Event function flow
  • 56. Thank You! Check out my ‘serverless and stream processing’ blog via twitter: @avery_neil As of September 2018, Ebay announced that they were replatforming onto Kafka Kafka Summit - New York (April 2) - CFP soon Kafka Summit - London (May 13-14) Rate this talk on the Summit-App ;)
  • 57. 60Confidential Kafka Summit SF 2018 - Download the Mobile App ● Search App store for “Kafka Summit” ● https://guidebook.com/g/kafkasummitsf ● See speakers and schedules ● Personalize your agenda ● Rate speakers and sessions! ● Network with fellow attendees ● Share comments and photos on social wall ● Turn on notifications to receive up to date info