SlideShare a Scribd company logo
Building Real-time
Data-driven Products
Øyvind Løkling & Lars Albertsson
Version 1.3 - 2016.10.12
Øyvind Løkling
Staff Software Engineer
Schibsted Product & Technology
Oslo - Stockholm - London - Barcelona - Krakow
http://jobs.schibsted.com/
Lars Albertsson
Independent Consultant
Mapflat.com
.. and more
Presentation goals
● Spark your interest in building data driven products.
● Give an overview of components and how these relate.
● Suggest technologies and approaches that can be used in practice.
● Event Collection
● The Unified Log
● Stream Processing
● Serving Results
● Schema
5
Data driven products
• Services and applications
primarily driven by capturing
and making sense of data
• Health trackers
• Recommendations
• Analytics
CC BY © https://500px.com/tommrazek
Data driven products
• Hosted services need to
• Handle large volumes of data
• Cleaning and structuring data
• Serve individual users fast
CC BY © https://500px.com/tommrazek
Big Data, Fast Data, Smart Data
• Accelerating data volumes and
speeds
• Internet of Things
• AB Testing and Experiments
• Personalised products
CC BY © https://500px.com/erfgorini
Big Data, Fast Data, Smart Data
A need to make sense of data and
act on fast
• Faster development cycle
• Data driven organisations
• Data driven products
CC BY © https://500px.com/erfgorini
Time scales - what parts become obsolete?
10
Credits: Confluent
Credits: Netflix
The Log
• Other common names
• Commit log
• Journal
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Jay Kreps - I <3 Logs
The state machine replication principle:
If two identical, deterministic processes begin in
the same state and get the same inputs in the
same order, they will produce the same output
and end in the same state.
The Unified Log
Simple idea; All of the organizations data,
available in one unified log that is;
• Unified: the one source of truth
• Append only: Data items are immutable
• Ordered: addressable by offset unique per partition
• Fast and scalable: Able to handle 1000´s msg/sec
• Distributed, robust and fault tolerant.
Kafka
• “Kafka is used for
building real-time data
pipelines and streaming
apps. It is horizontally
scalable, fault-tolerant,
wicked fast, and runs in
production in
thousands of
companies.”
But… :-) http://www.confluent.io/kafka-summit-2016-101-ways-to-configure-kafka-badly
Kafka
• “Kafka is used for
building real-time data
pipelines and streaming
apps. It is horizontally
scalable, fault-tolerant,
wicked fast, and runs in
production in
thousands of
companies.”
But… :-) http://www.confluent.io/kafka-summit-2016-101-ways-to-configure-kafka-badly
© 2016 Apache Software Foundation
© 2016 Apache Software Foundation
A streaming data product’s anatomy
18
Pub / sub
Unified log
Ingress Stream processing Egress
DB
Service
TopicJob
Pipeline
Service
Export
Visualisation
DB
Architectural patterns 
The Unified Log and Lambda, Kappa architectures
Lambda Architecture
Example: Recommendation Engine @ Schibsted
Building real time data-driven products
Building real time data-driven products
Building real time data-driven products
Building real time data-driven products
Building real time data-driven products
Kappa Architecture
A software architecture pattern… where the canonical data
store is an append-only immutable log. From the log, data is
streamed through a computational system and fed into
auxiliary stores for serving.
A Lambda architecture system with batch processing
removed.
https://www.oreilly.com/ideas/questioning-the-lambda-architecture
Building real time data-driven products
Building real time data-driven products
Kappa architecture
Lambda vs Kappa
● Lambda
○ Leverages existing batch infrastructure
○ Cognitive overhead maintaining two approaches in parallel
● Kappa
○ Is real-time processing is inherently approximate, less powerful, and more
lossy than batch processing. True?
○ Simpler model
https://www.oreilly.com/ideas/questioning-the-lambda-architecture
Cold Storage
• Source of truth for replay in case of failure
• Available for ad-hoc batch querying (Spark)
• Wishlist; Fast writes, Reliable, Cheap
• Cloud storage - s3 (with Glacier)
• Traditional - HDFS, SAN
• Consider what read performance do you need for
a) error recovery
b) bootstrapping new deployments
Capturing the data
Event collection
33
Cluster storage
HDFS
(NFS, S3, Google CS, C*)
Service
Reliable, simple, write available
Kafka
Event Bus with history
(Secor,
Camus)
Immediate handoff to append-only replicated log.
Once in the log, events eventually arrive in storage.
Unified log
Immutable events, append-only,
source of truth
Event collection - guarantees
34
Unified log
Service
(unimportant)
Events are safe
from here
Replicated
Topics
Non-critical data: Asynchronous fire-and-forget handoff
Critical data: Synchronous, replicated, with acknowledgement
Service
(important)
Event collection
• Source Types
• Firehose api´s
• Mobile apps and websites
• Internet of things / embedded sensors
• Event sourcing from existing systems
Event collection
• Considerations
• Can you control the data flow, ask sender to wait?
• Can clients be expected to have their logic updated?
• Can you afford to loose some data, make tradeoffs and
still solve your task?
Stream Processing
Pipeline graph
• Parallelised jobs read and write to Kafka topics
• Jobs are completely decoupled
• Downstream jobs do not impact upstream
• Usually an acyclic graph
CC BY © https://500px.com/thakurdalipsingh
Illustration from Apache Samza Doc - Concepts
Stream processing components
● Building blocks
○ Aggregate
■ Calculate time windows
■ Aggregate state (database/in memory)
○ Filter
■ Slim down stream
■ Privacy, Security concerns
○ Join
■ Enrich by joining with datasets (geoip)
○ Transform
■ Bring data into same “shape”, schema
Stream Processing Platforms
• Spark Streaming
• Ideal if you are already using Spark, same model
• Bridges gap between data science / data
engineers, batch and stream
• Kafka Stream
• Library - New, positions itself as a lightweight
alternative
• Tightly coupled to on Kafka
• Others
○ Storm, Flink, Samza, Google Dataflow, AWS
Lambda
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
Schemas
Schemas
• You always have a schema
• Schema on write
• Requires upfront schema design before data can be
received
• Synchronised deployment of whole pipeline
• Schema on read
• Allows data to be captured -as is-.
• Suitable for “Data lake”
• Often requires cleaning and transform bring datasets into
consistency downstream.
Schema on read or write?
43
DB
DB
DB
Service
Service
Export
Business
intelligenceChange agility important here
Production stability important here
Schemas
• Options In streaming applications
• Schema bundled with every record
• Schema registry + id in record
• Schema formats
• JSON Schema
• AVRO
Evolving Schemas
• Declare schema version even if no
guarantee. Captures intention of
source.
• Be prepared for bad and
non-validating data.
• Decide on strategy for bringing schema
versions in alignment.
• Maintain upgrade path through
transforms.
• What are the needs of the consumer.
• Data exploration vs Stable services.
Results
Serving Results
● As streams
○ Internal Consumer
○ External Consumer bridges
■ ex. REST post to external ingest endpoint
● As Views
○ Serving index, NoSQL
○ SQL / cubes for BI
Reactive Streams
• [...] an initiative to provide a standard for asynchronous stream processing with
non-blocking back pressure. [...] aimed at runtime environments (JVM and
JavaScript) as well as network protocols.
• The scope [...] is to find a minimal set of interfaces, methods and protocols that will
describe the necessary operations and entities to achieve this goal.
• “Glue” between libraries.
Reactive Kafka ->
Akka Stream ->
RxJava
https://imgflip.com/memetemplate/17759370/dog-meditation-funny
Thank you!
Schemas
• You always have a schema
• Even if you are “Schemaless”
• Build tooling and workflows for handling schema changes

More Related Content

PPTX
Functional architectural patterns
PDF
Test strategies for data processing pipelines
PDF
Data Infrastructure for a World of Music
PDF
A primer on building real time data-driven products
PPTX
Quark Virtualization Engine for Analytics
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PDF
Lambda architecture @ Indix
Functional architectural patterns
Test strategies for data processing pipelines
Data Infrastructure for a World of Music
A primer on building real time data-driven products
Quark Virtualization Engine for Analytics
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem
Lambda architecture @ Indix

What's hot (20)

PPTX
Data Pipeline at Tapad
PDF
Data pipelines from zero to solid
PDF
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
PDF
Building Data Pipelines in Python
PDF
Unified, Efficient, and Portable Data Processing with Apache Beam
PPT
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
PPTX
Google cloud Dataflow & Apache Flink
PDF
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
PPTX
Apache Beam (incubating)
PDF
Apache Flink 101 - the rise of stream processing and beyond
PDF
Continuous delivery for machine learning
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PDF
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
PDF
Introduction to Apache Apex by Thomas Weise
PDF
Hadoop summit 2010, HONU
PDF
Modern ETL Pipelines with Change Data Capture
PDF
Spark Summit EU talk by Zoltan Zvara
PDF
Timeline Service v.2 (Hadoop Summit 2016)
PDF
The Revolution Will be Streamed
PDF
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Pipeline at Tapad
Data pipelines from zero to solid
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building Data Pipelines in Python
Unified, Efficient, and Portable Data Processing with Apache Beam
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Google cloud Dataflow & Apache Flink
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Apache Beam (incubating)
Apache Flink 101 - the rise of stream processing and beyond
Continuous delivery for machine learning
Stream Processing use cases and applications with Apache Apex by Thomas Weise
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
Introduction to Apache Apex by Thomas Weise
Hadoop summit 2010, HONU
Modern ETL Pipelines with Change Data Capture
Spark Summit EU talk by Zoltan Zvara
Timeline Service v.2 (Hadoop Summit 2016)
The Revolution Will be Streamed
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Ad

Similar to Building real time data-driven products (20)

PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PPTX
Leveraging the power of the unbundled database
PDF
Cloud Lambda Architecture Patterns
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
PPTX
Software architecture for data applications
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
PDF
Data Streaming Technology Overview
PDF
Building Big Data Streaming Architectures
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
PDF
Towards Data Operations
PDF
Introduction to Streaming Analytics
PDF
Building a Data Pipeline from Scratch - Joe Crobak
PDF
Building end to end streaming application on Spark
PPT
Moving Towards a Streaming Architecture
PPTX
Data Architectures for Robust Decision Making
PPTX
Data streaming fundamentals
PDF
Big Data Streams Architectures. Why? What? How?
PPTX
Event Driven Architecture
Trivento summercamp masterclass 9/9/2016
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Leveraging the power of the unbundled database
Cloud Lambda Architecture Patterns
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
Software architecture for data applications
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Data Streaming Technology Overview
Building Big Data Streaming Architectures
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Towards Data Operations
Introduction to Streaming Analytics
Building a Data Pipeline from Scratch - Joe Crobak
Building end to end streaming application on Spark
Moving Towards a Streaming Architecture
Data Architectures for Robust Decision Making
Data streaming fundamentals
Big Data Streams Architectures. Why? What? How?
Event Driven Architecture
Ad

More from Lars Albertsson (20)

PDF
All the DataOps, all the paradigms .
PDF
Generative AI - the power to destroy democracy meets the security and reliabi...
PDF
The road to pragmatic application of AI.pdf
PDF
End-to-end pipeline agility - Berlin Buzzwords 2024
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
PDF
Industrialised data - the key to AI success.pdf
PDF
Crossing the data divide
PDF
Schema management with Scalameta
PDF
How to not kill people - Berlin Buzzwords 2023.pdf
PDF
Data engineering in 10 years.pdf
PDF
The 7 habits of data effective companies.pdf
PDF
Holistic data application quality
PDF
Secure software supply chain on a shoestring budget
PDF
DataOps - Lean principles and lean practices
PDF
Ai legal and ethics
PDF
The right side of speed - learning to shift left
PDF
Mortal analytics - Covid-19 and the problem of data quality
PDF
Data ops in practice - Swedish style
PDF
The lean principles of data ops
PDF
Data democratised
All the DataOps, all the paradigms .
Generative AI - the power to destroy democracy meets the security and reliabi...
The road to pragmatic application of AI.pdf
End-to-end pipeline agility - Berlin Buzzwords 2024
Schema on read is obsolete. Welcome metaprogramming..pdf
Industrialised data - the key to AI success.pdf
Crossing the data divide
Schema management with Scalameta
How to not kill people - Berlin Buzzwords 2023.pdf
Data engineering in 10 years.pdf
The 7 habits of data effective companies.pdf
Holistic data application quality
Secure software supply chain on a shoestring budget
DataOps - Lean principles and lean practices
Ai legal and ethics
The right side of speed - learning to shift left
Mortal analytics - Covid-19 and the problem of data quality
Data ops in practice - Swedish style
The lean principles of data ops
Data democratised

Recently uploaded (20)

PPTX
Global journeys: estimating international migration
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Global journeys: estimating international migration
Introduction-to-Cloud-ComputingFinal.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Acceptance and paychological effects of mandatory extra coach I classes.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Acumen Training GuidePresentation.pptx
A Quantitative-WPS Office.pptx research study
Miokarditis (Inflamasi pada Otot Jantung)
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Fluorescence-microscope_Botany_detailed content
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Data_Analytics_and_PowerBI_Presentation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx

Building real time data-driven products

  • 1. Building Real-time Data-driven Products Øyvind Løkling & Lars Albertsson Version 1.3 - 2016.10.12
  • 2. Øyvind Løkling Staff Software Engineer Schibsted Product & Technology Oslo - Stockholm - London - Barcelona - Krakow http://jobs.schibsted.com/
  • 5. Presentation goals ● Spark your interest in building data driven products. ● Give an overview of components and how these relate. ● Suggest technologies and approaches that can be used in practice. ● Event Collection ● The Unified Log ● Stream Processing ● Serving Results ● Schema 5
  • 6. Data driven products • Services and applications primarily driven by capturing and making sense of data • Health trackers • Recommendations • Analytics CC BY © https://500px.com/tommrazek
  • 7. Data driven products • Hosted services need to • Handle large volumes of data • Cleaning and structuring data • Serve individual users fast CC BY © https://500px.com/tommrazek
  • 8. Big Data, Fast Data, Smart Data • Accelerating data volumes and speeds • Internet of Things • AB Testing and Experiments • Personalised products CC BY © https://500px.com/erfgorini
  • 9. Big Data, Fast Data, Smart Data A need to make sense of data and act on fast • Faster development cycle • Data driven organisations • Data driven products CC BY © https://500px.com/erfgorini
  • 10. Time scales - what parts become obsolete? 10 Credits: Confluent Credits: Netflix
  • 11. The Log • Other common names • Commit log • Journal https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 12. Jay Kreps - I <3 Logs The state machine replication principle: If two identical, deterministic processes begin in the same state and get the same inputs in the same order, they will produce the same output and end in the same state.
  • 13. The Unified Log Simple idea; All of the organizations data, available in one unified log that is; • Unified: the one source of truth • Append only: Data items are immutable • Ordered: addressable by offset unique per partition • Fast and scalable: Able to handle 1000´s msg/sec • Distributed, robust and fault tolerant.
  • 14. Kafka • “Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.” But… :-) http://www.confluent.io/kafka-summit-2016-101-ways-to-configure-kafka-badly
  • 15. Kafka • “Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.” But… :-) http://www.confluent.io/kafka-summit-2016-101-ways-to-configure-kafka-badly
  • 16. © 2016 Apache Software Foundation
  • 17. © 2016 Apache Software Foundation
  • 18. A streaming data product’s anatomy 18 Pub / sub Unified log Ingress Stream processing Egress DB Service TopicJob Pipeline Service Export Visualisation DB
  • 19. Architectural patterns  The Unified Log and Lambda, Kappa architectures
  • 26. Kappa Architecture A software architecture pattern… where the canonical data store is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving. A Lambda architecture system with batch processing removed. https://www.oreilly.com/ideas/questioning-the-lambda-architecture
  • 30. Lambda vs Kappa ● Lambda ○ Leverages existing batch infrastructure ○ Cognitive overhead maintaining two approaches in parallel ● Kappa ○ Is real-time processing is inherently approximate, less powerful, and more lossy than batch processing. True? ○ Simpler model https://www.oreilly.com/ideas/questioning-the-lambda-architecture
  • 31. Cold Storage • Source of truth for replay in case of failure • Available for ad-hoc batch querying (Spark) • Wishlist; Fast writes, Reliable, Cheap • Cloud storage - s3 (with Glacier) • Traditional - HDFS, SAN • Consider what read performance do you need for a) error recovery b) bootstrapping new deployments
  • 33. Event collection 33 Cluster storage HDFS (NFS, S3, Google CS, C*) Service Reliable, simple, write available Kafka Event Bus with history (Secor, Camus) Immediate handoff to append-only replicated log. Once in the log, events eventually arrive in storage. Unified log Immutable events, append-only, source of truth
  • 34. Event collection - guarantees 34 Unified log Service (unimportant) Events are safe from here Replicated Topics Non-critical data: Asynchronous fire-and-forget handoff Critical data: Synchronous, replicated, with acknowledgement Service (important)
  • 35. Event collection • Source Types • Firehose api´s • Mobile apps and websites • Internet of things / embedded sensors • Event sourcing from existing systems
  • 36. Event collection • Considerations • Can you control the data flow, ask sender to wait? • Can clients be expected to have their logic updated? • Can you afford to loose some data, make tradeoffs and still solve your task?
  • 38. Pipeline graph • Parallelised jobs read and write to Kafka topics • Jobs are completely decoupled • Downstream jobs do not impact upstream • Usually an acyclic graph CC BY © https://500px.com/thakurdalipsingh Illustration from Apache Samza Doc - Concepts
  • 39. Stream processing components ● Building blocks ○ Aggregate ■ Calculate time windows ■ Aggregate state (database/in memory) ○ Filter ■ Slim down stream ■ Privacy, Security concerns ○ Join ■ Enrich by joining with datasets (geoip) ○ Transform ■ Bring data into same “shape”, schema
  • 40. Stream Processing Platforms • Spark Streaming • Ideal if you are already using Spark, same model • Bridges gap between data science / data engineers, batch and stream • Kafka Stream • Library - New, positions itself as a lightweight alternative • Tightly coupled to on Kafka • Others ○ Storm, Flink, Samza, Google Dataflow, AWS Lambda http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
  • 42. Schemas • You always have a schema • Schema on write • Requires upfront schema design before data can be received • Synchronised deployment of whole pipeline • Schema on read • Allows data to be captured -as is-. • Suitable for “Data lake” • Often requires cleaning and transform bring datasets into consistency downstream.
  • 43. Schema on read or write? 43 DB DB DB Service Service Export Business intelligenceChange agility important here Production stability important here
  • 44. Schemas • Options In streaming applications • Schema bundled with every record • Schema registry + id in record • Schema formats • JSON Schema • AVRO
  • 45. Evolving Schemas • Declare schema version even if no guarantee. Captures intention of source. • Be prepared for bad and non-validating data. • Decide on strategy for bringing schema versions in alignment. • Maintain upgrade path through transforms. • What are the needs of the consumer. • Data exploration vs Stable services.
  • 47. Serving Results ● As streams ○ Internal Consumer ○ External Consumer bridges ■ ex. REST post to external ingest endpoint ● As Views ○ Serving index, NoSQL ○ SQL / cubes for BI
  • 48. Reactive Streams • [...] an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. [...] aimed at runtime environments (JVM and JavaScript) as well as network protocols. • The scope [...] is to find a minimal set of interfaces, methods and protocols that will describe the necessary operations and entities to achieve this goal. • “Glue” between libraries. Reactive Kafka -> Akka Stream -> RxJava
  • 51. Schemas • You always have a schema • Even if you are “Schemaless” • Build tooling and workflows for handling schema changes