SlideShare a Scribd company logo
The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
The Future of Data
Pipelines
John Hammink
Developer Advocate, Evangelist @ Aiven
john@aiven.io
All Things Open Meetup 2020,
Raleigh NC
The Future of Data Pipelines
● Let’s start with where the future looked
from about a year ago
● Let’s assume that pub-sub and particularly
technologies like Kafka have created a pivot
point
● Let’s follow up with a discussion of trends
we’re all seeing
The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
The Future of Data Pipelines
● Background
● Learnings
● Some numbers on data
● Functionality
● Design
● Compliance
● Reliability/Performance/Usability/Other
● Conclusions
● Q & A
The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
An arbitrary timeline of data
The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
What have we learned?
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Pipeline data has
grown/transformed
in terms of:
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Core-to-edge computing
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Functionality
● can autoscale, shard,
tolerate partitions
● can capture, fix and
requeue errored
events
● can be troubleshot
and configured on fly
● data can be used for
AI/ML
● agnostic:
accomodates all
possible formats
● can self-perpetuate
and automate
continuous
improvements
source: kafka.apache.org
Source: Martin Kleppmann’s Kafka Summit 2018 Presentation, Is Kafka a Database? The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Other design considerations
● Kill switch - when things go wrong; a way to stem the data
flow.
● Query on the streams - like KSQL, Apache Flink, SQL on
Amazon Kinesis, etc. Ultimately: one query interface for all
data.
● Need to accomodate - core-to-edge, growing data volume,
near real-time velocity, bi-directionality
● Distributed ledgers - pipelines can support these as
tunably consistent distributed datastores
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Compliance
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Reliability/Performance/Usability/Other
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
● MTTF → ∞
● Components: from interpreted to native?
● Metadata: will continue to transform, to offer even more space
savings.
● Pipelines: from hard-coded to drag and drop designs?
● Which data is left volatile and which is stored?
Wrapping up
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
Check out The Future of Data Pipelines at
https://aiven.io/blog/the-future-of-data-pipelines/
Discussion
The Future of Data Pipelines | All Things Open 2019 | 2019-10-14

More Related Content

PDF
Building Notebook-based AI Pipelines with Elyra and Kubeflow
PPTX
Building a Codeless Log Pipeline w/ Confluent Sink Connector | Pollyanna Vale...
PDF
Replaying KStreams Apps Using State Snapshots (Nishchay Sinha & Yan Wang, Blo...
PPTX
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
PDF
How to leverage Kafka data streams with Neo4j
PPTX
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
PDF
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
PPTX
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building a Codeless Log Pipeline w/ Confluent Sink Connector | Pollyanna Vale...
Replaying KStreams Apps Using State Snapshots (Nishchay Sinha & Yan Wang, Blo...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
How to leverage Kafka data streams with Neo4j
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Evaluation of TPC-H on Spark and Spark SQL in ALOJA

What's hot (20)

PDF
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
PDF
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
PDF
Better Together: How Graph database enables easy data integration with Spark ...
PDF
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
PDF
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
PPTX
Apache Flink and what it is used for
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
PDF
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
PDF
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
PDF
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
 
PDF
Digital transformation: Highly resilient streaming architecture and strategie...
PDF
Introduction to Streaming with Apache Flink
PDF
Confluent Steaming Webinar - Cape Town - Vitality
PDF
Building Pinterest Real-Time Ads Platform Using Kafka Streams
PPTX
Marina Svicevic, Milos Pavkovic, Mladen Maric, Vijeta Hingorani [Socialgist] ...
PDF
HOP! Airlines Jets to Real Time
PDF
Monitor Kubernetes in Rancher using InfluxData
PDF
Kafka and Kafka Streams in the Global Schibsted Data Platform
PDF
#SlimScalding - Less Memory is More Capacity
PDF
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Better Together: How Graph database enables easy data integration with Spark ...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Flink for Everyone: Self Service Data Analytics with StreamPipes - Philipp Ze...
Apache Flink and what it is used for
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
 
Digital transformation: Highly resilient streaming architecture and strategie...
Introduction to Streaming with Apache Flink
Confluent Steaming Webinar - Cape Town - Vitality
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Marina Svicevic, Milos Pavkovic, Mladen Maric, Vijeta Hingorani [Socialgist] ...
HOP! Airlines Jets to Real Time
Monitor Kubernetes in Rancher using InfluxData
Kafka and Kafka Streams in the Global Schibsted Data Platform
#SlimScalding - Less Memory is More Capacity
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
Ad

Similar to The Future of Data Pipelines (20)

PDF
The Future of Data Pipelines
PDF
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
PDF
Eecs6893 big dataanalytics-lecture1
PDF
Present and future of unified, portable, and efficient data processing with A...
PDF
Integrating Semantic Web in the Real World: A Journey between Two Cities
PDF
Cloud-Scale BGP and NetFlow Analysis
PDF
Amidst demo (BNAIC 2015)
PPTX
Big and fast data strategy 2017 jr
PPTX
Network of Networks Discussion and Charter Document
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
PDF
Text Analytics & Linked Data Management As-a-Service
PPTX
OTC Start Thinking BIG Data 2018 10-18
PDF
Yuan ding resume
PDF
Incrementally streaming rdbms data to your data lake automagically
PDF
Netsoft19 Keynote: Fluid Network Planes
PDF
Modelers Datahub: Tool to manage model data
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
PPTX
Easy SPARQLing for the Building Performance Professional
PDF
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
The Future of Data Pipelines
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
Eecs6893 big dataanalytics-lecture1
Present and future of unified, portable, and efficient data processing with A...
Integrating Semantic Web in the Real World: A Journey between Two Cities
Cloud-Scale BGP and NetFlow Analysis
Amidst demo (BNAIC 2015)
Big and fast data strategy 2017 jr
Network of Networks Discussion and Charter Document
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Text Analytics & Linked Data Management As-a-Service
OTC Start Thinking BIG Data 2018 10-18
Yuan ding resume
Incrementally streaming rdbms data to your data lake automagically
Netsoft19 Keynote: Fluid Network Planes
Modelers Datahub: Tool to manage model data
Advanced Analytics and Machine Learning with Data Virtualization
Evolving Beyond the Data Lake: A Story of Wind and Rain
Easy SPARQLing for the Building Performance Professional
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
Ad

More from All Things Open (20)

PDF
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
PPTX
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
PDF
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
PDF
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
PDF
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
PDF
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
PDF
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
PPTX
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
PDF
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
PDF
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
PPTX
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
PDF
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
PPTX
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
PDF
The Death of the Browser - Rachel-Lee Nabors, AgentQL
PDF
Making Operating System updates fast, easy, and safe
PDF
Reshaping the landscape of belonging to transform community
PDF
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
PDF
Integrating Diversity, Equity, and Inclusion into Product Design
PDF
The Open Source Ecosystem for eBPF in Kubernetes
PDF
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
The Death of the Browser - Rachel-Lee Nabors, AgentQL
Making Operating System updates fast, easy, and safe
Reshaping the landscape of belonging to transform community
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
Integrating Diversity, Equity, and Inclusion into Product Design
The Open Source Ecosystem for eBPF in Kubernetes
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Monthly Chronicles - July 2025
Big Data Technologies - Introduction.pptx
Modernizing your data center with Dell and AMD
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

The Future of Data Pipelines

  • 1. The Future of Data Pipelines | All Things Open 2020 | 2020-02-24 The Future of Data Pipelines John Hammink Developer Advocate, Evangelist @ Aiven john@aiven.io All Things Open Meetup 2020, Raleigh NC
  • 2. The Future of Data Pipelines ● Let’s start with where the future looked from about a year ago ● Let’s assume that pub-sub and particularly technologies like Kafka have created a pivot point ● Let’s follow up with a discussion of trends we’re all seeing The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
  • 3. The Future of Data Pipelines ● Background ● Learnings ● Some numbers on data ● Functionality ● Design ● Compliance ● Reliability/Performance/Usability/Other ● Conclusions ● Q & A The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
  • 4. An arbitrary timeline of data The Future of Data Pipelines | All Things Open 2020 | 2020-02-24
  • 5. What have we learned? The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Pipeline data has grown/transformed in terms of:
  • 6. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
  • 7. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Core-to-edge computing
  • 8. The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Functionality ● can autoscale, shard, tolerate partitions ● can capture, fix and requeue errored events ● can be troubleshot and configured on fly ● data can be used for AI/ML ● agnostic: accomodates all possible formats ● can self-perpetuate and automate continuous improvements
  • 9. source: kafka.apache.org Source: Martin Kleppmann’s Kafka Summit 2018 Presentation, Is Kafka a Database? The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
  • 10. Other design considerations ● Kill switch - when things go wrong; a way to stem the data flow. ● Query on the streams - like KSQL, Apache Flink, SQL on Amazon Kinesis, etc. Ultimately: one query interface for all data. ● Need to accomodate - core-to-edge, growing data volume, near real-time velocity, bi-directionality ● Distributed ledgers - pipelines can support these as tunably consistent distributed datastores The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
  • 11. Compliance The Future of Data Pipelines | All Things Open 2019 | 2019-10-14
  • 12. Reliability/Performance/Usability/Other The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 ● MTTF → ∞ ● Components: from interpreted to native? ● Metadata: will continue to transform, to offer even more space savings. ● Pipelines: from hard-coded to drag and drop designs? ● Which data is left volatile and which is stored?
  • 13. Wrapping up The Future of Data Pipelines | All Things Open 2019 | 2019-10-14 Check out The Future of Data Pipelines at https://aiven.io/blog/the-future-of-data-pipelines/
  • 14. Discussion The Future of Data Pipelines | All Things Open 2019 | 2019-10-14