SlideShare a Scribd company logo
1
Riding the Streaming
Wave
DIY style:
Using & Building Kafka Connect
Plugins with Confluent Open Source
Konstantine Karantasis, Software Engineer
2
Intro
Contributor:
• Apache Kafka
• Confluent Open Source
• Confluent Enterprise
3
Challenge …
4
Challenge …
5
… and Answer
Streaming data platform as the central nervous system of data architecture.
6
Kafka Connect: An Overview
7
Kafka Connect: An Overview
8
Kafka Connect: An Overview
9
Kafka Connect: An Overview
10
Kafka Connect: An Overview
11
Kafka Connect: An Overview
12
Kafka Connect: An Overview
13
Kafka Connect: Out-of-the-box
Get for free:
• Distributed deployment that scales.
• Lean multi-tenancy with packaging/classloading isolation.
• REST API to manage plugins.
• Easy to use interfaces for automatic and periodic as well as
manual tracking of progress.
14
Kafka Connect: Out-of-the-box
more:
• Native integration with monitoring
platforms such as Confluent
Control Center.
• Metrics (more coming up soon).
• Continuous open source
development.
15
Kafka Connect Plugins: A Developer’s Perspective
An ecosystem of plugins:
• Connectors
• Transforms
• Converters
How data flow through Connect API:
16
Kafka Connect Plugin Types
Connectors are the richest plugins. Two types:
• Source Connectors
• Sink Connectors
Connectors may support both structured and unstructured data:
• Converters and the Schema Registry
Transforms:
• Is data useful as is? Or can use some basic transformations?
17
Where to start
Sounds great! How do I start?
(or else: Why Confluent Open Source for Connect plugin dev)
18
Where to start
Iterate fast during development with Confluent CLI:
19
Classloading Isolation: Development with peace of mind
Use the plugin.path worker configuration property
my-plugins (included in the plugin.path )
kafka-connect-foo-connector
JAR files, sample configs, licenses,
etc.
20
Classloading Isolation: Development with peace of mind
Workers isolate the JARs for each connector, transform,
and converter to prevent conflicts.
my-plugins (included in the plugin.path )
kafka-connect-foo-connector
kafka-connect-bar-connector
kafka-connect-baz-uber.jar
21
Kafka Connect in Action
Let’s build a simple stream of data with Kafka Connect.
• Find a dev-friendly public feed (e.g. meetup rsvps)
• Start a simple source connector (here: file source connector)
Demo with Confluent CLI
22
Source Connector Dev: Basic Concepts
• Keep track of Source Connector’s progress: User-defined
source offsets.
• It’s a distributed system. Design for multiple workers/multiple
tasks per worker.
• Map your data to topic-partitions efficiently.
• At-least-once semantics for Source Connectors.
23
Kafka Connect in Action
• Next, start a sink connector to extend the stream
(here: s3 sink connector)
Demo with Confluent CLI
24
Sink Connector Dev: Basic Concepts
• Sink Connectors utilize Kafka consumer offsets to track
progress.
• By default at-least once semantics with automatic and periodic
offset commits.
• But if the Sink allows for determinism and idempotence, sink
connectors can be exactly-once!
• Examples: S3 and HDFS Connectors.
25
Debugging Kafka Connect and Connect Plugins
This looked nice, but also pretty ideal for Connector development.
How do I debug my Connect plugins?
• Set the right environment variables for Confluent CLI
export CONNECT_DEBUG=y;
export DEBUG_SUSPEND_FLAG=y;
• also: export CONFLUENT_CURRENT=/your/fav/dir/location
(doesn’t have to be have to be /tmp)
• Restart Kafka Connect worker
• Attach a remote debugger using your favorite IDE
26
Package and Publish your connector
So you built your Kafka Connect plugin!
• Currently follow a commonly used pattern using maven-
assembly-plugin or shade plugin
• maven-kafka-connect-plugin coming up soon.
• confluent.io/connectors
27
Summary
Why develop with Kafka Connect?
• Lot’s of functionality for free (scalability, multi-tenancy, mgmt, monitoring)
• Quick on-boarding
• Active community
How to develop your Kafka Connect plugins?
• Use Confluent CLI
• Use classloading isolation with plugin.path
• Debug your connector with your favorite IDE
• Extend existing open source plugins or build your own!
• Stay tuned, the best is yet to come!
28
Thank You!

More Related Content

PPTX
Analytics Beyond RAM Capacity using R
PDF
Athens BigData Meetup - Sept 17
PPTX
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
PDF
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
PPTX
Confluent building a real-time streaming platform using kafka streams and k...
PDF
Data Pipelines Made Simple with Apache Kafka
PDF
Kafka Summit SF 2017 - Database Streaming at WePay
PDF
Apache Kafka lessons learned @PAYBACK
Analytics Beyond RAM Capacity using R
Athens BigData Meetup - Sept 17
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Confluent building a real-time streaming platform using kafka streams and k...
Data Pipelines Made Simple with Apache Kafka
Kafka Summit SF 2017 - Database Streaming at WePay
Apache Kafka lessons learned @PAYBACK

What's hot (20)

PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
PPTX
Portable Streaming Pipelines with Apache Beam
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PDF
Real-world Streaming Architectures
PDF
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
PDF
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
PDF
Data integration with Apache Kafka
PDF
Intro to AsyncAPI
PDF
Monitoring Apache Kafka with Confluent Control Center
PDF
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
PDF
Apache kafka-a distributed streaming platform
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
PDF
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
PDF
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
PDF
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PDF
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
PPTX
2018 07-11 - kafka integration patterns
PPTX
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Portable Streaming Pipelines with Apache Beam
Taking a look under the hood of Apache Flink's relational APIs.
Real-world Streaming Architectures
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Data integration with Apache Kafka
Intro to AsyncAPI
Monitoring Apache Kafka with Confluent Control Center
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Apache kafka-a distributed streaming platform
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
2018 07-11 - kafka integration patterns
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Ad

Similar to Riding the Streaming Wave DIY style (20)

PPTX
Data Pipelines with Kafka Connect
PDF
Integrating Apache Kafka Into Your Environment
PPT
Kafka Explainaton
PDF
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PDF
Building Streaming Data Applications Using Apache Kafka
PDF
Introduction to Kafka Streams - Knolx.pdf
PPTX
Kafka Streams for Java enthusiasts
PPTX
kafka for db as postgres
PDF
Introducing Kafka's Streams API
PPTX
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
PDF
Deploying Kafka on DC/OS
PDF
Day in the life event-driven workshop
PDF
Apache Kafka - Scalable Message-Processing and more !
ODP
Introduction to Kafka connect
PPTX
Kafka connect 101
PDF
Building Microservices with Apache Kafka
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Data Pipelines with Kafka Connect
Integrating Apache Kafka Into Your Environment
Kafka Explainaton
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building Streaming Data Applications Using Apache Kafka
Introduction to Kafka Streams - Knolx.pdf
Kafka Streams for Java enthusiasts
kafka for db as postgres
Introducing Kafka's Streams API
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Deploying Kafka on DC/OS
Day in the life event-driven workshop
Apache Kafka - Scalable Message-Processing and more !
Introduction to Kafka connect
Kafka connect 101
Building Microservices with Apache Kafka
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Ad

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Riding the Streaming Wave DIY style

  • 1. 1 Riding the Streaming Wave DIY style: Using & Building Kafka Connect Plugins with Confluent Open Source Konstantine Karantasis, Software Engineer
  • 2. 2 Intro Contributor: • Apache Kafka • Confluent Open Source • Confluent Enterprise
  • 5. 5 … and Answer Streaming data platform as the central nervous system of data architecture.
  • 13. 13 Kafka Connect: Out-of-the-box Get for free: • Distributed deployment that scales. • Lean multi-tenancy with packaging/classloading isolation. • REST API to manage plugins. • Easy to use interfaces for automatic and periodic as well as manual tracking of progress.
  • 14. 14 Kafka Connect: Out-of-the-box more: • Native integration with monitoring platforms such as Confluent Control Center. • Metrics (more coming up soon). • Continuous open source development.
  • 15. 15 Kafka Connect Plugins: A Developer’s Perspective An ecosystem of plugins: • Connectors • Transforms • Converters How data flow through Connect API:
  • 16. 16 Kafka Connect Plugin Types Connectors are the richest plugins. Two types: • Source Connectors • Sink Connectors Connectors may support both structured and unstructured data: • Converters and the Schema Registry Transforms: • Is data useful as is? Or can use some basic transformations?
  • 17. 17 Where to start Sounds great! How do I start? (or else: Why Confluent Open Source for Connect plugin dev)
  • 18. 18 Where to start Iterate fast during development with Confluent CLI:
  • 19. 19 Classloading Isolation: Development with peace of mind Use the plugin.path worker configuration property my-plugins (included in the plugin.path ) kafka-connect-foo-connector JAR files, sample configs, licenses, etc.
  • 20. 20 Classloading Isolation: Development with peace of mind Workers isolate the JARs for each connector, transform, and converter to prevent conflicts. my-plugins (included in the plugin.path ) kafka-connect-foo-connector kafka-connect-bar-connector kafka-connect-baz-uber.jar
  • 21. 21 Kafka Connect in Action Let’s build a simple stream of data with Kafka Connect. • Find a dev-friendly public feed (e.g. meetup rsvps) • Start a simple source connector (here: file source connector) Demo with Confluent CLI
  • 22. 22 Source Connector Dev: Basic Concepts • Keep track of Source Connector’s progress: User-defined source offsets. • It’s a distributed system. Design for multiple workers/multiple tasks per worker. • Map your data to topic-partitions efficiently. • At-least-once semantics for Source Connectors.
  • 23. 23 Kafka Connect in Action • Next, start a sink connector to extend the stream (here: s3 sink connector) Demo with Confluent CLI
  • 24. 24 Sink Connector Dev: Basic Concepts • Sink Connectors utilize Kafka consumer offsets to track progress. • By default at-least once semantics with automatic and periodic offset commits. • But if the Sink allows for determinism and idempotence, sink connectors can be exactly-once! • Examples: S3 and HDFS Connectors.
  • 25. 25 Debugging Kafka Connect and Connect Plugins This looked nice, but also pretty ideal for Connector development. How do I debug my Connect plugins? • Set the right environment variables for Confluent CLI export CONNECT_DEBUG=y; export DEBUG_SUSPEND_FLAG=y; • also: export CONFLUENT_CURRENT=/your/fav/dir/location (doesn’t have to be have to be /tmp) • Restart Kafka Connect worker • Attach a remote debugger using your favorite IDE
  • 26. 26 Package and Publish your connector So you built your Kafka Connect plugin! • Currently follow a commonly used pattern using maven- assembly-plugin or shade plugin • maven-kafka-connect-plugin coming up soon. • confluent.io/connectors
  • 27. 27 Summary Why develop with Kafka Connect? • Lot’s of functionality for free (scalability, multi-tenancy, mgmt, monitoring) • Quick on-boarding • Active community How to develop your Kafka Connect plugins? • Use Confluent CLI • Use classloading isolation with plugin.path • Debug your connector with your favorite IDE • Extend existing open source plugins or build your own! • Stay tuned, the best is yet to come!