SlideShare a Scribd company logo
Simplifying migration
from Kafka to Pulsar
Andrey Yegorov
Senior Software Engineer at DataStax
Committer at Apache Bookkeeper
Contributor at Apache Pulsar
Pulsar Virtual Summit North America 2021
Agenda
2
Thank you!
3
Problem
4
Goal
5
Diagrams
are
important
6
Pulsar
Kafka Connect Adaptor Sink
Kafka Connect Sink
Incoming
Data
Third-party system
Outgoing
Data
Prerequisite
work
● Implement GenericObject - Allow GenericRecord to wrap any Java Object
https://github.com/apache/pulsar/pull/10057
● Pulsar IO: Allow to develop Sinks that support Schema but without setting it at
build time (Sink<GenericObject>) https://github.com/apache/pulsar/pull/10034
● Add Schema.getNativeSchema https://github.com/apache/pulsar/pull/10076
● GenericObject - support KeyValue in Message#getValue()
https://github.com/apache/pulsar/pull/10107
● GenericObject: handle KeyValue with SEPARATED encoding
https://github.com/apache/pulsar/pull/10186
● Sink<GenericObject> unwrap internal AutoConsumeSchema and allow to handle
topics with KeyValue schema https://github.com/apache/pulsar/pull/10211
● And others
7
A lot of work to enable development of the KCA Sink
(kudos to my colleague Enrico Olivelli):
Kafka Connect
Adaptor Sink
work
● Add getPartitionIndex() to the Record<>
https://github.com/apache/pulsar/pull/9947
● Exposed SubscriptionType in the SinkContext
https://github.com/apache/pulsar/pull/10446
● SinkContext: ability to seek/pause/resume consumer for a topic
https://github.com/apache/pulsar/pull/10498
● Add ability to use Kafka's sinks as pulsar sinks
https://github.com/apache/pulsar/pull/9927
● Kafka connect sink adaptor to support non-primitive schemas
https://github.com/apache/pulsar/pull/10410
8
Done and work in progress, so far:
Demo
9
Plan
10
Setup mock
Kinesis
$ brew install awscli
$ aws configure
Use ("mock-kinesis-access-key", "mock-kinesis-secret-key") for access/secret keys
correspondingly when asked.
Follow modified steps from https://github.com/etspaceman/kinesis-mock:
$ docker pull ghcr.io/etspaceman/kinesis-mock:0.0.4
$ docker run -p 443:4567 -p 4568:4568 ghcr.io/etspaceman/kinesis-mock:0.0.4
Note port 443 in the mapping. Docker will still show something like:
k.m.KinesisMockService - Starting Kinesis Http2 Mock Service on port 4567
k.m.KinesisMockService - Starting Kinesis Http1 Plain Mock Service on port 4568
Create Kinesis stream:
$ aws kinesis create-stream --endpoint-url https://localhost/ --no-verify-ssl --
stream-name test-kinesis --shard-count 1
11
Build AWS
Kinesis-Kafka
Connector
Get code from https://github.com/awslabs/kinesis-kafka-connector
Make it skip certificate verification (for Kinesis mock):
diff --git a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
index f86f3fd..2920fb8 100644
--- a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
+++ b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java
@@ -359,6 +359,8 @@ public class AmazonKinesisSinkTask extends SinkTask {
// The namespace to upload metrics under.
config.setMetricsNamespace(metricsNameSpace);
+ config.setVerifyCertificate(false);
+ return new KinesisProducer(config);
}
Build it/install into local maven repo:
$ mvn clean install -DskipTest
12
Package
Build a nar with kinesis connector included:
diff --git a/pulsar-io/kafka-connect-adaptor-nar/pom.xml b/pulsar-io/kafka-connect-adaptor-
nar/pom.xml
index ea9bedbd056..c7fa9a1ebca 100644
--- a/pulsar-io/kafka-connect-adaptor-nar/pom.xml
+++ b/pulsar-io/kafka-connect-adaptor-nar/pom.xml
@@ -36,6 +36,11 @@
<artifactId>pulsar-io-kafka-connect-adaptor</artifactId>
<version>${project.version}</version>
</dependency>
+ <dependency>
+ <groupId>com.amazonaws</groupId>
+ <artifactId>amazon-kinesis-kafka-connector</artifactId>
+ <version>0.0.9-SNAPSHOT</version>
+ </dependency>
</dependencies>
Build it:
$ mvn -f pulsar-io/kafka-connect-adaptor-nar/pom.xml clean package -
DskipTests 13
Let’s roll
Start pulsar standalone:
$ bin/pulsar standalone
Run the sink:
$ bin/pulsar-admin sinks localrun -a ./pulsar-io/kafka-connect-adaptor-
nar/target/pulsar-io-kafka-connect-adaptor-nar-2.8.0-SNAPSHOT.nar --name
kwrap --namespace public/default/ktest --parallelism 1 -i my-topic --sink-
config-file ~/sink-kinesis.yaml
14
Config
$ cat ~/sink-kinesis.yaml
processingGuarantees: "EFFECTIVELY_ONCE"
configs:
"topic": "my-topic"
"offsetStorageTopic": "kafka-connect-sink-offset-kinesis"
"pulsarServiceUrl": "pulsar://localhost:6650/"
"kafkaConnectorSinkClass": "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector"
"kafkaConnectorConfigProperties":
"name": "test-kinesis-sink"
'connector.class': "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector"
"tasks.max": "1"
"topics": "my-topic"
"kinesisEndpoint": "localhost"
"region": "us-east-1"
"streamName": "test-kinesis"
"singleKinesisProducerPerPartition": "true"
"pauseConsumption": "true"
"maxConnections": "1"
15
Properties passed
to the Kafka Connect Sink
Action!
Produce message to pulsar topic:
$ bin/pulsar-client produce my-topic --messages "Hello"
Read data from Kinesis:
# Get shard iterator for kinesis and use it later:
$ aws kinesis get-shard-iterator --shard-id shardId-000000000000 --
shard-iterator-type TRIM_HORIZON --stream-name test-kinesis --endpoint-
url https://localhost/ --no-verify-ssl
$ aws kinesis get-records --endpoint-url https://localhost/ --no-
verify-ssl --shard-iterator <SHARD_ITERATOR_HERE>
{"SequenceNumber":
"49618471738282782665106189312850320303184854662386810882",
"ApproximateArrivalTimestamp": "2021-05-21T14:08:35-07:00",
"Data": "SGVsbG8=",
"PartitionKey": "0",
"EncryptionType": "NONE"}
https://www.base64decode.org/ tells us that “SGVsbG8=” is “Hello”.
16
17
Thank
you!
THE
END
18

More Related Content

PPT
Introduction To RDF and RDFS
ODP
Projects In Laravel : Learn Laravel Building 10 Projects
PDF
마이크로 서비스를 위한 AWS Cloud Map & App Mesh - Saeho Kim (AWS Solutions Architect)
PPTX
Escaping The Jar hell with Jigsaw Layers
PDF
Fast api
PDF
Introduction to Open Telemetry as Observability Library
PDF
Office 365 in a hybrid world
PPTX
Nginx Reverse Proxy with Kafka.pptx
Introduction To RDF and RDFS
Projects In Laravel : Learn Laravel Building 10 Projects
마이크로 서비스를 위한 AWS Cloud Map & App Mesh - Saeho Kim (AWS Solutions Architect)
Escaping The Jar hell with Jigsaw Layers
Fast api
Introduction to Open Telemetry as Observability Library
Office 365 in a hybrid world
Nginx Reverse Proxy with Kafka.pptx

What's hot (20)

PDF
News And Development Update Of The CloudStack Tungsten Fabric SDN Plug-in
PDF
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
PDF
An Introduction to OpenStack
PDF
WSO2 Identity Server - Product Overview
PPT
Parquet overview
PDF
Architecting an Enterprise API Management Strategy
PDF
High Availability for OpenStack
PDF
Greenplum User Case
PPTX
Deep Dive - OneDrive for business
PPTX
Ceph and Openstack in a Nutshell
PPTX
OpenTelemetry For Operators
PDF
Inside Architecture of Neutron
PPTX
Introduction to OpenStack Trove & Database as a Service
PPTX
OPA APIs and Use Case Survey
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PPTX
Object Storage in a Cloud-Native Container Envirnoment
PDF
CNCF Singapore - Introduction to Envoy
PPTX
AWS Snowball
PPTX
MP BGP-EVPN 실전기술-1편(개념잡기)
PDF
Why and how to use slack
News And Development Update Of The CloudStack Tungsten Fabric SDN Plug-in
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
An Introduction to OpenStack
WSO2 Identity Server - Product Overview
Parquet overview
Architecting an Enterprise API Management Strategy
High Availability for OpenStack
Greenplum User Case
Deep Dive - OneDrive for business
Ceph and Openstack in a Nutshell
OpenTelemetry For Operators
Inside Architecture of Neutron
Introduction to OpenStack Trove & Database as a Service
OPA APIs and Use Case Survey
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Object Storage in a Cloud-Native Container Envirnoment
CNCF Singapore - Introduction to Envoy
AWS Snowball
MP BGP-EVPN 실전기술-1편(개념잡기)
Why and how to use slack
Ad

Similar to Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021 (20)

PDF
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
PDF
Kafka on Pulsar
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
PDF
Getting Pulsar Spinning_Addison Higham
PPTX
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
PDF
Apache Pulsar Overview
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
PDF
ITPC Building Modern Data Streaming Apps
PDF
The Next Generation of Streaming
PDF
Creating Data Fabric for #IOT with Apache Pulsar
PDF
bigdata 2022_ FLiP Into Pulsar Apps
PDF
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
PDF
Timothy Spann: Apache Pulsar for ML
PDF
Music city data Hail Hydrate! from stream to lake
PDF
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
PDF
OSSNA Building Modern Data Streaming Apps
PDF
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
PDF
Elastic Data Processing with Apache Flink and Apache Pulsar
PDF
What We Learned From Building a Modern Messaging and Streaming System for Cloud
PPTX
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Kafka on Pulsar
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Getting Pulsar Spinning_Addison Higham
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar Overview
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
ITPC Building Modern Data Streaming Apps
The Next Generation of Streaming
Creating Data Fabric for #IOT with Apache Pulsar
bigdata 2022_ FLiP Into Pulsar Apps
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Timothy Spann: Apache Pulsar for ML
Music city data Hail Hydrate! from stream to lake
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
OSSNA Building Modern Data Streaming Apps
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Elastic Data Processing with Apache Flink and Apache Pulsar
What We Learned From Building a Modern Messaging and Streaming System for Cloud
Take Kafka-on-Pulsar to Production at Internet Scale: Improvements Made for P...
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Distributed Database Design Decisions to Support High Performance Event Strea...
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence

Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021

  • 1. Simplifying migration from Kafka to Pulsar Andrey Yegorov Senior Software Engineer at DataStax Committer at Apache Bookkeeper Contributor at Apache Pulsar Pulsar Virtual Summit North America 2021
  • 6. Diagrams are important 6 Pulsar Kafka Connect Adaptor Sink Kafka Connect Sink Incoming Data Third-party system Outgoing Data
  • 7. Prerequisite work ● Implement GenericObject - Allow GenericRecord to wrap any Java Object https://github.com/apache/pulsar/pull/10057 ● Pulsar IO: Allow to develop Sinks that support Schema but without setting it at build time (Sink<GenericObject>) https://github.com/apache/pulsar/pull/10034 ● Add Schema.getNativeSchema https://github.com/apache/pulsar/pull/10076 ● GenericObject - support KeyValue in Message#getValue() https://github.com/apache/pulsar/pull/10107 ● GenericObject: handle KeyValue with SEPARATED encoding https://github.com/apache/pulsar/pull/10186 ● Sink<GenericObject> unwrap internal AutoConsumeSchema and allow to handle topics with KeyValue schema https://github.com/apache/pulsar/pull/10211 ● And others 7 A lot of work to enable development of the KCA Sink (kudos to my colleague Enrico Olivelli):
  • 8. Kafka Connect Adaptor Sink work ● Add getPartitionIndex() to the Record<> https://github.com/apache/pulsar/pull/9947 ● Exposed SubscriptionType in the SinkContext https://github.com/apache/pulsar/pull/10446 ● SinkContext: ability to seek/pause/resume consumer for a topic https://github.com/apache/pulsar/pull/10498 ● Add ability to use Kafka's sinks as pulsar sinks https://github.com/apache/pulsar/pull/9927 ● Kafka connect sink adaptor to support non-primitive schemas https://github.com/apache/pulsar/pull/10410 8 Done and work in progress, so far:
  • 11. Setup mock Kinesis $ brew install awscli $ aws configure Use ("mock-kinesis-access-key", "mock-kinesis-secret-key") for access/secret keys correspondingly when asked. Follow modified steps from https://github.com/etspaceman/kinesis-mock: $ docker pull ghcr.io/etspaceman/kinesis-mock:0.0.4 $ docker run -p 443:4567 -p 4568:4568 ghcr.io/etspaceman/kinesis-mock:0.0.4 Note port 443 in the mapping. Docker will still show something like: k.m.KinesisMockService - Starting Kinesis Http2 Mock Service on port 4567 k.m.KinesisMockService - Starting Kinesis Http1 Plain Mock Service on port 4568 Create Kinesis stream: $ aws kinesis create-stream --endpoint-url https://localhost/ --no-verify-ssl -- stream-name test-kinesis --shard-count 1 11
  • 12. Build AWS Kinesis-Kafka Connector Get code from https://github.com/awslabs/kinesis-kafka-connector Make it skip certificate verification (for Kinesis mock): diff --git a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java index f86f3fd..2920fb8 100644 --- a/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java +++ b/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java @@ -359,6 +359,8 @@ public class AmazonKinesisSinkTask extends SinkTask { // The namespace to upload metrics under. config.setMetricsNamespace(metricsNameSpace); + config.setVerifyCertificate(false); + return new KinesisProducer(config); } Build it/install into local maven repo: $ mvn clean install -DskipTest 12
  • 13. Package Build a nar with kinesis connector included: diff --git a/pulsar-io/kafka-connect-adaptor-nar/pom.xml b/pulsar-io/kafka-connect-adaptor- nar/pom.xml index ea9bedbd056..c7fa9a1ebca 100644 --- a/pulsar-io/kafka-connect-adaptor-nar/pom.xml +++ b/pulsar-io/kafka-connect-adaptor-nar/pom.xml @@ -36,6 +36,11 @@ <artifactId>pulsar-io-kafka-connect-adaptor</artifactId> <version>${project.version}</version> </dependency> + <dependency> + <groupId>com.amazonaws</groupId> + <artifactId>amazon-kinesis-kafka-connector</artifactId> + <version>0.0.9-SNAPSHOT</version> + </dependency> </dependencies> Build it: $ mvn -f pulsar-io/kafka-connect-adaptor-nar/pom.xml clean package - DskipTests 13
  • 14. Let’s roll Start pulsar standalone: $ bin/pulsar standalone Run the sink: $ bin/pulsar-admin sinks localrun -a ./pulsar-io/kafka-connect-adaptor- nar/target/pulsar-io-kafka-connect-adaptor-nar-2.8.0-SNAPSHOT.nar --name kwrap --namespace public/default/ktest --parallelism 1 -i my-topic --sink- config-file ~/sink-kinesis.yaml 14
  • 15. Config $ cat ~/sink-kinesis.yaml processingGuarantees: "EFFECTIVELY_ONCE" configs: "topic": "my-topic" "offsetStorageTopic": "kafka-connect-sink-offset-kinesis" "pulsarServiceUrl": "pulsar://localhost:6650/" "kafkaConnectorSinkClass": "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector" "kafkaConnectorConfigProperties": "name": "test-kinesis-sink" 'connector.class': "com.amazon.kinesis.kafka.AmazonKinesisSinkConnector" "tasks.max": "1" "topics": "my-topic" "kinesisEndpoint": "localhost" "region": "us-east-1" "streamName": "test-kinesis" "singleKinesisProducerPerPartition": "true" "pauseConsumption": "true" "maxConnections": "1" 15 Properties passed to the Kafka Connect Sink
  • 16. Action! Produce message to pulsar topic: $ bin/pulsar-client produce my-topic --messages "Hello" Read data from Kinesis: # Get shard iterator for kinesis and use it later: $ aws kinesis get-shard-iterator --shard-id shardId-000000000000 -- shard-iterator-type TRIM_HORIZON --stream-name test-kinesis --endpoint- url https://localhost/ --no-verify-ssl $ aws kinesis get-records --endpoint-url https://localhost/ --no- verify-ssl --shard-iterator <SHARD_ITERATOR_HERE> {"SequenceNumber": "49618471738282782665106189312850320303184854662386810882", "ApproximateArrivalTimestamp": "2021-05-21T14:08:35-07:00", "Data": "SGVsbG8=", "PartitionKey": "0", "EncryptionType": "NONE"} https://www.base64decode.org/ tells us that “SGVsbG8=” is “Hello”. 16

Editor's Notes

  • #3: Thank you Goal Prerequisite work Kafka Connect Adaptor Sink work Demo
  • #4: Pulsar community Everyone who reviewed the code and contributed ideas DataStax and all the people whose memes I “borrowed”
  • #5: Complex/large-scale implementations of OSS systems, Kafka included, involve customizations and in-house developed tools and plugins. Transition from one system to another is a complicated process and making it iterative increases the chance of success.
  • #6: Simplify move from Kafka to Pulsar for power users of Kafka who rely on integrations of Kafka with other systems. Postpone rewrite of custom Kafka Connect Sinks to native Pulsar Sinks Enable Pulsar integrations when corresponding Pulsar Sink does not exist but the Kafka Connect Sink does Enable Pulsar integrations when existing Pulsar Sink’s behavior or functionality does not match what the integration rely on
  • #11: Let’s use something more exciting than a simple FileStreamSinkConnector AmazonKinesisSinkConnector it is Let’s use mock kinesis for simplicity And run it all locally
  • #18: We took a Kafka Connect Sink Packaged it for use with Pulsar Configured it to send messages to Kinesis Sent a message to Pulsar And the message appeared in Kinesis!