Kafka and kafka connect

Kafka and Kafka Connect
How to install and Configure
6/5/2017
By Touraj Ebrahimi
Senior Java Developer and Java Architect
github: toraj58
bitbucket: toraj58
twitter: @toraj58
youtube channel: https://www.youtube.com/channel/UCcLcw6sTk_8G6EgfBr0E5uA
LinkedIn: https://www.linkedin.com/in/touraj-ebrahimi-956063118/

‫اجرای‬ ‫و‬ ‫نصب‬ ‫نحوه‬ ‫مستند‬ ‫این‬ ‫در‬kafka‫و‬Zookeeper‫سناریو‬ ‫یک‬ ‫همچنین‬ ‫و‬ ‫نماییم‬ ‫می‬ ‫بررسی‬ ‫را‬Producer&Consumer‫پیاده‬
‫ادامه‬ ‫در‬ ‫و‬ ‫دهیم‬ ‫می‬ ‫بسط‬ ‫کالستر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫خود‬ ‫سناریوی‬ ‫سپس‬ ‫و‬ ‫کرده‬ ‫سازی‬Kafka Connect‫کنیم‬ ‫می‬ ‫تست‬ ‫و‬ ‫بندی‬ ‫پیکر‬ ‫را‬
Step 1: Download the code
‫از‬ ‫پس‬‫نظر‬ ‫مورد‬ ‫ورژن‬ ‫کردن‬ ‫دانلود‬kafka‫زیر‬ ‫صورت‬ ‫به‬ ‫انرا‬untar‫لینوکس‬ ‫در‬ ‫البته‬ ‫نماییم‬ ‫می‬:
> tar -xzf kafka_desiredversion.tgz
> cd kafka_desiredversion
Step 2: Start the server
kafka‫از‬Zookeeper‫کند‬ ‫می‬ ‫استفاده‬.‫سرور‬ ‫یک‬ ‫داریم‬ ‫احتیاج‬ ‫ابتدا‬ ‫ما‬ ‫بنابراین‬Zookeeper‫کنیم‬ ‫اندازی‬ ‫راه‬.‫نود‬ ‫یک‬ ‫اندازی‬ ‫راه‬ ‫برای‬
zookeeper‫کرد‬ ‫استفاده‬ ‫پکیج‬ ‫در‬ ‫که‬ ‫موجود‬ ‫اسکریپت‬ ‫از‬ ‫توانیم‬ ‫می‬.
> bin/zookeeper-server-start.sh config/zookeeper.properties
[2017-05-01 14:01:37,495] INFO Reading configuration from:
config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
‫فایل‬ ‫محتویات‬zookeeper.properties‫فرض‬ ‫پیش‬ ‫پورت‬ ‫کنید‬ ‫می‬ ‫مشاهده‬ ‫که‬ ‫همانطور‬ ،‫باشد‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬1212‫می‬
‫باشد‬:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-
production config
maxClientCnxns=0
‫سپس‬kafka‫نماییم‬ ‫می‬ ‫اندازی‬ ‫راه‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫سرور‬:
> bin/kafka-server-start.sh config/server.properties
[2017-05-01 15:01:47,028] INFO Verifying properties
(kafka.utils.VerifiableProperties)
[2017-05-01 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to
1048576 (kafka.utils.VerifiableProperties)

‫فایل‬ ‫محتویات‬server.properties‫فرض‬ ‫پیش‬ ‫پورت‬ ‫کنید‬ ‫می‬ ‫مشاهده‬ ‫که‬ ‫همانطور‬ ،‫باشد‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬0901‫باشد‬ ‫می‬:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# Switch to enable topic deletion or not, default value is false
#delete.topic.enable=true
############################# Socket Server Settings
#############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not
set,
# it uses the value for "listeners" if configured. Otherwise, it will use the
value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the
same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_P
LAINTEXT,SASL_SSL:SASL_SSL
# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection
against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/kafka/data/1
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup
and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs
located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only
fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to
disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the
flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a
small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after
a period of time or
# every N messages (or both). This can be done globally and overridden on a per-
topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush

#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy
can
# be set to delete segments after a period of time, or after a given size has
accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion
always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as
long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of
log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log
segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted
according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=172.17.0.3:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
host.name=172.17.0.4
log4j.opts=-dlog4j.configuration=
port=9092
advertised.host.name=172.17.0.4
advertised.port=9092
Step 3: Create a topic
‫ایجاد‬ ‫برای‬topic‫نام‬ ‫با‬test‫یک‬ ‫با‬partition‫یک‬ ‫و‬replica‫کنیم‬ ‫می‬ ‫عمل‬ ‫زیر‬ ‫صورت‬ ‫به‬:

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1
--partitions 1 --topic test
‫لیست‬ ‫دیدن‬ ‫برای‬topic‫اگر‬ ‫و‬ ‫کنیم‬ ‫می‬ ‫استفاده‬ ‫زیر‬ ‫دستور‬ ‫از‬ ‫ها‬zookeeper‫توسط‬DOCKER‫بجای‬ ‫است‬ ‫شده‬ ‫اجرا‬localhost‫باید‬
‫نام‬image‫بدهیم‬ ‫را‬ ‫شده‬ ‫اجرا‬ ‫داکر‬ ‫به‬ ‫مربوط‬
> bin/kafka-topics.sh --list --zookeeper localhost:2181
test
‫نکته‬:‫یک‬ ‫که‬ ‫زمانی‬ ‫که‬ ‫ییم‬ ‫نما‬ ‫تنظیم‬ ‫صورتی‬ ‫به‬ ‫را‬ ‫بروکرها‬ ‫توانیم‬ ‫می‬ ‫باشیم‬ ‫نکرده‬ ‫ایجاد‬ ‫دستی‬ ‫صورت‬ ‫به‬ ‫قبال‬ ‫را‬ ‫تاپیک‬ ‫اگر‬‫وجود‬ ‫که‬ ‫تاپیکی‬ ‫به‬ ‫پیغام‬
‫شود‬ ‫ایجاد‬ ‫خودکار‬ ‫صورت‬ ‫به‬ ‫تاپیک‬ ‫آنگاه‬ ‫شد‬ ‫ارسال‬ ‫ندارد‬
Step 4: Send some messages
kafka‫صورت‬ ‫به‬ ‫کالینت‬ ‫یک‬ ‫دارای‬Command Line‫یا‬ ‫فایل‬ ‫یک‬ ‫از‬ ‫که‬ ‫باشد‬ ‫می‬Standard Input‫در‬ ‫را‬ ‫آنها‬ ‫و‬ ‫گیرد‬ ‫می‬ ‫را‬ ‫ها‬ ‫داده‬
‫کالستر‬ ‫به‬ ‫پیام‬ ‫قالب‬Kafka‫کند‬ ‫می‬ ‫ارسال‬.‫شود‬ ‫می‬ ‫ارسال‬ ‫جداگانه‬ ‫پیام‬ ‫عنوان‬ ‫به‬ ‫خط‬ ‫هر‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬.
‫که‬ ‫کافیست‬Producer‫زیر‬ ‫صورت‬ ‫به‬ ‫یید‬ ‫نما‬ ‫ارسال‬ ‫سرور‬ ‫به‬ ‫را‬ ‫آنها‬ ‫تا‬ ‫کنید‬ ‫تایپ‬ ‫کنسول‬ ‫در‬ ‫پیام‬ ‫چند‬ ‫و‬ ‫کنید‬ ‫اجرا‬ ‫را‬:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message
Wolfenstein was here
Step 5: Start a consumer
kafka‫یک‬ ‫دارای‬command line consumer‫در‬ ‫را‬ ‫پیامها‬ ‫که‬ ‫باشد‬ ‫می‬standard output‫آنرا‬ ‫توان‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫که‬ ‫میکند‬ ‫چاپ‬
‫کرد‬ ‫اجرا‬:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --
from-beginning
This is a message
This is another message
Wolfenstein was here
‫اگر‬producer‫و‬Consumer‫ببینید‬ ‫دیگر‬ ‫ترمینال‬ ‫در‬ ‫را‬ ‫آنها‬ ‫پیامها‬ ‫تایپ‬ ‫هنگام‬ ‫توانید‬ ‫می‬ ‫کنید‬ ‫باز‬ ‫مجزا‬ ‫ترمینال‬ ‫دو‬ ‫در‬ ‫را‬
Step 6: Setting up a multi-broker cluster
‫پذیری‬ ‫توزیع‬ ‫و‬ ‫کالسترینگ‬ ‫قابلیتهای‬ ‫حالت‬ ‫این‬ ‫که‬ ‫کردیم‬ ‫استفاده‬ ‫بروکر‬ ‫یک‬ ‫از‬ ‫فقط‬ ‫ما‬ ‫باال‬ ‫مثالهای‬ ‫در‬kafka‫از‬ ‫بهتر‬ ‫درک‬ ‫برای‬ ‫دهد‬ ‫نمی‬ ‫نشان‬ ‫را‬
‫عملکرد‬kafka‫با‬ ‫کالستر‬ ‫یک‬ ‫ما‬3‫عدد‬node‫یک‬ ‫در‬ ‫را‬ ‫اینها‬ ‫همه‬ ‫و‬ ‫کنیم‬ ‫می‬ ‫ایجاد‬local machine‫راحتی‬ ‫به‬ ‫بتوانیم‬ ‫تا‬ ‫دهیم‬ ‫می‬ ‫انجام‬
‫نماییم‬ ‫تست‬.‫فایل‬ ‫یک‬ ‫ابتدا‬ ‫اینکار‬ ‫برای‬config‫نماییم‬ ‫می‬ ‫ایجاد‬ ‫بروکر‬ ‫هر‬ ‫برای‬.
> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties
‫را‬ ‫باال‬ ‫در‬ ‫کرده‬ ‫کپی‬ ‫فایلهای‬ ‫سپس‬‫دهید‬ ‫تغییر‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫خود‬ ‫دلخواه‬ ‫ادیتور‬ ‫با‬:
config/server-1.properties:
broker.id=1

listeners=PLAINTEXT://:9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dir=/tmp/kafka-logs-2
broker.id‫باشد‬ ‫کالستر‬ ‫در‬ ‫نود‬ ‫هر‬ ‫برای‬ ‫دائمی‬ ‫نام‬ ‫و‬ ‫باشد‬ ‫یکتا‬ ‫باید‬.‫را‬ ‫دایرکتوری‬ ‫الگ‬ ‫و‬ ‫پورت‬ ‫باید‬ ‫کنیم‬ ‫می‬ ‫ایجاد‬ ‫ماشین‬ ‫یک‬ ‫در‬ ‫را‬ ‫کالستر‬ ‫ما‬ ‫اگر‬
‫نبرند‬ ‫بین‬ ‫از‬ ‫را‬ ‫همدیگر‬ ‫دیتای‬ ‫و‬ ‫نکنند‬ ‫رجیستر‬ ‫پورت‬ ‫یک‬ ‫در‬ ‫بروکرها‬ ‫هم‬ ‫تا‬ ‫کنیم‬ ‫تنظیم‬ ‫باال‬ ‫کانفیگ‬ ‫در‬.
‫قبال‬ ‫چون‬Zookeeper‫نماییم‬ ‫ایجاد‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫جدید‬ ‫نود‬ ‫دو‬ ‫داریم‬ ‫نیاز‬ ‫فقط‬ ‫ایم‬ ‫کرده‬ ‫اجرا‬ ‫نود‬ ‫یک‬ ‫و‬:
> bin/kafka-server-start.sh config/server-1.properties &
...
> bin/kafka-server-start.sh config/server-2.properties &
...
‫با‬ ‫تاپیک‬ ‫یک‬ ‫حاال‬Replication Factor‫برابر‬3‫کنیم‬ ‫می‬ ‫ایجاد‬‫چون‬3‫ایم‬ ‫کرده‬ ‫اجرا‬ ‫بروکر‬ ‫تا‬:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3
--partitions 1 --topic my-replicated-topic
‫دستور‬ ‫از‬ ‫توان‬ ‫می‬ ‫دهد‬ ‫می‬ ‫انجام‬ ‫چکاری‬ ‫بروکر‬ ‫هر‬ ‫بدانیم‬ ‫اینکه‬ ‫برای‬describe topics‫کرد‬ ‫استفاده‬:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-
replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
‫از‬ ‫ای‬ ‫خالصه‬ ‫خط‬ ‫اولین‬‫است‬ ‫پارتیشنها‬ ‫تمام‬.‫است‬ ‫پارتیشن‬ ‫هر‬ ‫مورد‬ ‫در‬ ‫دیگراطالعات‬ ‫خطوط‬.‫تاپیک‬ ‫برای‬ ‫پارتیشن‬ ‫یک‬ ‫فقط‬ ‫ما‬ ‫چون‬my-
replicated-topic‫است‬ ‫شده‬ ‫چاپ‬ ‫خط‬ ‫یک‬ ‫فقط‬ ‫باال‬ ‫در‬ ‫بنابراین‬ ‫داریم‬
‫است‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فیلد‬ ‫هر‬ ‫توضیحات‬:
 "leader" is the node responsible for all reads and writes for the given partition. Each node will
be the leader for a randomly selected portion of the partitions.
 "replicas" is the list of nodes that replicate the log for this partition regardless of whether they
are the leader or even if they are currently alive.
 "isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive
and caught-up to the leader.
‫نکته‬:‫نود‬ ‫ما‬ ‫مثال‬ ‫در‬2‫است‬ ‫موجود‬ ‫پارتیشن‬ ‫تنها‬ ‫برای‬ ‫لیدر‬.
‫تاپیک‬ ‫برای‬test‫فرمان‬ ‫متوانیم‬ ‫هم‬--describe‫نما‬ ‫اجرا‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬‫نماییم‬ ‫بررسی‬ ‫را‬ ‫خروجی‬ ‫و‬ ‫ییم‬:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0

‫تاپیک‬ ‫که‬ ‫دهد‬ ‫می‬ ‫نشان‬ ‫باال‬ ‫اطالعات‬test‫دارای‬replica‫سرور‬ ‫در‬ ‫و‬ ‫نیست‬9‫تاپیک‬ ‫ایجاد‬ ‫زمان‬ ‫در‬ ‫کالستر‬ ‫در‬ ‫موجود‬ ‫نود‬ ‫تنها‬ ‫است‬
‫نماییم‬ ‫می‬ ‫ارسال‬ ‫جدید‬ ‫تاپیک‬ ‫به‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫پیام‬ ‫چند‬ ‫حاال‬:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-
replicated-topic
my test message 1
my test message 2
^C
‫نماییم‬ ‫می‬ ‫مصرف‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫شده‬ ‫ارسال‬ ‫های‬ ‫پیام‬ ‫حاال‬:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-
beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C
Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's kill it:
> ps aux | grep server-1.properties
7564 ttys002 0:15.91
/System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
> kill -9 7564
Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica set:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 2 Replicas: 1,2,0 Isr: 2,0
But the messages are still available for consumption even though the leader that took the writes
originally is down:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-
replicated-topic
...
my test message 1
my test message 2
^C
Step 7: Use Kafka Connect to import/export data
Writing data from the console and writing it back to the console is a convenient place to start, but
you'll probably want to use data from other sources or export data from Kafka to other systems. For
many systems, instead of writing custom integration code you can use Kafka Connect to import or
export data.
Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is an
extensible tool that runs connectors, which implement the custom logic for interacting with an

external system. In this quickstart we'll see how to run Kafka Connect with simple connectors that
import data from a file to a Kafka topic and export data from a Kafka topic to a file.
First, we'll start by creating some seed data to test with:
> echo -e "foonbar" > test.txt
Next, we'll start two connectors running in standalone mode, which means they run in a single, local,
dedicated process. We provide three configuration files as parameters. The first is always the
configuration for the Kafka Connect process, containing common configuration such as the Kafka
brokers to connect to and the serialization format for data. The remaining configuration files each
specify a connector to create. These files include a unique connector name, the connector class to
instantiate, and any other configuration required by the connector.
> bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-
source.properties config/connect-file-sink.properties
These sample configuration files, included with Kafka, use the default local cluster configuration you
started earlier and create two connectors: the first is a source connector that reads lines from an
input file and produces each to a Kafka topic and the second is a sink connector that reads
messages from a Kafka topic and produces each as a line in an output file.
During startup you'll see a number of log messages, including some indicating that the connectors
are being instantiated. Once the Kafka Connect process has started, the source connector should
start reading lines from test.txt and producing them to the topic connect-test, and the sink connector
should start reading messages from the topic connect-test and write them to the file test.sink.txt. We
can verify the data has been delivered through the entire pipeline by examining the contents of the
output file:
> cat test.sink.txt
foo
bar
Note that the data is being stored in the Kafka topic connect-test, so we can also run a console
consumer to see the data in the topic (or use custom consumer code to process it):
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-
beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
...
The connectors continue to process data, so we can add data to the file and see it move through the
pipeline:
> echo "Another line" >> test.txt
You should see the line appear in the console consumer output and in the sink file.
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other
systems. It makes it simple to quickly define connectors that move large collections of data into and
out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application
servers into Kafka topics, making the data available for stream processing with low latency. An
export job can deliver data from Kafka topics into secondary storage and query systems or into
batch systems for offline analysis. Kafka Connect features include:

 A common framework for Kafka connectors - Kafka Connect standardizes integration of
other data systems with Kafka, simplifying connector development, deployment, and
management
 Distributed and standalone modes - scale up to a large, centrally managed service
supporting an entire organization or scale down to development, testing, and small
production deployments
 REST interface - submit and manage connectors to your Kafka Connect cluster via an easy
to use REST API
 Automatic offset management - with just a little information from connectors, Kafka
Connect can manage the offset commit process automatically so connector developers do
not need to worry about this error prone part of connector development
 Distributed and scalable by default - Kafka Connect builds on the existing
 Streaming/batch integration - leveraging Kafka's existing capabilities, Kafka Connect is an
ideal solution for bridging streaming and batch data systems
‫کردن‬ ‫اجرا‬kafka‫صورت‬ ‫به‬Distributed:
> bin/connect-distributed.sh config/connect-distributed.properties
The difference is in the class which is started and the configuration parameters which change how
the Kafka Connect process decides where to store configurations, how to assign work, and where to
store offsets. In particular, the following configuration parameters are critical to set before starting
your cluster:
 group.id (default connect-cluster) - unique name for the cluster, used in forming the Connect
cluster group; note that this must not conflict with consumer group IDs
 config.storage.topic (default connect-configs) - topic to use for storing connector and task
configurations; note that this should be a single partition, highly replicated topic
 offset.storage.topic (default connect-offsets) - topic to use for ; this topic should have many
partitions and be replicated
Note that in distributed mode the connector configurations are not passed on the command line.
Instead, use the REST API described below to create, modify, and destroy connectors.
Connector configurations are simple key-value mappings. For standalone mode these are defined in
a properties file and passed to the Connect process on the command line. In distributed mode, they
will be included in the JSON payload for the request that creates (or modifies) the connector. Most
configurations are connector dependent, so they can't be outlined here. However, there are a few
common options:
 name - Unique name for the connector. Attempting to register again with the same name will
fail.
 connector.class - The Java class for the connector
 tasks.max - The maximum number of tasks that should be created for this connector. The
connector may create fewer tasks if it cannot achieve this level of parallelism.
Sink connectors also have one additional option to control their input:
 topics - A list of topics to use as input for this connector
Since Kafka Connect is intended to be run as a service, it also supports a REST API for managing
connectors. By default this service runs on port 8083. The following are the currently supported
endpoints:

 GET /connectors - return a list of active connectors
 POST /connectors - create a new connector; the request body should be a JSON object
containing a string name field and a object config field with the connector configuration
parameters
 GET /connectors/{name} - get information about a specific connector
 GET /connectors/{name}/config - get the configuration parameters for a specific connector
 PUT /connectors/{name}/config - update the configuration parameters for a specific
connector
 GET /connectors/{name}/tasks - get a list of tasks currently running for a connector
 DELETE /connectors/{name} - delete a connector, halting all tasks and deleting its
configuration

Kafka and kafka connect

More Related Content

What's hot (20)

Similar to Kafka and kafka connect (20)

Recently uploaded (20)

Kafka and kafka connect