SlideShare a Scribd company logo
Kafka and Kafka Connect
How to install and Configure
6/5/2017
By Touraj Ebrahimi
Senior Java Developer and Java Architect
github: toraj58
bitbucket: toraj58
twitter: @toraj58
youtube channel: https://www.youtube.com/channel/UCcLcw6sTk_8G6EgfBr0E5uA
LinkedIn: https://www.linkedin.com/in/touraj-ebrahimi-956063118/
‫اجرای‬ ‫و‬ ‫نصب‬ ‫نحوه‬ ‫مستند‬ ‫این‬ ‫در‬kafka‫و‬Zookeeper‫سناریو‬ ‫یک‬ ‫همچنین‬ ‫و‬ ‫نماییم‬ ‫می‬ ‫بررسی‬ ‫را‬Producer&Consumer‫پیاده‬
‫ادامه‬ ‫در‬ ‫و‬ ‫دهیم‬ ‫می‬ ‫بسط‬ ‫کالستر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫خود‬ ‫سناریوی‬ ‫سپس‬ ‫و‬ ‫کرده‬ ‫سازی‬Kafka Connect‫کنیم‬ ‫می‬ ‫تست‬ ‫و‬ ‫بندی‬ ‫پیکر‬ ‫را‬
Step 1: Download the code
‫از‬ ‫پس‬‫نظر‬ ‫مورد‬ ‫ورژن‬ ‫کردن‬ ‫دانلود‬kafka‫زیر‬ ‫صورت‬ ‫به‬ ‫انرا‬untar‫لینوکس‬ ‫در‬ ‫البته‬ ‫نماییم‬ ‫می‬:
> tar -xzf kafka_desiredversion.tgz
> cd kafka_desiredversion
Step 2: Start the server
kafka‫از‬Zookeeper‫کند‬ ‫می‬ ‫استفاده‬.‫سرور‬ ‫یک‬ ‫داریم‬ ‫احتیاج‬ ‫ابتدا‬ ‫ما‬ ‫بنابراین‬Zookeeper‫کنیم‬ ‫اندازی‬ ‫راه‬.‫نود‬ ‫یک‬ ‫اندازی‬ ‫راه‬ ‫برای‬
zookeeper‫کرد‬ ‫استفاده‬ ‫پکیج‬ ‫در‬ ‫که‬ ‫موجود‬ ‫اسکریپت‬ ‫از‬ ‫توانیم‬ ‫می‬.
> bin/zookeeper-server-start.sh config/zookeeper.properties
[2017-05-01 14:01:37,495] INFO Reading configuration from:
config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
‫فایل‬ ‫محتویات‬zookeeper.properties‫فرض‬ ‫پیش‬ ‫پورت‬ ‫کنید‬ ‫می‬ ‫مشاهده‬ ‫که‬ ‫همانطور‬ ،‫باشد‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬1212‫می‬
‫باشد‬:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-
production config
maxClientCnxns=0
‫سپس‬kafka‫نماییم‬ ‫می‬ ‫اندازی‬ ‫راه‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫سرور‬:
> bin/kafka-server-start.sh config/server.properties
[2017-05-01 15:01:47,028] INFO Verifying properties
(kafka.utils.VerifiableProperties)
[2017-05-01 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to
1048576 (kafka.utils.VerifiableProperties)
‫فایل‬ ‫محتویات‬server.properties‫فرض‬ ‫پیش‬ ‫پورت‬ ‫کنید‬ ‫می‬ ‫مشاهده‬ ‫که‬ ‫همانطور‬ ،‫باشد‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬0901‫باشد‬ ‫می‬:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# Switch to enable topic deletion or not, default value is false
#delete.topic.enable=true
############################# Socket Server Settings
#############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not
set,
# it uses the value for "listeners" if configured. Otherwise, it will use the
value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the
same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_P
LAINTEXT,SASL_SSL:SASL_SSL
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection
against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/kafka/data/1
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup
and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs
located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only
fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to
disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the
flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a
small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after
a period of time or
# every N messages (or both). This can be done globally and overridden on a per-
topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy
can
# be set to delete segments after a period of time, or after a given size has
accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion
always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as
long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of
log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log
segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted
according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=172.17.0.3:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
host.name=172.17.0.4
log4j.opts=-dlog4j.configuration=
port=9092
advertised.host.name=172.17.0.4
advertised.port=9092
Step 3: Create a topic
‫ایجاد‬ ‫برای‬topic‫نام‬ ‫با‬test‫یک‬ ‫با‬partition‫یک‬ ‫و‬replica‫کنیم‬ ‫می‬ ‫عمل‬ ‫زیر‬ ‫صورت‬ ‫به‬:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1
--partitions 1 --topic test
‫لیست‬ ‫دیدن‬ ‫برای‬topic‫اگر‬ ‫و‬ ‫کنیم‬ ‫می‬ ‫استفاده‬ ‫زیر‬ ‫دستور‬ ‫از‬ ‫ها‬zookeeper‫توسط‬DOCKER‫بجای‬ ‫است‬ ‫شده‬ ‫اجرا‬localhost‫باید‬
‫نام‬image‫بدهیم‬ ‫را‬ ‫شده‬ ‫اجرا‬ ‫داکر‬ ‫به‬ ‫مربوط‬
> bin/kafka-topics.sh --list --zookeeper localhost:2181
test
‫نکته‬:‫یک‬ ‫که‬ ‫زمانی‬ ‫که‬ ‫ییم‬ ‫نما‬ ‫تنظیم‬ ‫صورتی‬ ‫به‬ ‫را‬ ‫بروکرها‬ ‫توانیم‬ ‫می‬ ‫باشیم‬ ‫نکرده‬ ‫ایجاد‬ ‫دستی‬ ‫صورت‬ ‫به‬ ‫قبال‬ ‫را‬ ‫تاپیک‬ ‫اگر‬‫وجود‬ ‫که‬ ‫تاپیکی‬ ‫به‬ ‫پیغام‬
‫شود‬ ‫ایجاد‬ ‫خودکار‬ ‫صورت‬ ‫به‬ ‫تاپیک‬ ‫آنگاه‬ ‫شد‬ ‫ارسال‬ ‫ندارد‬
Step 4: Send some messages
kafka‫صورت‬ ‫به‬ ‫کالینت‬ ‫یک‬ ‫دارای‬Command Line‫یا‬ ‫فایل‬ ‫یک‬ ‫از‬ ‫که‬ ‫باشد‬ ‫می‬Standard Input‫در‬ ‫را‬ ‫آنها‬ ‫و‬ ‫گیرد‬ ‫می‬ ‫را‬ ‫ها‬ ‫داده‬
‫کالستر‬ ‫به‬ ‫پیام‬ ‫قالب‬Kafka‫کند‬ ‫می‬ ‫ارسال‬.‫شود‬ ‫می‬ ‫ارسال‬ ‫جداگانه‬ ‫پیام‬ ‫عنوان‬ ‫به‬ ‫خط‬ ‫هر‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬.
‫که‬ ‫کافیست‬Producer‫زیر‬ ‫صورت‬ ‫به‬ ‫یید‬ ‫نما‬ ‫ارسال‬ ‫سرور‬ ‫به‬ ‫را‬ ‫آنها‬ ‫تا‬ ‫کنید‬ ‫تایپ‬ ‫کنسول‬ ‫در‬ ‫پیام‬ ‫چند‬ ‫و‬ ‫کنید‬ ‫اجرا‬ ‫را‬:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message
Wolfenstein was here
Step 5: Start a consumer
kafka‫یک‬ ‫دارای‬command line consumer‫در‬ ‫را‬ ‫پیامها‬ ‫که‬ ‫باشد‬ ‫می‬standard output‫آنرا‬ ‫توان‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫که‬ ‫میکند‬ ‫چاپ‬
‫کرد‬ ‫اجرا‬:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --
from-beginning
This is a message
This is another message
Wolfenstein was here
‫اگر‬producer‫و‬Consumer‫ببینید‬ ‫دیگر‬ ‫ترمینال‬ ‫در‬ ‫را‬ ‫آنها‬ ‫پیامها‬ ‫تایپ‬ ‫هنگام‬ ‫توانید‬ ‫می‬ ‫کنید‬ ‫باز‬ ‫مجزا‬ ‫ترمینال‬ ‫دو‬ ‫در‬ ‫را‬
Step 6: Setting up a multi-broker cluster
‫پذیری‬ ‫توزیع‬ ‫و‬ ‫کالسترینگ‬ ‫قابلیتهای‬ ‫حالت‬ ‫این‬ ‫که‬ ‫کردیم‬ ‫استفاده‬ ‫بروکر‬ ‫یک‬ ‫از‬ ‫فقط‬ ‫ما‬ ‫باال‬ ‫مثالهای‬ ‫در‬kafka‫از‬ ‫بهتر‬ ‫درک‬ ‫برای‬ ‫دهد‬ ‫نمی‬ ‫نشان‬ ‫را‬
‫عملکرد‬kafka‫با‬ ‫کالستر‬ ‫یک‬ ‫ما‬3‫عدد‬node‫یک‬ ‫در‬ ‫را‬ ‫اینها‬ ‫همه‬ ‫و‬ ‫کنیم‬ ‫می‬ ‫ایجاد‬local machine‫راحتی‬ ‫به‬ ‫بتوانیم‬ ‫تا‬ ‫دهیم‬ ‫می‬ ‫انجام‬
‫نماییم‬ ‫تست‬.‫فایل‬ ‫یک‬ ‫ابتدا‬ ‫اینکار‬ ‫برای‬config‫نماییم‬ ‫می‬ ‫ایجاد‬ ‫بروکر‬ ‫هر‬ ‫برای‬.
> cp config/server.properties config/server-1.properties
> cp config/server.properties config/server-2.properties
‫را‬ ‫باال‬ ‫در‬ ‫کرده‬ ‫کپی‬ ‫فایلهای‬ ‫سپس‬‫دهید‬ ‫تغییر‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫خود‬ ‫دلخواه‬ ‫ادیتور‬ ‫با‬:
config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dir=/tmp/kafka-logs-2
broker.id‫باشد‬ ‫کالستر‬ ‫در‬ ‫نود‬ ‫هر‬ ‫برای‬ ‫دائمی‬ ‫نام‬ ‫و‬ ‫باشد‬ ‫یکتا‬ ‫باید‬.‫را‬ ‫دایرکتوری‬ ‫الگ‬ ‫و‬ ‫پورت‬ ‫باید‬ ‫کنیم‬ ‫می‬ ‫ایجاد‬ ‫ماشین‬ ‫یک‬ ‫در‬ ‫را‬ ‫کالستر‬ ‫ما‬ ‫اگر‬
‫نبرند‬ ‫بین‬ ‫از‬ ‫را‬ ‫همدیگر‬ ‫دیتای‬ ‫و‬ ‫نکنند‬ ‫رجیستر‬ ‫پورت‬ ‫یک‬ ‫در‬ ‫بروکرها‬ ‫هم‬ ‫تا‬ ‫کنیم‬ ‫تنظیم‬ ‫باال‬ ‫کانفیگ‬ ‫در‬.
‫قبال‬ ‫چون‬Zookeeper‫نماییم‬ ‫ایجاد‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫جدید‬ ‫نود‬ ‫دو‬ ‫داریم‬ ‫نیاز‬ ‫فقط‬ ‫ایم‬ ‫کرده‬ ‫اجرا‬ ‫نود‬ ‫یک‬ ‫و‬:
> bin/kafka-server-start.sh config/server-1.properties &
...
> bin/kafka-server-start.sh config/server-2.properties &
...
‫با‬ ‫تاپیک‬ ‫یک‬ ‫حاال‬Replication Factor‫برابر‬3‫کنیم‬ ‫می‬ ‫ایجاد‬‫چون‬3‫ایم‬ ‫کرده‬ ‫اجرا‬ ‫بروکر‬ ‫تا‬:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3
--partitions 1 --topic my-replicated-topic
‫دستور‬ ‫از‬ ‫توان‬ ‫می‬ ‫دهد‬ ‫می‬ ‫انجام‬ ‫چکاری‬ ‫بروکر‬ ‫هر‬ ‫بدانیم‬ ‫اینکه‬ ‫برای‬describe topics‫کرد‬ ‫استفاده‬:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-
replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
‫از‬ ‫ای‬ ‫خالصه‬ ‫خط‬ ‫اولین‬‫است‬ ‫پارتیشنها‬ ‫تمام‬.‫است‬ ‫پارتیشن‬ ‫هر‬ ‫مورد‬ ‫در‬ ‫دیگراطالعات‬ ‫خطوط‬.‫تاپیک‬ ‫برای‬ ‫پارتیشن‬ ‫یک‬ ‫فقط‬ ‫ما‬ ‫چون‬my-
replicated-topic‫است‬ ‫شده‬ ‫چاپ‬ ‫خط‬ ‫یک‬ ‫فقط‬ ‫باال‬ ‫در‬ ‫بنابراین‬ ‫داریم‬
‫است‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فیلد‬ ‫هر‬ ‫توضیحات‬:
 "leader" is the node responsible for all reads and writes for the given partition. Each node will
be the leader for a randomly selected portion of the partitions.
 "replicas" is the list of nodes that replicate the log for this partition regardless of whether they
are the leader or even if they are currently alive.
 "isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive
and caught-up to the leader.
‫نکته‬:‫نود‬ ‫ما‬ ‫مثال‬ ‫در‬2‫است‬ ‫موجود‬ ‫پارتیشن‬ ‫تنها‬ ‫برای‬ ‫لیدر‬.
‫تاپیک‬ ‫برای‬test‫فرمان‬ ‫متوانیم‬ ‫هم‬--describe‫نما‬ ‫اجرا‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬‫نماییم‬ ‫بررسی‬ ‫را‬ ‫خروجی‬ ‫و‬ ‫ییم‬:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
‫تاپیک‬ ‫که‬ ‫دهد‬ ‫می‬ ‫نشان‬ ‫باال‬ ‫اطالعات‬test‫دارای‬replica‫سرور‬ ‫در‬ ‫و‬ ‫نیست‬9‫تاپیک‬ ‫ایجاد‬ ‫زمان‬ ‫در‬ ‫کالستر‬ ‫در‬ ‫موجود‬ ‫نود‬ ‫تنها‬ ‫است‬
‫نماییم‬ ‫می‬ ‫ارسال‬ ‫جدید‬ ‫تاپیک‬ ‫به‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫پیام‬ ‫چند‬ ‫حاال‬:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-
replicated-topic
my test message 1
my test message 2
^C
‫نماییم‬ ‫می‬ ‫مصرف‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫شده‬ ‫ارسال‬ ‫های‬ ‫پیام‬ ‫حاال‬:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-
beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C
Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's kill it:
> ps aux | grep server-1.properties
7564 ttys002 0:15.91
/System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
> kill -9 7564
Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica set:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 2 Replicas: 1,2,0 Isr: 2,0
But the messages are still available for consumption even though the leader that took the writes
originally is down:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-
replicated-topic
...
my test message 1
my test message 2
^C
Step 7: Use Kafka Connect to import/export data
Writing data from the console and writing it back to the console is a convenient place to start, but
you'll probably want to use data from other sources or export data from Kafka to other systems. For
many systems, instead of writing custom integration code you can use Kafka Connect to import or
export data.
Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is an
extensible tool that runs connectors, which implement the custom logic for interacting with an
external system. In this quickstart we'll see how to run Kafka Connect with simple connectors that
import data from a file to a Kafka topic and export data from a Kafka topic to a file.
First, we'll start by creating some seed data to test with:
> echo -e "foonbar" > test.txt
Next, we'll start two connectors running in standalone mode, which means they run in a single, local,
dedicated process. We provide three configuration files as parameters. The first is always the
configuration for the Kafka Connect process, containing common configuration such as the Kafka
brokers to connect to and the serialization format for data. The remaining configuration files each
specify a connector to create. These files include a unique connector name, the connector class to
instantiate, and any other configuration required by the connector.
> bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-
source.properties config/connect-file-sink.properties
These sample configuration files, included with Kafka, use the default local cluster configuration you
started earlier and create two connectors: the first is a source connector that reads lines from an
input file and produces each to a Kafka topic and the second is a sink connector that reads
messages from a Kafka topic and produces each as a line in an output file.
During startup you'll see a number of log messages, including some indicating that the connectors
are being instantiated. Once the Kafka Connect process has started, the source connector should
start reading lines from test.txt and producing them to the topic connect-test, and the sink connector
should start reading messages from the topic connect-test and write them to the file test.sink.txt. We
can verify the data has been delivered through the entire pipeline by examining the contents of the
output file:
> cat test.sink.txt
foo
bar
Note that the data is being stored in the Kafka topic connect-test, so we can also run a console
consumer to see the data in the topic (or use custom consumer code to process it):
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-
beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
...
The connectors continue to process data, so we can add data to the file and see it move through the
pipeline:
> echo "Another line" >> test.txt
You should see the line appear in the console consumer output and in the sink file.
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other
systems. It makes it simple to quickly define connectors that move large collections of data into and
out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application
servers into Kafka topics, making the data available for stream processing with low latency. An
export job can deliver data from Kafka topics into secondary storage and query systems or into
batch systems for offline analysis. Kafka Connect features include:
 A common framework for Kafka connectors - Kafka Connect standardizes integration of
other data systems with Kafka, simplifying connector development, deployment, and
management
 Distributed and standalone modes - scale up to a large, centrally managed service
supporting an entire organization or scale down to development, testing, and small
production deployments
 REST interface - submit and manage connectors to your Kafka Connect cluster via an easy
to use REST API
 Automatic offset management - with just a little information from connectors, Kafka
Connect can manage the offset commit process automatically so connector developers do
not need to worry about this error prone part of connector development
 Distributed and scalable by default - Kafka Connect builds on the existing
 Streaming/batch integration - leveraging Kafka's existing capabilities, Kafka Connect is an
ideal solution for bridging streaming and batch data systems
‫کردن‬ ‫اجرا‬kafka‫صورت‬ ‫به‬Distributed:
> bin/connect-distributed.sh config/connect-distributed.properties
The difference is in the class which is started and the configuration parameters which change how
the Kafka Connect process decides where to store configurations, how to assign work, and where to
store offsets. In particular, the following configuration parameters are critical to set before starting
your cluster:
 group.id (default connect-cluster) - unique name for the cluster, used in forming the Connect
cluster group; note that this must not conflict with consumer group IDs
 config.storage.topic (default connect-configs) - topic to use for storing connector and task
configurations; note that this should be a single partition, highly replicated topic
 offset.storage.topic (default connect-offsets) - topic to use for ; this topic should have many
partitions and be replicated
Note that in distributed mode the connector configurations are not passed on the command line.
Instead, use the REST API described below to create, modify, and destroy connectors.
Connector configurations are simple key-value mappings. For standalone mode these are defined in
a properties file and passed to the Connect process on the command line. In distributed mode, they
will be included in the JSON payload for the request that creates (or modifies) the connector. Most
configurations are connector dependent, so they can't be outlined here. However, there are a few
common options:
 name - Unique name for the connector. Attempting to register again with the same name will
fail.
 connector.class - The Java class for the connector
 tasks.max - The maximum number of tasks that should be created for this connector. The
connector may create fewer tasks if it cannot achieve this level of parallelism.
Sink connectors also have one additional option to control their input:
 topics - A list of topics to use as input for this connector
Since Kafka Connect is intended to be run as a service, it also supports a REST API for managing
connectors. By default this service runs on port 8083. The following are the currently supported
endpoints:
 GET /connectors - return a list of active connectors
 POST /connectors - create a new connector; the request body should be a JSON object
containing a string name field and a object config field with the connector configuration
parameters
 GET /connectors/{name} - get information about a specific connector
 GET /connectors/{name}/config - get the configuration parameters for a specific connector
 PUT /connectors/{name}/config - update the configuration parameters for a specific
connector
 GET /connectors/{name}/tasks - get a list of tasks currently running for a connector
 DELETE /connectors/{name} - delete a connector, halting all tasks and deleting its
configuration

More Related Content

PDF
Microservices communication styles and event bus
PPTX
Microservice architecture
PDF
Stop reinventing the wheel with Istio by Mete Atamel (Google)
PPTX
Microservices With Istio Service Mesh
PDF
A sail in the cloud
PDF
Istio By Example (extended version)
PDF
Istio : Service Mesh
PPTX
Service Discovery and Registration in a Microservices Architecture
Microservices communication styles and event bus
Microservice architecture
Stop reinventing the wheel with Istio by Mete Atamel (Google)
Microservices With Istio Service Mesh
A sail in the cloud
Istio By Example (extended version)
Istio : Service Mesh
Service Discovery and Registration in a Microservices Architecture

What's hot (20)

PPTX
Cloud native microservices for systems and applications ieee rev2
PPTX
Connecting All Abstractions with Istio
PPTX
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud - An...
PDF
Istio Service Mesh
PDF
The elegant way of implementing microservices with istio
PDF
Microservice 4.0 Journey - From Spring NetFlix OSS to Istio Service Mesh and ...
PPT
PPTX
Container world hybridnetworking_rev2
PPTX
Microservices based VNF
PPTX
ISTIO Deep Dive
PPTX
Istio a service mesh
PPTX
COE Integration - OPNFV
PDF
Istio service mesh: past, present, future (TLV meetup)
PDF
Introduction to Istio on Kubernetes
PDF
Microservices:
 The phantom menace
. Istio Service Mesh: 
the new hope
PDF
Managing microservices with Istio Service Mesh
PDF
Istio service mesh introduction
PPTX
Microservices - Event-driven & the hidden landmines
PDF
Introduction to Istio Service Mesh
PDF
Managing Microservices With The Istio Service Mesh on Kubernetes
Cloud native microservices for systems and applications ieee rev2
Connecting All Abstractions with Istio
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud - An...
Istio Service Mesh
The elegant way of implementing microservices with istio
Microservice 4.0 Journey - From Spring NetFlix OSS to Istio Service Mesh and ...
Container world hybridnetworking_rev2
Microservices based VNF
ISTIO Deep Dive
Istio a service mesh
COE Integration - OPNFV
Istio service mesh: past, present, future (TLV meetup)
Introduction to Istio on Kubernetes
Microservices:
 The phantom menace
. Istio Service Mesh: 
the new hope
Managing microservices with Istio Service Mesh
Istio service mesh introduction
Microservices - Event-driven & the hidden landmines
Introduction to Istio Service Mesh
Managing Microservices With The Istio Service Mesh on Kubernetes
Ad

Similar to Kafka and kafka connect (20)

TXT
Squid3
TXT
Squid3
TXT
Httpd.conf
PDF
Alta disponibilidad en GNU/Linux
PPTX
Performance & Scalability Improvements in Perforce
PPTX
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
PDF
Oracle 11g R2 RAC setup on rhel 5.0
PDF
alfresco-global.properties-COMPLETO-3.4.6
PDF
Trying and evaluating the new features of GlusterFS 3.5
PDF
Making Spinnaker Go @ Stitch Fix
PDF
Pulsar Summit Asia - Running a secure pulsar cluster
PDF
From nothing to Prometheus : one year after
PDF
NGiNX, VHOSTS & SSL (let's encrypt)
PPTX
Docker Security workshop slides
PDF
[Cisco Connect 2018 - Vietnam] Anh duc le reap the benefits of sdn with cisco...
PDF
Cisco Router and Switch Security Hardening Guide
DOCX
How to install squid proxy on server or how to install squid proxy on centos o
KEY
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
PDF
Developing Realtime Data Pipelines With Apache Kafka
PDF
Computer Security
Squid3
Squid3
Httpd.conf
Alta disponibilidad en GNU/Linux
Performance & Scalability Improvements in Perforce
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Oracle 11g R2 RAC setup on rhel 5.0
alfresco-global.properties-COMPLETO-3.4.6
Trying and evaluating the new features of GlusterFS 3.5
Making Spinnaker Go @ Stitch Fix
Pulsar Summit Asia - Running a secure pulsar cluster
From nothing to Prometheus : one year after
NGiNX, VHOSTS & SSL (let's encrypt)
Docker Security workshop slides
[Cisco Connect 2018 - Vietnam] Anh duc le reap the benefits of sdn with cisco...
Cisco Router and Switch Security Hardening Guide
How to install squid proxy on server or how to install squid proxy on centos o
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
Developing Realtime Data Pipelines With Apache Kafka
Computer Security
Ad

Recently uploaded (20)

PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Transform Your Business with a Software ERP System
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
System and Network Administraation Chapter 3
PPTX
Introduction to Artificial Intelligence
PDF
medical staffing services at VALiNTRY
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
System and Network Administration Chapter 2
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
ManageIQ - Sprint 268 Review - Slide Deck
Odoo POS Development Services by CandidRoot Solutions
Transform Your Business with a Software ERP System
How to Migrate SBCGlobal Email to Yahoo Easily
Introduction Database Management System for Course Database
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Design an Analysis of Algorithms II-SECS-1021-03
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Odoo Companies in India – Driving Business Transformation.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
CHAPTER 2 - PM Management and IT Context
2025 Textile ERP Trends: SAP, Odoo & Oracle
System and Network Administraation Chapter 3
Introduction to Artificial Intelligence
medical staffing services at VALiNTRY
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
System and Network Administration Chapter 2
Which alternative to Crystal Reports is best for small or large businesses.pdf

Kafka and kafka connect

  • 1. Kafka and Kafka Connect How to install and Configure 6/5/2017 By Touraj Ebrahimi Senior Java Developer and Java Architect github: toraj58 bitbucket: toraj58 twitter: @toraj58 youtube channel: https://www.youtube.com/channel/UCcLcw6sTk_8G6EgfBr0E5uA LinkedIn: https://www.linkedin.com/in/touraj-ebrahimi-956063118/
  • 2. ‫اجرای‬ ‫و‬ ‫نصب‬ ‫نحوه‬ ‫مستند‬ ‫این‬ ‫در‬kafka‫و‬Zookeeper‫سناریو‬ ‫یک‬ ‫همچنین‬ ‫و‬ ‫نماییم‬ ‫می‬ ‫بررسی‬ ‫را‬Producer&Consumer‫پیاده‬ ‫ادامه‬ ‫در‬ ‫و‬ ‫دهیم‬ ‫می‬ ‫بسط‬ ‫کالستر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫خود‬ ‫سناریوی‬ ‫سپس‬ ‫و‬ ‫کرده‬ ‫سازی‬Kafka Connect‫کنیم‬ ‫می‬ ‫تست‬ ‫و‬ ‫بندی‬ ‫پیکر‬ ‫را‬ Step 1: Download the code ‫از‬ ‫پس‬‫نظر‬ ‫مورد‬ ‫ورژن‬ ‫کردن‬ ‫دانلود‬kafka‫زیر‬ ‫صورت‬ ‫به‬ ‫انرا‬untar‫لینوکس‬ ‫در‬ ‫البته‬ ‫نماییم‬ ‫می‬: > tar -xzf kafka_desiredversion.tgz > cd kafka_desiredversion Step 2: Start the server kafka‫از‬Zookeeper‫کند‬ ‫می‬ ‫استفاده‬.‫سرور‬ ‫یک‬ ‫داریم‬ ‫احتیاج‬ ‫ابتدا‬ ‫ما‬ ‫بنابراین‬Zookeeper‫کنیم‬ ‫اندازی‬ ‫راه‬.‫نود‬ ‫یک‬ ‫اندازی‬ ‫راه‬ ‫برای‬ zookeeper‫کرد‬ ‫استفاده‬ ‫پکیج‬ ‫در‬ ‫که‬ ‫موجود‬ ‫اسکریپت‬ ‫از‬ ‫توانیم‬ ‫می‬. > bin/zookeeper-server-start.sh config/zookeeper.properties [2017-05-01 14:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) ‫فایل‬ ‫محتویات‬zookeeper.properties‫فرض‬ ‫پیش‬ ‫پورت‬ ‫کنید‬ ‫می‬ ‫مشاهده‬ ‫که‬ ‫همانطور‬ ،‫باشد‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬1212‫می‬ ‫باشد‬: # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # the directory where the snapshot is stored. dataDir=/tmp/zookeeper # the port at which the clients will connect clientPort=2181 # disable the per-ip limit on the number of connections since this is a non- production config maxClientCnxns=0 ‫سپس‬kafka‫نماییم‬ ‫می‬ ‫اندازی‬ ‫راه‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫سرور‬: > bin/kafka-server-start.sh config/server.properties [2017-05-01 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties) [2017-05-01 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
  • 3. ‫فایل‬ ‫محتویات‬server.properties‫فرض‬ ‫پیش‬ ‫پورت‬ ‫کنید‬ ‫می‬ ‫مشاهده‬ ‫که‬ ‫همانطور‬ ،‫باشد‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬0901‫باشد‬ ‫می‬: # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # see kafka.server.KafkaConfig for additional details and defaults ############################# Server Basics ############################# # The id of the broker. This must be set to a unique integer for each broker. broker.id=1 # Switch to enable topic deletion or not, default value is false #delete.topic.enable=true ############################# Socket Server Settings ############################# # The address the socket server listens on. It will get the value returned from # java.net.InetAddress.getCanonicalHostName() if not configured. # FORMAT: # listeners = listener_name://host_name:port # EXAMPLE: # listeners = PLAINTEXT://your.host.name:9092 #listeners=PLAINTEXT://:9092 # Hostname and port the broker will advertise to producers and consumers. If not set, # it uses the value for "listeners" if configured. Otherwise, it will use the value # returned from java.net.InetAddress.getCanonicalHostName(). #advertised.listeners=PLAINTEXT://your.host.name:9092 # Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details #listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_P LAINTEXT,SASL_SSL:SASL_SSL # The number of threads handling network requests num.network.threads=3
  • 4. # The number of threads doing disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 ############################# Log Basics ############################# # A comma seperated list of directories under which to store log files log.dirs=/kafka/data/1 # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 ############################# Log Flush Policy ############################# # Messages are immediately written to the filesystem but by default we only fsync() to sync # the OS cache lazily. The following configurations control the flush of data to disk. # There are a few important trade-offs here: # 1. Durability: Unflushed data may be lost if you are not using replication. # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. # The settings below allow one to configure the flush policy to flush data after a period of time or # every N messages (or both). This can be done globally and overridden on a per- topic basis. # The number of messages to accept before forcing a flush of data to disk #log.flush.interval.messages=10000 # The maximum amount of time a message can sit in a log before we force a flush
  • 5. #log.flush.interval.ms=1000 ############################# Log Retention Policy ############################# # The following configurations control the disposal of log segments. The policy can # be set to delete segments after a period of time, or after a given size has accumulated. # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens # from the end of the log. # The minimum age of a log file to be eligible for deletion due to age log.retention.hours=168 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. Functions independently of log.retention.hours. #log.retention.bytes=1073741824 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=300000 ############################# Zookeeper ############################# # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=172.17.0.3:2181 # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000 host.name=172.17.0.4 log4j.opts=-dlog4j.configuration= port=9092 advertised.host.name=172.17.0.4 advertised.port=9092 Step 3: Create a topic ‫ایجاد‬ ‫برای‬topic‫نام‬ ‫با‬test‫یک‬ ‫با‬partition‫یک‬ ‫و‬replica‫کنیم‬ ‫می‬ ‫عمل‬ ‫زیر‬ ‫صورت‬ ‫به‬:
  • 6. > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test ‫لیست‬ ‫دیدن‬ ‫برای‬topic‫اگر‬ ‫و‬ ‫کنیم‬ ‫می‬ ‫استفاده‬ ‫زیر‬ ‫دستور‬ ‫از‬ ‫ها‬zookeeper‫توسط‬DOCKER‫بجای‬ ‫است‬ ‫شده‬ ‫اجرا‬localhost‫باید‬ ‫نام‬image‫بدهیم‬ ‫را‬ ‫شده‬ ‫اجرا‬ ‫داکر‬ ‫به‬ ‫مربوط‬ > bin/kafka-topics.sh --list --zookeeper localhost:2181 test ‫نکته‬:‫یک‬ ‫که‬ ‫زمانی‬ ‫که‬ ‫ییم‬ ‫نما‬ ‫تنظیم‬ ‫صورتی‬ ‫به‬ ‫را‬ ‫بروکرها‬ ‫توانیم‬ ‫می‬ ‫باشیم‬ ‫نکرده‬ ‫ایجاد‬ ‫دستی‬ ‫صورت‬ ‫به‬ ‫قبال‬ ‫را‬ ‫تاپیک‬ ‫اگر‬‫وجود‬ ‫که‬ ‫تاپیکی‬ ‫به‬ ‫پیغام‬ ‫شود‬ ‫ایجاد‬ ‫خودکار‬ ‫صورت‬ ‫به‬ ‫تاپیک‬ ‫آنگاه‬ ‫شد‬ ‫ارسال‬ ‫ندارد‬ Step 4: Send some messages kafka‫صورت‬ ‫به‬ ‫کالینت‬ ‫یک‬ ‫دارای‬Command Line‫یا‬ ‫فایل‬ ‫یک‬ ‫از‬ ‫که‬ ‫باشد‬ ‫می‬Standard Input‫در‬ ‫را‬ ‫آنها‬ ‫و‬ ‫گیرد‬ ‫می‬ ‫را‬ ‫ها‬ ‫داده‬ ‫کالستر‬ ‫به‬ ‫پیام‬ ‫قالب‬Kafka‫کند‬ ‫می‬ ‫ارسال‬.‫شود‬ ‫می‬ ‫ارسال‬ ‫جداگانه‬ ‫پیام‬ ‫عنوان‬ ‫به‬ ‫خط‬ ‫هر‬ ‫فرض‬ ‫پیش‬ ‫طور‬ ‫به‬. ‫که‬ ‫کافیست‬Producer‫زیر‬ ‫صورت‬ ‫به‬ ‫یید‬ ‫نما‬ ‫ارسال‬ ‫سرور‬ ‫به‬ ‫را‬ ‫آنها‬ ‫تا‬ ‫کنید‬ ‫تایپ‬ ‫کنسول‬ ‫در‬ ‫پیام‬ ‫چند‬ ‫و‬ ‫کنید‬ ‫اجرا‬ ‫را‬: > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test This is a message This is another message Wolfenstein was here Step 5: Start a consumer kafka‫یک‬ ‫دارای‬command line consumer‫در‬ ‫را‬ ‫پیامها‬ ‫که‬ ‫باشد‬ ‫می‬standard output‫آنرا‬ ‫توان‬ ‫می‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫که‬ ‫میکند‬ ‫چاپ‬ ‫کرد‬ ‫اجرا‬: > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test -- from-beginning This is a message This is another message Wolfenstein was here ‫اگر‬producer‫و‬Consumer‫ببینید‬ ‫دیگر‬ ‫ترمینال‬ ‫در‬ ‫را‬ ‫آنها‬ ‫پیامها‬ ‫تایپ‬ ‫هنگام‬ ‫توانید‬ ‫می‬ ‫کنید‬ ‫باز‬ ‫مجزا‬ ‫ترمینال‬ ‫دو‬ ‫در‬ ‫را‬ Step 6: Setting up a multi-broker cluster ‫پذیری‬ ‫توزیع‬ ‫و‬ ‫کالسترینگ‬ ‫قابلیتهای‬ ‫حالت‬ ‫این‬ ‫که‬ ‫کردیم‬ ‫استفاده‬ ‫بروکر‬ ‫یک‬ ‫از‬ ‫فقط‬ ‫ما‬ ‫باال‬ ‫مثالهای‬ ‫در‬kafka‫از‬ ‫بهتر‬ ‫درک‬ ‫برای‬ ‫دهد‬ ‫نمی‬ ‫نشان‬ ‫را‬ ‫عملکرد‬kafka‫با‬ ‫کالستر‬ ‫یک‬ ‫ما‬3‫عدد‬node‫یک‬ ‫در‬ ‫را‬ ‫اینها‬ ‫همه‬ ‫و‬ ‫کنیم‬ ‫می‬ ‫ایجاد‬local machine‫راحتی‬ ‫به‬ ‫بتوانیم‬ ‫تا‬ ‫دهیم‬ ‫می‬ ‫انجام‬ ‫نماییم‬ ‫تست‬.‫فایل‬ ‫یک‬ ‫ابتدا‬ ‫اینکار‬ ‫برای‬config‫نماییم‬ ‫می‬ ‫ایجاد‬ ‫بروکر‬ ‫هر‬ ‫برای‬. > cp config/server.properties config/server-1.properties > cp config/server.properties config/server-2.properties ‫را‬ ‫باال‬ ‫در‬ ‫کرده‬ ‫کپی‬ ‫فایلهای‬ ‫سپس‬‫دهید‬ ‫تغییر‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫خود‬ ‫دلخواه‬ ‫ادیتور‬ ‫با‬: config/server-1.properties: broker.id=1
  • 7. listeners=PLAINTEXT://:9093 log.dir=/tmp/kafka-logs-1 config/server-2.properties: broker.id=2 listeners=PLAINTEXT://:9094 log.dir=/tmp/kafka-logs-2 broker.id‫باشد‬ ‫کالستر‬ ‫در‬ ‫نود‬ ‫هر‬ ‫برای‬ ‫دائمی‬ ‫نام‬ ‫و‬ ‫باشد‬ ‫یکتا‬ ‫باید‬.‫را‬ ‫دایرکتوری‬ ‫الگ‬ ‫و‬ ‫پورت‬ ‫باید‬ ‫کنیم‬ ‫می‬ ‫ایجاد‬ ‫ماشین‬ ‫یک‬ ‫در‬ ‫را‬ ‫کالستر‬ ‫ما‬ ‫اگر‬ ‫نبرند‬ ‫بین‬ ‫از‬ ‫را‬ ‫همدیگر‬ ‫دیتای‬ ‫و‬ ‫نکنند‬ ‫رجیستر‬ ‫پورت‬ ‫یک‬ ‫در‬ ‫بروکرها‬ ‫هم‬ ‫تا‬ ‫کنیم‬ ‫تنظیم‬ ‫باال‬ ‫کانفیگ‬ ‫در‬. ‫قبال‬ ‫چون‬Zookeeper‫نماییم‬ ‫ایجاد‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫جدید‬ ‫نود‬ ‫دو‬ ‫داریم‬ ‫نیاز‬ ‫فقط‬ ‫ایم‬ ‫کرده‬ ‫اجرا‬ ‫نود‬ ‫یک‬ ‫و‬: > bin/kafka-server-start.sh config/server-1.properties & ... > bin/kafka-server-start.sh config/server-2.properties & ... ‫با‬ ‫تاپیک‬ ‫یک‬ ‫حاال‬Replication Factor‫برابر‬3‫کنیم‬ ‫می‬ ‫ایجاد‬‫چون‬3‫ایم‬ ‫کرده‬ ‫اجرا‬ ‫بروکر‬ ‫تا‬: > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic ‫دستور‬ ‫از‬ ‫توان‬ ‫می‬ ‫دهد‬ ‫می‬ ‫انجام‬ ‫چکاری‬ ‫بروکر‬ ‫هر‬ ‫بدانیم‬ ‫اینکه‬ ‫برای‬describe topics‫کرد‬ ‫استفاده‬: > bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my- replicated-topic Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs: Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0 ‫از‬ ‫ای‬ ‫خالصه‬ ‫خط‬ ‫اولین‬‫است‬ ‫پارتیشنها‬ ‫تمام‬.‫است‬ ‫پارتیشن‬ ‫هر‬ ‫مورد‬ ‫در‬ ‫دیگراطالعات‬ ‫خطوط‬.‫تاپیک‬ ‫برای‬ ‫پارتیشن‬ ‫یک‬ ‫فقط‬ ‫ما‬ ‫چون‬my- replicated-topic‫است‬ ‫شده‬ ‫چاپ‬ ‫خط‬ ‫یک‬ ‫فقط‬ ‫باال‬ ‫در‬ ‫بنابراین‬ ‫داریم‬ ‫است‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫فیلد‬ ‫هر‬ ‫توضیحات‬:  "leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions.  "replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive.  "isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader. ‫نکته‬:‫نود‬ ‫ما‬ ‫مثال‬ ‫در‬2‫است‬ ‫موجود‬ ‫پارتیشن‬ ‫تنها‬ ‫برای‬ ‫لیدر‬. ‫تاپیک‬ ‫برای‬test‫فرمان‬ ‫متوانیم‬ ‫هم‬--describe‫نما‬ ‫اجرا‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬‫نماییم‬ ‫بررسی‬ ‫را‬ ‫خروجی‬ ‫و‬ ‫ییم‬: > bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test Topic:test PartitionCount:1 ReplicationFactor:1 Configs: Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
  • 8. ‫تاپیک‬ ‫که‬ ‫دهد‬ ‫می‬ ‫نشان‬ ‫باال‬ ‫اطالعات‬test‫دارای‬replica‫سرور‬ ‫در‬ ‫و‬ ‫نیست‬9‫تاپیک‬ ‫ایجاد‬ ‫زمان‬ ‫در‬ ‫کالستر‬ ‫در‬ ‫موجود‬ ‫نود‬ ‫تنها‬ ‫است‬ ‫نماییم‬ ‫می‬ ‫ارسال‬ ‫جدید‬ ‫تاپیک‬ ‫به‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫پیام‬ ‫چند‬ ‫حاال‬: > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my- replicated-topic my test message 1 my test message 2 ^C ‫نماییم‬ ‫می‬ ‫مصرف‬ ‫زیر‬ ‫صورت‬ ‫به‬ ‫را‬ ‫شده‬ ‫ارسال‬ ‫های‬ ‫پیام‬ ‫حاال‬: > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from- beginning --topic my-replicated-topic ... my test message 1 my test message 2 ^C Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's kill it: > ps aux | grep server-1.properties 7564 ttys002 0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java... > kill -9 7564 Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica set: > bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs: Topic: my-replicated-topic Partition: 0 Leader: 2 Replicas: 1,2,0 Isr: 2,0 But the messages are still available for consumption even though the leader that took the writes originally is down: > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my- replicated-topic ... my test message 1 my test message 2 ^C Step 7: Use Kafka Connect to import/export data Writing data from the console and writing it back to the console is a convenient place to start, but you'll probably want to use data from other sources or export data from Kafka to other systems. For many systems, instead of writing custom integration code you can use Kafka Connect to import or export data. Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is an extensible tool that runs connectors, which implement the custom logic for interacting with an
  • 9. external system. In this quickstart we'll see how to run Kafka Connect with simple connectors that import data from a file to a Kafka topic and export data from a Kafka topic to a file. First, we'll start by creating some seed data to test with: > echo -e "foonbar" > test.txt Next, we'll start two connectors running in standalone mode, which means they run in a single, local, dedicated process. We provide three configuration files as parameters. The first is always the configuration for the Kafka Connect process, containing common configuration such as the Kafka brokers to connect to and the serialization format for data. The remaining configuration files each specify a connector to create. These files include a unique connector name, the connector class to instantiate, and any other configuration required by the connector. > bin/connect-standalone.sh config/connect-standalone.properties config/connect-file- source.properties config/connect-file-sink.properties These sample configuration files, included with Kafka, use the default local cluster configuration you started earlier and create two connectors: the first is a source connector that reads lines from an input file and produces each to a Kafka topic and the second is a sink connector that reads messages from a Kafka topic and produces each as a line in an output file. During startup you'll see a number of log messages, including some indicating that the connectors are being instantiated. Once the Kafka Connect process has started, the source connector should start reading lines from test.txt and producing them to the topic connect-test, and the sink connector should start reading messages from the topic connect-test and write them to the file test.sink.txt. We can verify the data has been delivered through the entire pipeline by examining the contents of the output file: > cat test.sink.txt foo bar Note that the data is being stored in the Kafka topic connect-test, so we can also run a console consumer to see the data in the topic (or use custom consumer code to process it): > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from- beginning {"schema":{"type":"string","optional":false},"payload":"foo"} {"schema":{"type":"string","optional":false},"payload":"bar"} ... The connectors continue to process data, so we can add data to the file and see it move through the pipeline: > echo "Another line" >> test.txt You should see the line appear in the console consumer output and in the sink file. Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis. Kafka Connect features include:
  • 10.  A common framework for Kafka connectors - Kafka Connect standardizes integration of other data systems with Kafka, simplifying connector development, deployment, and management  Distributed and standalone modes - scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments  REST interface - submit and manage connectors to your Kafka Connect cluster via an easy to use REST API  Automatic offset management - with just a little information from connectors, Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error prone part of connector development  Distributed and scalable by default - Kafka Connect builds on the existing  Streaming/batch integration - leveraging Kafka's existing capabilities, Kafka Connect is an ideal solution for bridging streaming and batch data systems ‫کردن‬ ‫اجرا‬kafka‫صورت‬ ‫به‬Distributed: > bin/connect-distributed.sh config/connect-distributed.properties The difference is in the class which is started and the configuration parameters which change how the Kafka Connect process decides where to store configurations, how to assign work, and where to store offsets. In particular, the following configuration parameters are critical to set before starting your cluster:  group.id (default connect-cluster) - unique name for the cluster, used in forming the Connect cluster group; note that this must not conflict with consumer group IDs  config.storage.topic (default connect-configs) - topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated topic  offset.storage.topic (default connect-offsets) - topic to use for ; this topic should have many partitions and be replicated Note that in distributed mode the connector configurations are not passed on the command line. Instead, use the REST API described below to create, modify, and destroy connectors. Connector configurations are simple key-value mappings. For standalone mode these are defined in a properties file and passed to the Connect process on the command line. In distributed mode, they will be included in the JSON payload for the request that creates (or modifies) the connector. Most configurations are connector dependent, so they can't be outlined here. However, there are a few common options:  name - Unique name for the connector. Attempting to register again with the same name will fail.  connector.class - The Java class for the connector  tasks.max - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism. Sink connectors also have one additional option to control their input:  topics - A list of topics to use as input for this connector Since Kafka Connect is intended to be run as a service, it also supports a REST API for managing connectors. By default this service runs on port 8083. The following are the currently supported endpoints:
  • 11.  GET /connectors - return a list of active connectors  POST /connectors - create a new connector; the request body should be a JSON object containing a string name field and a object config field with the connector configuration parameters  GET /connectors/{name} - get information about a specific connector  GET /connectors/{name}/config - get the configuration parameters for a specific connector  PUT /connectors/{name}/config - update the configuration parameters for a specific connector  GET /connectors/{name}/tasks - get a list of tasks currently running for a connector  DELETE /connectors/{name} - delete a connector, halting all tasks and deleting its configuration