Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Jesse Anderson, CEO, Smoking Hand

Kafka is a distributed publish subscribe system
It uses a commit log to track changes
Kafka was originally created at LinkedIn
Open sourced in 2011
Graduated to a top-level Apache project in 2012
Many Big Data projects are open source implementations of closed
source products
Unlike Hadoop, HBase or Cassandra, Kafka actually isn't a clone
of an existing closed source product
What Is Kafka?

A publish/subscribe is used to move data
Also known as a producer/consumer system
The publisher creates data
Can be from any source
Can be binary or text
The subscriber consumes the publisher's data
The subscriber will use the data for its algorithms
Pub/Sub

Decoupling is removing knowledge about how a system flows
A highly coupled system breaks when a simple change is made
A highly coupled system needs to know all configurations and
destinations
A decoupled system is resilient to change
It does not break during a change
Does not need extensive knowledge about the rest of the system
Decoupling

Kafka is proven with Big Data
Kafka decouples systems
Becoming common in enterprise data flows
The same codebase being used for years at LinkedIn answers the
questions:
Does it scale?
Is it fast?
Is it robust?
Is it production ready?
Kafka supports the traditional publish/subscribe features
Why Use Kafka?

We will now demonstrate how Kafka works with Legos
Concepts shown:
Publish/Subscribe
Topics
Partitioning
Commit Logs
Log compaction
DEMO: Kafka With Legos

Producers publish or create the data sent on the cluster
All producer data is sent over the network to the Kafka cluster
All producer data is sent as keys and values
The keys and values can be binary or text
Publisher

Consumers receive the producer's data
The consumers actually pull the data from the Kafka cluster
The consumers receive the keys and values sent by the producer
Subscriber

Topics are a way of grouping data together
Publishers push data on a topic
Consumers receive all of their data on a topic
The topic must match exactly on both the publisher and consumer
Topics

There are various ways to access Kafka
The most common way is to use the Java API
It is the only first class citizen
Other languages have API implementations but aren't part of the
Apache Kafka project
The REST interface allows many languages to use Kafka
This requires access to the REST Server
Kafka Connect allows general purpose integrations
Data can be ingested into Hadoop
Data can be added to RDBMS
Accessing Kafka

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
Properties props = new Properties();
// Configure brokers to connect to
props.put("bootstrap.servers", "broker1:9092");
// Configure serializer classes
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new
KafkaProducer<String, String>(
props);
// Create ProducerRecord and send it
String key = "mykey";
String value = "myvalue";
ProducerRecord<String, String> record = new
ProducerRecord<String, String>(
"my_topic", key, value);
producer.send(record);
producer.close();
Creating a Publisher

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
String topic = "hello_topic";
Properties props = new Properties();
// Configure initial location bootstrap servers
props.put("bootstrap.servers", "broker1:9092");
// Configure consumer group
props.put("group.id", "group1");
// Configure key and value deserializers
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
// Create the consumer and subscribe to the topic
consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
Creating a Consumer (1/2)

while (true) {
// Poll for ConsumerRecords for a certain amount of time
ConsumerRecords<String, String> records = consumer.poll(100);
// Process the ConsumerRecords, if any, that came back
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
String value = record.value();
// Do something with message
}
}
}
public void close() {
consumer.close();
}
public static void main(String[] args) {
MyConsumer consumer = new MyConsumer();
consumer.createConsumer();
consumer.close();
}
}
Creating a Consumer (2/2)

Current: Instructor, Thought Leader, Monkey Tamer
Previously:
Curriculum Developer and Instructor @ Cloudera
Senior Software Engineer @ Intuit
Covered, Conferences and Published In:
GigaOM, ArsTecnica, Pragmatic Programmers, Strata, OSCON,
Wall Street Journal, CNN, BBC, NPR
See Me On:
@jessetanderson
http://tiny.smokinghand.com/linkedin
http://tiny.smokinghand.com/youtube
About Me

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Jesse Anderson, CEO, Smoking Hand

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Jesse Anderson, CEO, Smoking Hand (20)

More from Data Con LA (20)

Recently uploaded (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Jesse Anderson, CEO, Smoking Hand