SlideShare a Scribd company logo
mypipe: Buffering and consuming
MySQL changes via Kafka
with
-=[ Scala - Avro - Akka ]=-
Hisham Mardam-Bey
Github: mardambey
Twitter: @codewarrior
Overview
● Who is this guy? + Quick Mate1 Intro
● Quick Tech Intro
● Motivation and History
● Features
● Design and Architecture
● Practical applications and usages
● System diagram
● Future work
● Q&A
Who is this guy?
● Linux and OpenBSD user and developer
since 1996
● Started out with C followed by Ruby
● Working with the JVM since 2007
● “Lately” building and running distributed
systems, and doing Scala
Github: mardambey
Twitter: @codewarrior
Mate1: quick intro
● Online dating, since 2003, based in Montreal
● Initially a team of 3, around 30 now
● Engineering team has 12 geeks / geekettes
○ Always looking for talent!
● We own and run our own hardware
○ fun!
○ mostly…
https://github.com/mate1
Super Quick Tech Intro
● MySQL: relational database
● Avro: data serialization system
● Kafka: publish-subscribe messaging
rethought as a distributed commit log
● Akka: toolkit and runtime simplifying the
construction of concurrent and distributed
applications
● Actors: universal primitives of concurrent
computation using message passing
● Schema repo / registry: holds versioned
Avro schemas
Motivation
● Initially, wanted:
○ MySQL triggers outside the DB
○ MySQL fan-in or fan-out replication (data cubes)
○ MySQL to “Hadoop”
● And then:
○ Cache or data store consistency with DB
○ Direct integration with big-data systems
○ Data schema evolution support
○ Turning MySQL inside out
■ Bootstrapping downstream data systems
History
● 2010: Custom Perl scripts to parse binlogs
● 2011/2012: Guzzler
○ Written in Scala, uses mysqlbinlog command
○ Simple to start with, difficult to maintain and control
● 2014: Enter mypipe!
○ Initial prototyping begins
Feature Overview (1/2)
● Emulates MySQL slave via binary log
○ Writes MySQL events to Kafka
● Uses Avro to serialize and deserialize data
○ Generically via a common schema for all tables
○ Specifically via per-table schema
● Modular by design
○ State saving / loading (files, MySQL, ZK, etc.)
○ Error handling
○ Event filtering
○ Connection sources
Feature Overview (2/2)
● Transaction and ALTER TABLE support
○ Includes transaction information within events
○ Refreshes schema as needed
● Can publish to any downstream system
○ Currently, we have have Kafka
○ Initially, we started with Cassandra for the prototype
● Can bootstrap a MySQL table into Kafka
○ Transforms entire table into Kafka events
○ Useful with Kafka log compaction
● Configurable
○ Kafka topic names
○ whitelist / blacklist support
● Console consumer, Dockerized dev env
Project Structure
● mypipe-api: API for MySQL binlogs
● mypipe-avro: binary protocol, mutation
serialization and deserialization
● mypipe-producers: push data downstream
● mypipe-kafka: Serializer & Decoder
implementations
● mypipe-runner: pipes and console tools
● mypipe-snapshotter: import MySQL tables
(beta)
MySQL Binary Logging
● Foundation of MySQL replication
● Statement or Row based
● Represents a journal / change log of data
● Allows applications to spy / tune in on
MySQL changes
MySQLBinaryLogConsumer
● Uses behavior from abstract class
● Modular design, in this case, uses config
based implementations
● Uses Hocon for ease and availability
case class MySQLBinaryLogConsumer(config: Config)
extends AbstractMySQLBinaryLogConsumer
with ConfigBasedConnectionSource
with ConfigBasedErrorHandlingBehaviour
with ConfigBasedEventSkippingBehaviour
with CacheableTableMapBehaviour
AbstractMySQLBinaryLogConsumer
● Maintains connection to MySQL
● Primarily handles
○ TABLE_MAP
○ QUERY (BEGIN, COMMIT, ROLLBACK, ALTER)
○ XID
○ Mutations (INSERT, UPDATE, DELETE)
● Provides an enriched binary log API
○ Looks up table metadata and includes it
○ Scala friendly case class and option-driven(*) API for
speaking MySQL binlogs
(*) constant work in progress (=
TABLE_MAP and table metadata
● Provides table metadata
○ Precedes mutation events
○ But no column names!
● MySQLMetadataManager
○ One actor per database
○ Uses “information_schema”
○ Determines column metadata and primary key
● TableCache
○ Wraps metadata actor providing a cache
○ Refreshes tables “when needed”
Mutations
case class ColumnMetadata(name: String, colType: ColumnType.EnumVal, isPrimaryKey: Boolean)
case class PrimaryKey(columns: List[ColumnMetadata])
case class Column(metadata: ColumnMetadata, value: java.io.Serializable)
case class Table(id: Long, name: String, db: String, columns: List[ColumnMetadata], primaryKey:
Option[PrimaryKey])
case class Row(table: Table, columns: Map[String, Column])
case class InsertMutation(timestamp: Long, table: Table, rows: List[Row], txid: UUID)
case class UpdateMutation(timestamp: Long, table: Table, rows: List[(Row, Row)], txid: UUID)
case class DeleteMutation(timestamp: Long, table: Table, rows: List[Row], txid: UUID)
● Fully enriched with table metadata
● Contain column types, data and txid
● Mutations can be serialized and deserialized
from and to Avro
Kafka Producers
● Two modes of operation:
○ Generic Avro beans
○ Specific Avro beans
● Producers decoupled from SerDE
○ Recently started supporting Kafka serializers and
decoders
○ Currently we only support: http://schemarepo.org/
○ Very soon we can integrate with systems such as
Confluent Platform’s schema registry.
Kafka Message Format
-----------------
| MAGIC | 1 byte |
|-----------------|
| MTYPE | 1 byte |
|-----------------|
| SCMID | N bytes |
|-----------------|
| DATA | N bytes |
-----------------
● MAGIC: magic byte, for protocol version
● MTYPE: mutation type, a single byte
○ indicating insert (0x1), update (0x2), or delete (0x3)
● SCMID: Avro schema ID, N bytes
● DATA: the actual mutation data as N bytes
Generic Message Format
3 Avro beans
○ InsertMutation, DeleteMutation, UpdateMutation
○ Hold data for new and old columns (for updates)
○ Groups data by type into Avro maps
{
"name": "old_integers",
"type": {"type": "map", "values": "int"}
},
{
"name": "new_integers",
"type": {"type": "map", "values": "int"}
},
{
"name": "old_strings",
"type": {"type": "map", "values": "string"}
},
{
"name": "new_strings",
"type": {"type": "map", "values": "string"}
} ...
Specific Message Format
Requires 3 Avro beans per table
○ Insert, Update, Delete
○ Specific fields can be used in the schema
{
"name": "UserInsert",
"fields": [
{
"name": "id",
"type": ["null", "int"]
},
{
"name": "username",
"type": ["null", "string"]
},
{
"name": "login_date",
"type": ["null", "long"]
},...
]
},
ALTER table support
● ALTER table queries intercepted
○ Producers can handle this event specifically
● Kafka serializer and deserializer
○ They inspect Avro beans and refresh schema if
needed
● Avro evolution rules must be respected
○ Or mypipe can’t properly encode / decode data
Pipes
● Join consumers to producers
● Use configurable time based checkpointing
and flushing
○ File based, MySQL based, ZK based, Kafka based
schema-repo-client = "mypipe.avro.schema.SchemaRepo"
consumers {
localhost {
# database "host:port:user:pass" array
source = "localhost:3306:mypipe:mypipe"
}
}
producers {
stdout {
class = "mypipe.kafka.producer.stdout.StdoutProducer"
}
kafka-generic {
class = "mypipe.kafka.producer.KafkaMutationGenericAvroProducer"
}
}
pipes {
stdout {
consumers = ["localhost"]
producer { stdout {} }
binlog-position-repo {
#class="mypipe.api.repo.ConfigurableMySQLBasedBinaryLogPositionRepository"
class = "mypipe.api.repo.ConfigurableFileBasedBinaryLogPositionRepository"
config {
file-prefix = "stdout-00" # required if binlog-position-repo is specifiec
data-dir = "/tmp/mypipe/data"
}
}
}
kafka-generic {
enabled = true
consumers = ["localhost"]
producer {
kafka-generic {
metadata-brokers = "localhost:9092"
}
}
}
Practical Applications
● Cache coherence
● Change logging and auditing
● MySQL to:
○ HDFS
○ Cassandra
○ Spark
● Once Confluent Schema Registry integrated
○ Kafka Connect
○ KStreams
● Other reactive applications
○ Real-time notifications
Pipe 2
Pipe 1
Kafka
System Diagram
Hadoop Cassandra
MySQL
BinaryLog
Consumer
Dashboards
Binary Logs
Select
Consumer
MySQL
Kafka
Producer
Schema
Registry
Kafka
Producer
db2_tbl1
db2_tbl2
db1_tbl1
db1_tbl2
Event
Consumers
Users
Pipe N
MySQL
BinaryLog
Consumer
Kafka
Producer
Future Work
● Finish MySQL -> Kafka snapshot support
● Move to Kafka 0.10
● MySQL global transaction identifier (GTID)
support
● Publish to Maven
● More tests, we have a good amount, but you
can’t have enough!
Fin!
That’s all folks (=
Thanks!
Questions?
https://github.com/mardambey/mypipe

More Related Content

PDF
Activity feeds (and more) at mate1
PDF
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
PPTX
Cassandra Lunch #59 Functions in Cassandra
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
PDF
Cassandra Explained
PDF
openTSDB - Metrics for a distributed world
PDF
OpenTSDB 2.0
PDF
Go at uber
Activity feeds (and more) at mate1
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Cassandra Lunch #59 Functions in Cassandra
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Cassandra Explained
openTSDB - Metrics for a distributed world
OpenTSDB 2.0
Go at uber

What's hot (19)

PPT
9b. Document-Oriented Databases lab
PPTX
How bol.com makes sense of its logs, using the Elastic technology stack.
PDF
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
PPTX
Scaling an ELK stack at bol.com
PDF
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
PDF
Xephon K A Time series database with multiple backends
PDF
GSoC2014 - Uniritter Presentation May, 2015
PDF
.NET Memory Primer (Martin Kulov)
PPTX
Introduction to NoSql
PDF
Introduction to Cassandra
PDF
Go and Uber’s time series database m3
PDF
ConvNetJS & CaffeJS
PDF
Sphinx && Perl Houston Perl Mongers - May 8th, 2014
PDF
My talk about Tarantool and Lua at Percona Live 2016
PDF
Experiences in ELK with D3.js for Large Log Analysis and Visualization
PPT
Tokyo Cabinet
ODP
Tokyo Cabinet
PDF
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
PDF
Tuga it 2017 - Event processing with Apache Storm
9b. Document-Oriented Databases lab
How bol.com makes sense of its logs, using the Elastic technology stack.
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
Scaling an ELK stack at bol.com
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Xephon K A Time series database with multiple backends
GSoC2014 - Uniritter Presentation May, 2015
.NET Memory Primer (Martin Kulov)
Introduction to NoSql
Introduction to Cassandra
Go and Uber’s time series database m3
ConvNetJS & CaffeJS
Sphinx && Perl Houston Perl Mongers - May 8th, 2014
My talk about Tarantool and Lua at Percona Live 2016
Experiences in ELK with D3.js for Large Log Analysis and Visualization
Tokyo Cabinet
Tokyo Cabinet
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
Tuga it 2017 - Event processing with Apache Storm
Ad

Similar to mypipe: Buffering and consuming MySQL changes via Kafka (20)

PDF
Streaming Operational Data with MariaDB MaxScale
PDF
Real-time, real estate listings with Apache Kafka
PDF
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
PPTX
Evolving Streaming Applications
PDF
Avro Data | Washington DC HUG
PPTX
ZIP
Gluecon 2012 - DynamoDB
KEY
DynamoDB Gluecon 2012
PDF
Outside The Box With Apache Cassnadra
PPTX
Summer 2017 undergraduate research powerpoint
PPT
Tokyocabinet
PDF
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
PDF
Data engineering Stl Big Data IDEA user group
PDF
Amazon DynamoDB Lessen's Learned by Beginner
KEY
KeyValue Stores
KEY
MongoDB SF Python
KEY
Nosql redis-mongo
KEY
MongoDB EuroPython 2009
PPTX
Drop acid
Streaming Operational Data with MariaDB MaxScale
Real-time, real estate listings with Apache Kafka
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Evolving Streaming Applications
Avro Data | Washington DC HUG
Gluecon 2012 - DynamoDB
DynamoDB Gluecon 2012
Outside The Box With Apache Cassnadra
Summer 2017 undergraduate research powerpoint
Tokyocabinet
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
Data engineering Stl Big Data IDEA user group
Amazon DynamoDB Lessen's Learned by Beginner
KeyValue Stores
MongoDB SF Python
Nosql redis-mongo
MongoDB EuroPython 2009
Drop acid
Ad

Recently uploaded (20)

PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPT
Introduction Database Management System for Course Database
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Transform Your Business with a Software ERP System
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
VVF-Customer-Presentation2025-Ver1.9.pptx
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Navsoft: AI-Powered Business Solutions & Custom Software Development
Softaken Excel to vCard Converter Software.pdf
Digital Strategies for Manufacturing Companies
Reimagine Home Health with the Power of Agentic AI​
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Introduction Database Management System for Course Database
Design an Analysis of Algorithms II-SECS-1021-03
Transform Your Business with a Software ERP System
Designing Intelligence for the Shop Floor.pdf
ai tools demonstartion for schools and inter college
How to Choose the Right IT Partner for Your Business in Malaysia

mypipe: Buffering and consuming MySQL changes via Kafka

  • 1. mypipe: Buffering and consuming MySQL changes via Kafka with -=[ Scala - Avro - Akka ]=- Hisham Mardam-Bey Github: mardambey Twitter: @codewarrior
  • 2. Overview ● Who is this guy? + Quick Mate1 Intro ● Quick Tech Intro ● Motivation and History ● Features ● Design and Architecture ● Practical applications and usages ● System diagram ● Future work ● Q&A
  • 3. Who is this guy? ● Linux and OpenBSD user and developer since 1996 ● Started out with C followed by Ruby ● Working with the JVM since 2007 ● “Lately” building and running distributed systems, and doing Scala Github: mardambey Twitter: @codewarrior
  • 4. Mate1: quick intro ● Online dating, since 2003, based in Montreal ● Initially a team of 3, around 30 now ● Engineering team has 12 geeks / geekettes ○ Always looking for talent! ● We own and run our own hardware ○ fun! ○ mostly… https://github.com/mate1
  • 5. Super Quick Tech Intro ● MySQL: relational database ● Avro: data serialization system ● Kafka: publish-subscribe messaging rethought as a distributed commit log ● Akka: toolkit and runtime simplifying the construction of concurrent and distributed applications ● Actors: universal primitives of concurrent computation using message passing ● Schema repo / registry: holds versioned Avro schemas
  • 6. Motivation ● Initially, wanted: ○ MySQL triggers outside the DB ○ MySQL fan-in or fan-out replication (data cubes) ○ MySQL to “Hadoop” ● And then: ○ Cache or data store consistency with DB ○ Direct integration with big-data systems ○ Data schema evolution support ○ Turning MySQL inside out ■ Bootstrapping downstream data systems
  • 7. History ● 2010: Custom Perl scripts to parse binlogs ● 2011/2012: Guzzler ○ Written in Scala, uses mysqlbinlog command ○ Simple to start with, difficult to maintain and control ● 2014: Enter mypipe! ○ Initial prototyping begins
  • 8. Feature Overview (1/2) ● Emulates MySQL slave via binary log ○ Writes MySQL events to Kafka ● Uses Avro to serialize and deserialize data ○ Generically via a common schema for all tables ○ Specifically via per-table schema ● Modular by design ○ State saving / loading (files, MySQL, ZK, etc.) ○ Error handling ○ Event filtering ○ Connection sources
  • 9. Feature Overview (2/2) ● Transaction and ALTER TABLE support ○ Includes transaction information within events ○ Refreshes schema as needed ● Can publish to any downstream system ○ Currently, we have have Kafka ○ Initially, we started with Cassandra for the prototype ● Can bootstrap a MySQL table into Kafka ○ Transforms entire table into Kafka events ○ Useful with Kafka log compaction ● Configurable ○ Kafka topic names ○ whitelist / blacklist support ● Console consumer, Dockerized dev env
  • 10. Project Structure ● mypipe-api: API for MySQL binlogs ● mypipe-avro: binary protocol, mutation serialization and deserialization ● mypipe-producers: push data downstream ● mypipe-kafka: Serializer & Decoder implementations ● mypipe-runner: pipes and console tools ● mypipe-snapshotter: import MySQL tables (beta)
  • 11. MySQL Binary Logging ● Foundation of MySQL replication ● Statement or Row based ● Represents a journal / change log of data ● Allows applications to spy / tune in on MySQL changes
  • 12. MySQLBinaryLogConsumer ● Uses behavior from abstract class ● Modular design, in this case, uses config based implementations ● Uses Hocon for ease and availability case class MySQLBinaryLogConsumer(config: Config) extends AbstractMySQLBinaryLogConsumer with ConfigBasedConnectionSource with ConfigBasedErrorHandlingBehaviour with ConfigBasedEventSkippingBehaviour with CacheableTableMapBehaviour
  • 13. AbstractMySQLBinaryLogConsumer ● Maintains connection to MySQL ● Primarily handles ○ TABLE_MAP ○ QUERY (BEGIN, COMMIT, ROLLBACK, ALTER) ○ XID ○ Mutations (INSERT, UPDATE, DELETE) ● Provides an enriched binary log API ○ Looks up table metadata and includes it ○ Scala friendly case class and option-driven(*) API for speaking MySQL binlogs (*) constant work in progress (=
  • 14. TABLE_MAP and table metadata ● Provides table metadata ○ Precedes mutation events ○ But no column names! ● MySQLMetadataManager ○ One actor per database ○ Uses “information_schema” ○ Determines column metadata and primary key ● TableCache ○ Wraps metadata actor providing a cache ○ Refreshes tables “when needed”
  • 15. Mutations case class ColumnMetadata(name: String, colType: ColumnType.EnumVal, isPrimaryKey: Boolean) case class PrimaryKey(columns: List[ColumnMetadata]) case class Column(metadata: ColumnMetadata, value: java.io.Serializable) case class Table(id: Long, name: String, db: String, columns: List[ColumnMetadata], primaryKey: Option[PrimaryKey]) case class Row(table: Table, columns: Map[String, Column]) case class InsertMutation(timestamp: Long, table: Table, rows: List[Row], txid: UUID) case class UpdateMutation(timestamp: Long, table: Table, rows: List[(Row, Row)], txid: UUID) case class DeleteMutation(timestamp: Long, table: Table, rows: List[Row], txid: UUID) ● Fully enriched with table metadata ● Contain column types, data and txid ● Mutations can be serialized and deserialized from and to Avro
  • 16. Kafka Producers ● Two modes of operation: ○ Generic Avro beans ○ Specific Avro beans ● Producers decoupled from SerDE ○ Recently started supporting Kafka serializers and decoders ○ Currently we only support: http://schemarepo.org/ ○ Very soon we can integrate with systems such as Confluent Platform’s schema registry.
  • 17. Kafka Message Format ----------------- | MAGIC | 1 byte | |-----------------| | MTYPE | 1 byte | |-----------------| | SCMID | N bytes | |-----------------| | DATA | N bytes | ----------------- ● MAGIC: magic byte, for protocol version ● MTYPE: mutation type, a single byte ○ indicating insert (0x1), update (0x2), or delete (0x3) ● SCMID: Avro schema ID, N bytes ● DATA: the actual mutation data as N bytes
  • 18. Generic Message Format 3 Avro beans ○ InsertMutation, DeleteMutation, UpdateMutation ○ Hold data for new and old columns (for updates) ○ Groups data by type into Avro maps { "name": "old_integers", "type": {"type": "map", "values": "int"} }, { "name": "new_integers", "type": {"type": "map", "values": "int"} }, { "name": "old_strings", "type": {"type": "map", "values": "string"} }, { "name": "new_strings", "type": {"type": "map", "values": "string"} } ...
  • 19. Specific Message Format Requires 3 Avro beans per table ○ Insert, Update, Delete ○ Specific fields can be used in the schema { "name": "UserInsert", "fields": [ { "name": "id", "type": ["null", "int"] }, { "name": "username", "type": ["null", "string"] }, { "name": "login_date", "type": ["null", "long"] },... ] },
  • 20. ALTER table support ● ALTER table queries intercepted ○ Producers can handle this event specifically ● Kafka serializer and deserializer ○ They inspect Avro beans and refresh schema if needed ● Avro evolution rules must be respected ○ Or mypipe can’t properly encode / decode data
  • 21. Pipes ● Join consumers to producers ● Use configurable time based checkpointing and flushing ○ File based, MySQL based, ZK based, Kafka based
  • 22. schema-repo-client = "mypipe.avro.schema.SchemaRepo" consumers { localhost { # database "host:port:user:pass" array source = "localhost:3306:mypipe:mypipe" } } producers { stdout { class = "mypipe.kafka.producer.stdout.StdoutProducer" } kafka-generic { class = "mypipe.kafka.producer.KafkaMutationGenericAvroProducer" } }
  • 23. pipes { stdout { consumers = ["localhost"] producer { stdout {} } binlog-position-repo { #class="mypipe.api.repo.ConfigurableMySQLBasedBinaryLogPositionRepository" class = "mypipe.api.repo.ConfigurableFileBasedBinaryLogPositionRepository" config { file-prefix = "stdout-00" # required if binlog-position-repo is specifiec data-dir = "/tmp/mypipe/data" } } }
  • 24. kafka-generic { enabled = true consumers = ["localhost"] producer { kafka-generic { metadata-brokers = "localhost:9092" } } }
  • 25. Practical Applications ● Cache coherence ● Change logging and auditing ● MySQL to: ○ HDFS ○ Cassandra ○ Spark ● Once Confluent Schema Registry integrated ○ Kafka Connect ○ KStreams ● Other reactive applications ○ Real-time notifications
  • 26. Pipe 2 Pipe 1 Kafka System Diagram Hadoop Cassandra MySQL BinaryLog Consumer Dashboards Binary Logs Select Consumer MySQL Kafka Producer Schema Registry Kafka Producer db2_tbl1 db2_tbl2 db1_tbl1 db1_tbl2 Event Consumers Users Pipe N MySQL BinaryLog Consumer Kafka Producer
  • 27. Future Work ● Finish MySQL -> Kafka snapshot support ● Move to Kafka 0.10 ● MySQL global transaction identifier (GTID) support ● Publish to Maven ● More tests, we have a good amount, but you can’t have enough!
  • 28. Fin! That’s all folks (= Thanks! Questions? https://github.com/mardambey/mypipe