Event Driven Architecture

1
© OCTO 2015© OCTO 2015
Event Driven Architecture
bluckbluck

2
© OCTO 2015
The first problem was how to transport data between systems
The second part of this problem was the need to do richer
analytical data processing with very low latency.

3
© OCTO 2015
The pipeline for log data was
scalable
but lossy and
could only deliver data with high latency.
The pipeline between Oracle instances was
fast,
exact, and
real-time,
but not available to any other systems.

4
© OCTO 2015
The pipeline of Oracle data for Hadoop was
periodic CSV dumps—
high throughput,
but batch.
The pipeline of data to our search system was
low latency,
but unscalable
and tied directly to the database.
The messaging systems were
low latency
but unreliable
and unscalable.

6
© OCTO 2015
We added data centers geographically distributed around the world we had to
build out geographical replication for each of these data flows
he data was always unreliable.
Our reports were untrustworthy,
derived indexes and stores were questionable, and
everyone spent a lot of time battling data quality issues of all kinds
At the same time we weren't just shipping data from place to place; we also
wanted to do things with it
Hadoop had given us a platform for batch processing, data archival, and ad hoc
processing, and this had been enormously successful, but we lacked an
analogous platform for low-latency processing.

7
© OCTO 2015
Stream Data Plateform

8
© OCTO 2015
Stream Data Plateform

9
© OCTO 2015
Your database stores the current state of your data. But the current state is
always caused by some actions that took place in the past. The actions are the
events.
Much of what people refer to when they talk about "big data" is really the act of
capturing these events that previously weren't recorded anywhere and putting
them to use for analysis, optimization, and decision making
Event streams are an obvious fit for log data or things like "orders", "sales",
"clicks" or "trades" that are obviously event-like.
The Rise of Events and Event Streams

10
© OCTO 2015
data in databases can also be thought of as an event stream. The process of creating a
backup or standby copy of a database :
to dump out the contents
to take a "diff" of what has changed
Change capture : If we take our diffs more and more frequently what we will be left with is a
continuous sequence of single row changes.
By publishing the database changes into the stream data platform you add this to the other
set of event streams. You can use these streams to synchronize other systems like
Hadoop cluster,
a replica database, or
a search index, or
you can feed these changes into applications
or stream processors to directly compute new things off the changes.
Databases Are Event Streams

11
© OCTO 2015
A stream data platform has two primary uses:
Data Integration: The stream data platform captures streams of events or data changes and
feeds these to other data systems such as relational databases, key-value stores, Hadoop, or
the data warehouse.
Stream processing: It enables continuous, real-time processing and transformation of these
streams and makes the results available system-wide.
The stream data platform is a central hub for data streams.
t also acts as a buffer between these systems—the publisher of data doesn't need to be
concerned with the various systems that will eventually consume and load the data. This
means consumers of data can come and go and are fully decoupled from the source.
What Is a Stream Data Platform For?

12
© OCTO 2015
Hadoop wants to be able to maintain a full copy of all the data in your organization and act
as a "data lake" or "enterprise data hub".
Directly integrating each data source with HDFS is a hugely time consuming proposition
the end result only makes that data available to Hadoop.
This type of data capture isn't suitable for real-time processing or syncing other real-time
applications.
This same pipeline can run in reverse: Hadoop and the data warehouse environment can
publish out results that need to flow into appropriate systems for serving in customer-facing
applications.
What Is a Stream Data Platform For? Zoom
Hadoop

13
© OCTO 2015
The stream processing use case plays off the data integration use case.
The results of the stream processing are just a new, derived stream.
Stream processing acts as both a way to develop applications that need low-latency
transformations but it is also directly part of the data integration usage as well:
integrating systems often requires some munging of data streams in between.
What Is a Stream Data Platform For? Zoom
ETL

14
© OCTO 2015
A stream data platform is similar to an enterprise messaging system—it receives
messages and distributes them to interested subscribers. There are three
important differences:
Messaging systems are typically run in one-off deployments for different
applications. The purpose of the stream data platform is very much as a central data
hub.
Messaging systems do a poor job of supporting integration with batch systems, such
as a data warehouse or a Hadoop cluster, as they have limited data storage
capacity.
Messaging systems do not provide semantics that are easily compatible with rich
stream processing.
How Does a Stream Data Platform Relate To Existing Things

15
© OCTO 2015
In other words a data stream data platform is a messaging system whose role
has been rethought at a company-wide scale.
How Does a Stream Data Platform Relate To Existing Things

16
© OCTO 2015
A stream data platform is a true platform that any other system can choose to
tap into and many applications can build around.
by making data available in a uniform format in a single place with a common
stream abstraction, many of the routine data clean-up tasks can be avoided
entirely.
Data Integration Tools

17
© OCTO 2015
The advantage of a stream data platform is that transformation is fundamentally
decoupled from the stream itself.
This code can live in applications or stream processing tasks, allowing teams to
iterate at their own pace without a central bottleneck for application
development.
Enterprise Service Buses

18
© OCTO 2015
Databases have long had similar log mechanisms such as Golden Gate.
However these mechanisms are limited to database changes only and are not a
general purpose event capture platform.
Change Capture Systems

19
© OCTO 2015
A stream data platform doesn't replace your data warehouse; in fact, quite the
opposite: it feeds it data.
Data Warehouses and Hadoop

20
© OCTO 2015
They attempt to add richer processing semantics to subscribers and can make
implementing data transformation easier.
Stream Processing Systems

21
© OCTO 2015
everything from user activity to database changes to administrative actions like
restarting a process are captured in real-time streams that are subscribed to and
processed in real-time.
What Does This Look Like In Practice?

22
© OCTO 2015
part of the promise of this approach to data management is having a central repository with
the full set of data streams your organization generates. This works best when data is all in
the same place.
simplifying system architecture.
fewer integration points for data consumers,
fewer things to operate,
lower incremental cost for adding new applications,
makes it easier to reason about data flow.
But, there are several reasons to end up with multiple clusters
To keep activity local to a datacenter
For security reasons
For SLA control.
Rcommendations : Limit The Number of Clusters

23
© OCTO 2015
Apache Kafka does not enforce any particular format
If each individual or application chooses a representation of their own preference—say
some use JSON, others XML, and others CSV—the result is that any system or process
which uses multiple data streams has to munge and understand each of these.
Local optimization—choosing your favorite format for data you produce—leads to huge
global sub-optimization since now each system needs to write N adaptors, one for each
format it wants to ingest.
imagine how useless the Unix toolchain would be if each tool invented its own format: you
would have to translate between formats every time you wanted to pipe one command to
another.
Rcommendations : Pick A Single Data Format

24
© OCTO 2015
Connecting all systems directly would look
something like this
Whereas having this central stream data platform
looks something like this
Rcommendations : Pick A Single Data Format

25
© OCTO 2015
We think Avro is the best choice for a number of reasons:
1. It has a direct mapping to and from JSON
2. It has a very compact format. The bulk of JSON, repeating every field name with every single
record, is what makes JSON inefficient for high-volume usage.
3. It is very fast.
4. It has great bindings for a wide variety of programming languages so you can generate Java
objects that make working with event data easier, but it does not require code generation so
tools can be written generically for any data stream.
5. It has a rich, extensible schema language defined in pure JSON
6. It has the best notion of compatibility for evolving your data over time.
Rcommendations : Use Avro as Your Data Format

26
© OCTO 2015
Isn't the modern world of big data all about unstructured data, dumped in whatever form is
convenient, and parsed later when it is queried?
One of the primary advantages of this type of architecture where data is modeled as
streams is that applications are decoupled. Applications produce a stream of events
capturing what occurred without knowledge of which things subscribe to these streams.
The Need For Schemas

27
© OCTO 2015
Whenever you see a common activity across multiple systems try to use a common schema
for this activity.
An example of this that is common to all businesses is application errors.
Share Event Schemas

Event Driven Architecture

More Related Content

What's hot (20)

Viewers also liked (16)

Similar to Event Driven Architecture (20)

Recently uploaded (20)

Event Driven Architecture