SlideShare a Scribd company logo
© 2016 MapR Technologies L1-1®
© 2016 MapR Technologies
®
Streaming Patterns, Revolutionary
Architectures
Carol McDonald
© 2016 MapR Technologies L1-2®
Agenda
Streams Core Components
•  Topics, Partitions
•  Fault Tolerance
•  High Availability
Patterns
•  Event Sourcing
•  Duality of Streams and Databases
•  Command Query Responsibility Separation
•  Polyglot Persistence, Multiple Materialized Views
•  Turning the Database Upside Down
Real World Examples
•  Fraud Detection
•  Healthcare Exchange
© 2016 MapR Technologies L1-3®
Which products are we discussing?
© 2016 MapR Technologies L1-4®
© 2016 MapR Technologies© 2016 MapR Technologies
Streams Core Components
© 2016 MapR Technologies L1-5®
What’s a Stream ?
Producers ConsumersEvents_Stream
A stream is an unbounded sequence of events carried
from a set of producers to a set of consumers.
Events
© 2016 MapR Technologies L1-6®
What is Streaming Data? Got Some Examples?
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
© 2016 MapR Technologies L1-7®
Why Streams?
Trigger Events:
•  Stock Prices
•  User Activity
•  Sensor Data
Topic
Many Big Data sources are Event Oriented
StreamStreamStream
Event Data
TopicTopic
Real-Time Analytics
© 2016 MapR Technologies L1-8®
Analyze Data
What if you need to analyze data as it arrives?
© 2016 MapR Technologies L1-9®
It was hot
at 6:05
yesterday!
Batch Processing with HDFS
Analyze
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
90°90°
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
© 2016 MapR Technologies L1-10®
Event Processing with Streams
6:05 P.M.: 90°
To
pic
Stream
Temperature
Turn on the air
conditioning!
© 2016 MapR Technologies L1-11®
Organize Data
What if you need to organize data as it arrives?
© 2016 MapR Technologies L1-12®
Integrating Many Data Sources and Applications
Sources
(Producers)
Applications
(Consumers)
Unorganized, Complicated, and Tightly Coupled.
© 2016 MapR Technologies L1-13®
Organize Data into Topics with MapR Streams
Topics Organize Events into Categories and Decouple Producers from Consumers
Consumers
MapR Cluster
Topic: Pressure
Topic: Temperature
Topic: Warnings
Consumers
Consumers
Kafka API Kafka API
© 2016 MapR Technologies L1-14®
Process High Volume of Data
What if you need to process a high volume of data as it arrives?
© 2016 MapR Technologies L1-15®
What if BP had detected problems before the oil hit the water ?
•  1M samples/sec
•  High performance at
scale is necessary!
© 2016 MapR Technologies L1-16®
Legacy Messaging
Millions of
Sources
Hundreds of
Destinationsinsert
Legacy Message
Queue:
Message rate
<100K/s
Publish
Acks
delete
Consume
Acks
© 2016 MapR Technologies L1-17®
Mechanisms for Decoupling
Traditional message queues?
•  Huge performance hit for persistence:
•  message acknowledgement per message per consumer
•  Lots of Non sequential disk I/O when messages added/removed
© 2016 MapR Technologies L1-18®
Scalable Messaging with MapR Streams
Server 1
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Server 2
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Server 3
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
Topics are partitioned for throughput and scalability
© 2016 MapR Technologies L1-19®
Scalable Messaging with MapR Streams
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
Producers are load balanced between partitions
Kafka API
© 2016 MapR Technologies L1-20®
Scalable Messaging with MapR Streams
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Partition2: Topic - Pressure
Partition2: Topic - Temperature
Partition2: Topic - Warning
Partition3: Topic - Pressure
Partition3: Topic - Temperature
Partition3: Topic - Warning
Consumers
Consumers
Consumers
Consumer groups can read in parallel
Kafka API
© 2016 MapR Technologies L1-21®
Core Components: Partitions
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partitions:
–  Messages are
appended in
order
Offset:
–  Sequential id of a
message in a
partition Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
New
Message
6 5 4 3 2 1
Old
Message
© 2016 MapR Technologies L1-22®
Read Cursors
•  Read cursor: offset ID of most recent read message
•  Producers Append New messages to tail
•  Consumers Read from head
MapR Cluster
6 5 4 3 2 1
Consumer
groupProducers
Read cursors
Consumer
group
© 2016 MapR Technologies L1-23®
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
Events are delivered in the order they are received, like a queue.
Partitioned, Sequential Access =
High Performance New
Message
6 5 4 3 2 1
Old
Message
© 2016 MapR Technologies L1-24®
Unlike a queue, events are persisted even after they’re delivered
Messages remain on the partition, available to other consumers
Minimizes Non-Sequential disk read-writes
MapR Cluster (1 Server)
Topic: Warning
Partition
1
3 2 1 Unread Events
Get Unread
3 2 1
Client Library ConsumerPoll
© 2016 MapR Technologies L1-25®
Considering a Messaging Platform
Kafka-esque Logs?
•  Sequential writing/reading disk:
•  Messages are persisted sequentially as produced, and read sequentially when consumed
•  Performance plus persistence
•  performance of up to a billion messages per second at millisecond-level delivery times.
Kafka model is BLAZING fast
•  Kafka 0.9 API with message sizes at 200 bytes
•  MapR Streams on a 5 node cluster sustained 18 million events / sec
•  Throughput of 3.5GB/s and over 1.5 trillion events / day
© 2016 MapR Technologies L1-26®
When Are Messages Deleted?
•  Messages can be persisted forever
Or
•  Older messages can be deleted automatically based on time to live
MapR Cluster (1 Server)
6 5 4 3 2 1Partition
1
Older
message
© 2016 MapR Technologies L1-27®
Parallelism When Reading
To read messages from the same Topic in parallel:
•  create consumer groups
•  consumers with same group.id
•  partitions assigned dynamically round-robin
Consumer group: Oil Wells
Consumer A
Consumer B
Consumer C
MapR Cluster
Partition 4: Warning
Partition 3: Warning
Partition 2: Warning
Partition 1: Warning
Partition 5: Warning
© 2016 MapR Technologies L1-28®
Fault Tolerance Consumption: Partitions Re-Assigned Dynamically
If consumer goes offline, partitions re-assigned
Consumer group.id: Oil Wells
Consumer A
Consumer C
MapR Cluster
Partition4: Warning
Partition3: Warning
Partition2: Warning
Partition1: Warning
Partition5: Warning
© 2016 MapR Technologies L1-29®
Processing Same Message for Different Views
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
Pub Sub: Multiple Consumers, Multiple Destinations
© 2016 MapR Technologies L1-30®
© 2016 MapR Technologies© 2016 MapR Technologies
Partition Fault Tolerance
© 2016 MapR Technologies L1-31®
Message Recovery
What if you need to recover messages in case of server failure?
© 2016 MapR Technologies L1-32®
Partitions are Replicated for Fault Tolerance
Producer
Producer
Server 2 Partition2: Topic - Warning
Producer
Server 1 Partition1: Topic - Warning
Server 3 Partition3: Topic - Warning
Server 2
Server 3
Server 1
Server 3
Server 1
Server 2
© 2016 MapR Technologies L1-33®
Partition1: Warning
Partition2: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition2: Warning Replica
Partition3: Warning
Producer
Producer
Producer
Server 1
Server 2
Server 3
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition2: Warning
Partitions are Replicated for Fault Tolerance
© 2016 MapR Technologies L1-34®
Partitions are Replicated for Fault Tolerance
Producer
Producer
Producer
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition1: Warning
Partition2: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition2: Warning Replica
Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning
© 2016 MapR Technologies L1-35®
Partitions are Replicated for Fault tolerance
Producer
Producer
Producer
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition1: Warning
Partition2: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica
Partition2: Warning Replica
Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning
© 2016 MapR Technologies L1-36®
© 2016 MapR Technologies© 2016 MapR Technologies
Streams and High Availability
© 2016 MapR Technologies L1-37®
•  Stream:
–  collection of topics managed together
•  Manage stream:
–  replication
–  security
–  time-to-live
–  number of partitions
Core Components: Streams
Stream
Pressure
Temperature
Warning
Stream
Pressure
Temperature
Warning
Consumers
Consumers
Consumers
Consumers
Producers
Producers
Replication
© 2016 MapR Technologies L1-38®
Real-time Access
What if you need real-time access to live data distributed across multiple clusters
and multiple data centers?
© 2016 MapR Technologies L1-39®
Lack of Global Replication
Topic: C
© 2016 MapR Technologies L1-40®
Streams and Replication
Streams:
•  are a collection of topics
•  can be replicated worldwide
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
Replicating to
another
cluster
© 2016 MapR Technologies L1-41®
Streams and Replication
Topic: A
Topic: B
Topic: C
Fail Over
Streams:
•  high availability
•  disaster recovery
© 2016 MapR Technologies L1-42®
Replicating Streams: Master-Slave Replication
Venezuela_HA
Cluster
Metrics Stream
MetricsProducers
Venezuela
Cluster
Metrics Stream
Metrics
Consumers
High Availabiltiy
Backup for
Venezula
Master Slave
© 2016 MapR Technologies L1-43®
Replicating Streams: Many-to-One Replication
Houston
Metrics Stream
Metrics
Producers Venezuela
Metrics Stream
MetricsConsumers
Consumers
Producers Mexico
Metrics Stream
MetricsConsumers
Analyze all data from
Houston
Many
One
© 2016 MapR Technologies L1-44®
Replicating Streams: Multi-Master Replication
Producers Seoul
Metrics Stream
MetricsConsumers
ProducersSan Francisco
Metrics Stream
Metrics Consumers
Both send and receive updates
© 2016 MapR Technologies L1-45®
Stream Replication
WAN
Stream
Pressure
Temperature
Warning
Stream
Pressure
Temperature
Warning
Stream
Pressure
Temperature
Warning
© 2016 MapR Technologies L1-46®
Ship picks up containers…
Singapore
© 2016 MapR Technologies L1-47®
Arrives at destination…
Tokyo
© 2016 MapR Technologies L1-48®
While enroute to next destination…
Washington
© 2016 MapR Technologies L1-49®
Where does the data live…
Singapore Washington
Tokyo
© 2016 MapR Technologies L1-50®
What is important about this?
Data is generated on the ship
•  Must have an easy way (i.e. foolproof) to move the data off the ship
Each port stores the data from the ship
•  Moving data between locations
•  Analytics could happen at any location
This is a multi-data center time series data use case
•  Events from sensors = metrics
•  Same concepts as data center monitoring
© 2016 MapR Technologies L1-51®
© 2016 MapR Technologies© 2016 MapR Technologies
Patterns
© 2016 MapR Technologies L1-52®
Event Sourcing
Updates
Imagine each event as a change to an entry in a database.
Account Id Balance
WillO 80.00
BradA 20.00
1: WillO : Deposit : 100.00
2: BradA : Deposit : 50.00
3: BradA : Withdraw : 30.00
4: WillO : Withdraw: 20.00
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Change log
4 3 2 1
credit, debit events
current account balances
© 2016 MapR Technologies L1-53®
Replication
Change Log
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
3 2 1 3 2 1
3 2 1
Duality of Streams and Tables:
Database: captures data at rest
Stream: captures data change
Master:
Append writes
Slave:
Apply writes in order
© 2016 MapR Technologies L1-54®
Which Makes a Better System of Record?
Which of these can be used to reconstruct the other?
1: WillO : Deposit : 100.00
2: BradA : Deposit : 50.00
3: BradA : Withdraw : 30.00
4: WillO : Withdraw: 20.00
Account Id Balance
WillO 80.00
BradA 20.00
Change Log
3 2 1
© 2016 MapR Technologies L1-55®
Rewind: Reprocessing Events
MapR Cluster
6 5 4 3 2 1Producers
Reprocess from
oldest message
Consumer
Create new view, Index, cache
© 2016 MapR Technologies L1-56®
Rewind Reprocessing Events
MapR Cluster
6 5 4 3 2 1Producers
To Newest message
Consumer new view
Read from
new view
© 2016 MapR Technologies L1-57®
Event Sourcing, Command Query Responsibility Separation:
Turning the Database Upside Down
Key-Val Document Graph
Wide
Column
Time
Series
Relational
???Events Updates
© 2016 MapR Technologies L1-58®
What Else Do I Use My Stream For?
Lineage - “how did BradA’s balance get so low?”
Auditing - “who deposited/withdrew from BradA’s account?”
History – to see the status of the accounts last year
Integrity - “can I trust this data hasn’t been tampered with?”
•  Yup - Streams are immutable
0: WillO : Deposit : 100.00
1: BradA : Deposit : 50.00
2: BradA : Withdraw : 30.00
3: WillO : Withdraw: 20.00
© 2016 MapR Technologies L1-59®
What Do I Need For This to Work?
Infinitely persisted events
A way to query your persisted stream data
An integrated security model across the stream and databases
© 2016 MapR Technologies L1-60®
Fraud Detection
Point of Sale -> Data Center is Transaction Fraud ?
•  Lots of requests
•  Need answer within ~ 50 100 milliseconds
Data
Center
Point of Sale
Location, time, card#
Fraud yes/no ?
© 2016 MapR Technologies L1-61®
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
1.  Look up last card use
2.  Compute the card velocity:
•  Subtract last location, time from
current location, time
3.  Update last card use
© 2016 MapR Technologies L1-62®
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
1.  Look up last card use
2.  Compute the card velocity
3.  Update last card use
Bottleneck !
© 2016 MapR Technologies L1-63®
Service Isolation: Separate Read from Write
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
Read
Read last card use
© 2016 MapR Technologies L1-64®
Separate Read Model from the Write Model:
Command Query Responsibility Separation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
Read
Event last card use
Write last card use
© 2016 MapR Technologies L1-65®
Event Sourcing: New Uses of Data
Processing Same Message for Multiple Views
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
© 2016 MapR Technologies L1-66®
Scaling Through Isolation allows Multiple Consumers
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
Multiple fraud detectors can use the same message queue
•  De-coupling and
isolation are key
•  Propagate
events, not table
updates
© 2016 MapR Technologies L1-67®
Decoupled Architecture
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection
more than one component can
make use of the same stream of messages for a variety of uses
© 2016 MapR Technologies L1-68®
Lessons
De-coupling and isolation are key
Propagate events, not table updates
© 2016 MapR Technologies L1-69®
Building Enterprise Software vs Internet Companies
Enterprise Software:
Complexity of domain =>
Business logic, Business rules
Banking, Healthcare, Telecom
Compliance=>
Security
Internet Companies:
Volume of data =>
Complex data infrastructure
Large Scale Availability, Recovery
Reference Martin Kleppmann
© 2016 MapR Technologies L1-70®
Building Enterprise Software vs Internet Companies
Enterprise Software:
Event Sourcing
Internet Companies:
Stream Processing
Reference Martin Kleppmann
© 2016 MapR Technologies L1-71®
© 2016 MapR Technologies© 2016 MapR Technologies
Real World Solution
© 2016 MapR Technologies L1-72®
Credit Card Fraud Model Building
© 2016 MapR Technologies L1-73®
ServeNoSQL StorageData Ingest
Fraud Stream Processing Architecture
Stream
ProcessingSource
MapR-FS
MapR-DB
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
© 2016 MapR Technologies L1-74®
Streams
Messaging
Fraud Processing
Stream Processing
Derive
features
Model
raw
enriched
alerts
process
Batch Processing
MapR-FS
MapR-DB
MapR-DB
raw
enriched
alerts
Model
build model
update model
© 2016 MapR Technologies L1-75®
Streams
Messaging
Fraud Event Processing
Stream
Processing
NoSQL
Storage
MapR-FS
MapR-DB
Raw
Enriched
Fraud
1.  Parse raw event
2.  read card holder
profile from MapR-DB
3.  Derive features
4.  Get prediction from
model with features
5.  Publish not fraud to
enriched topic
6.  Publish fraud to
fraud topic
© 2016 MapR Technologies L1-76®
Fraud Processing Same Message for Different Views
Partition1: Topic – Raw Trans
Partition1: Topic – Enriched
Partition1: Topic – Fraud Alert
Partition2: Topic – Raw Trans
Partition2: Topic - Enriched
Partition2: Topic – Fraud Alert
Partition3: Topic – Raw Trans
Partition3: Topic - Enriched
Partition3: Topic – Fraud Alert
Consumers
MapR-FS
MapR-DB
Consumers
Consumers
Consumers
MapR-FS
MapR-DB
Consumers
Consumers
Consumers
MapR-FS
MapR-DB
Consumers
Consumers
© 2016 MapR Technologies L1-77®
© 2016 MapR Technologies© 2016 MapR Technologies
Real World Solution
© 2016 MapR Technologies L1-78®
JSON DB
(MapR-DB)
Graph DB
(Titan on
MapR-DB)
Search Engine
(Elastic-Search)
Transforming the Health Care Ecosystem
Electronic Medical
Records
“The Stream is the
System of Record”
–Brad Anderson
VP Big Data Informatics
© 2016 MapR Technologies L1-79®
Liaison ALLOY™ Platform
79
Data Integration
ingest syndicatetransform
Data Management
master
deduplicate
harmonize
relate
merge
tokenize
store / persist
analyze
summarize
report
distill
recommend
explore
query
sandbox
batch transform
learn
traverse
© 2016 MapR Technologies L1-80®
Use Case: Streaming System of Record for Healthcare
Objective:
•  Build a flexible, secure
healthcare exchange
Records Analysis
Applications
Challenges:
•  Many different data models
•  Security and privacy issues
•  HIPAA compliance
Records
© 2016 MapR Technologies L1-81®
ALLOY Health:
Exchange State HIE
Clinical Data Viewer
Analytics queries like:
What are the outcomes in the entire state on diabetes?
Are there doctors that are doing this better than others?
Clinical Data
Financial Data
Provider
Organizations
© 2016 MapR Technologies L1-82®
2000+ Practices 200 + Labs 30,000 + Clinicians
OrdersAnywhere
PORTAL (no EHR)
EHR with
HL7 ONLY
EHR with WORKFLOW
INTEGRATION
RADIOLOGY
LAB
© 2016 MapR Technologies L1-83®
This is a PAIN !
COMPLIAN
CE
SECURITY CONTROLS
COMPLIANCE
FEATURES
PRIVACY
PCI DSS
3.0
21 CFR Part
11
SSAE16 /
SOC2
HIPAA/HITECH	
  
© 2016 MapR Technologies L1-84®
WHY NOW?
84http://bit.ly/29aBatK
© 2016 MapR Technologies L1-85®
WHY NOW?
2014 FQ4 profit
$ -440 M
Total Cost Estimate
$ -12 B
© 2016 MapR Technologies L1-86®
Why Now? The Relational database is not the only tool
1234
Attribute Value
patient_id 1234
Name Jon Smith
Age 50
999
Attribute Value
patient_id 999
Name Jonathan
Smith
DOB Jun 1965
86
9876
Attribute Value
provider_id 86
Name Dr. Nora Paige
Specialty Diabetes
Attribute Value
rx_id 9876
Name Sitagliptin
Dosage 325mg
Visited
Prescribed
WasPrescribed
Patient
Patient
Prescription
Provider
Context and Relationships
© 2016 MapR Technologies L1-87®
WHY NOW? Mind the Gap
87
© 2016 MapR Technologies L1-88®
Streaming System of Record for Healthcare
Stream
Topic
Records
Applications
6 5 4 3 2 1
Search
Graph DB
JSON
HBase
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
A
P
I
Streaming System of Record Materialized
Views
© 2016 MapR Technologies L1-89®
89	
  
Immutable Log
Raw
Data
workflow
Key/Value
(MapR-DB)
materialized
view
workflow
Search
Engine
materialized
view
CEP
k v v v v v
k v v v
k v v
k v v v v
k v v v
k v v v v v
Document Log
(MapR-FS)
log
API
App
pre-
processor
workflow
Graph
(ArangoDB)
materialized
view
workflow
Time
Series
(OpenTSDB)
materialized
view
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
App AppApp
...
The Promised Land
Compliance
Auditor
© 2016 MapR Technologies L1-90®
The Promised Land
Auditor smiley faces
•  Data Lineage
•  Audit Logging
•  Wire-level encryption
•  At Rest encryption
Replication
•  Disaster Recovery
•  EU – data can’t leave
Non-Stream / Non-”Big Data”
•  Software Development Lifecycle
•  System Hardening
•  Separation of Concerns
-  Dev vs Ops
•  Patch Management
90
Compliance
Auditor
© 2016 MapR Technologies L1-91®
Solution
Design/architecture solved some
•  Streams
•  Data Lineage/System of Record
•  Kappa Architecture (Kreps/Kleppman)
MapR solved others
•  Unified Security
•  Replication DC to DC
•  Converge Kafka/HBase/Hadoop to one cluster
•  Multi-tenancy (lots of topics, for lots of tenants)
91
© 2016 MapR Technologies L1-92®
© 2016 MapR Technologies© 2016 MapR Technologies
API
© 2016 MapR Technologies L1-93®
Sample Producer: All Together
public class SampleProducer {
String topic=“/streams/pump:warning”;
public static KafkaProducer producer;
public static void main(String[] args) {
producer=setUpProducer();
for(int i = 0; i < 3; i++) {
String txt = “msg ” + i;
ProducerRecord<String, String> rec = new
ProducerRecord<String, String>(topic, txt);
producer.send(rec);
System.out.println("Sent msg number " + i);
}
producer.close();
}
© 2016 MapR Technologies L1-94®
public class MyConsumer {
public static String topic = "/stream/pump:warning”;
public static KafkaConsumer consumer;
public static void main(String[] args) {
configureConsumer(args);
consumer.subscribe(topic);
while (true) {
ConsumerRecords<String, String> msg=
consumer.poll(pollTimeOut);
Iterator<ConsumerRecord<String, String>> iter =
msg.iterator();
while (iter.hasNext()) {
ConsumerRecord<String, String> record = iter.next();
System.out.println(”read " + record.toString());
}
}
consumer.close();
}
}
Sample Consumer: All Together
© 2016 MapR Technologies L1-95®
© 2016 MapR Technologies© 2016 MapR Technologies
Summary
© 2016 MapR Technologies L1-96®
Can we get “Extreme” ?
1+ Trillion Events
•  per day
Millions of Producers
•  Billions of events per second
Multiple Consumers
•  Potentially for every event
Multiple Data Centers
•  Plan for success
•  Plan for drastic failure
Think that is crazy? Consider having 100
servers and performing:
Monitoring and Application logs…
•  100 metrics per server
•  60 samples per minute
•  50 metrics per request
•  1,000 log entries per request (abnormally
small, depends on level)
•  1million requests per day
~ 2 billion events per day, for one small
(ish) use case
Extreme Average Reality
© 2016 MapR Technologies L1-97®
Stream Processing
Building a Complete Data Architecture
MapR File System
(MapR-FS)
MapR Converged Data Platform
MapR Database
(MapR-DB)
MapR Streams
Sources/Apps Bulk Processing
© 2016 MapR Technologies L1-98®
© 2016 MapR Technologies L1-99®
© 2016 MapR Technologies L1-10
0
®
bit.ly/jjug-aug2016
Find my slides & other related materials to this talk here:
or search:
© 2016 MapR Technologies L1-10
1
®
MapR Blog
• https://www.mapr.com/blog/
© 2016 MapR Technologies L1-10
2
®
…helping you put data technology to work
●  Find answers
●  Ask technical questions
●  Join on-demand training course
discussions
●  Follow release announcements
●  Share and vote on product ideas
●  Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com

More Related Content

PDF
Advanced Threat Detection on Streaming Data
PDF
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
PPTX
NoSQL Application Development with JSON and MapR-DB
PDF
How Big Data is Reducing Costs and Improving Outcomes in Health Care
PDF
Build a Time Series Application with Apache Spark and Apache HBase
PDF
Apache Spark Overview
PDF
Streaming patterns revolutionary architectures
PDF
Applying Machine Learning to Live Patient Data
Advanced Threat Detection on Streaming Data
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
NoSQL Application Development with JSON and MapR-DB
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Build a Time Series Application with Apache Spark and Apache HBase
Apache Spark Overview
Streaming patterns revolutionary architectures
Applying Machine Learning to Live Patient Data

What's hot (19)

PDF
Fast Cars, Big Data How Streaming can help Formula 1
PPTX
When Streaming Becomes Strategic
PDF
Introduction to Spark on Hadoop
PDF
Free Code Friday - Machine Learning with Apache Spark
PPTX
How Spark is Enabling the New Wave of Converged Cloud Applications
PDF
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
PPTX
Deep Learning vs. Cheap Learning
PDF
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
PPTX
MapR 5.2 Product Update
PDF
MapR 5.2: Getting More Value from the MapR Converged Data Platform
PPTX
Evolving from RDBMS to NoSQL + SQL
PPTX
MapR and Cisco Make IT Better
PDF
IoT Use Cases with MapR
PDF
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
PDF
Predicting Flight Delays with Spark Machine Learning
PPTX
Dealing with an Upside Down Internet
PPTX
Big Data Everywhere Chicago: SQL on Hadoop
PDF
MapR & Skytree:
PPTX
Spark & Hadoop at Production at Scale
Fast Cars, Big Data How Streaming can help Formula 1
When Streaming Becomes Strategic
Introduction to Spark on Hadoop
Free Code Friday - Machine Learning with Apache Spark
How Spark is Enabling the New Wave of Converged Cloud Applications
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Deep Learning vs. Cheap Learning
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
MapR 5.2 Product Update
MapR 5.2: Getting More Value from the MapR Converged Data Platform
Evolving from RDBMS to NoSQL + SQL
MapR and Cisco Make IT Better
IoT Use Cases with MapR
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Predicting Flight Delays with Spark Machine Learning
Dealing with an Upside Down Internet
Big Data Everywhere Chicago: SQL on Hadoop
MapR & Skytree:
Spark & Hadoop at Production at Scale
Ad

Viewers also liked (20)

PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
PDF
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
PDF
Real-Time Data Feeds Using the Streaming API
PPTX
Leveraging Mesos to manage container workloads at Samsung SAMI
PPTX
SAMI - Samsung Developer Conference - Nov 2014
PDF
What is your PaaS
PDF
Micro Gateways are a Big Deal
ODP
Interoperable Web Services with JAX-WS and WSIT
PDF
Making Scrum Work Inside Small Businesses
PPTX
PDF
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
PDF
Big Data Streams Architectures. Why? What? How?
PPTX
Apache Spark Machine Learning Decision Trees
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PPTX
Building a Node.js API backend with LoopBack in 5 Minutes
PDF
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
PDF
Rapid API Development with LoopBack/StrongLoop
PDF
NoSQL HBase schema design and SQL with Apache Drill
PDF
ASPgems - kappa architecture
PDF
Real time data ingestion and Hybrid Cloud
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Real-Time Data Feeds Using the Streaming API
Leveraging Mesos to manage container workloads at Samsung SAMI
SAMI - Samsung Developer Conference - Nov 2014
What is your PaaS
Micro Gateways are a Big Deal
Interoperable Web Services with JAX-WS and WSIT
Making Scrum Work Inside Small Businesses
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Big Data Streams Architectures. Why? What? How?
Apache Spark Machine Learning Decision Trees
Innovation in the Data Warehouse - StampedeCon 2016
Building a Node.js API backend with LoopBack in 5 Minutes
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Rapid API Development with LoopBack/StrongLoop
NoSQL HBase schema design and SQL with Apache Drill
ASPgems - kappa architecture
Real time data ingestion and Hybrid Cloud
Ad

Similar to Streaming Patterns Revolutionary Architectures with the Kafka API (20)

PPTX
Design Patterns for working with Fast Data in Kafka
PPTX
Design Patterns for working with Fast Data
PPTX
How Spark is Enabling the New Wave of Converged Applications
PDF
Handling the Extremes: Scaling and Streaming in Finance
PPTX
Designing and Implementing your IOT Solutions with Open Source
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
PPTX
Map r seattle streams meetup oct 2016
PPTX
Kafka talk
PPSX
Event Sourcing & CQRS, Kafka, Rabbit MQ
PDF
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
PDF
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
PPTX
Event Detection Pipelines with Apache Kafka
PDF
Open Source Bristol 30 March 2022
PPTX
Stream data from Apache Kafka for processing with Apache Apex
PDF
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PPTX
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
PPTX
Spark Streaming the Industrial IoT
PPTX
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
PPSX
MyHeritage Kakfa use cases - Feb 2014 Meetup
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data
How Spark is Enabling the New Wave of Converged Applications
Handling the Extremes: Scaling and Streaming in Finance
Designing and Implementing your IOT Solutions with Open Source
Evolving Beyond the Data Lake: A Story of Wind and Rain
Map r seattle streams meetup oct 2016
Kafka talk
Event Sourcing & CQRS, Kafka, Rabbit MQ
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
Event Detection Pipelines with Apache Kafka
Open Source Bristol 30 March 2022
Stream data from Apache Kafka for processing with Apache Apex
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Spark Streaming the Industrial IoT
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
MyHeritage Kakfa use cases - Feb 2014 Meetup

More from Carol McDonald (16)

PDF
Introduction to machine learning with GPUs
PDF
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
PDF
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
PDF
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
PDF
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
PDF
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
PDF
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
PDF
Demystifying AI, Machine Learning and Deep Learning
PDF
Spark graphx
PDF
Spark machine learning predicting customer churn
PDF
Apache Spark Machine Learning
PDF
Apache Spark streaming and HBase
PDF
Machine Learning Recommendations with Spark
PDF
Introduction to Spark
DOC
CU9411MW.DOC
PDF
Getting started with HBase
Introduction to machine learning with GPUs
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Demystifying AI, Machine Learning and Deep Learning
Spark graphx
Spark machine learning predicting customer churn
Apache Spark Machine Learning
Apache Spark streaming and HBase
Machine Learning Recommendations with Spark
Introduction to Spark
CU9411MW.DOC
Getting started with HBase

Recently uploaded (20)

PDF
top salesforce developer skills in 2025.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Nekopoi APK 2025 free lastest update
PDF
medical staffing services at VALiNTRY
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
System and Network Administration Chapter 2
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Transform Your Business with a Software ERP System
top salesforce developer skills in 2025.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Operating system designcfffgfgggggggvggggggggg
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Nekopoi APK 2025 free lastest update
medical staffing services at VALiNTRY
ManageIQ - Sprint 268 Review - Slide Deck
PTS Company Brochure 2025 (1).pdf.......
System and Network Administration Chapter 2
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Understanding Forklifts - TECH EHS Solution
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo Companies in India – Driving Business Transformation.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Design an Analysis of Algorithms II-SECS-1021-03
history of c programming in notes for students .pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
How Creative Agencies Leverage Project Management Software.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Transform Your Business with a Software ERP System

Streaming Patterns Revolutionary Architectures with the Kafka API

  • 1. © 2016 MapR Technologies L1-1® © 2016 MapR Technologies ® Streaming Patterns, Revolutionary Architectures Carol McDonald
  • 2. © 2016 MapR Technologies L1-2® Agenda Streams Core Components •  Topics, Partitions •  Fault Tolerance •  High Availability Patterns •  Event Sourcing •  Duality of Streams and Databases •  Command Query Responsibility Separation •  Polyglot Persistence, Multiple Materialized Views •  Turning the Database Upside Down Real World Examples •  Fraud Detection •  Healthcare Exchange
  • 3. © 2016 MapR Technologies L1-3® Which products are we discussing?
  • 4. © 2016 MapR Technologies L1-4® © 2016 MapR Technologies© 2016 MapR Technologies Streams Core Components
  • 5. © 2016 MapR Technologies L1-5® What’s a Stream ? Producers ConsumersEvents_Stream A stream is an unbounded sequence of events carried from a set of producers to a set of consumers. Events
  • 6. © 2016 MapR Technologies L1-6® What is Streaming Data? Got Some Examples? Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  • 7. © 2016 MapR Technologies L1-7® Why Streams? Trigger Events: •  Stock Prices •  User Activity •  Sensor Data Topic Many Big Data sources are Event Oriented StreamStreamStream Event Data TopicTopic Real-Time Analytics
  • 8. © 2016 MapR Technologies L1-8® Analyze Data What if you need to analyze data as it arrives?
  • 9. © 2016 MapR Technologies L1-9® It was hot at 6:05 yesterday! Batch Processing with HDFS Analyze 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° 90°90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°
  • 10. © 2016 MapR Technologies L1-10® Event Processing with Streams 6:05 P.M.: 90° To pic Stream Temperature Turn on the air conditioning!
  • 11. © 2016 MapR Technologies L1-11® Organize Data What if you need to organize data as it arrives?
  • 12. © 2016 MapR Technologies L1-12® Integrating Many Data Sources and Applications Sources (Producers) Applications (Consumers) Unorganized, Complicated, and Tightly Coupled.
  • 13. © 2016 MapR Technologies L1-13® Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers Consumers MapR Cluster Topic: Pressure Topic: Temperature Topic: Warnings Consumers Consumers Kafka API Kafka API
  • 14. © 2016 MapR Technologies L1-14® Process High Volume of Data What if you need to process a high volume of data as it arrives?
  • 15. © 2016 MapR Technologies L1-15® What if BP had detected problems before the oil hit the water ? •  1M samples/sec •  High performance at scale is necessary!
  • 16. © 2016 MapR Technologies L1-16® Legacy Messaging Millions of Sources Hundreds of Destinationsinsert Legacy Message Queue: Message rate <100K/s Publish Acks delete Consume Acks
  • 17. © 2016 MapR Technologies L1-17® Mechanisms for Decoupling Traditional message queues? •  Huge performance hit for persistence: •  message acknowledgement per message per consumer •  Lots of Non sequential disk I/O when messages added/removed
  • 18. © 2016 MapR Technologies L1-18® Scalable Messaging with MapR Streams Server 1 Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Server 2 Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Server 3 Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Topics are partitioned for throughput and scalability
  • 19. © 2016 MapR Technologies L1-19® Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Producers are load balanced between partitions Kafka API
  • 20. © 2016 MapR Technologies L1-20® Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Consumers Consumers Consumers Consumer groups can read in parallel Kafka API
  • 21. © 2016 MapR Technologies L1-21® Core Components: Partitions Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partitions: –  Messages are appended in order Offset: –  Sequential id of a message in a partition Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers New Message 6 5 4 3 2 1 Old Message
  • 22. © 2016 MapR Technologies L1-22® Read Cursors •  Read cursor: offset ID of most recent read message •  Producers Append New messages to tail •  Consumers Read from head MapR Cluster 6 5 4 3 2 1 Consumer groupProducers Read cursors Consumer group
  • 23. © 2016 MapR Technologies L1-23® Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers Events are delivered in the order they are received, like a queue. Partitioned, Sequential Access = High Performance New Message 6 5 4 3 2 1 Old Message
  • 24. © 2016 MapR Technologies L1-24® Unlike a queue, events are persisted even after they’re delivered Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes MapR Cluster (1 Server) Topic: Warning Partition 1 3 2 1 Unread Events Get Unread 3 2 1 Client Library ConsumerPoll
  • 25. © 2016 MapR Technologies L1-25® Considering a Messaging Platform Kafka-esque Logs? •  Sequential writing/reading disk: •  Messages are persisted sequentially as produced, and read sequentially when consumed •  Performance plus persistence •  performance of up to a billion messages per second at millisecond-level delivery times. Kafka model is BLAZING fast •  Kafka 0.9 API with message sizes at 200 bytes •  MapR Streams on a 5 node cluster sustained 18 million events / sec •  Throughput of 3.5GB/s and over 1.5 trillion events / day
  • 26. © 2016 MapR Technologies L1-26® When Are Messages Deleted? •  Messages can be persisted forever Or •  Older messages can be deleted automatically based on time to live MapR Cluster (1 Server) 6 5 4 3 2 1Partition 1 Older message
  • 27. © 2016 MapR Technologies L1-27® Parallelism When Reading To read messages from the same Topic in parallel: •  create consumer groups •  consumers with same group.id •  partitions assigned dynamically round-robin Consumer group: Oil Wells Consumer A Consumer B Consumer C MapR Cluster Partition 4: Warning Partition 3: Warning Partition 2: Warning Partition 1: Warning Partition 5: Warning
  • 28. © 2016 MapR Technologies L1-28® Fault Tolerance Consumption: Partitions Re-Assigned Dynamically If consumer goes offline, partitions re-assigned Consumer group.id: Oil Wells Consumer A Consumer C MapR Cluster Partition4: Warning Partition3: Warning Partition2: Warning Partition1: Warning Partition5: Warning
  • 29. © 2016 MapR Technologies L1-29® Processing Same Message for Different Views Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API Pub Sub: Multiple Consumers, Multiple Destinations
  • 30. © 2016 MapR Technologies L1-30® © 2016 MapR Technologies© 2016 MapR Technologies Partition Fault Tolerance
  • 31. © 2016 MapR Technologies L1-31® Message Recovery What if you need to recover messages in case of server failure?
  • 32. © 2016 MapR Technologies L1-32® Partitions are Replicated for Fault Tolerance Producer Producer Server 2 Partition2: Topic - Warning Producer Server 1 Partition1: Topic - Warning Server 3 Partition3: Topic - Warning Server 2 Server 3 Server 1 Server 3 Server 1 Server 2
  • 33. © 2016 MapR Technologies L1-33® Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Producer Producer Producer Server 1 Server 2 Server 3 Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition2: Warning Partitions are Replicated for Fault Tolerance
  • 34. © 2016 MapR Technologies L1-34® Partitions are Replicated for Fault Tolerance Producer Producer Producer Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Server 1 Server 2 Server 3 Partition2: Warning
  • 35. © 2016 MapR Technologies L1-35® Partitions are Replicated for Fault tolerance Producer Producer Producer Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Server 1 Server 2 Server 3 Partition2: Warning
  • 36. © 2016 MapR Technologies L1-36® © 2016 MapR Technologies© 2016 MapR Technologies Streams and High Availability
  • 37. © 2016 MapR Technologies L1-37® •  Stream: –  collection of topics managed together •  Manage stream: –  replication –  security –  time-to-live –  number of partitions Core Components: Streams Stream Pressure Temperature Warning Stream Pressure Temperature Warning Consumers Consumers Consumers Consumers Producers Producers Replication
  • 38. © 2016 MapR Technologies L1-38® Real-time Access What if you need real-time access to live data distributed across multiple clusters and multiple data centers?
  • 39. © 2016 MapR Technologies L1-39® Lack of Global Replication Topic: C
  • 40. © 2016 MapR Technologies L1-40® Streams and Replication Streams: •  are a collection of topics •  can be replicated worldwide Topic: A Topic: B Topic: C Topic: A Topic: B Topic: C Replicating to another cluster
  • 41. © 2016 MapR Technologies L1-41® Streams and Replication Topic: A Topic: B Topic: C Fail Over Streams: •  high availability •  disaster recovery
  • 42. © 2016 MapR Technologies L1-42® Replicating Streams: Master-Slave Replication Venezuela_HA Cluster Metrics Stream MetricsProducers Venezuela Cluster Metrics Stream Metrics Consumers High Availabiltiy Backup for Venezula Master Slave
  • 43. © 2016 MapR Technologies L1-43® Replicating Streams: Many-to-One Replication Houston Metrics Stream Metrics Producers Venezuela Metrics Stream MetricsConsumers Consumers Producers Mexico Metrics Stream MetricsConsumers Analyze all data from Houston Many One
  • 44. © 2016 MapR Technologies L1-44® Replicating Streams: Multi-Master Replication Producers Seoul Metrics Stream MetricsConsumers ProducersSan Francisco Metrics Stream Metrics Consumers Both send and receive updates
  • 45. © 2016 MapR Technologies L1-45® Stream Replication WAN Stream Pressure Temperature Warning Stream Pressure Temperature Warning Stream Pressure Temperature Warning
  • 46. © 2016 MapR Technologies L1-46® Ship picks up containers… Singapore
  • 47. © 2016 MapR Technologies L1-47® Arrives at destination… Tokyo
  • 48. © 2016 MapR Technologies L1-48® While enroute to next destination… Washington
  • 49. © 2016 MapR Technologies L1-49® Where does the data live… Singapore Washington Tokyo
  • 50. © 2016 MapR Technologies L1-50® What is important about this? Data is generated on the ship •  Must have an easy way (i.e. foolproof) to move the data off the ship Each port stores the data from the ship •  Moving data between locations •  Analytics could happen at any location This is a multi-data center time series data use case •  Events from sensors = metrics •  Same concepts as data center monitoring
  • 51. © 2016 MapR Technologies L1-51® © 2016 MapR Technologies© 2016 MapR Technologies Patterns
  • 52. © 2016 MapR Technologies L1-52® Event Sourcing Updates Imagine each event as a change to an entry in a database. Account Id Balance WillO 80.00 BradA 20.00 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying Change log 4 3 2 1 credit, debit events current account balances
  • 53. © 2016 MapR Technologies L1-53® Replication Change Log https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying 3 2 1 3 2 1 3 2 1 Duality of Streams and Tables: Database: captures data at rest Stream: captures data change Master: Append writes Slave: Apply writes in order
  • 54. © 2016 MapR Technologies L1-54® Which Makes a Better System of Record? Which of these can be used to reconstruct the other? 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 Account Id Balance WillO 80.00 BradA 20.00 Change Log 3 2 1
  • 55. © 2016 MapR Technologies L1-55® Rewind: Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers Reprocess from oldest message Consumer Create new view, Index, cache
  • 56. © 2016 MapR Technologies L1-56® Rewind Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers To Newest message Consumer new view Read from new view
  • 57. © 2016 MapR Technologies L1-57® Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down Key-Val Document Graph Wide Column Time Series Relational ???Events Updates
  • 58. © 2016 MapR Technologies L1-58® What Else Do I Use My Stream For? Lineage - “how did BradA’s balance get so low?” Auditing - “who deposited/withdrew from BradA’s account?” History – to see the status of the accounts last year Integrity - “can I trust this data hasn’t been tampered with?” •  Yup - Streams are immutable 0: WillO : Deposit : 100.00 1: BradA : Deposit : 50.00 2: BradA : Withdraw : 30.00 3: WillO : Withdraw: 20.00
  • 59. © 2016 MapR Technologies L1-59® What Do I Need For This to Work? Infinitely persisted events A way to query your persisted stream data An integrated security model across the stream and databases
  • 60. © 2016 MapR Technologies L1-60® Fraud Detection Point of Sale -> Data Center is Transaction Fraud ? •  Lots of requests •  Need answer within ~ 50 100 milliseconds Data Center Point of Sale Location, time, card# Fraud yes/no ?
  • 61. © 2016 MapR Technologies L1-61® Traditional Solution POS 1..n Fraud detector Last card use 1.  Look up last card use 2.  Compute the card velocity: •  Subtract last location, time from current location, time 3.  Update last card use
  • 62. © 2016 MapR Technologies L1-62® What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector 1.  Look up last card use 2.  Compute the card velocity 3.  Update last card use Bottleneck !
  • 63. © 2016 MapR Technologies L1-63® Service Isolation: Separate Read from Write POS 1..n Fraud detector Last card use Updater card activity Read Read last card use
  • 64. © 2016 MapR Technologies L1-64® Separate Read Model from the Write Model: Command Query Responsibility Separation POS 1..n Fraud detector Last card use Updater card activity Read Event last card use Write last card use
  • 65. © 2016 MapR Technologies L1-65® Event Sourcing: New Uses of Data Processing Same Message for Multiple Views POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 66. © 2016 MapR Technologies L1-66® Scaling Through Isolation allows Multiple Consumers POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector Multiple fraud detectors can use the same message queue •  De-coupling and isolation are key •  Propagate events, not table updates
  • 67. © 2016 MapR Technologies L1-67® Decoupled Architecture Producer Activity Handler Producer Producer Historical Interesting Data Real-time Analysis Results Dashboard Anomaly Detection more than one component can make use of the same stream of messages for a variety of uses
  • 68. © 2016 MapR Technologies L1-68® Lessons De-coupling and isolation are key Propagate events, not table updates
  • 69. © 2016 MapR Technologies L1-69® Building Enterprise Software vs Internet Companies Enterprise Software: Complexity of domain => Business logic, Business rules Banking, Healthcare, Telecom Compliance=> Security Internet Companies: Volume of data => Complex data infrastructure Large Scale Availability, Recovery Reference Martin Kleppmann
  • 70. © 2016 MapR Technologies L1-70® Building Enterprise Software vs Internet Companies Enterprise Software: Event Sourcing Internet Companies: Stream Processing Reference Martin Kleppmann
  • 71. © 2016 MapR Technologies L1-71® © 2016 MapR Technologies© 2016 MapR Technologies Real World Solution
  • 72. © 2016 MapR Technologies L1-72® Credit Card Fraud Model Building
  • 73. © 2016 MapR Technologies L1-73® ServeNoSQL StorageData Ingest Fraud Stream Processing Architecture Stream ProcessingSource MapR-FS MapR-DB Topic: A Topic: B Topic: C Topic: A Topic: B Topic: C
  • 74. © 2016 MapR Technologies L1-74® Streams Messaging Fraud Processing Stream Processing Derive features Model raw enriched alerts process Batch Processing MapR-FS MapR-DB MapR-DB raw enriched alerts Model build model update model
  • 75. © 2016 MapR Technologies L1-75® Streams Messaging Fraud Event Processing Stream Processing NoSQL Storage MapR-FS MapR-DB Raw Enriched Fraud 1.  Parse raw event 2.  read card holder profile from MapR-DB 3.  Derive features 4.  Get prediction from model with features 5.  Publish not fraud to enriched topic 6.  Publish fraud to fraud topic
  • 76. © 2016 MapR Technologies L1-76® Fraud Processing Same Message for Different Views Partition1: Topic – Raw Trans Partition1: Topic – Enriched Partition1: Topic – Fraud Alert Partition2: Topic – Raw Trans Partition2: Topic - Enriched Partition2: Topic – Fraud Alert Partition3: Topic – Raw Trans Partition3: Topic - Enriched Partition3: Topic – Fraud Alert Consumers MapR-FS MapR-DB Consumers Consumers Consumers MapR-FS MapR-DB Consumers Consumers Consumers MapR-FS MapR-DB Consumers Consumers
  • 77. © 2016 MapR Technologies L1-77® © 2016 MapR Technologies© 2016 MapR Technologies Real World Solution
  • 78. © 2016 MapR Technologies L1-78® JSON DB (MapR-DB) Graph DB (Titan on MapR-DB) Search Engine (Elastic-Search) Transforming the Health Care Ecosystem Electronic Medical Records “The Stream is the System of Record” –Brad Anderson VP Big Data Informatics
  • 79. © 2016 MapR Technologies L1-79® Liaison ALLOY™ Platform 79 Data Integration ingest syndicatetransform Data Management master deduplicate harmonize relate merge tokenize store / persist analyze summarize report distill recommend explore query sandbox batch transform learn traverse
  • 80. © 2016 MapR Technologies L1-80® Use Case: Streaming System of Record for Healthcare Objective: •  Build a flexible, secure healthcare exchange Records Analysis Applications Challenges: •  Many different data models •  Security and privacy issues •  HIPAA compliance Records
  • 81. © 2016 MapR Technologies L1-81® ALLOY Health: Exchange State HIE Clinical Data Viewer Analytics queries like: What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others? Clinical Data Financial Data Provider Organizations
  • 82. © 2016 MapR Technologies L1-82® 2000+ Practices 200 + Labs 30,000 + Clinicians OrdersAnywhere PORTAL (no EHR) EHR with HL7 ONLY EHR with WORKFLOW INTEGRATION RADIOLOGY LAB
  • 83. © 2016 MapR Technologies L1-83® This is a PAIN ! COMPLIAN CE SECURITY CONTROLS COMPLIANCE FEATURES PRIVACY PCI DSS 3.0 21 CFR Part 11 SSAE16 / SOC2 HIPAA/HITECH  
  • 84. © 2016 MapR Technologies L1-84® WHY NOW? 84http://bit.ly/29aBatK
  • 85. © 2016 MapR Technologies L1-85® WHY NOW? 2014 FQ4 profit $ -440 M Total Cost Estimate $ -12 B
  • 86. © 2016 MapR Technologies L1-86® Why Now? The Relational database is not the only tool 1234 Attribute Value patient_id 1234 Name Jon Smith Age 50 999 Attribute Value patient_id 999 Name Jonathan Smith DOB Jun 1965 86 9876 Attribute Value provider_id 86 Name Dr. Nora Paige Specialty Diabetes Attribute Value rx_id 9876 Name Sitagliptin Dosage 325mg Visited Prescribed WasPrescribed Patient Patient Prescription Provider Context and Relationships
  • 87. © 2016 MapR Technologies L1-87® WHY NOW? Mind the Gap 87
  • 88. © 2016 MapR Technologies L1-88® Streaming System of Record for Healthcare Stream Topic Records Applications 6 5 4 3 2 1 Search Graph DB JSON HBase Micro Service Micro Service Micro Service Micro Service Micro Service Micro Service A P I Streaming System of Record Materialized Views
  • 89. © 2016 MapR Technologies L1-89® 89   Immutable Log Raw Data workflow Key/Value (MapR-DB) materialized view workflow Search Engine materialized view CEP k v v v v v k v v v k v v k v v v v k v v v k v v v v v Document Log (MapR-FS) log API App pre- processor workflow Graph (ArangoDB) materialized view workflow Time Series (OpenTSDB) materialized view micro service micro service micro service micro service micro service micro service micro service micro service App AppApp ... The Promised Land Compliance Auditor
  • 90. © 2016 MapR Technologies L1-90® The Promised Land Auditor smiley faces •  Data Lineage •  Audit Logging •  Wire-level encryption •  At Rest encryption Replication •  Disaster Recovery •  EU – data can’t leave Non-Stream / Non-”Big Data” •  Software Development Lifecycle •  System Hardening •  Separation of Concerns -  Dev vs Ops •  Patch Management 90 Compliance Auditor
  • 91. © 2016 MapR Technologies L1-91® Solution Design/architecture solved some •  Streams •  Data Lineage/System of Record •  Kappa Architecture (Kreps/Kleppman) MapR solved others •  Unified Security •  Replication DC to DC •  Converge Kafka/HBase/Hadoop to one cluster •  Multi-tenancy (lots of topics, for lots of tenants) 91
  • 92. © 2016 MapR Technologies L1-92® © 2016 MapR Technologies© 2016 MapR Technologies API
  • 93. © 2016 MapR Technologies L1-93® Sample Producer: All Together public class SampleProducer { String topic=“/streams/pump:warning”; public static KafkaProducer producer; public static void main(String[] args) { producer=setUpProducer(); for(int i = 0; i < 3; i++) { String txt = “msg ” + i; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec); System.out.println("Sent msg number " + i); } producer.close(); }
  • 94. © 2016 MapR Technologies L1-94® public class MyConsumer { public static String topic = "/stream/pump:warning”; public static KafkaConsumer consumer; public static void main(String[] args) { configureConsumer(args); consumer.subscribe(topic); while (true) { ConsumerRecords<String, String> msg= consumer.poll(pollTimeOut); Iterator<ConsumerRecord<String, String>> iter = msg.iterator(); while (iter.hasNext()) { ConsumerRecord<String, String> record = iter.next(); System.out.println(”read " + record.toString()); } } consumer.close(); } } Sample Consumer: All Together
  • 95. © 2016 MapR Technologies L1-95® © 2016 MapR Technologies© 2016 MapR Technologies Summary
  • 96. © 2016 MapR Technologies L1-96® Can we get “Extreme” ? 1+ Trillion Events •  per day Millions of Producers •  Billions of events per second Multiple Consumers •  Potentially for every event Multiple Data Centers •  Plan for success •  Plan for drastic failure Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs… •  100 metrics per server •  60 samples per minute •  50 metrics per request •  1,000 log entries per request (abnormally small, depends on level) •  1million requests per day ~ 2 billion events per day, for one small (ish) use case Extreme Average Reality
  • 97. © 2016 MapR Technologies L1-97® Stream Processing Building a Complete Data Architecture MapR File System (MapR-FS) MapR Converged Data Platform MapR Database (MapR-DB) MapR Streams Sources/Apps Bulk Processing
  • 98. © 2016 MapR Technologies L1-98®
  • 99. © 2016 MapR Technologies L1-99®
  • 100. © 2016 MapR Technologies L1-10 0 ® bit.ly/jjug-aug2016 Find my slides & other related materials to this talk here: or search:
  • 101. © 2016 MapR Technologies L1-10 1 ® MapR Blog • https://www.mapr.com/blog/
  • 102. © 2016 MapR Technologies L1-10 2 ® …helping you put data technology to work ●  Find answers ●  Ask technical questions ●  Join on-demand training course discussions ●  Follow release announcements ●  Share and vote on product ideas ●  Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com