SlideShare a Scribd company logo
1© 2016 MapR Technologies 1© 2016 MapR Technologies
Evolving Beyond the Data Lake
A Story of Wind and Rain
2© 2016 MapR Technologies 2
Industry Leaders Are Investing in Disruptive Technology Now
Innovating and reducing costs at the same time
Source: IDC, Gartner; Analysis & Estimates: MapR
Next-gen consists of cloud, big data, software and hardware related expenses
(100,000)
(80,000)
(60,000)
(40,000)
(20,000)
-
20,000
40,000
60,000
80,000
100,000
120,000
2013 2014 2015 2016 2017 2018 2019 2020
Investment in Next-Gen vs. Legacy Technologies for Data
$120
100
80
60
40
20
(20)
(40)
(60)
(80)
(100)
In Billions
Total $ Growth of IT Market Next-Gen Growth Legacy Market Growth/Shrink in $
90% of data is on
next-gen technology
in just four years
3© 2016 MapR Technologies 3
Application Development and Deployment
Oracle
Bulk Load
Machine
Learning
Data
Lake
Predictive
Modeling
BI /
Reporting
Insights
DB
Events
(Kafka)
NoSQL
SQL
Server
Graph
DB
Microservice
(.NET)
Microservice
(NodeJS)
Microservice
(Java)
Customer Insights
SQL
Server
IIS, ASP.NET
Desktop
Browser
(Javascript, jQuery)
SQL
HTML, CSS, JS
Microsoft
Reporting
Service
2005 Today Desktop
Browser
(Javascript, 20+
Frameworks)
Tablet
Native
Android
Native
iOS
JSON
JSON, CSS,
HTML, JS
Backendfor
Frontend
(Java)
4© 2016 MapR Technologies 4
Application Development and Deployment
Oracle
Bulk Load
Machine
Learning
Data
Lake
Predictive
Modeling
BI /
Reporting
Insights
DB
Events
(Kafka)
NoSQL
SQL
Server
Graph
DB
Microservice
(.NET)
Backendfor
Frontend
(Java)
Microservice
(NodeJS)
Microservice
(Java)
Desktop
Browser
(Javascript, 20+
Frameworks)
Tablet
Native
Android
Native
iOS
Customer Insights
JSON
JSON, CSS,
HTML, JS
SQL
Server
IIS, ASP.NET
Desktop
Browser
(Javascript, jQuery)
SQL
HTML, CSS, JS
Microsoft
Reporting
Service
2005 Today
5© 2016 MapR Technologies 5© 2016 MapR Technologies© 2016 MapR Technologies
Messaging platforms
6© 2016 MapR Technologies 6
Producers Consumers
A stream is an unbounded sequence of events carried
from a set of producers to a set of consumers.
What’s a Stream?
Producers and consumers don’t have to be aware of
each other, instead they participate in shared topics.
This is called publish/subscribe.
/Events:Topic
7© 2016 MapR Technologies 7
Publishers and Subscribers (pub-sub)
/Events:Topic Analytics
Consumers
Stream ProcessorsSocial Platforms
Servers
(Logs, Metrics)
Sensors
Mobile Apps
Other Apps &
Microservices
Alerting Systems
Stream Processing
Frameworks
Databases &
Search Engines
Dashboards
Other Apps &
Microservices
8© 2016 MapR Technologies 8
Considering a Messaging Platform
• 50-100k messages per second used to be good
– Not really good to handle decoupled communication between services
• Kafka model is BLAZING fast
– Kafka 0.9 API with message sizes at 200 bytes
– MapR Streams on a 5 node cluster sustained 18 million events / sec
– Throughput of 3.5GB/s and over 1.5 trillion events / day
• Manual sharding is not a “great” solution
– Adding more servers should be easy and fool proof, not painful
– Yes, I have lived through this
9© 2016 MapR Technologies 9
Goals
• Real-time or near-time
– Includes situations with deadlines
– Also includes situations where delay is simply undesirable
– Even includes situations where delay is just fine
• Microservices
– Streaming is a convenient idiom for design
– Microservices … you know we wanted it
– Service isolation is a key requirement
10© 2016 MapR Technologies 10
Advantages of Messaging and Real-time Enablement
• Less moving parts
– Less things to go wrong
• Better resource utilization
– Scale any application up or down on demand
• Common deployment model (new isolation model)
– Repeatability between environments (dev, qa, production)
• Improved integration testing
– Listen to production streams in dev and qa (** this is a BIG DEAL! **)
• Shared file system
– Get at the data anywhere in the cluster
– Simplifies business continuity
11© 2016 MapR Technologies 11
A microservice is
loosely coupled
with bounded context
12© 2016 MapR Technologies 12
How to Couple Services and Break micro-ness
• Shared schemas, relational stores
• Ad hoc communication between services
• Enterprise service busses
• Brittle protocols
• Poor protocol versioning
Don’t do this!
13© 2016 MapR Technologies 13
How to Decouple Services
• Use self-describing data
• Private databases
• Infrastructural communication between services
• Use modern protocols
• Adopt future-proof protocol practices
• Use shared storage where necessary due to scale
14© 2016 MapR Technologies 14
Decoupled Architecture
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection
15© 2016 MapR Technologies 15
Mechanisms for Decoupling
• Traditional message queues?
– Message queues are classic answer
– Key feature/flaw is out-of-order acknowledgement
– Many implementations
– You pay a huge performance hit for persistence
• Kafka-esque Logs?
– Logs are like queues, but with ordering
– Out-of-order consumption is possible, acknowledgement not so much
– Canonical base implementation is Kafka
– Performance plus persistence
16© 2016 MapR Technologies 16
Shared Resources
17© 2016 MapR Technologies 17
Fraud Detection
?
POS 1
location, t, card #
yes/no?
POS 2
location, t, card #
yes/no?
18© 2016 MapR Technologies 18
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
19© 2016 MapR Technologies 19
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
20© 2016 MapR Technologies 20
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
21© 2016 MapR Technologies 21
How to Get Service Isolation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
22© 2016 MapR Technologies 22
New Uses of Data
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
23© 2016 MapR Technologies 23
Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
24© 2016 MapR Technologies 24© 2016 MapR Technologies
Use Cases
25© 2016 MapR Technologies 25
Event-based Data Drives Applications
Failure
Alerts
Real-time application
& network monitoring
Trending
now
Web
Personalized Offers
Real-time Fraud Detection
Ad optimization
Supply Chain Optimization
26© 2016 MapR Technologies 26
Classifiers
Fighting Fraudulent Web Traffic
Activity Stream
Click Stream
Deviation from Normal
Blacklist Activities
Whitelist Activities
User Activity Profile
Known Bad Classifier
All OK Classifier
Session Alteration
Stream Notify Security
27© 2016 MapR Technologies 27
Similarities between Marketing and Fraud?
Customer 360 Website Fraud
• Build a user profile
– What are their normal usage patterns
• Build “segmented” profiles
– What do real users normally do
• Dynamically alter website
– Prevent user functionality
• Kick-off external workflows
– Notify security team
• Build a user profile
– What type of content do they like
• Build “segmented” profiles
– Company affiliation
• Dynamically alter website
– Show alternate content
• Kick-off external workflows
– Nurture emails
28© 2016 MapR Technologies 28
Message
Bus
Specialized Storage
Operational Applications
J2EE
AppServer
Relational
Database
Legacy Business Platforms
• IT must integrate all the products
• Inability to operationalize the insight rapidly
• Can’t deal with high speed data ingestion and processing
• Scale up architecture leads to high cost
Specialized Storage
Analytical Applications
Analytic
Database
ETL Tool BI Tool
29© 2016 MapR Technologies 29
Converged Data Platform
Analytical
Applications
Operational
Applications
Converged Applications
Complete Access to Real-time and
Historical Data in One Platform
Developers
Creating Database
and Event Based
Applications
(Bottom Line Initiatives) (Top Line Initiatives)
Analysts
Creating BI Reports
and KPIs on Data
Warehouse
Historical Data Current Data
30© 2016 MapR Technologies 30
Web-Scale Storage
MapR-FS MapR-DB
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Event StreamingDatabase
MapR Platform Services: Open API Architecture
Assures Interoperability, Avoids Lock-in
HDFS
API
POSIX
NFS
SQL,
HBase
API
JSON
API
Kafka
API
31© 2016 MapR Technologies 31
Converged Application Benefits
• Consumers scale horizontally with partitions
• 1:1 mapping between consumer and partition
• Enables predictable scaling as production needs grow
• Data can be seamlessly replicated to another cluster
• Enables HA with zero code changes
• Data is indexed dynamically according to receivers, senders
• Scales beyond the capabilities of Kafka
• Snapshots can be taken to capture state
• Enables faster testing and deployment of
applications
32© 2016 MapR Technologies 32
Not All Data Platforms are the Same
33© 2016 MapR Technologies 33
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal

More Related Content

PDF
Deep Learning at Scale
PPTX
CEP - simplified streaming architecture - Strata Singapore 2016
PPTX
Evolving from RDBMS to NoSQL + SQL
PPTX
How Spark is Enabling the New Wave of Converged Cloud Applications
PDF
Open Source Innovations in the MapR Ecosystem Pack 2.0
PDF
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
PPTX
NoSQL Application Development with JSON and MapR-DB
PDF
MapR 5.2: Getting More Value from the MapR Converged Data Platform
Deep Learning at Scale
CEP - simplified streaming architecture - Strata Singapore 2016
Evolving from RDBMS to NoSQL + SQL
How Spark is Enabling the New Wave of Converged Cloud Applications
Open Source Innovations in the MapR Ecosystem Pack 2.0
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
NoSQL Application Development with JSON and MapR-DB
MapR 5.2: Getting More Value from the MapR Converged Data Platform

What's hot (20)

PPTX
Zeta Architecture: The Next Generation Big Data Architecture
PPTX
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
PPTX
When Streaming Becomes Strategic
PPTX
Spark & Hadoop at Production at Scale
PPTX
MapR Streams and MapR Converged Data Platform
PPTX
MapR and Cisco Make IT Better
PPTX
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
PPTX
MapR 5.2 Product Update
PPTX
MapR on Azure: Getting Value from Big Data in the Cloud -
PDF
Dchug m7-30 apr2013
PDF
An Introduction to the MapR Converged Data Platform
PPTX
MapR and Machine Learning Primer
PPTX
Building a Scalable Data Science Platform with R
PPTX
Keys for Success from Streams to Queries
PPTX
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
PDF
Insight Platforms Accelerate Digital Transformation
PPTX
Deep Learning vs. Cheap Learning
PPTX
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PDF
Meruvian - Introduction to MapR
Zeta Architecture: The Next Generation Big Data Architecture
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
When Streaming Becomes Strategic
Spark & Hadoop at Production at Scale
MapR Streams and MapR Converged Data Platform
MapR and Cisco Make IT Better
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
MapR 5.2 Product Update
MapR on Azure: Getting Value from Big Data in the Cloud -
Dchug m7-30 apr2013
An Introduction to the MapR Converged Data Platform
MapR and Machine Learning Primer
Building a Scalable Data Science Platform with R
Keys for Success from Streams to Queries
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Insight Platforms Accelerate Digital Transformation
Deep Learning vs. Cheap Learning
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Meruvian - Introduction to MapR
Ad

Viewers also liked (14)

PPTX
Next Generation Enterprise Architecture
PDF
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
PPTX
MapR 5.2: Getting More Value from the MapR Converged Community Edition
PDF
[G6]hadoop이중화왜하는거지
PDF
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
PDF
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
PDF
SAP Content Management Solution Brief
PPTX
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
PPTX
Big data analysing genomics and the bdg project
PDF
SAP HANA and SAP Vora
PPTX
DIY Driver Analysis Webinar slides
PDF
Webinar: Selecting the Right SQL-on-Hadoop Solution
PPTX
Securing Hadoop - MapR Technologies
PDF
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
Next Generation Enterprise Architecture
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR 5.2: Getting More Value from the MapR Converged Community Edition
[G6]hadoop이중화왜하는거지
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
SAP Content Management Solution Brief
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Big data analysing genomics and the bdg project
SAP HANA and SAP Vora
DIY Driver Analysis Webinar slides
Webinar: Selecting the Right SQL-on-Hadoop Solution
Securing Hadoop - MapR Technologies
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
Ad

Similar to Evolving Beyond the Data Lake: A Story of Wind and Rain (20)

PDF
The Keys to Digital Transformation
PDF
Streaming in the Extreme
PPTX
Where is Data Going? - RMDC Keynote
PDF
Handling the Extremes: Scaling and Streaming in Finance
PPTX
Real-time Hadoop: The Ideal Messaging System for Hadoop
PDF
Big Data LDN 2017: How to leverage the cloud for Business Solutions
PPTX
Real time-hadoop
PPTX
How Spark is Enabling the New Wave of Converged Applications
PPTX
Next-Gen уже здесь
PDF
Key Considerations for Putting Hadoop in Production SlideShare
PPTX
Enabling Real-Time Business with Change Data Capture
PPTX
Geo-Distributed Big Data and Analytics
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
PDF
Spark and MapR Streams: A Motivating Example
PDF
Data Architecture at Vente-Exclusive.com - TOTM Exellys
PPTX
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
PDF
Simply Business' Data Platform
PDF
HUG_Ireland_Streaming_Ted_Dunning
PPTX
MapR-DB – The First In-Hadoop Document Database
PDF
Hadoop and Your Enterprise Data Warehouse
The Keys to Digital Transformation
Streaming in the Extreme
Where is Data Going? - RMDC Keynote
Handling the Extremes: Scaling and Streaming in Finance
Real-time Hadoop: The Ideal Messaging System for Hadoop
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Real time-hadoop
How Spark is Enabling the New Wave of Converged Applications
Next-Gen уже здесь
Key Considerations for Putting Hadoop in Production SlideShare
Enabling Real-Time Business with Change Data Capture
Geo-Distributed Big Data and Analytics
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Spark and MapR Streams: A Motivating Example
Data Architecture at Vente-Exclusive.com - TOTM Exellys
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Simply Business' Data Platform
HUG_Ireland_Streaming_Ted_Dunning
MapR-DB – The First In-Hadoop Document Database
Hadoop and Your Enterprise Data Warehouse

More from MapR Technologies (15)

PPTX
Converging your data landscape
PPTX
ML Workshop 2: Machine Learning Model Comparison & Evaluation
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
PPTX
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
PPTX
ML Workshop 1: A New Architecture for Machine Learning Logistics
PPTX
Machine Learning Success: The Key to Easier Model Management
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PDF
Live Tutorial – Streaming Real-Time Events Using Apache APIs
PPTX
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
PDF
Live Machine Learning Tutorial: Churn Prediction
PPTX
Best Practices for Data Convergence in Healthcare
PPTX
MapR Product Update - Spring 2017
PPTX
3 Benefits of Multi-Temperature Data Management for Data Analytics
PDF
Baptist Health: Solving Healthcare Problems with Big Data
PPTX
Design Patterns for working with Fast Data
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
Best Practices for Data Convergence in Healthcare
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
Baptist Health: Solving Healthcare Problems with Big Data
Design Patterns for working with Fast Data

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Knowledge Engineering Part 1
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Foundation of Data Science unit number two notes
Major-Components-ofNKJNNKNKNKNKronment.pptx
Launch Your Data Science Career in Kochi – 2025
Clinical guidelines as a resource for EBP(1).pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Knowledge Engineering Part 1
.pdf is not working space design for the following data for the following dat...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Evolving Beyond the Data Lake: A Story of Wind and Rain

  • 1. 1© 2016 MapR Technologies 1© 2016 MapR Technologies Evolving Beyond the Data Lake A Story of Wind and Rain
  • 2. 2© 2016 MapR Technologies 2 Industry Leaders Are Investing in Disruptive Technology Now Innovating and reducing costs at the same time Source: IDC, Gartner; Analysis & Estimates: MapR Next-gen consists of cloud, big data, software and hardware related expenses (100,000) (80,000) (60,000) (40,000) (20,000) - 20,000 40,000 60,000 80,000 100,000 120,000 2013 2014 2015 2016 2017 2018 2019 2020 Investment in Next-Gen vs. Legacy Technologies for Data $120 100 80 60 40 20 (20) (40) (60) (80) (100) In Billions Total $ Growth of IT Market Next-Gen Growth Legacy Market Growth/Shrink in $ 90% of data is on next-gen technology in just four years
  • 3. 3© 2016 MapR Technologies 3 Application Development and Deployment Oracle Bulk Load Machine Learning Data Lake Predictive Modeling BI / Reporting Insights DB Events (Kafka) NoSQL SQL Server Graph DB Microservice (.NET) Microservice (NodeJS) Microservice (Java) Customer Insights SQL Server IIS, ASP.NET Desktop Browser (Javascript, jQuery) SQL HTML, CSS, JS Microsoft Reporting Service 2005 Today Desktop Browser (Javascript, 20+ Frameworks) Tablet Native Android Native iOS JSON JSON, CSS, HTML, JS Backendfor Frontend (Java)
  • 4. 4© 2016 MapR Technologies 4 Application Development and Deployment Oracle Bulk Load Machine Learning Data Lake Predictive Modeling BI / Reporting Insights DB Events (Kafka) NoSQL SQL Server Graph DB Microservice (.NET) Backendfor Frontend (Java) Microservice (NodeJS) Microservice (Java) Desktop Browser (Javascript, 20+ Frameworks) Tablet Native Android Native iOS Customer Insights JSON JSON, CSS, HTML, JS SQL Server IIS, ASP.NET Desktop Browser (Javascript, jQuery) SQL HTML, CSS, JS Microsoft Reporting Service 2005 Today
  • 5. 5© 2016 MapR Technologies 5© 2016 MapR Technologies© 2016 MapR Technologies Messaging platforms
  • 6. 6© 2016 MapR Technologies 6 Producers Consumers A stream is an unbounded sequence of events carried from a set of producers to a set of consumers. What’s a Stream? Producers and consumers don’t have to be aware of each other, instead they participate in shared topics. This is called publish/subscribe. /Events:Topic
  • 7. 7© 2016 MapR Technologies 7 Publishers and Subscribers (pub-sub) /Events:Topic Analytics Consumers Stream ProcessorsSocial Platforms Servers (Logs, Metrics) Sensors Mobile Apps Other Apps & Microservices Alerting Systems Stream Processing Frameworks Databases & Search Engines Dashboards Other Apps & Microservices
  • 8. 8© 2016 MapR Technologies 8 Considering a Messaging Platform • 50-100k messages per second used to be good – Not really good to handle decoupled communication between services • Kafka model is BLAZING fast – Kafka 0.9 API with message sizes at 200 bytes – MapR Streams on a 5 node cluster sustained 18 million events / sec – Throughput of 3.5GB/s and over 1.5 trillion events / day • Manual sharding is not a “great” solution – Adding more servers should be easy and fool proof, not painful – Yes, I have lived through this
  • 9. 9© 2016 MapR Technologies 9 Goals • Real-time or near-time – Includes situations with deadlines – Also includes situations where delay is simply undesirable – Even includes situations where delay is just fine • Microservices – Streaming is a convenient idiom for design – Microservices … you know we wanted it – Service isolation is a key requirement
  • 10. 10© 2016 MapR Technologies 10 Advantages of Messaging and Real-time Enablement • Less moving parts – Less things to go wrong • Better resource utilization – Scale any application up or down on demand • Common deployment model (new isolation model) – Repeatability between environments (dev, qa, production) • Improved integration testing – Listen to production streams in dev and qa (** this is a BIG DEAL! **) • Shared file system – Get at the data anywhere in the cluster – Simplifies business continuity
  • 11. 11© 2016 MapR Technologies 11 A microservice is loosely coupled with bounded context
  • 12. 12© 2016 MapR Technologies 12 How to Couple Services and Break micro-ness • Shared schemas, relational stores • Ad hoc communication between services • Enterprise service busses • Brittle protocols • Poor protocol versioning Don’t do this!
  • 13. 13© 2016 MapR Technologies 13 How to Decouple Services • Use self-describing data • Private databases • Infrastructural communication between services • Use modern protocols • Adopt future-proof protocol practices • Use shared storage where necessary due to scale
  • 14. 14© 2016 MapR Technologies 14 Decoupled Architecture Producer Activity Handler Producer Producer Historical Interesting Data Real-time Analysis Results Dashboard Anomaly Detection
  • 15. 15© 2016 MapR Technologies 15 Mechanisms for Decoupling • Traditional message queues? – Message queues are classic answer – Key feature/flaw is out-of-order acknowledgement – Many implementations – You pay a huge performance hit for persistence • Kafka-esque Logs? – Logs are like queues, but with ordering – Out-of-order consumption is possible, acknowledgement not so much – Canonical base implementation is Kafka – Performance plus persistence
  • 16. 16© 2016 MapR Technologies 16 Shared Resources
  • 17. 17© 2016 MapR Technologies 17 Fraud Detection ? POS 1 location, t, card # yes/no? POS 2 location, t, card # yes/no?
  • 18. 18© 2016 MapR Technologies 18 Traditional Solution POS 1..n Fraud detector Last card use
  • 19. 19© 2016 MapR Technologies 19 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 20. 20© 2016 MapR Technologies 20 What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 21. 21© 2016 MapR Technologies 21 How to Get Service Isolation POS 1..n Fraud detector Last card use Updater card activity
  • 22. 22© 2016 MapR Technologies 22 New Uses of Data POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 23. 23© 2016 MapR Technologies 23 Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 24. 24© 2016 MapR Technologies 24© 2016 MapR Technologies Use Cases
  • 25. 25© 2016 MapR Technologies 25 Event-based Data Drives Applications Failure Alerts Real-time application & network monitoring Trending now Web Personalized Offers Real-time Fraud Detection Ad optimization Supply Chain Optimization
  • 26. 26© 2016 MapR Technologies 26 Classifiers Fighting Fraudulent Web Traffic Activity Stream Click Stream Deviation from Normal Blacklist Activities Whitelist Activities User Activity Profile Known Bad Classifier All OK Classifier Session Alteration Stream Notify Security
  • 27. 27© 2016 MapR Technologies 27 Similarities between Marketing and Fraud? Customer 360 Website Fraud • Build a user profile – What are their normal usage patterns • Build “segmented” profiles – What do real users normally do • Dynamically alter website – Prevent user functionality • Kick-off external workflows – Notify security team • Build a user profile – What type of content do they like • Build “segmented” profiles – Company affiliation • Dynamically alter website – Show alternate content • Kick-off external workflows – Nurture emails
  • 28. 28© 2016 MapR Technologies 28 Message Bus Specialized Storage Operational Applications J2EE AppServer Relational Database Legacy Business Platforms • IT must integrate all the products • Inability to operationalize the insight rapidly • Can’t deal with high speed data ingestion and processing • Scale up architecture leads to high cost Specialized Storage Analytical Applications Analytic Database ETL Tool BI Tool
  • 29. 29© 2016 MapR Technologies 29 Converged Data Platform Analytical Applications Operational Applications Converged Applications Complete Access to Real-time and Historical Data in One Platform Developers Creating Database and Event Based Applications (Bottom Line Initiatives) (Top Line Initiatives) Analysts Creating BI Reports and KPIs on Data Warehouse Historical Data Current Data
  • 30. 30© 2016 MapR Technologies 30 Web-Scale Storage MapR-FS MapR-DB Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Event StreamingDatabase MapR Platform Services: Open API Architecture Assures Interoperability, Avoids Lock-in HDFS API POSIX NFS SQL, HBase API JSON API Kafka API
  • 31. 31© 2016 MapR Technologies 31 Converged Application Benefits • Consumers scale horizontally with partitions • 1:1 mapping between consumer and partition • Enables predictable scaling as production needs grow • Data can be seamlessly replicated to another cluster • Enables HA with zero code changes • Data is indexed dynamically according to receivers, senders • Scales beyond the capabilities of Kafka • Snapshots can be taken to capture state • Enables faster testing and deployment of applications
  • 32. 32© 2016 MapR Technologies 32 Not All Data Platforms are the Same
  • 33. 33© 2016 MapR Technologies 33 @kingmesal jscott@mapr.com Engage with us! kingmesal

Editor's Notes

  • #2: Great news, I have 467 slides today …. Hahah… I’m just kidding… I only have 465…
  • #3: Over the next four years, companies will experience flat IT spending. But underneath that will be a steady decrease in legacy spend accompanied by a corresponding increase in spend behind next gen technologies. But this chart also provides insight into the solution. The key to reducing costs while driving innovation is the data. CLICK In fact, forecast also shows that within four years 90% of data will be on next gen technology….
  • #4: It’s important to realize how Application development has changed dramatically in the past 10 years… The complexity was driven by the difficulties in dealing with separate silos of data...
  • #9: Real time – means you have a choice, now or when you are ready I can’t emphasize enough that this capability allows you to feed your production data stream into Dev and QA for testing – Many people would give an arm for that capability.
  • #10: Much more than just traditional real-time… Not FINITE! It is a stream... There is no explicit end
  • #11: Real time – means you have a choice, now or when you are ready
  • #13: Anyone ever sit in a LONG meeting to discuss changing a database schema? Data warehouse? You know, those 30 minute meetings that run an hour long with no agreed upon answer? Add fields, DO NOT CHANGE a field type…
  • #14: Gentle migrations… like JSON... Not sharing your database with everyone helps Avro, binary json....
  • #15: Message driven architectures are fundamentally sound, but in the past the cost to scale the messaging layer was cost prohibi
  • #16: Reading and acknowledging a messages is much like a database transaction, and within message queues they are a major factor to performance. All IO is in a continuous, sweeping motion, increases throughput
  • #21: Either due to meetings! Or perhaps one application dominating the use of the shared database.
  • #26: Event based data drives applications.. Whether it’s collecting machine sensors to predict and prevent failures, or providing key offers to customers, or identifying and preventing fraud before it happens. All these use cases are enabled by event based data flows and a converged platform.
  • #27: Bad Actors! / Fraudsters Deviation from normal
  • #28: For those who may be more familiar with a Customer 360 let me explain just how similar the software to support both models really is.
  • #30: 1 line of code Spyglass
  • #32: blueprint for converged applications event driven microservices
  • #33: This table isn’t meant to show you that we do everything better than Cloudera and Hortonworks, it is to show you that not all platforms are built for the same purpose. This is really intended to show you how truly different we are from Cloudera and Hortonworks. We compete with a number of other companies like IBM (DB2 and MQ), Oracle (DW and Database), Tibco (MQ)… we compete with a lot of different companies that cover different parts of your business. We just chose to build it into a single homogenous platform.