SlideShare a Scribd company logo
What's The Role of Machine Learning In
Fast Data and Streaming Applications?
WEBINAR
Emre Velipasaoglu, Ph.D 

Principal Data Scientist
What is Machine Learning?
A computer program is a set of explicit
instructions that produce an output for a
given input.
Machine Learning (ML) is about how to
program computers to improve
automatically with experience rather
than explicit instructions.
val x = 0 to 1000000000000
val y = x.map(x => x/2.0 + 1)
1 -> 1.5
2 -> 2.0
=>
10^12 -> ?
Why does everyone want to use it?
A lot of the recent transformative technologies are based on ML:
• optical character recognition
• speech recognition
• fraud detection
• web search
• personalized marketing 

and advertising
• Computer Aided medical 

Diagnosis (CADx)
Why does everyone want to use it?
Emerging trends that will leverage ML:
• Internet of Things (IoT)
• Augmented Reality (AR) 

and Virtual Reality (VR)
• Autonomous vehicles
• Customer service chat bots
• Security: Face/voice/biometrics recognition
• Healthcare: drug discovery, outcome prediction, 

personalized care

• Democratization of ML and the long tail of ML applications
What are some of the use cases?
Machines do certain tasks better than humans.
• IBM DeepBlue in chess
• IBM Watson in Jeopardy
• Google AlphaGo in go
• Lip reading: LipNet 93% vs. humans 65%
What are some of the use cases?
Machines are more cost efficient in certain tasks:
• Transcribing: Microsoft 89% vs. humans 89%
• Computer-aided diagnosis: E.g. Dermatologist-level classification of skin
cancer with deep neural networks, Esteva, et. al., published in Nature,
June 2017.
What are some of the use cases?
Machines are the only way to scale up processing in certain tasks.
• Previewing video: Clarifai can 

analyze 3.5 minutes of video in 

10 seconds for detecting 

objects.
• Commercial loan agreements 

review: AI in seconds vs. 

humans in 360,000 hours.
Why should you care?
Information gives competitive advantage.
ML unlocks information from your data
It is your product, your data, your operations.
ML is being democratized.
It is not for a handful of giant tech companies anymore.
How does it work?
E.g. Augmented Reality (AR) Shopping Personalization:
• Shopping is one area where 

AR is expected to impact.
• IBM CeBIT 2013 app:
• scans a shelf
• recognizes products
• overlays nutritional info
• Add a recommender system, 

tailored to
• your customers,
• your product catalog.
Learning a Recommender
User Rating Matrix
A B C D
Alan 5 1 1
Emre 4 2 3
Vishal 5 1
Matrix
Factorization
User Latent 

Factor Model
f1 f2
Alan 1.63 0.89
Emre 0.89 2.10
Vishal 2.03 1.01
Item Latent Factor Model
A B C D
f1 2.21 1.88 -0.24 0.33
f2 1.68 1.22 1.51 0.74
Scoring Items
Query 

User
User Latent Factors
f1 f2
Vishal 2.03 1.01
Query 

Item
Estimate
Ratings
Ranked Items
score
A 6.00
D 1.41
C 1.05
Item Latent Factors
A C D
f1 2.21 -0.24 0.33
f2 1.68 1.51 0.74
User Latent 

Factor Model
f1 f2
Alan 1.63 0.89
Emre 0.89 2.10
Vishal 2.03 1.01
Item Latent Factor Model
A B C D
f1 2.21 1.88 -0.24 0.33
f2 1.68 1.22 1.51 0.74
ApplicationUserItems
Modern ML
• Size of data
• Does not fit in one node, must distribute
• E.g. billions of user x item ratings (several orders of magnitude more
historical events to aggregate ratings from)
• Size of model
• Does not fit in one node, must distribute
• E.g. millions of users, thousands of products
• Learning speed
• Anywhere from real-time model updates to batch updates in minutes
• Operational latency and throughput
• Low milliseconds response time for millions of transactions
ML Lifecycle - Development
Early research
• Explore of modeling techniques
Iterations
• Feature selection
• Training parameter tuning
Productization
• Feature computations, model updating, scoring, caching, optimizations, etc. 

(e.g. update and query of latent factors)
recommender
system
collaborative filtering
content-based filtering
hybrid models
Bayesian networks
clustering
latent semantic models
Markov decision process …
singular value decomposition
alternating least squares
non-negative matrix factorization
number of latent factors
step size
convergence criteria
ML Lifecycle - Management
Monitoring
• Model performance
• Latency
• Throughput
• Model quality
• Drift (e.g. has the user’s tastes changed recently?)
• Security and robustness
Controlling
• Model optimization (for performance)
• Model update (for quality)
Which tools are available and what do they do?
Machine Learning
Spark MLlib ML library for Spark
Flink ML ML library for Flink
Mahout Distributed or scalable ML algorithms
Tensorflow Google's open source deep learning library
Theano Numerical library for Python, especially for deep learning
Deeplearning4j Deep learning library in Java
BigDL Intel’s distributed deep learning library on Spark
scikit-learn Main ML library for Python
OpenNLP ML based toolkit for the processing of natural language text
Which tools are available and what do they do?
Streaming
Flink Stream processing framework with sophisticated handling of
late arriving data
Spark Streaming Dataset based computing framework with mini-batch
streaming support
Beam API for data processing pipelines
Data Ingestion
Kafka Distributed stream processing for high-throughput, low-
latency, real-time data feeds
Flume Log processing
Which tools are available and what do they do?
Persistence and Storage
HDFS Hadoop based Distributed File System
Cassandra Distributed NoSQL database management system
ElasticSearch Distributed, RESTful search and analytics engine
AWS S3 Cloud based object store
How does the Fast Data Platform tie it all together?
HDFS
User
Rating
Matrix
Spark
Matrix
Factorizat
ion
Cassandra
Latent
Item
Model
Latent
User
Model
Flink / Akka Streams
Query
Item
Query
User
Item
Factors
User
Factors
Score
Kafka
Ranked
Items
Kafka
User Items
Application
batch streaming
What else does FDP provide?
data persistence & storage
stream
processing
machine learning
cluster
analysis
infrastructure
durable
messaging
backplane
microservices
intelligent
management
In Summary
• Machine Learning is the way to build transformative products leveraging data that
are otherwise impossible to build.

• It is not difficult to build Machine Learning based solutions, thanks to new open
source tools.

• Lightbend’s Fast Data Platform provides an easy onramp for building, deploying
and running Fast Data clusters and services leveraging best of breed tools .
Upgrade your grey matter!

Get the free O’Reilly book by Dr. Dean Wampler, 

VP of Fast Data Engineering at Lightbend
bit.ly/lightbend-fast-data
lightbend.com/fast-data-platform
What's The Role Of Machine Learning In Fast Data And Streaming Applications?

More Related Content

PDF
IBM and Lightbend Build Integrated Platform for Cognitive Development
PDF
Transforming The Customer Experience With Real-Time Insights
PPTX
Journey to the Modern App with Containers, Microservices and Big Data
PPTX
Databus - LinkedIn's Change Data Capture Pipeline
PPTX
Quantifying Genuine User Experience in Virtual Desktop Ecosystems
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
PDF
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
PDF
Time to Talk about Data Mesh
IBM and Lightbend Build Integrated Platform for Cognitive Development
Transforming The Customer Experience With Real-Time Insights
Journey to the Modern App with Containers, Microservices and Big Data
Databus - LinkedIn's Change Data Capture Pipeline
Quantifying Genuine User Experience in Virtual Desktop Ecosystems
Apache Kafka Streams + Machine Learning / Deep Learning
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
Time to Talk about Data Mesh

What's hot (20)

PDF
Work is a Stream of Applications (Audun Strand, NAV) Kafka Summit London 2019
PPTX
Bridge Your Kafka Streams to Azure Webinar
PDF
Events Everywhere: Enabling Digital Transformation in the Public Sector
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
PPTX
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
PDF
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
PDF
ETL Is Dead, Long-live Streams
PDF
Event streaming: A paradigm shift in enterprise software architecture
PPTX
Modernizing your Application Architecture with Microservices
PDF
Improving Veteran benefit services through efficient data streaming | Robert ...
PPTX
Microservices in the Enterprise
PDF
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
PPTX
IoT Connected Brewery
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
PDF
How to Build and Operate a Global Behavioral Change Platform (Neil Adamson, V...
PDF
Redis and Kafka - Advanced Microservices Design Patterns Simplified
PDF
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
PDF
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
PDF
Transformation During a Global Pandemic | Ashish Pandit and Scott Lee, Univer...
Work is a Stream of Applications (Audun Strand, NAV) Kafka Summit London 2019
Bridge Your Kafka Streams to Azure Webinar
Events Everywhere: Enabling Digital Transformation in the Public Sector
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
ETL Is Dead, Long-live Streams
Event streaming: A paradigm shift in enterprise software architecture
Modernizing your Application Architecture with Microservices
Improving Veteran benefit services through efficient data streaming | Robert ...
Microservices in the Enterprise
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
IoT Connected Brewery
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
How to Build and Operate a Global Behavioral Change Platform (Neil Adamson, V...
Redis and Kafka - Advanced Microservices Design Patterns Simplified
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Machine Learning Trends of 2018 combined with the Apache Kafka Ecosystem
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Transformation During a Global Pandemic | Ashish Pandit and Scott Lee, Univer...
Ad

Viewers also liked (9)

PDF
Akka Streams - From Zero to Kafka
PDF
Akka streams kafka kinesis
PDF
Reactive integrations with Akka Streams
PDF
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
PDF
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
PDF
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
PPTX
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
PDF
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
PDF
Reactive Stream Processing with Akka Streams
Akka Streams - From Zero to Kafka
Akka streams kafka kinesis
Reactive integrations with Akka Streams
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Reactive Stream Processing with Akka Streams
Ad

Similar to What's The Role Of Machine Learning In Fast Data And Streaming Applications? (20)

PDF
C19013010 the tutorial to build shared ai services session 1
PDF
Webinar: Machine Learning para Microcontroladores
PPTX
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
PPTX
Machine Learning App Development Benefits & Tech Stack.pptx
PPTX
Machine Learning AND Deep Learning for OpenPOWER
PDF
Bring Your Own Recipes Hands-On Session
PDF
Digital_IOT_(Microsoft_Solution).pdf
PPTX
AI-Driven Digital Transformation Using Agentic AI
PDF
Functionalities in AI Applications and Use Cases (OECD)
PPTX
Feature store: Solving anti-patterns in ML-systems
PPTX
Step by step AI Day 3: AI Technologies
PDF
Dell AI Telecom Webinar
PDF
Streaming analytics
PDF
Introduction to ML.NET
PPTX
Serverless machine learning architectures at Helixa
PPSX
10-Hot-Data-Analytics-Tre-8904178.ppsx
PDF
Machine Learning with Data Science Online Course | Learn and Build
PDF
Technovision
PDF
Gse uk-cedrinemadera-2018-shared
PPTX
AI-ML-Virtual-Internship on new technology
C19013010 the tutorial to build shared ai services session 1
Webinar: Machine Learning para Microcontroladores
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Machine Learning App Development Benefits & Tech Stack.pptx
Machine Learning AND Deep Learning for OpenPOWER
Bring Your Own Recipes Hands-On Session
Digital_IOT_(Microsoft_Solution).pdf
AI-Driven Digital Transformation Using Agentic AI
Functionalities in AI Applications and Use Cases (OECD)
Feature store: Solving anti-patterns in ML-systems
Step by step AI Day 3: AI Technologies
Dell AI Telecom Webinar
Streaming analytics
Introduction to ML.NET
Serverless machine learning architectures at Helixa
10-Hot-Data-Analytics-Tre-8904178.ppsx
Machine Learning with Data Science Online Course | Learn and Build
Technovision
Gse uk-cedrinemadera-2018-shared
AI-ML-Virtual-Internship on new technology

More from Lightbend (20)

PDF
IoT 'Megaservices' - High Throughput Microservices with Akka
PDF
How Akka Cluster Works: Actors Living in a Cluster
PDF
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
PDF
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
PDF
Akka at Enterprise Scale: Performance Tuning Distributed Applications
PDF
Digital Transformation with Kubernetes, Containers, and Microservices
PDF
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
PDF
Cloudstate - Towards Stateful Serverless
PDF
Digital Transformation from Monoliths to Microservices to Serverless and Beyond
PDF
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
PPTX
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
PDF
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
PDF
Microservices, Kubernetes, and Application Modernization Done Right
PDF
Full Stack Reactive In Practice
PDF
Akka and Kubernetes: A Symbiotic Love Story
PPTX
Scala 3 Is Coming: Martin Odersky Shares What To Know
PDF
Migrating From Java EE To Cloud-Native Reactive Systems
PDF
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
PDF
Designing Events-First Microservices For A Cloud Native World
PDF
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala
IoT 'Megaservices' - High Throughput Microservices with Akka
How Akka Cluster Works: Actors Living in a Cluster
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
Putting the 'I' in IoT - Building Digital Twins with Akka Microservices
Akka at Enterprise Scale: Performance Tuning Distributed Applications
Digital Transformation with Kubernetes, Containers, and Microservices
Detecting Real-Time Financial Fraud with Cloudflow on Kubernetes
Cloudstate - Towards Stateful Serverless
Digital Transformation from Monoliths to Microservices to Serverless and Beyond
Akka Anti-Patterns, Goodbye: Six Features of Akka 2.6
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Microservices, Kubernetes, and Application Modernization Done Right
Full Stack Reactive In Practice
Akka and Kubernetes: A Symbiotic Love Story
Scala 3 Is Coming: Martin Odersky Shares What To Know
Migrating From Java EE To Cloud-Native Reactive Systems
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Designing Events-First Microservices For A Cloud Native World
Scala Security: Eliminate 200+ Code-Level Threats With Fortify SCA For Scala

Recently uploaded (20)

PDF
Business model innovation report 2022.pdf
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
Types of control:Qualitative vs Quantitative
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
Roadmap Map-digital Banking feature MB,IB,AB
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
Lecture (1)-Introduction.pptx business communication
DOCX
Euro SEO Services 1st 3 General Updates.docx
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
How to Get Funding for Your Trucking Business
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
Business model innovation report 2022.pdf
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Principles of Marketing, Industrial, Consumers,
Unit 1 Cost Accounting - Cost sheet
Power and position in leadershipDOC-20250808-WA0011..pdf
Types of control:Qualitative vs Quantitative
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Reconciliation AND MEMORANDUM RECONCILATION
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
Roadmap Map-digital Banking feature MB,IB,AB
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Lecture (1)-Introduction.pptx business communication
Euro SEO Services 1st 3 General Updates.docx
Probability Distribution, binomial distribution, poisson distribution
How to Get Funding for Your Trucking Business
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider

What's The Role Of Machine Learning In Fast Data And Streaming Applications?

  • 1. What's The Role of Machine Learning In Fast Data and Streaming Applications? WEBINAR Emre Velipasaoglu, Ph.D 
 Principal Data Scientist
  • 2. What is Machine Learning? A computer program is a set of explicit instructions that produce an output for a given input. Machine Learning (ML) is about how to program computers to improve automatically with experience rather than explicit instructions. val x = 0 to 1000000000000 val y = x.map(x => x/2.0 + 1) 1 -> 1.5 2 -> 2.0 => 10^12 -> ?
  • 3. Why does everyone want to use it? A lot of the recent transformative technologies are based on ML: • optical character recognition • speech recognition • fraud detection • web search • personalized marketing 
 and advertising • Computer Aided medical 
 Diagnosis (CADx)
  • 4. Why does everyone want to use it? Emerging trends that will leverage ML: • Internet of Things (IoT) • Augmented Reality (AR) 
 and Virtual Reality (VR) • Autonomous vehicles • Customer service chat bots • Security: Face/voice/biometrics recognition • Healthcare: drug discovery, outcome prediction, 
 personalized care
 • Democratization of ML and the long tail of ML applications
  • 5. What are some of the use cases? Machines do certain tasks better than humans. • IBM DeepBlue in chess • IBM Watson in Jeopardy • Google AlphaGo in go • Lip reading: LipNet 93% vs. humans 65%
  • 6. What are some of the use cases? Machines are more cost efficient in certain tasks: • Transcribing: Microsoft 89% vs. humans 89% • Computer-aided diagnosis: E.g. Dermatologist-level classification of skin cancer with deep neural networks, Esteva, et. al., published in Nature, June 2017.
  • 7. What are some of the use cases? Machines are the only way to scale up processing in certain tasks. • Previewing video: Clarifai can 
 analyze 3.5 minutes of video in 
 10 seconds for detecting 
 objects. • Commercial loan agreements 
 review: AI in seconds vs. 
 humans in 360,000 hours.
  • 8. Why should you care? Information gives competitive advantage. ML unlocks information from your data It is your product, your data, your operations. ML is being democratized. It is not for a handful of giant tech companies anymore.
  • 9. How does it work? E.g. Augmented Reality (AR) Shopping Personalization: • Shopping is one area where 
 AR is expected to impact. • IBM CeBIT 2013 app: • scans a shelf • recognizes products • overlays nutritional info • Add a recommender system, 
 tailored to • your customers, • your product catalog.
  • 10. Learning a Recommender User Rating Matrix A B C D Alan 5 1 1 Emre 4 2 3 Vishal 5 1 Matrix Factorization User Latent 
 Factor Model f1 f2 Alan 1.63 0.89 Emre 0.89 2.10 Vishal 2.03 1.01 Item Latent Factor Model A B C D f1 2.21 1.88 -0.24 0.33 f2 1.68 1.22 1.51 0.74
  • 11. Scoring Items Query 
 User User Latent Factors f1 f2 Vishal 2.03 1.01 Query 
 Item Estimate Ratings Ranked Items score A 6.00 D 1.41 C 1.05 Item Latent Factors A C D f1 2.21 -0.24 0.33 f2 1.68 1.51 0.74 User Latent 
 Factor Model f1 f2 Alan 1.63 0.89 Emre 0.89 2.10 Vishal 2.03 1.01 Item Latent Factor Model A B C D f1 2.21 1.88 -0.24 0.33 f2 1.68 1.22 1.51 0.74 ApplicationUserItems
  • 12. Modern ML • Size of data • Does not fit in one node, must distribute • E.g. billions of user x item ratings (several orders of magnitude more historical events to aggregate ratings from) • Size of model • Does not fit in one node, must distribute • E.g. millions of users, thousands of products • Learning speed • Anywhere from real-time model updates to batch updates in minutes • Operational latency and throughput • Low milliseconds response time for millions of transactions
  • 13. ML Lifecycle - Development Early research • Explore of modeling techniques Iterations • Feature selection • Training parameter tuning Productization • Feature computations, model updating, scoring, caching, optimizations, etc. 
 (e.g. update and query of latent factors) recommender system collaborative filtering content-based filtering hybrid models Bayesian networks clustering latent semantic models Markov decision process … singular value decomposition alternating least squares non-negative matrix factorization number of latent factors step size convergence criteria
  • 14. ML Lifecycle - Management Monitoring • Model performance • Latency • Throughput • Model quality • Drift (e.g. has the user’s tastes changed recently?) • Security and robustness Controlling • Model optimization (for performance) • Model update (for quality)
  • 15. Which tools are available and what do they do? Machine Learning Spark MLlib ML library for Spark Flink ML ML library for Flink Mahout Distributed or scalable ML algorithms Tensorflow Google's open source deep learning library Theano Numerical library for Python, especially for deep learning Deeplearning4j Deep learning library in Java BigDL Intel’s distributed deep learning library on Spark scikit-learn Main ML library for Python OpenNLP ML based toolkit for the processing of natural language text
  • 16. Which tools are available and what do they do? Streaming Flink Stream processing framework with sophisticated handling of late arriving data Spark Streaming Dataset based computing framework with mini-batch streaming support Beam API for data processing pipelines Data Ingestion Kafka Distributed stream processing for high-throughput, low- latency, real-time data feeds Flume Log processing
  • 17. Which tools are available and what do they do? Persistence and Storage HDFS Hadoop based Distributed File System Cassandra Distributed NoSQL database management system ElasticSearch Distributed, RESTful search and analytics engine AWS S3 Cloud based object store
  • 18. How does the Fast Data Platform tie it all together? HDFS User Rating Matrix Spark Matrix Factorizat ion Cassandra Latent Item Model Latent User Model Flink / Akka Streams Query Item Query User Item Factors User Factors Score Kafka Ranked Items Kafka User Items Application batch streaming
  • 19. What else does FDP provide? data persistence & storage stream processing machine learning cluster analysis infrastructure durable messaging backplane microservices intelligent management
  • 20. In Summary • Machine Learning is the way to build transformative products leveraging data that are otherwise impossible to build.
 • It is not difficult to build Machine Learning based solutions, thanks to new open source tools.
 • Lightbend’s Fast Data Platform provides an easy onramp for building, deploying and running Fast Data clusters and services leveraging best of breed tools .
  • 21. Upgrade your grey matter!
 Get the free O’Reilly book by Dr. Dean Wampler, 
 VP of Fast Data Engineering at Lightbend bit.ly/lightbend-fast-data