SlideShare a Scribd company logo
© 2017 MapR Technologies
Applying Machine Learning to IOT:
End to End Distributed Pipeline for Real-
Time Uber Data Using Apache APIs: Kafka,
Spark, HBase
Carol McDonald
@caroljmcdonald
© 2017 MapR Technologies
Agenda
‱  What is AI?
‱  Why now?
‱  What is Machine Learning?
–  Examples
‱  What is Deep Learning?
–  Examples
© 2017 MapR Technologies
What is AI?
© 2017 MapR Technologies
AI NSA MIT Late 80s
© 2017 MapR Technologies
Problems with hard coded Rules
‱  Rules are manual, uses a human expert
–  difficult to maintain
–  give a one size fits all decision! (2 times overdose same as 38 times)
‱  Machine learning uses data and statistics
–  can give sorted probabilty, can precisely match/target individuals
© 2017 MapR Technologies
What is Machine Learning?
Data Build ModelTrain Algorithm
Finds patterns
New Data Use Model
(prediction function)
Predictions
Contains patterns Recognizes patterns
f(X)
© 2017 MapR Technologies
Why all the buzz now?
What has changed?
© 2017 MapR Technologies
What has changed in the past 10 years?
Distributed computing
Streaming analytics
Improved machine learning
© 2017 MapR Technologies
Distribute Computation
Driver sends
Program tasks
Data Distributed
across Cluster
Result
© 2017 MapR Technologies
Apache Spark Distributed Datasets
Distributed Dataset
Node
Executor
P4
Node
Executor
P1 P3
Node
Executor
P2
partitioned
Partition 1
8213034705, 95,
2.927373,
jake7870, 0


Partition 2
8213034705,
115, 2.943484,
Davidbresler2,
1
.
Partition 3
8213034705,
100, 2.951285,
gladimacowgirl,
58

Partition 4
8213034705,
117, 2.998947,
daysrus, 95
.
‱  Data read into Memory Cache
‱  Partitioned across a cluster
‱  Operated on in parallel
‱  Cached in memory for iterations
© 2017 MapR Technologies
Streaming Analytics
© 2017 MapR Technologies
GPUs speed up Multi core servers for parallel processing
Cluster of GPUs 1 million times faster than Cray-1
© 2017 MapR Technologies
Mythbusters explain Parallel graphics with GPU vs Sequential CPU
‱  Painting a smily face with a sequential paint gun
© 2017 MapR Technologies
Mythbusters explain Parallel graphics with GPU
‱  Painting a smiling face with one blast from a parallel paint gun !
© 2017 MapR Technologies
Machine Learning
© 2017 MapR Technologies
Types of Machine learning
© 2017 MapR Technologies
Supervised Machine Learning
Supervised
‱  Classification
–  Naïve Bayes
–  SVM
–  Random Decision
Forests
‱  Regression
–  Linear
–  Logistic
Machine Learning
Unsupervised
‱  Clustering
–  K-means
‱  Dimensionality reduction
–  Principal Component
Analysis
–  SVD
Label
© 2017 MapR Technologies
Supervised Algorithms use labeled data
Data
features
Build Model
New Data
features
Predict
Use Model
X1, X2
Y
f(X1, X2) =Y
X1, X2
Y
© 2017 MapR Technologies
ML Discovery Model Building
Model
Training/
Building
Training
Set
Test Model
Predictions
Test
Set
Evaluate Results
Historical
Data
Deployed
Model
Insights
Data
Discovery,
Model
Creation
Production
Feature Extraction
Feature
Extraction
●  Churn Modelling
Uber
trips
Stream
TopicUber
trips
New Data
© 2017 MapR Technologies
Supervised Machine Learning: Classification & Regression
Classification
Identifies
category for item
© 2017 MapR Technologies
Classification: Definition
Form of ML that:
‱  Identifies which category an item belongs to
‱  Uses supervised learning algorithms
–  Data is labeled
Sentiment
© 2017 MapR Technologies
If it Walks/Swims/Quacks Like a Duck 

 Then It Must Be a Duck
swims
walks
quacks
Features:
walks
quacks
swims
Features:
© 2017 MapR Technologies
Debit Card Fraud Example
‱  What are we trying to predict?
–  This is the Label or Target outcome:
–  Fraud or Not Fraud
‱  What are the “if questions” or properties we can use to predict?
–  These are the Features:
–  Is the amount spent today > historical average?
–  Unusual region for card history ?
–  Known merchant or not ?
© 2017 MapR Technologies
Decision Tree For Classification
‱  Tree of decisions about features
‱  Estimates IF THEN ELSE questions
‱  Gives probability of a correct decision
Is the amount spent in 24
hours > average
Is the number of
states used from > 2
Are there multiple
Purchases today from
risky merchants?
YES NO
NoYES
Fraud
90%
Not Fraud
50%
Fraud
90%
Not Fraud
30%
YES No
© 2017 MapR Technologies
Real Time Credit Card Fraud Detection with Apache Spark Streaming
1.  Get event credit card
transaction data
2.  Read card holder profile
3.  Calculate history
features
4.  Publish Alerts for fraud
and enriched events
https://mapr.com/blog/real-time-credit-card-fraud-detection-apache-spark-and-event-streaming/
© 2017 MapR Technologies
Classification Identifies Category
‱  Classification:
–  identifies which category a new item belongs to
‱  Who will ( buy, churn, get admitted to hospital ) ?
‱  What is the mood of this comment?
‱  Retail Example:
–  Which promotion draws more customers ?
‱  Healthcare Example:
–  Suggest Patient diagnosis
–  Identify patients with high readmission risk
© 2017 MapR Technologies
Label
Probabilty
of Fraud 1
X
Features: trans amount, type of store,
Time Location difference last trans.
Fraud
0
Not Fraud
.5
Classification Probability Logistic Regression Example
Predicts probability an item belongs to a category
© 2017 MapR Technologies
Supervised Learning: Classification Probability
‱  Logistic Regression (and other algorithms) :
–  Predicts probability an item belongs to a category (eg probability of fraud)
‱  What is probablity someone will ( buy, churn, get admitted to hospital ) ?
‱  Probability customer will renew service
‱  Healthcare:
–  Probability of readmission
© 2017 MapR Technologies
Label:
Price of house
Y
X1, X2
Features: square feet,
number bedrooms, location
Data point: sum of x,
price
Sales price = intercept + coeff * X1 + coeff2 * X2
Regression Predicts Amount, Estimates relationship between X & Y
© 2017 MapR Technologies
Regression Predicts by estimating the relationship between variables
‱  Regression predicts a numeric value (eg price)
‱  What will be the ( revenue, product demand , sales , # churners)
‱  Retail Example:
–  Sales based on an event
‱  Healthcare Example:
–  Days of hospital stay
© 2017 MapR Technologies
What is Unsupervised Machine Learning?
Machine Learning
Unsupervised
‱  Clustering
–  K-means
‱  Dimensionality reduction
–  Principal Component
Analysis
–  SVD
Supervised
‱  Classification
–  Naïve Bayes
–  SVM
–  Random Decision
Forests
‱  Regression
–  Linear
–  Logistic
© 2017 MapR Technologies
Unsupervised Algorithms use Unlabeled data
Customer GroupsBuild ModelTrain Algorithm
Finds patterns
New Customer
Purchase Data
Use Model
Similar Customer Group
Contains patterns Recognizes patterns
Customer purchase
data
© 2017 MapR Technologies
Unsupervised Machine Learning: Clustering
Clustering
group news articles into different categories
© 2017 MapR Technologies
Unsupervised Learning
Learning structure from unlabeled examples
NBA Players
http://www.sloansportsconference.com/wp-content/uploads/2012/03/Alagappan-Muthu-EOSMarch2012PPT.pdf
© 2017 MapR Technologies
Clustering: Definition
‱  Groups objects into clusters of high similarity
–  Customer segmentation
–  Text categorization
–  recommendations
‱  Anomaly detection: find what’s not similar
x
x
x
x
x
© 2017 MapR Technologies
Clustering Groups objects into Clusters of high similarity
‱  What are the groups of (customers, patients..) with
similar (bevahior, purchases, symptoms, illness
)
‱  Healthcare:
–  Patient similarity
‱  Retail:
–  Group customers by purchases.
© 2017 MapR Technologies
Bank Customer Segmentation: Bank Products, Card Purchases
© 2017 MapR Technologies
Association, Co-Occurrence, Market Basket Recommendations
‱  Retail
–  Products which are purchased
together
‱  Take action:
–  Store layouts
–  Which products to put on
specials, promote, coupons

‱  Healthcare
–  Patients like mine cohorts
© 2017 MapR Technologies
Deep Learning
© 2017 MapR Technologies
Deep Learning
Multilayered neural networks
© 2017 MapR Technologies
The Network is trained with images
© 2017 MapR Technologies
Neural network neuron or node
Each node takes input data and a weight and outputs a confidence score to the next
layer
© 2017 MapR Technologies
Each node outputs a confidence score to the next layer
© 2017 MapR Technologies
Errors are calculted at the output layer
© 2017 MapR Technologies
Errors are sent back through the network
© 2017 MapR Technologies
This process is repeated, adjusting weights, until correct
© 2017 MapR Technologies
This process is repeated with lots of images
© 2017 MapR Technologies
Deep Learning
During this process layers learn the optimal features for the model
© 2017 MapR Technologies
Deep Learning Features
‱  Advantage:
–  Features do not have to be
predetermined
‱  Disadvantage:
–  Decisions are a black box
Feature
Decisions
?
© 2017 MapR Technologies
Deep Learning in the News!
FINANCE AUTON. DRIVING HEALTHCARE VOICE RECOG.
3/27/17 - Hedge funds
have been trying to
teach computers to
think like traders for
years. (Bloomberg)
4/3/17 – Daimler
 to
deploy autonomous
taxis that customers
can hail using a
smartphone app by
the start of the next
decade. (Fortune)
3/28/17 - deep learning
is being applied to
processing medical
images 
 eye disease

 skin cancer (MIT
tech review)
3/31/17 - IBM research

 advancing speech
recognition by applying
deep learning into
acoustic and lang.
models (InfoQ)
© 2017 MapR Technologies
Deep Neural Networks
‱  Classification and
‱  Forecasting
Deep
Neural
Networks
© 2017 MapR Technologies
Convolutional Neural Networks for Images
‱  Insights from image & video files
Convolutional
Neural
Networks
© 2017 MapR Technologies
Ex. PATIENT MORTALITY PREDICTION
1Scientific RepoRts | 7: 1648 | DOI:10.1038/s41598-017-01931-w
www.nature.com/scientificreports
Precision Radiology: Predicting
longevity using feature engineering
and deep learning methods in a
radiomics framework
LukeOakden-Rayner1,2
,GustavoCarneiro3
,Taryn Bessen1
, JacintoC. Nascimento4
,Andrew P.
Bradley5
& Lyle J. Palmer2
Precision medicine approaches rely on obtaining precise knowledge of the true state of health of an
individual patient, which results from a combination of their genetic risks and environmental exposures.
This approach is currently limited by the lack of effective and efficient non-invasive medical tests to
define the full range of phenotypic variation associated with individual health. Such knowledge is
critical for improved early intervention, for better treatment decisions, and for ameliorating the steadily
worsening epidemic of chronic disease.We present proof-of-concept experiments to demonstrate how
routinely acquired cross-sectionalCT imaging may be used to predict patient longevity as a proxy for
overall individual health and disease status using computer image analysis techniques. Despite the
limitations of a modest dataset and the use of off-the-shelf machine learning methods, our results are
comparable to previous ‘manual’ clinical methods for longevity prediction.This work demonstrates
that radiomics techniques can be used to extract biomarkers relevant to one of the most widely used
outcomes in epidemiological and clinical research – mortality, and that deep learning with convolutional
neural networks can be usefully applied to radiomics research.Computer image analysis applied
to routinely collected medical images offers substantial potential to enhance precision medicine
initiatives.
Measuring phenotypic variation in precision medicine
Precision medicine has become a key focus of modern bioscience and medicine, and involves “prevention and
treatment strategies that take individual variability into account”, through the use of “large-scale biologic data-
bases 
 powerful methods for characterizing patients 
 and computational tools for analysing large sets of
data”1
. The variation within individuals that enables the identification of patient subgroups for precision medicine
strategies is termed the “phenotype”. The observable phenotype reflects both genomic variation and the accumu-
lated lifestyle and environmental exposures that impact biological function - the exposome2
.
Precision medicine relies upon the availability of useful biomarkers, defined as “a characteristic that is objec-
tively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or phar-
macological responses to a therapeutic intervention”3
. A ‘good’ biomarker has the following characteristics: it is
sensitive, specific, predictive, robust, bridges clinical and preclinical health states, and is non-invasive4
.
Genomics can produce good biomarkers useful for precision medicine5
. There has been significant success in
exploring human genetic variation in the field of genomics, where data-driven methods have highlighted the role
of human genetic variation in disease diagnosis, prognosis, and treatment response6
. However, for the chronic
and age-related diseases which account for the majority of morbidity and mortality in developed nations7
and
worldwide8
, the majority (70–90%) of observable phenotypic variation is related to non-genetic determinants9
.
1
Department of a io o o a e ai e ospita ort errace e ai e 5000 ustra ia. c oo of u ic
ea t e ni ersit of e ai e ort errace e ai e 5000 ustra ia. 3
c oo of omputer cience e
ni ersit of e ai e ort errace e ai e 5000 ustra ia. 4
Instituto uperior cnico is on ortu a .
5
c oo of Information ec no o an ectrica n ineerin e ni ersit of ueens an ui in 78 t ucia
D 40 7 ueens an ustra ia. orrespon ence an re uests for materia s s ou e a resse to .O. emai :
u eoa enra ner mai .com)
Received: 8 December 2016
Accepted: 6 April 2017
Published: xx xx xxxx
OPEN
Oakden-Rayner, et al.,
Scientific Reports, May 2017
com/scientificreports/
Figure 4. Images at the level of the proximal left anterior descending coronary artery, with the most strongly
predicted mortality and survival cases selected by averaging the predictions from the deep learning and
engineered feature models. The mortality cases (left side) demonstrate prominent visual changes of emphysema,
cardiomegaly, vascular disease and osteopaenia. The survival cases (right side) appear visually less diseased and
frail.
Mortality Survival
© 2017 MapR Technologies
Example: Exploiting Unstructured Data
http://www.economist.com/news/science-and-technology/21664943-computers-can-
recognise-complication-diabetes-can-lead-blindness-now - Sep 19, 2015
Diabetic Retinopathy:
‱  Challenging to diagnose from
image (84% consensus)
‱  Crowd-sourced to Kaggle
‱  Deep-learning and convolutional
NN used to classify image data
‱  Winning model showed 85%
accuracy rate
© 2017 MapR Technologies
Recurrent Neural Networks for Sequenced data
‱  Sequence of events and language
applications
Recurrent
Neural
Networks
© 2017 MapR Technologies
To Learn More:
‱  MapR Quick Start solutions
https://mapr.com/solutions/big-data-and-hadoop-quick-start-solutions/
‱  Customer 360, Recommendation Engine, Log Analysis, Risk, Deep Learning
© 2017 MapR Technologies
MapR Deep Learning QSS
New Image
to Classify
Category
Probabilities
Training
Images

Category
1
Category
N


MapR-FS
MapR Data Platform
Kubernetes
Enterprise Storage Database Event Streaming
MapR-FS MapR-DB MapR Streams
Global Namespace High Availability Data Protection Multi-tenancy Unified Security
D
MapR Converged Data Platform
POD 1
DD MASTER
NODE
POD 2 POD 3
Parameter
Server 1
TF Trainer
1
TF Trainer
2
© 2017 MapR Technologies
Fit your business model
Common Use Cases
‱  Churn prediction
‱  Customer clustering
‱  Product recommendation
‱  Budget optimization
‱  ETA
‱  Sales prediction
‱  Pricing model
‱  

Cost function -- real business impact
‱  Leverage A/B testing
© 2017 MapR Technologies
90+%	of	effort	is	logistics,	
not	learning
© 2017 MapR Technologies
Big Data – Machine Learning Cycle 
Big
Data
Identify a problem
Prepare Data Model Data Get Insight
Test a Solution
EvaluateMonitor Deploy
Machine LearningReference: head of Machine learning at Uber
© 2017 MapR Technologies
End to End Streaming Analytics Example Application
https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-learning-streaming-and-kafka-api-part-1/
© 2017 MapR Technologies
MapR Blog
‱ https://www.mapr.com/blog/
© 2017 MapR Technologies
© 2017 MapR Technologies

helping you put data technology to work
●  Find answers
●  Ask technical questions
●  Join on-demand training course
discussions
●  Follow release announcements
●  Share and vote on product ideas
●  Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
© 2017 MapR Technologies
We reinvented the data platform
for next-gen intelligent applications & Data Science
On-Premise, In the Cloud, Hybrid
NoSQL Webscale
Storage
MessagingMultiple
Processing
Engines
Real Time Unified Security Multi-tenancy Disaster Recovery
Streaming
Multiple compute engines and tools operating concurrently
Immediate access to vast amounts of diverse data
Low latency for millisecond responsiveness
Support diverse workloads simultaneously
Able to be a reliable system of record
Enterprise grade reliability
© 2017 MapR Technologies
Q&A
ENGAGE WITH US

More Related Content

PDF
Streaming patterns revolutionary architectures
PDF
Applying Machine Learning to Live Patient Data
PDF
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
PDF
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
PDF
How Big Data is Reducing Costs and Improving Outcomes in Health Care
PDF
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
PDF
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
PDF
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Streaming patterns revolutionary architectures
Applying Machine Learning to Live Patient Data
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB

What's hot (19)

PDF
Advanced Threat Detection on Streaming Data
PDF
Live Machine Learning Tutorial: Churn Prediction
PDF
Fast Cars, Big Data How Streaming can help Formula 1
PDF
Predicting Flight Delays with Spark Machine Learning
PPTX
Apache Spark Machine Learning Decision Trees
PPTX
Geo-Distributed Big Data and Analytics
PDF
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
PPTX
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
PPTX
ML Workshop 2: Machine Learning Model Comparison & Evaluation
PDF
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
PDF
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
PDF
Spark and MapR Streams: A Motivating Example
PDF
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
PPTX
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
PDF
Meruvian - Introduction to MapR
PDF
Introduction to machine learning with GPUs
PDF
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
PPTX
When Streaming Becomes Strategic
PPTX
Real time big data applications with hadoop ecosystem
Advanced Threat Detection on Streaming Data
Live Machine Learning Tutorial: Churn Prediction
Fast Cars, Big Data How Streaming can help Formula 1
Predicting Flight Delays with Spark Machine Learning
Apache Spark Machine Learning Decision Trees
Geo-Distributed Big Data and Analytics
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Spark and MapR Streams: A Motivating Example
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
Meruvian - Introduction to MapR
Introduction to machine learning with GPUs
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
When Streaming Becomes Strategic
Real time big data applications with hadoop ecosystem
Ad

Similar to Demystifying AI, Machine Learning and Deep Learning (20)

PDF
Live Tutorial – Streaming Real-Time Events Using Apache APIs
PPTX
Deep Learning for Fraud Detection
PDF
Free Code Friday - Machine Learning with Apache Spark
PDF
machine_learning_section1_ebook.pdf
 
PPTX
Data Science Crash Course
PDF
Introduction to Machine Learning
PDF
Intro to machine learning
PDF
Spark machine learning predicting customer churn
PDF
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
PDF
Introduction MAchine Learning . Machine Learning is trendy concept
PPTX
Machine Learning AND Deep Learning for OpenPOWER
PPT
i2ml-chap1-v1-1.ppt
PDF
ML master class
PDF
Predictive Maintenance Using Recurrent Neural Networks
PPTX
Deep Learning Fundamentals
PDF
Demystifying Machine Learning - How to give your business superpowers.
PDF
Introduction to machine learning and applications (1)
PDF
Introduction to Data Science
PDF
Map r chicago_advanalytics_oct_meetup
PPT
Introduction to Machine Learning and different types of Learning
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Deep Learning for Fraud Detection
Free Code Friday - Machine Learning with Apache Spark
machine_learning_section1_ebook.pdf
 
Data Science Crash Course
Introduction to Machine Learning
Intro to machine learning
Spark machine learning predicting customer churn
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
Introduction MAchine Learning . Machine Learning is trendy concept
Machine Learning AND Deep Learning for OpenPOWER
i2ml-chap1-v1-1.ppt
ML master class
Predictive Maintenance Using Recurrent Neural Networks
Deep Learning Fundamentals
Demystifying Machine Learning - How to give your business superpowers.
Introduction to machine learning and applications (1)
Introduction to Data Science
Map r chicago_advanalytics_oct_meetup
Introduction to Machine Learning and different types of Learning
Ad

More from Carol McDonald (13)

PDF
Spark graphx
PDF
Streaming Patterns Revolutionary Architectures with the Kafka API
PDF
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
PDF
Apache Spark Machine Learning
PDF
Build a Time Series Application with Apache Spark and Apache HBase
PDF
Apache Spark streaming and HBase
PDF
Machine Learning Recommendations with Spark
PDF
Apache Spark Overview
PDF
Introduction to Spark
DOC
CU9411MW.DOC
PDF
Getting started with HBase
PDF
Introduction to Spark on Hadoop
PDF
NoSQL HBase schema design and SQL with Apache Drill
Spark graphx
Streaming Patterns Revolutionary Architectures with the Kafka API
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Apache Spark Machine Learning
Build a Time Series Application with Apache Spark and Apache HBase
Apache Spark streaming and HBase
Machine Learning Recommendations with Spark
Apache Spark Overview
Introduction to Spark
CU9411MW.DOC
Getting started with HBase
Introduction to Spark on Hadoop
NoSQL HBase schema design and SQL with Apache Drill

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Introduction to Artificial Intelligence
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
L1 - Introduction to python Backend.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Digital Strategies for Manufacturing Companies
PDF
AI in Product Development-omnex systems
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
CHAPTER 2 - PM Management and IT Context
Odoo POS Development Services by CandidRoot Solutions
Design an Analysis of Algorithms I-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
Introduction to Artificial Intelligence
Understanding Forklifts - TECH EHS Solution
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
L1 - Introduction to python Backend.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Digital Strategies for Manufacturing Companies
AI in Product Development-omnex systems
Softaken Excel to vCard Converter Software.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms II-SECS-1021-03
history of c programming in notes for students .pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
top salesforce developer skills in 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

Demystifying AI, Machine Learning and Deep Learning

  • 1. © 2017 MapR Technologies Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- Time Uber Data Using Apache APIs: Kafka, Spark, HBase Carol McDonald @caroljmcdonald
  • 2. © 2017 MapR Technologies Agenda ‱  What is AI? ‱  Why now? ‱  What is Machine Learning? –  Examples ‱  What is Deep Learning? –  Examples
  • 3. © 2017 MapR Technologies What is AI?
  • 4. © 2017 MapR Technologies AI NSA MIT Late 80s
  • 5. © 2017 MapR Technologies Problems with hard coded Rules ‱  Rules are manual, uses a human expert –  difficult to maintain –  give a one size fits all decision! (2 times overdose same as 38 times) ‱  Machine learning uses data and statistics –  can give sorted probabilty, can precisely match/target individuals
  • 6. © 2017 MapR Technologies What is Machine Learning? Data Build ModelTrain Algorithm Finds patterns New Data Use Model (prediction function) Predictions Contains patterns Recognizes patterns f(X)
  • 7. © 2017 MapR Technologies Why all the buzz now? What has changed?
  • 8. © 2017 MapR Technologies What has changed in the past 10 years? Distributed computing Streaming analytics Improved machine learning
  • 9. © 2017 MapR Technologies Distribute Computation Driver sends Program tasks Data Distributed across Cluster Result
  • 10. © 2017 MapR Technologies Apache Spark Distributed Datasets Distributed Dataset Node Executor P4 Node Executor P1 P3 Node Executor P2 partitioned Partition 1 8213034705, 95, 2.927373, jake7870, 0

 Partition 2 8213034705, 115, 2.943484, Davidbresler2, 1
. Partition 3 8213034705, 100, 2.951285, gladimacowgirl, 58
 Partition 4 8213034705, 117, 2.998947, daysrus, 95
. ‱  Data read into Memory Cache ‱  Partitioned across a cluster ‱  Operated on in parallel ‱  Cached in memory for iterations
  • 11. © 2017 MapR Technologies Streaming Analytics
  • 12. © 2017 MapR Technologies GPUs speed up Multi core servers for parallel processing Cluster of GPUs 1 million times faster than Cray-1
  • 13. © 2017 MapR Technologies Mythbusters explain Parallel graphics with GPU vs Sequential CPU ‱  Painting a smily face with a sequential paint gun
  • 14. © 2017 MapR Technologies Mythbusters explain Parallel graphics with GPU ‱  Painting a smiling face with one blast from a parallel paint gun !
  • 15. © 2017 MapR Technologies Machine Learning
  • 16. © 2017 MapR Technologies Types of Machine learning
  • 17. © 2017 MapR Technologies Supervised Machine Learning Supervised ‱  Classification –  NaĂŻve Bayes –  SVM –  Random Decision Forests ‱  Regression –  Linear –  Logistic Machine Learning Unsupervised ‱  Clustering –  K-means ‱  Dimensionality reduction –  Principal Component Analysis –  SVD Label
  • 18. © 2017 MapR Technologies Supervised Algorithms use labeled data Data features Build Model New Data features Predict Use Model X1, X2 Y f(X1, X2) =Y X1, X2 Y
  • 19. © 2017 MapR Technologies ML Discovery Model Building Model Training/ Building Training Set Test Model Predictions Test Set Evaluate Results Historical Data Deployed Model Insights Data Discovery, Model Creation Production Feature Extraction Feature Extraction ●  Churn Modelling Uber trips Stream TopicUber trips New Data
  • 20. © 2017 MapR Technologies Supervised Machine Learning: Classification & Regression Classification Identifies category for item
  • 21. © 2017 MapR Technologies Classification: Definition Form of ML that: ‱  Identifies which category an item belongs to ‱  Uses supervised learning algorithms –  Data is labeled Sentiment
  • 22. © 2017 MapR Technologies If it Walks/Swims/Quacks Like a Duck 

 Then It Must Be a Duck swims walks quacks Features: walks quacks swims Features:
  • 23. © 2017 MapR Technologies Debit Card Fraud Example ‱  What are we trying to predict? –  This is the Label or Target outcome: –  Fraud or Not Fraud ‱  What are the “if questions” or properties we can use to predict? –  These are the Features: –  Is the amount spent today > historical average? –  Unusual region for card history ? –  Known merchant or not ?
  • 24. © 2017 MapR Technologies Decision Tree For Classification ‱  Tree of decisions about features ‱  Estimates IF THEN ELSE questions ‱  Gives probability of a correct decision Is the amount spent in 24 hours > average Is the number of states used from > 2 Are there multiple Purchases today from risky merchants? YES NO NoYES Fraud 90% Not Fraud 50% Fraud 90% Not Fraud 30% YES No
  • 25. © 2017 MapR Technologies Real Time Credit Card Fraud Detection with Apache Spark Streaming 1.  Get event credit card transaction data 2.  Read card holder profile 3.  Calculate history features 4.  Publish Alerts for fraud and enriched events https://mapr.com/blog/real-time-credit-card-fraud-detection-apache-spark-and-event-streaming/
  • 26. © 2017 MapR Technologies Classification Identifies Category ‱  Classification: –  identifies which category a new item belongs to ‱  Who will ( buy, churn, get admitted to hospital ) ? ‱  What is the mood of this comment? ‱  Retail Example: –  Which promotion draws more customers ? ‱  Healthcare Example: –  Suggest Patient diagnosis –  Identify patients with high readmission risk
  • 27. © 2017 MapR Technologies Label Probabilty of Fraud 1 X Features: trans amount, type of store, Time Location difference last trans. Fraud 0 Not Fraud .5 Classification Probability Logistic Regression Example Predicts probability an item belongs to a category
  • 28. © 2017 MapR Technologies Supervised Learning: Classification Probability ‱  Logistic Regression (and other algorithms) : –  Predicts probability an item belongs to a category (eg probability of fraud) ‱  What is probablity someone will ( buy, churn, get admitted to hospital ) ? ‱  Probability customer will renew service ‱  Healthcare: –  Probability of readmission
  • 29. © 2017 MapR Technologies Label: Price of house Y X1, X2 Features: square feet, number bedrooms, location Data point: sum of x, price Sales price = intercept + coeff * X1 + coeff2 * X2 Regression Predicts Amount, Estimates relationship between X & Y
  • 30. © 2017 MapR Technologies Regression Predicts by estimating the relationship between variables ‱  Regression predicts a numeric value (eg price) ‱  What will be the ( revenue, product demand , sales , # churners) ‱  Retail Example: –  Sales based on an event ‱  Healthcare Example: –  Days of hospital stay
  • 31. © 2017 MapR Technologies What is Unsupervised Machine Learning? Machine Learning Unsupervised ‱  Clustering –  K-means ‱  Dimensionality reduction –  Principal Component Analysis –  SVD Supervised ‱  Classification –  NaĂŻve Bayes –  SVM –  Random Decision Forests ‱  Regression –  Linear –  Logistic
  • 32. © 2017 MapR Technologies Unsupervised Algorithms use Unlabeled data Customer GroupsBuild ModelTrain Algorithm Finds patterns New Customer Purchase Data Use Model Similar Customer Group Contains patterns Recognizes patterns Customer purchase data
  • 33. © 2017 MapR Technologies Unsupervised Machine Learning: Clustering Clustering group news articles into different categories
  • 34. © 2017 MapR Technologies Unsupervised Learning Learning structure from unlabeled examples NBA Players http://www.sloansportsconference.com/wp-content/uploads/2012/03/Alagappan-Muthu-EOSMarch2012PPT.pdf
  • 35. © 2017 MapR Technologies Clustering: Definition ‱  Groups objects into clusters of high similarity –  Customer segmentation –  Text categorization –  recommendations ‱  Anomaly detection: find what’s not similar x x x x x
  • 36. © 2017 MapR Technologies Clustering Groups objects into Clusters of high similarity ‱  What are the groups of (customers, patients..) with similar (bevahior, purchases, symptoms, illness
) ‱  Healthcare: –  Patient similarity ‱  Retail: –  Group customers by purchases.
  • 37. © 2017 MapR Technologies Bank Customer Segmentation: Bank Products, Card Purchases
  • 38. © 2017 MapR Technologies Association, Co-Occurrence, Market Basket Recommendations ‱  Retail –  Products which are purchased together ‱  Take action: –  Store layouts –  Which products to put on specials, promote, coupons
 ‱  Healthcare –  Patients like mine cohorts
  • 39. © 2017 MapR Technologies Deep Learning
  • 40. © 2017 MapR Technologies Deep Learning Multilayered neural networks
  • 41. © 2017 MapR Technologies The Network is trained with images
  • 42. © 2017 MapR Technologies Neural network neuron or node Each node takes input data and a weight and outputs a confidence score to the next layer
  • 43. © 2017 MapR Technologies Each node outputs a confidence score to the next layer
  • 44. © 2017 MapR Technologies Errors are calculted at the output layer
  • 45. © 2017 MapR Technologies Errors are sent back through the network
  • 46. © 2017 MapR Technologies This process is repeated, adjusting weights, until correct
  • 47. © 2017 MapR Technologies This process is repeated with lots of images
  • 48. © 2017 MapR Technologies Deep Learning During this process layers learn the optimal features for the model
  • 49. © 2017 MapR Technologies Deep Learning Features ‱  Advantage: –  Features do not have to be predetermined ‱  Disadvantage: –  Decisions are a black box Feature Decisions ?
  • 50. © 2017 MapR Technologies Deep Learning in the News! FINANCE AUTON. DRIVING HEALTHCARE VOICE RECOG. 3/27/17 - Hedge funds have been trying to teach computers to think like traders for years. (Bloomberg) 4/3/17 – Daimler
 to deploy autonomous taxis that customers can hail using a smartphone app by the start of the next decade. (Fortune) 3/28/17 - deep learning is being applied to processing medical images 
 eye disease 
 skin cancer (MIT tech review) 3/31/17 - IBM research 
 advancing speech recognition by applying deep learning into acoustic and lang. models (InfoQ)
  • 51. © 2017 MapR Technologies Deep Neural Networks ‱  Classification and ‱  Forecasting Deep Neural Networks
  • 52. © 2017 MapR Technologies Convolutional Neural Networks for Images ‱  Insights from image & video files Convolutional Neural Networks
  • 53. © 2017 MapR Technologies Ex. PATIENT MORTALITY PREDICTION 1Scientific RepoRts | 7: 1648 | DOI:10.1038/s41598-017-01931-w www.nature.com/scientificreports Precision Radiology: Predicting longevity using feature engineering and deep learning methods in a radiomics framework LukeOakden-Rayner1,2 ,GustavoCarneiro3 ,Taryn Bessen1 , JacintoC. Nascimento4 ,Andrew P. Bradley5 & Lyle J. Palmer2 Precision medicine approaches rely on obtaining precise knowledge of the true state of health of an individual patient, which results from a combination of their genetic risks and environmental exposures. This approach is currently limited by the lack of effective and efficient non-invasive medical tests to define the full range of phenotypic variation associated with individual health. Such knowledge is critical for improved early intervention, for better treatment decisions, and for ameliorating the steadily worsening epidemic of chronic disease.We present proof-of-concept experiments to demonstrate how routinely acquired cross-sectionalCT imaging may be used to predict patient longevity as a proxy for overall individual health and disease status using computer image analysis techniques. Despite the limitations of a modest dataset and the use of off-the-shelf machine learning methods, our results are comparable to previous ‘manual’ clinical methods for longevity prediction.This work demonstrates that radiomics techniques can be used to extract biomarkers relevant to one of the most widely used outcomes in epidemiological and clinical research – mortality, and that deep learning with convolutional neural networks can be usefully applied to radiomics research.Computer image analysis applied to routinely collected medical images offers substantial potential to enhance precision medicine initiatives. Measuring phenotypic variation in precision medicine Precision medicine has become a key focus of modern bioscience and medicine, and involves “prevention and treatment strategies that take individual variability into account”, through the use of “large-scale biologic data- bases 
 powerful methods for characterizing patients 
 and computational tools for analysing large sets of data”1 . The variation within individuals that enables the identification of patient subgroups for precision medicine strategies is termed the “phenotype”. The observable phenotype reflects both genomic variation and the accumu- lated lifestyle and environmental exposures that impact biological function - the exposome2 . Precision medicine relies upon the availability of useful biomarkers, defined as “a characteristic that is objec- tively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or phar- macological responses to a therapeutic intervention”3 . A ‘good’ biomarker has the following characteristics: it is sensitive, specific, predictive, robust, bridges clinical and preclinical health states, and is non-invasive4 . Genomics can produce good biomarkers useful for precision medicine5 . There has been significant success in exploring human genetic variation in the field of genomics, where data-driven methods have highlighted the role of human genetic variation in disease diagnosis, prognosis, and treatment response6 . However, for the chronic and age-related diseases which account for the majority of morbidity and mortality in developed nations7 and worldwide8 , the majority (70–90%) of observable phenotypic variation is related to non-genetic determinants9 . 1 Department of a io o o a e ai e ospita ort errace e ai e 5000 ustra ia. c oo of u ic ea t e ni ersit of e ai e ort errace e ai e 5000 ustra ia. 3 c oo of omputer cience e ni ersit of e ai e ort errace e ai e 5000 ustra ia. 4 Instituto uperior cnico is on ortu a . 5 c oo of Information ec no o an ectrica n ineerin e ni ersit of ueens an ui in 78 t ucia D 40 7 ueens an ustra ia. orrespon ence an re uests for materia s s ou e a resse to .O. emai : u eoa enra ner mai .com) Received: 8 December 2016 Accepted: 6 April 2017 Published: xx xx xxxx OPEN Oakden-Rayner, et al., Scientific Reports, May 2017 com/scientificreports/ Figure 4. Images at the level of the proximal left anterior descending coronary artery, with the most strongly predicted mortality and survival cases selected by averaging the predictions from the deep learning and engineered feature models. The mortality cases (left side) demonstrate prominent visual changes of emphysema, cardiomegaly, vascular disease and osteopaenia. The survival cases (right side) appear visually less diseased and frail. Mortality Survival
  • 54. © 2017 MapR Technologies Example: Exploiting Unstructured Data http://www.economist.com/news/science-and-technology/21664943-computers-can- recognise-complication-diabetes-can-lead-blindness-now - Sep 19, 2015 Diabetic Retinopathy: ‱  Challenging to diagnose from image (84% consensus) ‱  Crowd-sourced to Kaggle ‱  Deep-learning and convolutional NN used to classify image data ‱  Winning model showed 85% accuracy rate
  • 55. © 2017 MapR Technologies Recurrent Neural Networks for Sequenced data ‱  Sequence of events and language applications Recurrent Neural Networks
  • 56. © 2017 MapR Technologies To Learn More: ‱  MapR Quick Start solutions https://mapr.com/solutions/big-data-and-hadoop-quick-start-solutions/ ‱  Customer 360, Recommendation Engine, Log Analysis, Risk, Deep Learning
  • 57. © 2017 MapR Technologies MapR Deep Learning QSS New Image to Classify Category Probabilities Training Images
 Category 1 Category N 
 MapR-FS MapR Data Platform Kubernetes Enterprise Storage Database Event Streaming MapR-FS MapR-DB MapR Streams Global Namespace High Availability Data Protection Multi-tenancy Unified Security D MapR Converged Data Platform POD 1 DD MASTER NODE POD 2 POD 3 Parameter Server 1 TF Trainer 1 TF Trainer 2
  • 58. © 2017 MapR Technologies Fit your business model Common Use Cases ‱  Churn prediction ‱  Customer clustering ‱  Product recommendation ‱  Budget optimization ‱  ETA ‱  Sales prediction ‱  Pricing model ‱  
 Cost function -- real business impact ‱  Leverage A/B testing
  • 59. © 2017 MapR Technologies 90+% of effort is logistics, not learning
  • 60. © 2017 MapR Technologies Big Data – Machine Learning Cycle Big Data Identify a problem Prepare Data Model Data Get Insight Test a Solution EvaluateMonitor Deploy Machine LearningReference: head of Machine learning at Uber
  • 61. © 2017 MapR Technologies End to End Streaming Analytics Example Application https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-learning-streaming-and-kafka-api-part-1/
  • 62. © 2017 MapR Technologies MapR Blog ‱ https://www.mapr.com/blog/
  • 63. © 2017 MapR Technologies
  • 64. © 2017 MapR Technologies 
helping you put data technology to work ●  Find answers ●  Ask technical questions ●  Join on-demand training course discussions ●  Follow release announcements ●  Share and vote on product ideas ●  Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  • 65. © 2017 MapR Technologies We reinvented the data platform for next-gen intelligent applications & Data Science On-Premise, In the Cloud, Hybrid NoSQL Webscale Storage MessagingMultiple Processing Engines Real Time Unified Security Multi-tenancy Disaster Recovery Streaming Multiple compute engines and tools operating concurrently Immediate access to vast amounts of diverse data Low latency for millisecond responsiveness Support diverse workloads simultaneously Able to be a reliable system of record Enterprise grade reliability
  • 66. © 2017 MapR Technologies Q&A ENGAGE WITH US