SlideShare a Scribd company logo
David Smith
                           Revolution Analytics
                                   @revodavid




Real-Time Big Data Analytics
From Deployment to Production


                                            1
2
Buzzword
 Bingo!


           REAL TIME

           BIG DATA

   PREDICTIVE ANALYTICS
                          3
Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0   4
User ID
Predictive                                         Browser
                     Factors                       Time/Date / Location
                                                    Any known information
Analytics                                          Previous purchases
                                                   Friend data
Model
                                                                   Decision Tree
                                                                   Logistic Regression
                                                                   Neural Network
                                                                   Predictive Model
                                                                   K-means clustering
                   Scoring Rules                                   Ensemble Model

                                                   Product of most interest
                                                   Offer of most likely sale
                      Scores                       Most relevant Selection
                                                   Prediction or link
                                                   Forecast sale value
                                                   Optimal Bid
             ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0   5
Real-time Deployment
1. Data distillation
2. Model development and
   validation
3. Model deployment
4. Real-time model scoring
5. Model refresh
                 "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0   6
1. Data Distillation in Hadoop

   Log Files


Sensor Streams HDFS Load    Map-Reduce   Structured
                                            Data
                                 rmr
  Language Text


 Unstructured                            Analytics
    Data                                 Data Mart

                                                      7
2. The Model Development Cycle
                                    Feature
                                   Selection
                                   Sampling
                                   Aggregati
                                      on
                   Model
                  Comparis                             Variable
Structured Data     on /
                   Bench-
                                                        Trans-
                                                      formation
                                                                  Predictive Model
                  marking




                         Model
                                                 Model
                        Refineme
                            nt
                                               Estimation           R White Paper
                                                                        bit.ly/r-is-hot



                                                                                          8
3: Deployment Options
                                 Factors
 Unknown factors
   SQL / Rules Engine
   Code (C++, Java, R, Hadoop)
   PMML Engine
 Factors known in advance
   Batch Lookup Tables           Scores


                                           9
Why did I buy that blender?
 Just browsing in the mall
 TV ad / magazine ad
 Coupon in the mail
 “Just moved” promo email
 Webstore recommendation
 Browsing catalog

                              10
UpStream: Attribution Modeling




                                 11
4. Model
                                  • Exploratory data analysis
Scoring                           • Time-to-event models
                                  • GAM survival models


UPSTREAM DATA                                                                        CUSTOM VARIABLES
FORMAT                                                                                         (PMML)




     •   ETL                                                    • Scoring for inference
     •   Marketing channel data                                 • Scoring for prediction
     •   Behavioral variables
                                                                • 5 billion scores per day
     •   Promotional data                                         per retailer
     •   Overlay data
5. Model refresh      Factors




                       Scores

                   Actual Outcomes
Big Data     Real Time
Kilobytes/S
               Seconds
     ec

Megabytes/
              Milliseconds
   Sec


 Gigabytes
                Minutes
 Terabytes



Petabytes    Minutes 
 Exabytes       Hours

                             14
PREDICTIVE
ANALYTICS
 BIG DATA

REAL TIME
             15
Real-Time Big Data Predictive Analytics:                                            David Smith
From Deployment to Production                                                             @revodavid




             The leading enterprise provider of software and services for Open Source R



                          Booth 618 / Office Hours Weds 1:30PM

    www.revolutionanalytics.com             +1 650 646 9545               Twitter: @RevolutionR




                                                                                                  16

More Related Content

PDF
Predictive Analytics: Advanced techniques in data mining
PDF
Predictive Analytics - Big Data & Artificial Intelligence
PPTX
Predictive analytics and big data tutorial
PPTX
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
PDF
AI & Big Data Analytics : Innovation trends and use cases
PPT
Real time analytics of big data
PDF
Big Data
PDF
Data science Applications in the Enterprise
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics - Big Data & Artificial Intelligence
Predictive analytics and big data tutorial
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
AI & Big Data Analytics : Innovation trends and use cases
Real time analytics of big data
Big Data
Data science Applications in the Enterprise

What's hot (20)

PPTX
Predicting Customer Behavior With Big Data
PPTX
Unit i big data introduction
PDF
Apply (Big) Data Analytics & Predictive Analytics to Business Application
DOCX
Tools for Unstructured Data Analytics
PDF
On Big Data Analytics - opportunities and challenges
PDF
Come diventare data scientist - Paolo Pellegrini
PPTX
Big data - Key Enablers, Drivers & Challenges
PPTX
Data Science Innovations : Democratisation of Data and Data Science
PPTX
Big data deep learning: applications and challenges
PPTX
Ehr challenges [bigdata]
PDF
BigData Analytics_1.7
PPTX
Exploring Big Data Analytics Tools
PDF
Big Data analytics
PDF
Lecture1 introduction to big data
PDF
How to design ai functions to the cloud native infra
PDF
Data science and visualization lab presentation
PPTX
BIG DATA & DATA ANALYTICS
PPTX
Big data Presentation
PPTX
Presentation on Big Data Analytics
PDF
Demystify big data data science
Predicting Customer Behavior With Big Data
Unit i big data introduction
Apply (Big) Data Analytics & Predictive Analytics to Business Application
Tools for Unstructured Data Analytics
On Big Data Analytics - opportunities and challenges
Come diventare data scientist - Paolo Pellegrini
Big data - Key Enablers, Drivers & Challenges
Data Science Innovations : Democratisation of Data and Data Science
Big data deep learning: applications and challenges
Ehr challenges [bigdata]
BigData Analytics_1.7
Exploring Big Data Analytics Tools
Big Data analytics
Lecture1 introduction to big data
How to design ai functions to the cloud native infra
Data science and visualization lab presentation
BIG DATA & DATA ANALYTICS
Big data Presentation
Presentation on Big Data Analytics
Demystify big data data science
Ad

Viewers also liked (20)

PPTX
Predictive Analytics - An Overview
PDF
Mobile Commerce - como aprender, medir e converter
PDF
The 2012 Future of Open Source Survey Results
DOCX
R2DOCX example
PPTX
Gercek Zamanli Odeme Sistemleri Analitigi
PPTX
Conversion Optimization with Realtime Payment Analytics - 2014-11-19
PPTX
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
PPTX
Predicting Consumer Behaviour via Hadoop
DOCX
mapReduce for machine learning
PPT
Daniel Abadi HadoopWorld 2010
PDF
Design patterns in MapReduce
PPTX
Predictive Analytics on Big Data. DIY or BUY?
PPTX
Population Health Management, Predictive Analytics, Big Data and Text Analytics
PPTX
Decision trees in hadoop
PDF
Evaluating Big Data Predictive Analytics Platforms
PDF
From Business Intelligence to Predictive Analytics
PPTX
Big data and Predictive Analytics By : Professor Lili Saghafi
PDF
Predictive Analytics - How to get stuff out of your Crystal Ball
PDF
Seminar_Report_hadoop
PDF
Application of MapReduce in Cloud Computing
Predictive Analytics - An Overview
Mobile Commerce - como aprender, medir e converter
The 2012 Future of Open Source Survey Results
R2DOCX example
Gercek Zamanli Odeme Sistemleri Analitigi
Conversion Optimization with Realtime Payment Analytics - 2014-11-19
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
Predicting Consumer Behaviour via Hadoop
mapReduce for machine learning
Daniel Abadi HadoopWorld 2010
Design patterns in MapReduce
Predictive Analytics on Big Data. DIY or BUY?
Population Health Management, Predictive Analytics, Big Data and Text Analytics
Decision trees in hadoop
Evaluating Big Data Predictive Analytics Platforms
From Business Intelligence to Predictive Analytics
Big data and Predictive Analytics By : Professor Lili Saghafi
Predictive Analytics - How to get stuff out of your Crystal Ball
Seminar_Report_hadoop
Application of MapReduce in Cloud Computing
Ad

Similar to Real-time Big Data Analytics: From Deployment to Production (20)

PDF
Real-time Big Data Analytics: From Deployment to Production
PDF
Best practices for building and deploying predictive models over big data pre...
PDF
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
PDF
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
PDF
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
PDF
Today's BI and Data Mining ecosystem
PPTX
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
PDF
Big Data and Implications on Platform Architecture
PDF
Today's bi and data mining ecosystem v2
PDF
Analyzing Multi-Structured Data
PDF
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
PDF
Scaling MySQL: Benefits of Automatic Data Distribution
PDF
Marshall Sponder - Social Media Monitoring Analytics - Measure13
PDF
Barak regev
PDF
IBM Cognos - Vad handlar egentligen prediktiv analys om?
PDF
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
PDF
Big Data Beyond Hadoop*: Research Directions for the Future
PPTX
Anexinet Big Data Solutions
PDF
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
PDF
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
Real-time Big Data Analytics: From Deployment to Production
Best practices for building and deploying predictive models over big data pre...
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
Today's BI and Data Mining ecosystem
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
Big Data and Implications on Platform Architecture
Today's bi and data mining ecosystem v2
Analyzing Multi-Structured Data
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
Scaling MySQL: Benefits of Automatic Data Distribution
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Barak regev
IBM Cognos - Vad handlar egentligen prediktiv analys om?
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Big Data Beyond Hadoop*: Research Directions for the Future
Anexinet Big Data Solutions
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
An Integrated Framework for Parameter-based Optimization of Scientific Workflows

More from Revolution Analytics (20)

PPTX
Speeding up R with Parallel Programming in the Cloud
PPTX
Migrating Existing Open Source Machine Learning to Azure
PPTX
R in Minecraft
PPTX
The case for R for AI developers
PPTX
Speed up R with parallel programming in the Cloud
PPTX
The R Ecosystem
PPTX
R Then and Now
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
PPTX
Reproducible Data Science with R
PPTX
The Value of Open Source Communities
PPTX
The R Ecosystem
PPTX
R at Microsoft (useR! 2016)
PPTX
Building a scalable data science platform with R
PPTX
R at Microsoft
PPTX
The Business Economics and Opportunity of Open Source Data Science
PPTX
Taking R Analytics to SQL and the Cloud
PPTX
The Network structure of R packages on CRAN & BioConductor
PPTX
The network structure of cran 2015 07-02 final
PPTX
Simple Reproducibility with the checkpoint package
PPTX
R at Microsoft
Speeding up R with Parallel Programming in the Cloud
Migrating Existing Open Source Machine Learning to Azure
R in Minecraft
The case for R for AI developers
Speed up R with parallel programming in the Cloud
The R Ecosystem
R Then and Now
Predicting Loan Delinquency at One Million Transactions per Second
Reproducible Data Science with R
The Value of Open Source Communities
The R Ecosystem
R at Microsoft (useR! 2016)
Building a scalable data science platform with R
R at Microsoft
The Business Economics and Opportunity of Open Source Data Science
Taking R Analytics to SQL and the Cloud
The Network structure of R packages on CRAN & BioConductor
The network structure of cran 2015 07-02 final
Simple Reproducibility with the checkpoint package
R at Microsoft

Real-time Big Data Analytics: From Deployment to Production

  • 1. David Smith Revolution Analytics @revodavid Real-Time Big Data Analytics From Deployment to Production 1
  • 2. 2
  • 3. Buzzword Bingo! REAL TIME BIG DATA PREDICTIVE ANALYTICS 3
  • 4. Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 4
  • 5. User ID Predictive Browser Factors Time/Date / Location Any known information Analytics Previous purchases Friend data Model Decision Tree Logistic Regression Neural Network Predictive Model K-means clustering Scoring Rules Ensemble Model Product of most interest Offer of most likely sale Scores Most relevant Selection Prediction or link Forecast sale value Optimal Bid ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 5
  • 6. Real-time Deployment 1. Data distillation 2. Model development and validation 3. Model deployment 4. Real-time model scoring 5. Model refresh "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0 6
  • 7. 1. Data Distillation in Hadoop Log Files Sensor Streams HDFS Load Map-Reduce Structured Data rmr Language Text Unstructured Analytics Data Data Mart 7
  • 8. 2. The Model Development Cycle Feature Selection Sampling Aggregati on Model Comparis Variable Structured Data on / Bench- Trans- formation Predictive Model marking Model Model Refineme nt Estimation R White Paper bit.ly/r-is-hot 8
  • 9. 3: Deployment Options Factors Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables Scores 9
  • 10. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 10
  • 12. 4. Model • Exploratory data analysis Scoring • Time-to-event models • GAM survival models UPSTREAM DATA CUSTOM VARIABLES FORMAT (PMML) • ETL • Scoring for inference • Marketing channel data • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per retailer • Overlay data
  • 13. 5. Model refresh Factors Scores Actual Outcomes
  • 14. Big Data Real Time Kilobytes/S Seconds ec Megabytes/ Milliseconds Sec Gigabytes Minutes  Terabytes Petabytes  Minutes  Exabytes Hours 14
  • 16. Real-Time Big Data Predictive Analytics: David Smith From Deployment to Production @revodavid The leading enterprise provider of software and services for Open Source R Booth 618 / Office Hours Weds 1:30PM www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR 16

Editor's Notes

  • #4: Get out your buzzword bingo cards!
  • #5: Data as “new oil” – valuable commodityBig Data is crude oil: messy, hard to get at, got contaminants in it.
  • #6: Start off with stuff we know in real time.
  • #9: Model development processNot just about the computational speed. Also about productivity of developer.
  • #12: Demographics: consumer, product, marketActions: web clicks, email clicks, mobile app usage, call center logs, social, search …Outcomes: impressions, touches, orders (retail, online, mobile)Strategic allocation
  • #13: Outcome is “buying” instead of “dying”
  • #17: From Revolution Analytics. We help companies deploy predictive models created in R to real-time production systems.