SlideShare a Scribd company logo
Rapid Productionalization of Predictive Models 
In-database Modeling with Revolution Analytics on Teradata 
Skylar Lyon 
Accenture Analytics
Introduction 
Skylar Lyon 
Accenture Analytics 
• 7 years of experience with focus on big data 
and predictive analytics - using discrete choice 
modeling, random forest classification, 
ensemble modeling, and clustering 
• Technology experience includes: Hadoop, 
Accumulo, PostgreSQL, qGIS, JBoss, Tomcat, 
R, GeoMesa, and more 
• Worked from Army installations across the 
nation and also had the opportunity to travel 
twice to Baghdad to deploy solutions 
downrange. 
Copyright © 2014 Accenture. All rights reserved. 2
How we got here 
Project background and my involvement 
• New Customer Analytics team for Silicon Valley Internet eCommerce 
giant 
• Data scientists developing predictive models 
• Deferred focus on productionalization 
• Joined as Big Data Infrastructure and Analytics Lead 
Copyright © 2014 Accenture. All rights reserved. 3
Colleague‘s CRAN R model 
Binomial logistic regression 
• 50+ Independent variables including categorical with indicator 
variables 
• Train from small sample (many thousands) – not a problem in and of 
itself 
• Scoring across entire corpus (many hundred millions) – slightly more 
challenging 
Copyright © 2014 Accenture. All rights reserved. 4
We optimized the current productionalization process 
We moved compute to data 
Before After 
Reduced 5+ hour process to 40 seconds 
Copyright © 2014 Accenture. All rights reserved. 5
Benchmarking our optimized process 
5+ hours to 40 seconds: Recommendation is that this now become 
the defacto productionalization process 
Copyright © 2014 Accenture. All rights reserved. 6 
rows 
minutes
Optimization process 
Recode CRAN R to Rx R 
Before 
trainit <- glm(as.formula(specs[[i]]), data = training.data, 
family='binomial', maxit=iters) 
fits <- predict(trainit, newdata=test.data, type='response') 
After 
trainit <- rxGlm(as.formula(specs[[i]]), data = training.data, 
family='binomial', maxIterations=iters) 
fits <- rxPredict(trainit, newdata=test.data, type='response') 
Copyright © 2014 Accenture. All rights reserved. 7
Additional benefits to new process 
Technology is increasing data science team’s options and 
opportunities 
• Train in-database on much larger set – reduces need to sample 
• Nearly “native” R language – decrease deploy time 
• Hadoop support – score in multiple data warehouses 
Copyright © 2014 Accenture. All rights reserved. 8
Appendix 
Table of Contents 
• Technical Considerations 
Copyright © 2014 Accenture. All rights reserved. 9
Technical considerations 
Environment setup 
• Teradata environment – 4 node, 1700 series appliance server 
• Revolution R Enterprise – version 7.1, running R 3.0.2 
Copyright © 2014 Accenture. All rights reserved. 10

More Related Content

PDF
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
PDF
High Performance Predictive Analytics in R and Hadoop
PPTX
R and Data Science
PPTX
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
PDF
High Performance Predictive Analytics in R and Hadoop
PPTX
Are You Ready for Big Data Big Analytics?
PPTX
Revolution R Enterprise - Portland R User Group, November 2013
PPTX
High Performance Predictive Analytics in R and Hadoop
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
High Performance Predictive Analytics in R and Hadoop
R and Data Science
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
High Performance Predictive Analytics in R and Hadoop
Are You Ready for Big Data Big Analytics?
Revolution R Enterprise - Portland R User Group, November 2013
High Performance Predictive Analytics in R and Hadoop

What's hot (20)

PPTX
R at Microsoft (useR! 2016)
PDF
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
PDF
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
PDF
R server and spark
PPTX
Building a scalable data science platform with R
PPTX
R at Microsoft
PDF
AI on Spark for Malware Analysis and Anomalous Threat Detection
PDF
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
PDF
Introduction to TitanDB
PPTX
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
PDF
R and Big Data using Revolution R Enterprise with Hadoop
PPTX
DeployR: Revolution R Enterprise with Business Intelligence Applications
PDF
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
PDF
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
PPTX
Data Science at Scale by Sarah Guido
PDF
Basics of Digital Design and Verilog
PDF
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
PDF
Pandas UDF: Scalable Analysis with Python and PySpark
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
PPTX
How Spark Enables the Internet of Things- Paula Ta-Shma
R at Microsoft (useR! 2016)
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
R server and spark
Building a scalable data science platform with R
R at Microsoft
AI on Spark for Malware Analysis and Anomalous Threat Detection
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Introduction to TitanDB
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R and Big Data using Revolution R Enterprise with Hadoop
DeployR: Revolution R Enterprise with Business Intelligence Applications
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Data Science at Scale by Sarah Guido
Basics of Digital Design and Verilog
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Pandas UDF: Scalable Analysis with Python and PySpark
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
How Spark Enables the Internet of Things- Paula Ta-Shma
Ad

Viewers also liked (11)

PPTX
Through the firewall with miniCRAN
PDF
Company Introduction-OptimumNano Energy Co., Ltd
PPT
PPTX
Applications of R (DataWeek 2014)
PDF
Route2 Company Introduction 25.07.11
PPS
ATTEND Company Introduction 201507
PPT
BPM Business Value Patterns
PPT
We Fashion Company Introduction
PDF
Chemicals: Smarter Investments, Outstanding Results
PDF
Digital Disruption Nordic Retail Banking_10june_digital
PPTX
Introducing a presentation
Through the firewall with miniCRAN
Company Introduction-OptimumNano Energy Co., Ltd
Applications of R (DataWeek 2014)
Route2 Company Introduction 25.07.11
ATTEND Company Introduction 201507
BPM Business Value Patterns
We Fashion Company Introduction
Chemicals: Smarter Investments, Outstanding Results
Digital Disruption Nordic Retail Banking_10june_digital
Introducing a presentation
Ad

Similar to Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata (20)

PDF
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
PDF
Data Science for Business Managers - An intro to ROI for predictive analytics
PDF
Can We Automate Predictive Analytics
PDF
Barga Galvanize Sept 2015
PDF
The 3 Key Barriers Keeping Companies from Deploying Data Products
PDF
Big Data Science - hype?
PPTX
how to build a Length of Stay model for a ProofOfConcept project
PDF
"What we learned from 5 years of building a data science software that actual...
PDF
Data Driven Engineering 2014
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PDF
Mastering Predictive Analytics with R 2nd edition Edition Forte
PPTX
Machine Learning - Startup weekend UCSB 2018
PDF
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
PDF
Machine learning systems for engineers
PDF
Data Analysis - Making Big Data Work
PPTX
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
PPTX
Introduction to Data Science
PPT
Int'l Conference on Predictive APIs: RTB Optimizer presentation
PDF
Predictive Analytics in Practice - Breakfast Club 11th May 2017
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Data Science for Business Managers - An intro to ROI for predictive analytics
Can We Automate Predictive Analytics
Barga Galvanize Sept 2015
The 3 Key Barriers Keeping Companies from Deploying Data Products
Big Data Science - hype?
how to build a Length of Stay model for a ProofOfConcept project
"What we learned from 5 years of building a data science software that actual...
Data Driven Engineering 2014
Mastering Predictive Analytics with R 2nd edition Edition Forte
Mastering Predictive Analytics with R 2nd edition Edition Forte
Mastering Predictive Analytics with R 2nd edition Edition Forte
Machine Learning - Startup weekend UCSB 2018
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...
Machine learning systems for engineers
Data Analysis - Making Big Data Work
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
Introduction to Data Science
Int'l Conference on Predictive APIs: RTB Optimizer presentation
Predictive Analytics in Practice - Breakfast Club 11th May 2017

More from Revolution Analytics (20)

PPTX
Speeding up R with Parallel Programming in the Cloud
PPTX
Migrating Existing Open Source Machine Learning to Azure
PPTX
R in Minecraft
PPTX
The case for R for AI developers
PPTX
Speed up R with parallel programming in the Cloud
PPTX
The R Ecosystem
PPTX
R Then and Now
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
PPTX
Reproducible Data Science with R
PPTX
The Value of Open Source Communities
PPTX
The R Ecosystem
PPTX
R at Microsoft
PPTX
The Business Economics and Opportunity of Open Source Data Science
PPTX
Taking R Analytics to SQL and the Cloud
PPTX
The Network structure of R packages on CRAN & BioConductor
PPTX
The network structure of cran 2015 07-02 final
PPTX
Simple Reproducibility with the checkpoint package
PDF
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
PDF
Warranty Predictive Analytics solution
PPTX
Reproducibility with Checkpoint & RRO - NYC R Conference
Speeding up R with Parallel Programming in the Cloud
Migrating Existing Open Source Machine Learning to Azure
R in Minecraft
The case for R for AI developers
Speed up R with parallel programming in the Cloud
The R Ecosystem
R Then and Now
Predicting Loan Delinquency at One Million Transactions per Second
Reproducible Data Science with R
The Value of Open Source Communities
The R Ecosystem
R at Microsoft
The Business Economics and Opportunity of Open Source Data Science
Taking R Analytics to SQL and the Cloud
The Network structure of R packages on CRAN & BioConductor
The network structure of cran 2015 07-02 final
Simple Reproducibility with the checkpoint package
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Warranty Predictive Analytics solution
Reproducibility with Checkpoint & RRO - NYC R Conference

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Electronic commerce courselecture one. Pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
sap open course for s4hana steps from ECC to s4
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Electronic commerce courselecture one. Pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

  • 1. Rapid Productionalization of Predictive Models In-database Modeling with Revolution Analytics on Teradata Skylar Lyon Accenture Analytics
  • 2. Introduction Skylar Lyon Accenture Analytics • 7 years of experience with focus on big data and predictive analytics - using discrete choice modeling, random forest classification, ensemble modeling, and clustering • Technology experience includes: Hadoop, Accumulo, PostgreSQL, qGIS, JBoss, Tomcat, R, GeoMesa, and more • Worked from Army installations across the nation and also had the opportunity to travel twice to Baghdad to deploy solutions downrange. Copyright © 2014 Accenture. All rights reserved. 2
  • 3. How we got here Project background and my involvement • New Customer Analytics team for Silicon Valley Internet eCommerce giant • Data scientists developing predictive models • Deferred focus on productionalization • Joined as Big Data Infrastructure and Analytics Lead Copyright © 2014 Accenture. All rights reserved. 3
  • 4. Colleague‘s CRAN R model Binomial logistic regression • 50+ Independent variables including categorical with indicator variables • Train from small sample (many thousands) – not a problem in and of itself • Scoring across entire corpus (many hundred millions) – slightly more challenging Copyright © 2014 Accenture. All rights reserved. 4
  • 5. We optimized the current productionalization process We moved compute to data Before After Reduced 5+ hour process to 40 seconds Copyright © 2014 Accenture. All rights reserved. 5
  • 6. Benchmarking our optimized process 5+ hours to 40 seconds: Recommendation is that this now become the defacto productionalization process Copyright © 2014 Accenture. All rights reserved. 6 rows minutes
  • 7. Optimization process Recode CRAN R to Rx R Before trainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters) fits <- predict(trainit, newdata=test.data, type='response') After trainit <- rxGlm(as.formula(specs[[i]]), data = training.data, family='binomial', maxIterations=iters) fits <- rxPredict(trainit, newdata=test.data, type='response') Copyright © 2014 Accenture. All rights reserved. 7
  • 8. Additional benefits to new process Technology is increasing data science team’s options and opportunities • Train in-database on much larger set – reduces need to sample • Nearly “native” R language – decrease deploy time • Hadoop support – score in multiple data warehouses Copyright © 2014 Accenture. All rights reserved. 8
  • 9. Appendix Table of Contents • Technical Considerations Copyright © 2014 Accenture. All rights reserved. 9
  • 10. Technical considerations Environment setup • Teradata environment – 4 node, 1700 series appliance server • Revolution R Enterprise – version 7.1, running R 3.0.2 Copyright © 2014 Accenture. All rights reserved. 10

Editor's Notes

  • #4: Problem statement
  • #5: Gabi’s binomial logistic regression model Admittedly, could be recoded to SQL, but not so easy with random forest and more powerful ensemble models
  • #6: Lots of data movement; 6+ hour process
  • #8: Show some CRAN R versus Rx R code