SlideShare a Scribd company logo
A Step Towards Reproducibility 
in R 
H2O World 
November 18 - 19, 2014
2 
R’s popularity is growing rapidly 
IEEE Spectrum Top Programming Languages 
#15: R 
• IEEE Spectrum, July 2014 • RedMonk Programming Language 
Rankings, 2013
3 
R is used more than other data science tools 
• O’Reilly Strata 2013 Data Science 
Salary Survey 
• KDNuggets Poll: Top Languages for 
analytics, data mining, data science
4 
R is among the highest-paid IT skills in the US 
• Dice Tech Salary Survey, January 
2014 
• O’Reilly Strata 2013 Data Science 
Salary Survey
Companies Using R 
5
Google 
“The great beauty of R 
is that you can modify 
it to do all sorts of 
things.” 
— Hal Varian 
Chief Economist, 
Google 
6 
“R is really 
important to the 
point that it's hard 
to overvalue it.” — 
Daryl Pregibon 
Head of 
Statistics, 
Google 
• Advertising 
Effectiveness 
• Economic forecasting
Facebook 
• Exploratory Data 
Analysis 
• Experimental Analysis 
“Generally, we use R to move 
fast when we get a new data 
set. With R, we don’t need to 
develop custom tools or write 
a bunch of code. Instead, we 
can just go about cleaning 
and exploring the data.” — 
Solomon Messing, data 
scientist at Facebook
8 
Twitter 
“A common pattern for me is that I'll code a MapReduce 
job in Scala, do some simple command-line munging on 
the results, pass the data into Python or R for further 
analysis, pull from a database to grab some extra fields, 
and so on, often integrating what I find into some 
machine learning models in the end” — Ed Chen, Data 
Scientist, Twitter 
• Data Visualization • Semantic clustering
9 
Insurance 
• Risk Analysis • Marketing Analytics 
• Catastrophe Modeling
10 
Finance and Banking 
• Credit Risk Analysis • Financial Networks
11 
John Deere 
Statistical Analysis: 
• Short Term Demand Forecasting 
• Crop Forecasting 
• Long Term Demand Forecasting 
• Maintenance and Reliability 
• Production Scheduling 
• Data Coordination
12 
Monsanto 
Statistical Analysis: 
• Plant Breeding 
• Fertility mapping 
• Precision Seeding 
• Disease Management 
• Yield forecasting
13 
Public Affairs 
• Casualty estimation in Warzones • Political Analysis
14 
Pharmaceuticals 
“R use at the FDA is completely 
acceptable and has not caused 
any problems.” — Dr Jae 
Brodsky, Office of 
Biostatistics, Food and Drug 
Administration 
Regulatory Drug Approvals 
• Reproducible research 
• Accurate, reliable and consistent statistical analysis 
• Internal reporting (Section 508 compliance)
15 
Weather and Climate 
• Climate change forecasts • Flood Warnings
16 
Revolution Analytics 
 Open Source development 
– Revolution R Open, RHadoop, 
ParallelR, DeployR Open, Reproducible 
R Toolkit 
– Project funding 
 Community Support 
– User Group Sponsorship 
– Meetups 
– Events sponsorship 
– Revolutions Blog
Reproducibility is the ability of an entire experiment or study 
to be reproduced, either by the researcher or by someone else 
working independently. It is one of the main principles of 
the scientific method …Wikipedia 
Reproducible research is the idea that data analyses, and 
more generally, scientific claims, are published with their 
data and software code so that others may verify the 
findings and build upon them. Roger Peng
Reproducibility – why do we care? 
Academic / Research 
 Verify results 
 Advance Research 
Business 
 Production code 
 Reliability 
 Reusability 
 Collaboration 
 Regulation 
www.nytimes.com/2011/07/08/health/research/08genes.html 
http://arxiv.org/pdf/1010.1092.pdf 
18
19 
An R Reproducibility Problem 
Adapted from http://xkcd.com/234/ CC BY-NC 2.5
20 
Revolution Analytics’ Reproducibility Environment 
 A Distribution of R (RRO) that points to a static CRAN mirror 
 The Checkpoint Server: the static CRAN mirror 
– CRAN packages fixed with each Revolution R Open update (currently 10/1/14) 
 Daily CRAN snapshots 
– Storing every package version since September 2014 
– Binaries and sources 
– At mran.revolutionanalytics.com/snapshot 
 CRAN package checkpoint 
CRAN 
http://mran.revolutionanalytics.com/snapshot/ 
RRDaily 
snapshots 
checkpoint 
package 
library(checkpoint) 
checkpoint("2014-09-17") 
CRAN mirror 
http://cran.revolutionanalytics.com/ 
checkpoint 
server 
Midnight 
UTC
21 
Using Revolution Analytics’ Reproducibility Tools 
 Scenario 1: Set up a consistent, company wide R environment 
– Have users download RRO 
– All users will get the base and recommended packages as of 10/1/14 
– For each project, R user run checkpoint to download a consistent set of packages 
that are appropriate for that project 
 Scenario 2: With or w/o RRO share scripts synced to a snapshot 
– Have the user with whom you are sharing put your scripts in a separate project and 
download the checkpoint package 
– Have the user run checkpoint(“yyyy-mm-dd) with a date appropriate for your 
project 
– Checkpoint will automatically download the correct version of the packages used in 
the scripts
22 
Using checkpoint 
 Easy to use: add 2 lines to the top of each script 
library(checkpoint) 
checkpoint("2014-09-17") 
 For the package author: 
– Use package versions available on the chosen date 
– Installs packages local to this project 
• Allows different package versions to be used simultaneously 
 For a script collaborator: 
– Automatically installs required packages 
• Detects required packages (no need to manually install!) 
– Uses same package versions as script author to ensure reproducibility
23 
# Create a local checkpoint library 
library(checkpoint) 
checkpoint("2014-11-14") 
> library(checkpoint) 
checkpoint: Part of the Reproducible R Toolkit from Revolution Analytics 
http://projects.revolutionanalytics.com/rrt/ 
Warning message: 
package ‘checkpoint’ was built under R version 3.1.2 
> checkpoint("2014-11-14") 
Scanning for loaded pkgs 
Scanning for packages used in this project 
Installing packages used in this project 
Warning: dependencies ‘stats’, ‘tools’, ‘utils’, ‘methods’, ‘graphics’, ‘splines’, ‘grid’, ‘grDevices’ are not available 
also installing the dependencies ‘bitops’, ‘stringr’, ‘digest’, ‘jsonlite’, ‘lattice’, ‘RCurl’, ‘rjson’, ‘statmod’, 
‘survival’, ‘XML’, ‘httr’, ‘Matrix’ 
package ‘bitops’ successfully unpacked and MD5 sums checked 
package ‘stringr’ successfully unpacked and MD5 sums checked 
package ‘digest’ successfully unpacked and MD5 sums checked 
package ‘jsonlite’ successfully unpacked and MD5 sums checked 
package ‘lattice’ successfully unpacked and MD5 sums checked 
package ‘RCurl’ successfully unpacked and MD5 sums checked 
package ‘rjson’ successfully unpacked and MD5 sums checked 
package ‘statmod’ successfully unpacked and MD5 sums checked 
package ‘survival’ successfully unpacked and MD5 sums checked 
package ‘XML’ successfully unpacked and MD5 sums checked 
package ‘httr’ successfully unpacked and MD5 sums checked 
package ‘Matrix’ successfully unpacked and MD5 sums checked 
package ‘h2o’ successfully unpacked and MD5 sums checked 
package ‘miniCRAN’ successfully unpacked and MD5 sums checked 
package ‘igraph’ successfully unpacked and MD5 sums checked
24 
MRAN: The Managed R Archive Network 
 Download RRO 
 Learn about R and RRO 
 Daily CRAN snapshots 
 Explore Packages 
– and dependencies 
 Explore Task Views
Thank You 
Joseph Rickert 
Joseph.rickert@revolutionanalytics.com, @revojoe 
blog.revolutionanalytics.com

More Related Content

PPTX
Simple Reproducibility with the checkpoint package
PPTX
Reproducibility with Checkpoint & RRO - NYC R Conference
PDF
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
PDF
In-Database Analytics Deep Dive with Teradata and Revolution
PPTX
R at Microsoft
PDF
Revolution R - 100% R and More
PPTX
Revolution R: 100% R and more
PPTX
Big data analytics on teradata with revolution r enterprise bill jacobs
Simple Reproducibility with the checkpoint package
Reproducibility with Checkpoint & RRO - NYC R Conference
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
In-Database Analytics Deep Dive with Teradata and Revolution
R at Microsoft
Revolution R - 100% R and More
Revolution R: 100% R and more
Big data analytics on teradata with revolution r enterprise bill jacobs

What's hot (20)

PPTX
R Then and Now
PPTX
Reproducible Data Science with R
PPTX
R reproducibility
PPTX
Revolution R Enterprise - Portland R User Group, November 2013
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
PDF
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
PDF
R and Big Data using Revolution R Enterprise with Hadoop
PPTX
R at Microsoft (useR! 2016)
PDF
Introduction to Microsoft R Services
PDF
Big Data - Analytics with R
PDF
Microsoft R Server for Data Sciencea
PDF
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
PPTX
R at Microsoft
PDF
Rdf saturator
PDF
Moving From SAS to R Webinar Presentation - 07Aug14
PDF
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
PPTX
Big data analytics using R
PPTX
The network structure of cran 2015 07-02 final
PPTX
Big data business case
PPTX
High Performance Predictive Analytics in R and Hadoop
R Then and Now
Reproducible Data Science with R
R reproducibility
Revolution R Enterprise - Portland R User Group, November 2013
Predicting Loan Delinquency at One Million Transactions per Second
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
R and Big Data using Revolution R Enterprise with Hadoop
R at Microsoft (useR! 2016)
Introduction to Microsoft R Services
Big Data - Analytics with R
Microsoft R Server for Data Sciencea
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
R at Microsoft
Rdf saturator
Moving From SAS to R Webinar Presentation - 07Aug14
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Big data analytics using R
The network structure of cran 2015 07-02 final
Big data business case
High Performance Predictive Analytics in R and Hadoop
Ad

Viewers also liked (20)

PDF
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
PDF
Apache kafka
PPTX
Introduction to Apache Kafka
PDF
Apache Flume - DataDayTexas
PDF
DataFrames: The Good, Bad, and Ugly
PDF
50 Best Motivational Quotes to Ignite Your Sales Drive
PDF
How to Become a Thought Leader in Your Niche
PPTX
Good sales person
PDF
The Four Attributes That Drive Sales Growth And Performance
PPT
Sales Training
PPT
Silent Edge, The Sales Performance Authority, short credentials
PDF
How to Develop the Total Person (qualities and attributes of highly effective...
PDF
Differentiate or Die
PDF
6 Attributes of a Great Salesperson from Shark Tank's Kevin O'Leary
PDF
Target employee incentive scheme
PDF
Sales Manager’s Guidebook Volume 3 - Managing Sales Performance
PDF
Sales Performance Motivation
PDF
4 Amazing Sales Tools I Use Every Day - Be Effective - Tools to Close Deals F...
PDF
Good presentations matter
PPT
Incentive plan presentation
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
Apache kafka
Introduction to Apache Kafka
Apache Flume - DataDayTexas
DataFrames: The Good, Bad, and Ugly
50 Best Motivational Quotes to Ignite Your Sales Drive
How to Become a Thought Leader in Your Niche
Good sales person
The Four Attributes That Drive Sales Growth And Performance
Sales Training
Silent Edge, The Sales Performance Authority, short credentials
How to Develop the Total Person (qualities and attributes of highly effective...
Differentiate or Die
6 Attributes of a Great Salesperson from Shark Tank's Kevin O'Leary
Target employee incentive scheme
Sales Manager’s Guidebook Volume 3 - Managing Sales Performance
Sales Performance Motivation
4 Amazing Sales Tools I Use Every Day - Be Effective - Tools to Close Deals F...
Good presentations matter
Incentive plan presentation
Ad

Similar to A Step Towards Reproducibility in R (20)

PDF
Reproducibility with Revolution R Open and the Checkpoint Package
PPTX
Reproducibility with Revolution R Open
PPTX
Reproducibility with Checkpoint & RRO
PDF
R - the language
PPTX
The Powerful Marriage of Hadoop and R (David Champagne)
PPTX
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
PPTX
How the growth of R helps data-driven organizations succeed
PDF
Big Data Analytics with R
PDF
Extending lifespan with Hadoop and R
PDF
Big Data Analysis Starts with R
PPTX
R and Data Science
PPTX
Applications of R (DataWeek 2014)
PDF
Open source analytics
PPTX
Big data analytics with R tool.pptx
PPTX
R_L1-Aug-2022.pptx
PDF
R tutorial
PPTX
Building a Scalable Data Science Platform with R
PPTX
R as supporting tool for analytics and simulation
PDF
Unit1_Introduction to R.pdf
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open
Reproducibility with Checkpoint & RRO
R - the language
The Powerful Marriage of Hadoop and R (David Champagne)
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
How the growth of R helps data-driven organizations succeed
Big Data Analytics with R
Extending lifespan with Hadoop and R
Big Data Analysis Starts with R
R and Data Science
Applications of R (DataWeek 2014)
Open source analytics
Big data analytics with R tool.pptx
R_L1-Aug-2022.pptx
R tutorial
Building a Scalable Data Science Platform with R
R as supporting tool for analytics and simulation
Unit1_Introduction to R.pdf
Why R? A Brief Introduction to the Open Source Statistics Platform

More from Revolution Analytics (15)

PPTX
Speeding up R with Parallel Programming in the Cloud
PPTX
Migrating Existing Open Source Machine Learning to Azure
PPTX
R in Minecraft
PPTX
The case for R for AI developers
PPTX
Speed up R with parallel programming in the Cloud
PPTX
The R Ecosystem
PPTX
The Value of Open Source Communities
PPTX
The R Ecosystem
PPTX
Building a scalable data science platform with R
PPTX
The Business Economics and Opportunity of Open Source Data Science
PPTX
Taking R Analytics to SQL and the Cloud
PPTX
The Network structure of R packages on CRAN & BioConductor
PDF
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
PDF
Warranty Predictive Analytics solution
PDF
Batter Up! Advanced Sports Analytics with R and Storm
Speeding up R with Parallel Programming in the Cloud
Migrating Existing Open Source Machine Learning to Azure
R in Minecraft
The case for R for AI developers
Speed up R with parallel programming in the Cloud
The R Ecosystem
The Value of Open Source Communities
The R Ecosystem
Building a scalable data science platform with R
The Business Economics and Opportunity of Open Source Data Science
Taking R Analytics to SQL and the Cloud
The Network structure of R packages on CRAN & BioConductor
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Warranty Predictive Analytics solution
Batter Up! Advanced Sports Analytics with R and Storm

A Step Towards Reproducibility in R

  • 1. A Step Towards Reproducibility in R H2O World November 18 - 19, 2014
  • 2. 2 R’s popularity is growing rapidly IEEE Spectrum Top Programming Languages #15: R • IEEE Spectrum, July 2014 • RedMonk Programming Language Rankings, 2013
  • 3. 3 R is used more than other data science tools • O’Reilly Strata 2013 Data Science Salary Survey • KDNuggets Poll: Top Languages for analytics, data mining, data science
  • 4. 4 R is among the highest-paid IT skills in the US • Dice Tech Salary Survey, January 2014 • O’Reilly Strata 2013 Data Science Salary Survey
  • 6. Google “The great beauty of R is that you can modify it to do all sorts of things.” — Hal Varian Chief Economist, Google 6 “R is really important to the point that it's hard to overvalue it.” — Daryl Pregibon Head of Statistics, Google • Advertising Effectiveness • Economic forecasting
  • 7. Facebook • Exploratory Data Analysis • Experimental Analysis “Generally, we use R to move fast when we get a new data set. With R, we don’t need to develop custom tools or write a bunch of code. Instead, we can just go about cleaning and exploring the data.” — Solomon Messing, data scientist at Facebook
  • 8. 8 Twitter “A common pattern for me is that I'll code a MapReduce job in Scala, do some simple command-line munging on the results, pass the data into Python or R for further analysis, pull from a database to grab some extra fields, and so on, often integrating what I find into some machine learning models in the end” — Ed Chen, Data Scientist, Twitter • Data Visualization • Semantic clustering
  • 9. 9 Insurance • Risk Analysis • Marketing Analytics • Catastrophe Modeling
  • 10. 10 Finance and Banking • Credit Risk Analysis • Financial Networks
  • 11. 11 John Deere Statistical Analysis: • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Maintenance and Reliability • Production Scheduling • Data Coordination
  • 12. 12 Monsanto Statistical Analysis: • Plant Breeding • Fertility mapping • Precision Seeding • Disease Management • Yield forecasting
  • 13. 13 Public Affairs • Casualty estimation in Warzones • Political Analysis
  • 14. 14 Pharmaceuticals “R use at the FDA is completely acceptable and has not caused any problems.” — Dr Jae Brodsky, Office of Biostatistics, Food and Drug Administration Regulatory Drug Approvals • Reproducible research • Accurate, reliable and consistent statistical analysis • Internal reporting (Section 508 compliance)
  • 15. 15 Weather and Climate • Climate change forecasts • Flood Warnings
  • 16. 16 Revolution Analytics  Open Source development – Revolution R Open, RHadoop, ParallelR, DeployR Open, Reproducible R Toolkit – Project funding  Community Support – User Group Sponsorship – Meetups – Events sponsorship – Revolutions Blog
  • 17. Reproducibility is the ability of an entire experiment or study to be reproduced, either by the researcher or by someone else working independently. It is one of the main principles of the scientific method …Wikipedia Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. Roger Peng
  • 18. Reproducibility – why do we care? Academic / Research  Verify results  Advance Research Business  Production code  Reliability  Reusability  Collaboration  Regulation www.nytimes.com/2011/07/08/health/research/08genes.html http://arxiv.org/pdf/1010.1092.pdf 18
  • 19. 19 An R Reproducibility Problem Adapted from http://xkcd.com/234/ CC BY-NC 2.5
  • 20. 20 Revolution Analytics’ Reproducibility Environment  A Distribution of R (RRO) that points to a static CRAN mirror  The Checkpoint Server: the static CRAN mirror – CRAN packages fixed with each Revolution R Open update (currently 10/1/14)  Daily CRAN snapshots – Storing every package version since September 2014 – Binaries and sources – At mran.revolutionanalytics.com/snapshot  CRAN package checkpoint CRAN http://mran.revolutionanalytics.com/snapshot/ RRDaily snapshots checkpoint package library(checkpoint) checkpoint("2014-09-17") CRAN mirror http://cran.revolutionanalytics.com/ checkpoint server Midnight UTC
  • 21. 21 Using Revolution Analytics’ Reproducibility Tools  Scenario 1: Set up a consistent, company wide R environment – Have users download RRO – All users will get the base and recommended packages as of 10/1/14 – For each project, R user run checkpoint to download a consistent set of packages that are appropriate for that project  Scenario 2: With or w/o RRO share scripts synced to a snapshot – Have the user with whom you are sharing put your scripts in a separate project and download the checkpoint package – Have the user run checkpoint(“yyyy-mm-dd) with a date appropriate for your project – Checkpoint will automatically download the correct version of the packages used in the scripts
  • 22. 22 Using checkpoint  Easy to use: add 2 lines to the top of each script library(checkpoint) checkpoint("2014-09-17")  For the package author: – Use package versions available on the chosen date – Installs packages local to this project • Allows different package versions to be used simultaneously  For a script collaborator: – Automatically installs required packages • Detects required packages (no need to manually install!) – Uses same package versions as script author to ensure reproducibility
  • 23. 23 # Create a local checkpoint library library(checkpoint) checkpoint("2014-11-14") > library(checkpoint) checkpoint: Part of the Reproducible R Toolkit from Revolution Analytics http://projects.revolutionanalytics.com/rrt/ Warning message: package ‘checkpoint’ was built under R version 3.1.2 > checkpoint("2014-11-14") Scanning for loaded pkgs Scanning for packages used in this project Installing packages used in this project Warning: dependencies ‘stats’, ‘tools’, ‘utils’, ‘methods’, ‘graphics’, ‘splines’, ‘grid’, ‘grDevices’ are not available also installing the dependencies ‘bitops’, ‘stringr’, ‘digest’, ‘jsonlite’, ‘lattice’, ‘RCurl’, ‘rjson’, ‘statmod’, ‘survival’, ‘XML’, ‘httr’, ‘Matrix’ package ‘bitops’ successfully unpacked and MD5 sums checked package ‘stringr’ successfully unpacked and MD5 sums checked package ‘digest’ successfully unpacked and MD5 sums checked package ‘jsonlite’ successfully unpacked and MD5 sums checked package ‘lattice’ successfully unpacked and MD5 sums checked package ‘RCurl’ successfully unpacked and MD5 sums checked package ‘rjson’ successfully unpacked and MD5 sums checked package ‘statmod’ successfully unpacked and MD5 sums checked package ‘survival’ successfully unpacked and MD5 sums checked package ‘XML’ successfully unpacked and MD5 sums checked package ‘httr’ successfully unpacked and MD5 sums checked package ‘Matrix’ successfully unpacked and MD5 sums checked package ‘h2o’ successfully unpacked and MD5 sums checked package ‘miniCRAN’ successfully unpacked and MD5 sums checked package ‘igraph’ successfully unpacked and MD5 sums checked
  • 24. 24 MRAN: The Managed R Archive Network  Download RRO  Learn about R and RRO  Daily CRAN snapshots  Explore Packages – and dependencies  Explore Task Views
  • 25. Thank You Joseph Rickert Joseph.rickert@revolutionanalytics.com, @revojoe blog.revolutionanalytics.com

Editor's Notes

  • #4: http://blog.revolutionanalytics.com/2014/02/r-salary-surveys.html http://blog.revolutionanalytics.com/2014/01/in-data-scientist-survey-r-is-the-most-used-tool-other-than-databases.html http://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.html http://blog.revolutionanalytics.com/2014/02/r-is-15th-of-top-programming-languages-in-latest-redmonk-ranking.html http://blog.revolutionanalytics.com/2013/09/top-languages-for-data-science.html
  • #5: Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
  • #7: A
  • #9: http://blog.revolutionanalytics.com/2013/05/the-arteries-of-the-world-in-tweets.html http://blog.revolutionanalytics.com/2012/03/r-twitter-and-mcdonalds.html
  • #10: Deloitte: http://www.revolutionanalytics.com/free-webinars/actuarial-analytics-r
  • #11: Credit Suisse: http://blog.revolutionanalytics.com/2013/05/sheftel-on-r-on-the-trading-desk.html
  • #12: http://www.revolutionanalytics.com/free-webinars/order-fulfillment-forecasting-john-deere-how-r-facilitates-creativity-and-flexibility http://blog.revolutionanalytics.com/2012/11/video-how-john-deere-uses-r.html
  • #13: http://blog.revolutionanalytics.com/2013/11/strata-hadoop-world-2013-recap.html