SlideShare a Scribd company logo
A Statistician Walks into a Tech Company
R at a rapidly scaling healthcare technology startup
Sandy Griffith
Twitter: @sgrifter
sgriffith@flatiron.com
www.flatiron.com
My story
Academic biostatistics
© 2016 Flatiron Health, Inc. Proprietary and confidential.
My story
3
Academic biostatistics Healthcare tech
© 2016 Flatiron Health, Inc. Proprietary and confidential. 4
Flatiron’s mission is to serve cancer patients and our
partners by dramatically improving treatment and
accelerating research.
Our Mission
Flatiron Processes EHR Data At Scale
© 2016 Flatiron Health, Inc. Proprietary and confidential. 5
Research-
Grade Data
Demographics
Diagnosis
Visits
Labs
e-Prescribing
Pathology
Report
Discharge
Notes
Radiology
Report
Physician
Notes
Electronic Health
Record
Structured Data Unstructured Data Outside
Practice
Hospital
Lab
Structured Data
Processing
Unstructured
Data
Processing
Standard EHR Data
Rapidly Scaling
January 2015
Flatiron: ~140
Software Engineers: ~50
Quantitative Sciences team: 1
6© 2016 Flatiron Health, Inc. Proprietary and confidential.
Now: We are a team of 262
7
We include…
All Flatiron data and tools are collaboratively built, implemented and maintained by a
cross-disciplinary team that includes oncology, engineering, and quantitative sciences
We come from…
9 Medical oncologists and nurses
70 Software engineers
10 Quantitative scientists
5 Medical informaticists
+ more!
© 2016 Flatiron Health, Inc. Proprietary and confidential.
Primary Language: time of hire
© 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R: time of hire
9© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
10© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
11© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
12© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
13© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R
14© 2016 Flatiron Health, Inc. Proprietary and confidential.
Time of hire Now
Now we have R users, but when should we use R?
Three scenarios:
1. R for prototyping → !R in production
2. R as a long-term solution
3. R and !R in parallel
15© 2016 Flatiron Health, Inc. Proprietary and confidential.
R for prototyping → !R in production
16© 2016 Flatiron Health, Inc. Proprietary and confidential.
Prototype
● One-time linkage
● Small cohort (10s of thousands)
● RecordLinkage R package
● Probabilistic linkage method using
EM algorithm
Production
● Repeated daily at scale
● Large cohort (~5 million patients)
● Code maintained by different team
● Deterministic logic in SQL
Example: Linking external mortality data
R for prototyping → !R in production
Why this made sense:
● Stable method -- No longer needed rapid iteration
● Tuning parameters
● Similar performance, more transparency
● No R users on team that would be maintaining code
17© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Linking external mortality data
R as a long-term solution
Early version (Jan 2015)
18© 2016 Flatiron Health, Inc. Proprietary and confidential.
● bash commands for extracting data
run from R script using ETL tool
● R script run via command line
● parameters in metafiles manually
updated
● Runs a series of Rmd files and
renders HTML output
Current Version (April 2016)
Example: Rmarkdown QA report
● linked to data pipeline maintained
by software engineering
● metafile generated dynamically
● Plotly survival curves
● Flatly bootstrap theme
● Plan to continue using R
indefinitely
R as a long-term solution
19© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Rmarkdown QA report
Why this made sense:
● Mature product and team
● Quantitative science members remain embedded in team
● Strong support and collaboration with software engineering
● Requirements are dynamic -- continued need for rapid
prototyping
R and !R in parallel
● Specific research questions
● 2 people code independently in Python/SQL and R
● Compare results
● Language sometimes incidental, more about 2 different perspectives
Why this made sense:
● High stakes or low error tolerance
● Complicated concepts
● Custom projects often involve novel problems
20© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Some external collaborations
Thank you
● Melissa Curtis
● Josh Kraut
● Kathi Seidl-Rathkopf
● Cindy Revol
● Rachael Sorg
● Jay Rughani
21© 2016 Flatiron Health, Inc. Proprietary and confidential.
● Paul You
● Aracelis Torres
● Alphan Kirayoglu
● Ben Birnbaum
● Ann Jaskiw
● James Gippetti
Join our Team!
Drop me a note at sgriffith@flatiron.com, @sgrifter,
or visit flatiron.com/careers

More Related Content

PDF
Building Scalable Prediction Services in R
PDF
Scaling Analysis Responsibly
PDF
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
PDF
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
PDF
Improving data interoperability in Python and R
PPTX
Cloud-native Enterprise Data Science Teams
PPTX
Beyond the Science Gateway
PPTX
Anaconda Data Science Collaboration
Building Scalable Prediction Services in R
Scaling Analysis Responsibly
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
Improving data interoperability in Python and R
Cloud-native Enterprise Data Science Teams
Beyond the Science Gateway
Anaconda Data Science Collaboration

What's hot (16)

PDF
High-Performance Python
PDF
#rstats lessons for #measure
PPT
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
PDF
Agile Data
PDF
Using airflow for tools development
PDF
Agile Data Science
PDF
Web Applications of the Future with TypeScript and GraphQL
PDF
分析革命がもたらすビッグデータの世界@Cloudera World Tokyo 2014
PPTX
The crusade for big data in the AAL domain
PPTX
How to do Keyword Research: 7 Techniques & Tools
PDF
Big data debunking some of the myths
PDF
Julia + R for Data Science
PPTX
Continuous Integration - NoVA CodeCamp 2014-10-11
PPTX
MLconf NYC Josh Wills
PDF
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
PDF
Automated and Explainable Deep Learning for Clinical Language Understanding a...
High-Performance Python
#rstats lessons for #measure
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
Agile Data
Using airflow for tools development
Agile Data Science
Web Applications of the Future with TypeScript and GraphQL
分析革命がもたらすビッグデータの世界@Cloudera World Tokyo 2014
The crusade for big data in the AAL domain
How to do Keyword Research: 7 Techniques & Tools
Big data debunking some of the myths
Julia + R for Data Science
Continuous Integration - NoVA CodeCamp 2014-10-11
MLconf NYC Josh Wills
DevSecCon Singapore 2018 - Maginot Line – 6 Common AppSec Anti-Patterns Preve...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Ad

Viewers also liked (16)

PDF
Improving Data Interoperability for Python and R
PDF
Scaling Data Science at Airbnb
PPTX
Inside the R Consortium
PDF
The Political Impact of Social Penumbras
PDF
Reflection on the Data Science Profession in NYC
PDF
One Algorithm to Rule Them All: How to Automate Statistical Computation
PDF
Analyzing NYC Transit Data
PDF
The Feels
PDF
Broom: Converting Statistical Models to Tidy Data Frames
PDF
Data Science Challenges in Personal Program Analysis
PDF
R Packages for Time-Varying Networks and Extremal Dependence
PDF
I Don't Want to Be a Dummy! Encoding Predictors for Trees
PDF
R for Everything
PDF
Thinking Small About Big Data
PDF
Iterating over statistical models: NCAA tournament edition
PDF
Using R at NYT Graphics
Improving Data Interoperability for Python and R
Scaling Data Science at Airbnb
Inside the R Consortium
The Political Impact of Social Penumbras
Reflection on the Data Science Profession in NYC
One Algorithm to Rule Them All: How to Automate Statistical Computation
Analyzing NYC Transit Data
The Feels
Broom: Converting Statistical Models to Tidy Data Frames
Data Science Challenges in Personal Program Analysis
R Packages for Time-Varying Networks and Extremal Dependence
I Don't Want to Be a Dummy! Encoding Predictors for Trees
R for Everything
Thinking Small About Big Data
Iterating over statistical models: NCAA tournament edition
Using R at NYT Graphics
Ad

Similar to A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup (20)

PPTX
The use of R statistical package in controlled infrastructure. The case of Cl...
PDF
Data mining with Rattle For R
PDF
Introduction To R
PPTX
Reason To learn & use r
PDF
GNU R in Clinical Research and Evidence-Based Medicine
PDF
MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
PDF
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
PDF
Executive Intro to R
PDF
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
PDF
Introduction to the R Programming Language
PPTX
Are You Ready for Big Data Big Analytics?
PPTX
Revolution Analytics Podcast
PDF
R and Big Data using Revolution R Enterprise with Hadoop
PDF
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
PDF
R Vs Python – The most trending debate of aspiring Data Scientists
PPTX
A Step Towards Reproducibility in R
PDF
Overview of Python and R Features .
PDF
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
PPT
An introduction to R is a document useful
PPTX
BIG DATA ANALYTICS USING R
The use of R statistical package in controlled infrastructure. The case of Cl...
Data mining with Rattle For R
Introduction To R
Reason To learn & use r
GNU R in Clinical Research and Evidence-Based Medicine
MongoDB_Talk_ValidatingAnOpenSociety_112916_Final
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Executive Intro to R
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Introduction to the R Programming Language
Are You Ready for Big Data Big Analytics?
Revolution Analytics Podcast
R and Big Data using Revolution R Enterprise with Hadoop
Learnings from Developing a New B2B SaaS Product (Suryaveer Lodha (Sunny) Pro...
R Vs Python – The most trending debate of aspiring Data Scientists
A Step Towards Reproducibility in R
Overview of Python and R Features .
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
An introduction to R is a document useful
BIG DATA ANALYTICS USING R

More from Work-Bench (8)

PDF
2017 Enterprise Almanac
PDF
AI to Enable Next Generation of People Managers
PDF
Startup Recruiting Workbook: Sourcing and Interview Process
PDF
Cloud Native Infrastructure Management Solutions Compared
PPTX
Building a Demand Generation Machine at MongoDB
PPTX
How to Market Your Startup to the Enterprise
PDF
Marketing & Design for the Enterprise
PDF
Playing the Marketing Long Game
2017 Enterprise Almanac
AI to Enable Next Generation of People Managers
Startup Recruiting Workbook: Sourcing and Interview Process
Cloud Native Infrastructure Management Solutions Compared
Building a Demand Generation Machine at MongoDB
How to Market Your Startup to the Enterprise
Marketing & Design for the Enterprise
Playing the Marketing Long Game

Recently uploaded (20)

PPTX
IMPACT OF LANDSLIDE.....................
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Microsoft 365 products and services descrption
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Introduction to Inferential Statistics.pptx
PPTX
Business_Capability_Map_Collection__pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Managing Community Partner Relationships
PPTX
modul_python (1).pptx for professional and student
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Introduction to Data Science and Data Analysis
PDF
Transcultural that can help you someday.
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IMPACT OF LANDSLIDE.....................
CYBER SECURITY the Next Warefare Tactics
SAP 2 completion done . PRESENTATION.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Microsoft 365 products and services descrption
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Introduction to Inferential Statistics.pptx
Business_Capability_Map_Collection__pptx
A Complete Guide to Streamlining Business Processes
Managing Community Partner Relationships
modul_python (1).pptx for professional and student
Qualitative Qantitative and Mixed Methods.pptx
Business Analytics and business intelligence.pdf
Introduction to Data Science and Data Analysis
Transcultural that can help you someday.
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup

  • 1. A Statistician Walks into a Tech Company R at a rapidly scaling healthcare technology startup Sandy Griffith Twitter: @sgrifter sgriffith@flatiron.com www.flatiron.com
  • 2. My story Academic biostatistics © 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 4. © 2016 Flatiron Health, Inc. Proprietary and confidential. 4 Flatiron’s mission is to serve cancer patients and our partners by dramatically improving treatment and accelerating research. Our Mission
  • 5. Flatiron Processes EHR Data At Scale © 2016 Flatiron Health, Inc. Proprietary and confidential. 5 Research- Grade Data Demographics Diagnosis Visits Labs e-Prescribing Pathology Report Discharge Notes Radiology Report Physician Notes Electronic Health Record Structured Data Unstructured Data Outside Practice Hospital Lab Structured Data Processing Unstructured Data Processing Standard EHR Data
  • 6. Rapidly Scaling January 2015 Flatiron: ~140 Software Engineers: ~50 Quantitative Sciences team: 1 6© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 7. Now: We are a team of 262 7 We include… All Flatiron data and tools are collaboratively built, implemented and maintained by a cross-disciplinary team that includes oncology, engineering, and quantitative sciences We come from… 9 Medical oncologists and nurses 70 Software engineers 10 Quantitative scientists 5 Medical informaticists + more! © 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 8. Primary Language: time of hire © 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 9. Proficiency with R: time of hire 9© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 10. A decision point early on 10© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 11. A decision point early on 11© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 12. Cultivate R culture 1. Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 12© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 13. Cultivate R culture 1. Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 13© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 14. Proficiency with R 14© 2016 Flatiron Health, Inc. Proprietary and confidential. Time of hire Now
  • 15. Now we have R users, but when should we use R? Three scenarios: 1. R for prototyping → !R in production 2. R as a long-term solution 3. R and !R in parallel 15© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 16. R for prototyping → !R in production 16© 2016 Flatiron Health, Inc. Proprietary and confidential. Prototype ● One-time linkage ● Small cohort (10s of thousands) ● RecordLinkage R package ● Probabilistic linkage method using EM algorithm Production ● Repeated daily at scale ● Large cohort (~5 million patients) ● Code maintained by different team ● Deterministic logic in SQL Example: Linking external mortality data
  • 17. R for prototyping → !R in production Why this made sense: ● Stable method -- No longer needed rapid iteration ● Tuning parameters ● Similar performance, more transparency ● No R users on team that would be maintaining code 17© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Linking external mortality data
  • 18. R as a long-term solution Early version (Jan 2015) 18© 2016 Flatiron Health, Inc. Proprietary and confidential. ● bash commands for extracting data run from R script using ETL tool ● R script run via command line ● parameters in metafiles manually updated ● Runs a series of Rmd files and renders HTML output Current Version (April 2016) Example: Rmarkdown QA report ● linked to data pipeline maintained by software engineering ● metafile generated dynamically ● Plotly survival curves ● Flatly bootstrap theme ● Plan to continue using R indefinitely
  • 19. R as a long-term solution 19© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Rmarkdown QA report Why this made sense: ● Mature product and team ● Quantitative science members remain embedded in team ● Strong support and collaboration with software engineering ● Requirements are dynamic -- continued need for rapid prototyping
  • 20. R and !R in parallel ● Specific research questions ● 2 people code independently in Python/SQL and R ● Compare results ● Language sometimes incidental, more about 2 different perspectives Why this made sense: ● High stakes or low error tolerance ● Complicated concepts ● Custom projects often involve novel problems 20© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Some external collaborations
  • 21. Thank you ● Melissa Curtis ● Josh Kraut ● Kathi Seidl-Rathkopf ● Cindy Revol ● Rachael Sorg ● Jay Rughani 21© 2016 Flatiron Health, Inc. Proprietary and confidential. ● Paul You ● Aracelis Torres ● Alphan Kirayoglu ● Ben Birnbaum ● Ann Jaskiw ● James Gippetti Join our Team! Drop me a note at sgriffith@flatiron.com, @sgrifter, or visit flatiron.com/careers