SlideShare a Scribd company logo
AGILE DATA
Christopher Bergh
Head Chef,
DataKitchen
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
3K I T C H E N
DATA
Algorithm Nerd
Columbia, MIT, NASA-
Ames; ATC Automation
Into In 1990
Fuzzy Logic, Neural
Networks, Constraint
Satisfaction; Unix/C
Software Nerd
CTO, Dir Engineering, VP
Product Management
Into In 2000
Management of
Software Teams &
Startups; PowerPoint
Data Nerd
COO: ETL Engineers,
Analysts & Analytic Tool
Into In 2010
W. Edwards Deming,
Data, Bootstrapping;
Excel Hacking
WHO AM I
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
SO WHAT IS THE PROBLEM?
In one word ….
LOTSA Technologies in Analytics
LOTSA People In Analytic Teams
DATA SCIENTIST
REPORTING ANALYST
ETL ENGINEER
DATABASE ARCHITECT
DEV OPS ENGINEERData Governance
LOTSA Data & Analysis
ONE OFF
RE
USE
LOTSA Missed Expectations
Analyze
Prepare Data
C
Analyze
Prepare Data
Business Customer Expectation Analyst Reality
Communicate The business does not
think that Analysts are
preparing data
Analysts don’t want to
prepare data
Complexity
Another Field, Software Development, Ran into
the Same Problems With Complexity ...
… They Used Something Called
‘Agile’ To Solve The Problem
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
AGILEMANIFESTO.ORG
5/31/2015 12
AGILEMANIFESTO.ORG
AGILEMANIFESTO.ORG
13
analytics
s/software/analytics/
PRACTICES THAT ARE EASY TO APPLY
 Development Sprints
 User Stories
 Daily Meetings
 Defined Roles
 Retrospectives
 Pair Programming
 Burn Down Charts
SOME PRACTICES HAVE BEEN DIFFICULT TO APPLY
 Test Driven Development
 Branching And Merging
 Refactoring
 Small Releases
 Frequent Or Continuous Integration
 Experimentation For Learning
 Individual Development Environments
AGILE – WHAT IS UNIQUE TO ANALYTICS?
17
PUT THE
ANALYST AT
THE CENTER
AGILE – WHAT IS UNIQUE TO ANALYTICS?
ANALYICS
PERCIEVED
VALUE DECAY
CURVE
AGENDA
Who Am I?
What Is The Problem?
A Look At Agile Through Data Lens
How To Do Agile Data In Five Shocking Steps
Why? Your work is just code: models, transforms, etc.
Use a source code control system (like GIT) to enable:
Branching
Merging
Diff
5/31/2015 20
1. MANAGE YOUR WORK LIKE CODE
2. TEST AND CONTAIN
1. Create and monitor tests
2. Test on separate data from production
3. Run tests early and often
4. Target 20% of code for tests
5/31/2015 21
Unit Tests & Systems Test … Keep Adding & Improving
1. Break up you work into components
2. Manage the environment for each
component (e.g. Docker, AMI)
3. Practice Environment Version
Control
3. PROVIDE SEPARATE ENVIRONMENTS FOR ANALYSTS
Why?
Analysts need
their data the
data to iterate,
develop &
explore.
5/31/2015 22
4. SUPPORT THREE TYPES OF WORKFLOWS
Small Team
Work directly on production
Feature Branch
Merge back to production branch
Data Governance
3rd party verification before production
merge
5/31/2015 23
Review
Test
Approve
5. GIVE ANALYSTS ABILITY TO EDIT DATABASE SAFELY
5/31/2015 24
Best-in-class companies take 12 days
to integrate new data sources into
their analytical systems; industry
average companies take 60 days;
and, laggards average 143 days
Source: Aberdeen Group: Data Management for BI: Fueling the analytical engine with high-octane information
Figure out how to
do this in
minutes
CONCLUSION
CONCLUSION
AGILE DATA Christopher Bergh
cbergh@datakitchen.io
Questions?
Comments?
BOSTON 2015
@opendatasci

More Related Content

PDF
Conversion Hotel 2018 Keynote: Aleksander Fabijan
PDF
Agile Data Science
PDF
Problem Prediction Model with Changes and Incidents
PPTX
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
PPTX
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
PDF
Data Science at Shopify (PyCon 2018)
PDF
What is Regression Testing?
PDF
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
Conversion Hotel 2018 Keynote: Aleksander Fabijan
Agile Data Science
Problem Prediction Model with Changes and Incidents
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
Data Science at Shopify (PyCon 2018)
What is Regression Testing?
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall

What's hot (14)

PPTX
Continuous Intelligence Workshop
DOCX
Questions About Software Testing
PDF
Lecture11 ie321 dr_atifshahzad -security
PPTX
OSS Java Analysis - What You Might Be Missing
PDF
Testing the Intelligence of your AI
PPT
Nii shonan-meeting-gsrm-20141021 - コピー
PPTX
Software testing using genetic algorithms
PPTX
Preventing Information Flow with Jeeves - Singapore Data Privacy Workshop
PDF
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
PDF
Are We Secure? Answering the Unanswerable
PPTX
Dfmw Spreadsheet Errors Presentation Jake Carney
PDF
Building Scalable Prediction Services in R
PPTX
5 Steps to Defend from Targeted Attacks with Security Integration
PDF
Framework-less Applications
Continuous Intelligence Workshop
Questions About Software Testing
Lecture11 ie321 dr_atifshahzad -security
OSS Java Analysis - What You Might Be Missing
Testing the Intelligence of your AI
Nii shonan-meeting-gsrm-20141021 - コピー
Software testing using genetic algorithms
Preventing Information Flow with Jeeves - Singapore Data Privacy Workshop
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
Are We Secure? Answering the Unanswerable
Dfmw Spreadsheet Errors Presentation Jake Carney
Building Scalable Prediction Services in R
5 Steps to Defend from Targeted Attacks with Security Integration
Framework-less Applications
Ad

Viewers also liked (20)

PPTX
Agile Curation: 2015 AGU Presentation
PPTX
Agile Data Governance
PPTX
Agile Data Management & Integration
PDF
Tdwi agile data warehouse - dv, what is the buzz about
PDF
Agile Data Strategy and Lean Execution
PPTX
Agile Data Governance Tutorial
PPTX
MDM & BI Strategy For Large Enterprises
PDF
Real-World Data Governance Webinar: Agile and Data Governance - Bridging the Gap
PDF
RWDG Webinar: Agile Data Governance - How to Apply Governance to Agile
PPTX
Implementing Agile Data Governance
PDF
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
PDF
The Business Value of Metadata for Data Governance
PDF
Data-Ed Webinar: Data Governance Strategies
PDF
The Future of Enterprise IT: DevOps and Data Lifecycle Management
PPT
Real-World Data Governance: Master Data Management & Data Governance
PDF
Agile Data Warehouse Design for Big Data Presentation
PDF
Data Governance - Atlas 7.12.2015
PDF
RWDG Slides: Using Agile to Justify Data Governance
PPTX
Data Governance
PDF
Agile Data Science 2.0
Agile Curation: 2015 AGU Presentation
Agile Data Governance
Agile Data Management & Integration
Tdwi agile data warehouse - dv, what is the buzz about
Agile Data Strategy and Lean Execution
Agile Data Governance Tutorial
MDM & BI Strategy For Large Enterprises
Real-World Data Governance Webinar: Agile and Data Governance - Bridging the Gap
RWDG Webinar: Agile Data Governance - How to Apply Governance to Agile
Implementing Agile Data Governance
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
The Business Value of Metadata for Data Governance
Data-Ed Webinar: Data Governance Strategies
The Future of Enterprise IT: DevOps and Data Lifecycle Management
Real-World Data Governance: Master Data Management & Data Governance
Agile Data Warehouse Design for Big Data Presentation
Data Governance - Atlas 7.12.2015
RWDG Slides: Using Agile to Justify Data Governance
Data Governance
Agile Data Science 2.0
Ad

Similar to Agile Data (20)

PDF
Data kitchen 7 agile steps - big data fest 9-18-2015
PDF
#rstats lessons for #measure
PPTX
Make data simple in the cognitive era
PPTX
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
PDF
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
PPTX
Data science tools of the trade
PDF
Empirical evaluation in 2020: how big, how beautiful?
PDF
ChatGPT and Beyond - Elevating DevOps Productivity
PDF
Bridging the Gap: Analyzing Data in and Below the Cloud
ODP
Desmistificando Tecnologias
PDF
The 3 Key Barriers Keeping Companies from Deploying Data Products
PDF
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
PDF
7 Dimensions of Agile Analytics by Ken Collier
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PPTX
Washington DC DataOps Meetup -- Nov 2019
PDF
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
PDF
Intro of Key Features of SoftCAAT Pro software
PDF
DevOps for the Discouraged
PDF
Belladati Meetup Singapore Workshop
PDF
AE Foyer: Information Management in the Digital Enterprise
Data kitchen 7 agile steps - big data fest 9-18-2015
#rstats lessons for #measure
Make data simple in the cognitive era
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
Data science tools of the trade
Empirical evaluation in 2020: how big, how beautiful?
ChatGPT and Beyond - Elevating DevOps Productivity
Bridging the Gap: Analyzing Data in and Below the Cloud
Desmistificando Tecnologias
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
7 Dimensions of Agile Analytics by Ken Collier
MLOps and Data Quality: Deploying Reliable ML Models in Production
Washington DC DataOps Meetup -- Nov 2019
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Intro of Key Features of SoftCAAT Pro software
DevOps for the Discouraged
Belladati Meetup Singapore Workshop
AE Foyer: Information Management in the Digital Enterprise

More from odsc (20)

PPT
Understanding the Chief Data Officer
PPTX
Machine-In-The-Loop for Knowledge Discovery
PPT
API Driven Development
PPTX
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
PPTX
Productionizing Deep Learning From the Ground Up
PPT
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
PPTX
Think Breadth, Not Depth
PPT
Data Science at Dow Jones: Monetizing Data, News and Information
PDF
Spark, Python and Parquet
PPTX
Building a Predictive Analytics Solution with Azure ML
PPT
Beyond Names
PPT
How Woman are Conquering the S&P 500
PPTX
Domain Expertise and Unstructured Data
PPTX
Kaggle The Home of Data Science
PPT
Open Source Tools & Data Science Competitions
PPT
Machine Learning with scikit-learn
PPT
Bridging the Gap Between Data and Insight using Open-Source Tools
PDF
Top 10 Signs of the Textpocalypse
PPTX
The Art of Data Science
PPTX
Frontiers of Open Data Science Research
Understanding the Chief Data Officer
Machine-In-The-Loop for Knowledge Discovery
API Driven Development
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Productionizing Deep Learning From the Ground Up
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Think Breadth, Not Depth
Data Science at Dow Jones: Monetizing Data, News and Information
Spark, Python and Parquet
Building a Predictive Analytics Solution with Azure ML
Beyond Names
How Woman are Conquering the S&P 500
Domain Expertise and Unstructured Data
Kaggle The Home of Data Science
Open Source Tools & Data Science Competitions
Machine Learning with scikit-learn
Bridging the Gap Between Data and Insight using Open-Source Tools
Top 10 Signs of the Textpocalypse
The Art of Data Science
Frontiers of Open Data Science Research

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Agile Data

  • 1. AGILE DATA Christopher Bergh Head Chef, DataKitchen O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. AGENDA Who Am I? What Is The Problem? A Look At Agile Through Data Lens How To Do Agile Data In Five Shocking Steps
  • 3. 3K I T C H E N DATA Algorithm Nerd Columbia, MIT, NASA- Ames; ATC Automation Into In 1990 Fuzzy Logic, Neural Networks, Constraint Satisfaction; Unix/C Software Nerd CTO, Dir Engineering, VP Product Management Into In 2000 Management of Software Teams & Startups; PowerPoint Data Nerd COO: ETL Engineers, Analysts & Analytic Tool Into In 2010 W. Edwards Deming, Data, Bootstrapping; Excel Hacking WHO AM I
  • 4. AGENDA Who Am I? What Is The Problem? A Look At Agile Through Data Lens How To Do Agile Data In Five Shocking Steps
  • 5. SO WHAT IS THE PROBLEM? In one word ….
  • 7. LOTSA People In Analytic Teams DATA SCIENTIST REPORTING ANALYST ETL ENGINEER DATABASE ARCHITECT DEV OPS ENGINEERData Governance
  • 8. LOTSA Data & Analysis ONE OFF RE USE
  • 9. LOTSA Missed Expectations Analyze Prepare Data C Analyze Prepare Data Business Customer Expectation Analyst Reality Communicate The business does not think that Analysts are preparing data Analysts don’t want to prepare data
  • 10. Complexity Another Field, Software Development, Ran into the Same Problems With Complexity ... … They Used Something Called ‘Agile’ To Solve The Problem
  • 11. AGENDA Who Am I? What Is The Problem? A Look At Agile Through Data Lens How To Do Agile Data In Five Shocking Steps
  • 15. PRACTICES THAT ARE EASY TO APPLY  Development Sprints  User Stories  Daily Meetings  Defined Roles  Retrospectives  Pair Programming  Burn Down Charts
  • 16. SOME PRACTICES HAVE BEEN DIFFICULT TO APPLY  Test Driven Development  Branching And Merging  Refactoring  Small Releases  Frequent Or Continuous Integration  Experimentation For Learning  Individual Development Environments
  • 17. AGILE – WHAT IS UNIQUE TO ANALYTICS? 17 PUT THE ANALYST AT THE CENTER
  • 18. AGILE – WHAT IS UNIQUE TO ANALYTICS? ANALYICS PERCIEVED VALUE DECAY CURVE
  • 19. AGENDA Who Am I? What Is The Problem? A Look At Agile Through Data Lens How To Do Agile Data In Five Shocking Steps
  • 20. Why? Your work is just code: models, transforms, etc. Use a source code control system (like GIT) to enable: Branching Merging Diff 5/31/2015 20 1. MANAGE YOUR WORK LIKE CODE
  • 21. 2. TEST AND CONTAIN 1. Create and monitor tests 2. Test on separate data from production 3. Run tests early and often 4. Target 20% of code for tests 5/31/2015 21 Unit Tests & Systems Test … Keep Adding & Improving 1. Break up you work into components 2. Manage the environment for each component (e.g. Docker, AMI) 3. Practice Environment Version Control
  • 22. 3. PROVIDE SEPARATE ENVIRONMENTS FOR ANALYSTS Why? Analysts need their data the data to iterate, develop & explore. 5/31/2015 22
  • 23. 4. SUPPORT THREE TYPES OF WORKFLOWS Small Team Work directly on production Feature Branch Merge back to production branch Data Governance 3rd party verification before production merge 5/31/2015 23 Review Test Approve
  • 24. 5. GIVE ANALYSTS ABILITY TO EDIT DATABASE SAFELY 5/31/2015 24 Best-in-class companies take 12 days to integrate new data sources into their analytical systems; industry average companies take 60 days; and, laggards average 143 days Source: Aberdeen Group: Data Management for BI: Fueling the analytical engine with high-octane information Figure out how to do this in minutes
  • 27. AGILE DATA Christopher Bergh cbergh@datakitchen.io Questions? Comments? BOSTON 2015 @opendatasci