SlideShare a Scribd company logo
Agile Data Science
INFORMS BIG DATA CONFERENCE
6/23/2013
!
Presented by Joel S Horwitz
Follow me @JSHorwitz
Alpine Data Labs
Agile Manifesto
We are uncovering better ways of developing software by doing it and
helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
We are uncovering better ways of developing software models by doing it
and helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software models over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
Agile Manifesto
Business Analytics Technologist
Linear workflows in non-agile culture
????
Business Owner
Business Analyst
Data Owner
Data Scientist
Business Stakeholders
< / >
001011
= 5
= 10
Agile is about continuous interactions
Analytics
• Feature Creation
• Model / Scoring
• Evaluation
Technologist
• Wrangling
• Interpretation
• Pipelines
Business
• Presentation
• Deployment
• Productize
Minimally Viable Data Products (MVDP)
crossfilter.js or D3.JS
Tableau
Dashboards
Product
Recommendations
Also, Product
Recommendations
People
Recommendations
One model, many use cases.
Agile Data Science Feedback Loop
Instrumentation
/ Data Collection
Analysis &
Design
Results &
Interpretation
Cultivating
Data Intuition
What do you need?
1. Business Champion
2.Integrated Environment
3. Analytics Ninjas
1. Business Champion(s): Defining the Problem
Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals.
Chief Technology Officer
SVP of Product
Chief Marketing Officer
VP of Sales
Problem statement… Monthly active user count
growth was stagnating. Evidence of where to look…
acquisition funnel, user engagement, and customer loyalty.
Question to answer… How
do I connect each part of
the acquisition funnel?
We started with the acquisition funnel… web visits, downloads, installs, and activations.
2. Environment: Before
• Many data silos and technology limitations
• Data definitions not well defined.
• Multiple data formats.
• No place to build MVDPs at scale.
2. Environment: After
Open Source Technologies
Flume, Sqoop, Oozie, and others
Analytics Sandbox
Relational
Database
Minimally Viable Data Products
Selectively include data
that is Relevant
How was our analytics platform deployed and
connected to the network of systems.
First there was FTP… dump raw log files from web analytics, content delivery network, and
application log files.
!
Second there was MS SQL… Most of the data was in Microsoft SQL databases.
!
Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop.
!
Now there is web based analytics…
What is Hadoop?
• Overview: Apache Hadoop is a framework for running applications on large cluster built of
commodity hardware.
• Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by
Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on
compute nodes throughout a cluster to enable reliable, extremely rapid computations.
• Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a
machine learning system.
3. Analytics Ninjas
Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller
Curiosity
Willingness to learn
Resourceful
Risk Takers
!
Schedule Standups… daily is best depending on the project it can vary
Take notes!
Open dialogue (peer review)
Share knowledge
!
Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success
Create a Wiki
Work backwards from the end goal to the analysis to the data
Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
What is Analytics?
Analytics is the application of computer technology, operational research, and statistics to solve problems in business
and industry.
!
Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net
value, and many other factors.
!
Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data
from our connected world.
!
Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand
database management tools
!
McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per
year and the European public sector €250 billion.
Analytics Example: Platform Monetization
Search Lifetime Value
• 3rd Party Distribution
• SEM
• Organic
Traffic Quality Analysis
• PPC
• PPD
• PPI
• PPA
1. Business Champion
• Sales & Product
2.Integrated Environment
• Web analytics / app data and SQL
Database.
3. Analytics Ninjas
• Search engine marketers, business
analysts, and statisticians.
Examples of Analytics: Campaign Management
1. Business Champion
• Sales, Product, Marketing, and Project
Managers.
2.Integrated Environment
• Data Warehouse, SQL DB, Hadoop, and
Tableau
3. Analytics Ninjas
• Web analytics, product managers,
business analysts, and business
intelligence.
Examples of Analytics: Customer Sentiment
Keywords by Ratings
1. Business Champion
• Product and Marketing
2.Integrated Environment
• Mobile analytics, app logfiles,
and Hadoop.
3. Analytics Ninjas
• Web analytics, product
managers, business analysts,
mobile developers.
What to avoid
Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook
FQL, others?).
Analysis lifecycle…
1. Give me data… import data and ETL it into submission.
2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever
3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)...
known knowns = data puking
(dashboards)
Interestingness rocks! Tell me where to look (I’m
feeling lucky…)
get people to "make love to data" to make
actual good use of it
Viva la revolucion! Data democracy!
Thank you! Any Questions?
Want to jump start your Agile Data Science project? Head over to
http://start.alpinenow.com
Follow me on @JSHorwitz

More Related Content

PDF
Agile Data Science
PDF
Revised blooms taxonomy action verbs
PPTX
What is Power BI
PDF
Product Design and Development
PDF
Tableau Customer Presentation
PDF
Information Retrieval Models for Recommender Systems - PhD slides
PPTX
What is Zotero 5.0?
PPTX
Unit-1 Introduction and Mathematical Preliminaries.pptx
Agile Data Science
Revised blooms taxonomy action verbs
What is Power BI
Product Design and Development
Tableau Customer Presentation
Information Retrieval Models for Recommender Systems - PhD slides
What is Zotero 5.0?
Unit-1 Introduction and Mathematical Preliminaries.pptx

What's hot (13)

PPTX
NBA : CO PO Mapping
PPTX
Planning
PPTX
Energy_Harvesting_by_Piezo_Electricity
PPT
Bachelor\'s Thesis Presentation
PDF
Research and Development
PDF
Lecture 01 - Research Methods
PPT
COMPUTERIZED LAYOUT METHODS ,CRAFT , ALDEP
PPTX
Data Visualization - A Brief Overview
PPTX
National board of accreditation (NBA)
PPTX
Tableau Visual analytics complete deck 2
PPTX
Product Design and Development
PPTX
Big Data Testing: Ensuring MongoDB Data Quality
PDF
IISc offer letter
NBA : CO PO Mapping
Planning
Energy_Harvesting_by_Piezo_Electricity
Bachelor\'s Thesis Presentation
Research and Development
Lecture 01 - Research Methods
COMPUTERIZED LAYOUT METHODS ,CRAFT , ALDEP
Data Visualization - A Brief Overview
National board of accreditation (NBA)
Tableau Visual analytics complete deck 2
Product Design and Development
Big Data Testing: Ensuring MongoDB Data Quality
IISc offer letter
Ad

Viewers also liked (12)

PPT
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
PDF
Key performance indicators in professional service firms
PPTX
Target Operating Model Research
PPT
Target Operating Model Definition
PDF
CRISP-DM: a data science project methodology
PDF
Agile Data Science 2.0
PDF
Crisp dm
PPTX
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
PDF
Crisp-DM
PDF
CRISP-DM - Agile Approach To Data Mining Projects
PPTX
Operating Model
PDF
8 Steps to Creating a Data Strategy
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Key performance indicators in professional service firms
Target Operating Model Research
Target Operating Model Definition
CRISP-DM: a data science project methodology
Agile Data Science 2.0
Crisp dm
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Crisp-DM
CRISP-DM - Agile Approach To Data Mining Projects
Operating Model
8 Steps to Creating a Data Strategy
Ad

Similar to Agile data science (20)

PPTX
Unit-I_Big data life cycle.pptx, sources of Big Data
PDF
How Celtra Optimizes its Advertising Platform with Databricks
PDF
Understanding What’s Possible: Getting Business Value from Big Data Quickly
PPTX
Liberating data power of APIs
PDF
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
PDF
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
PDF
Open Data Canvas 0.1
PDF
The Right Data Warehouse: Automation Now, Business Value Thereafter
PPTX
Data science in business Administration Nagarajan.pptx
PPTX
Exploring the impact and evolution of Advanced Analytics Tools.pptx
PPTX
Proposed Talk Outline for Pycon2017
PDF
The Analytic Platform: Empowering the Business Now
PPTX
The Python ecosystem for data science - Landscape Overview
PDF
Datapedia Analysis Report
PPTX
Knowledge Extraction from Social Media
PDF
Keyrus US Information
PDF
Keyrus US Information
PPTX
data science and business analytics
PPTX
semana1.pptx
PPTX
Morden EcoSystem.pptx
Unit-I_Big data life cycle.pptx, sources of Big Data
How Celtra Optimizes its Advertising Platform with Databricks
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Liberating data power of APIs
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Open Data Canvas 0.1
The Right Data Warehouse: Automation Now, Business Value Thereafter
Data science in business Administration Nagarajan.pptx
Exploring the impact and evolution of Advanced Analytics Tools.pptx
Proposed Talk Outline for Pycon2017
The Analytic Platform: Empowering the Business Now
The Python ecosystem for data science - Landscape Overview
Datapedia Analysis Report
Knowledge Extraction from Social Media
Keyrus US Information
Keyrus US Information
data science and business analytics
semana1.pptx
Morden EcoSystem.pptx

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Logistic Regression ml machine learning.pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PDF
Lecture1 pattern recognition............
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Global journeys: estimating international migration
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Foundation of Data Science unit number two notes
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Reliability_Chapter_ presentation 1221.5784
Logistic Regression ml machine learning.pptx
A Quantitative-WPS Office.pptx research study
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction to Knowledge Engineering Part 1
Business Ppt On Nestle.pptx huunnnhhgfvu
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
Lecture1 pattern recognition............
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Global journeys: estimating international migration
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Foundation of Data Science unit number two notes
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Moving the Public Sector (Government) to a Digital Adoption
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Agile data science

  • 1. Agile Data Science INFORMS BIG DATA CONFERENCE 6/23/2013 ! Presented by Joel S Horwitz Follow me @JSHorwitz Alpine Data Labs
  • 2. Agile Manifesto We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more.
  • 3. We are uncovering better ways of developing software models by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software models over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more. Agile Manifesto
  • 4. Business Analytics Technologist Linear workflows in non-agile culture ???? Business Owner Business Analyst Data Owner Data Scientist Business Stakeholders < / > 001011 = 5 = 10
  • 5. Agile is about continuous interactions Analytics • Feature Creation • Model / Scoring • Evaluation Technologist • Wrangling • Interpretation • Pipelines Business • Presentation • Deployment • Productize
  • 6. Minimally Viable Data Products (MVDP) crossfilter.js or D3.JS Tableau Dashboards Product Recommendations Also, Product Recommendations People Recommendations One model, many use cases.
  • 7. Agile Data Science Feedback Loop Instrumentation / Data Collection Analysis & Design Results & Interpretation Cultivating Data Intuition
  • 8. What do you need? 1. Business Champion 2.Integrated Environment 3. Analytics Ninjas
  • 9. 1. Business Champion(s): Defining the Problem Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals. Chief Technology Officer SVP of Product Chief Marketing Officer VP of Sales Problem statement… Monthly active user count growth was stagnating. Evidence of where to look… acquisition funnel, user engagement, and customer loyalty. Question to answer… How do I connect each part of the acquisition funnel? We started with the acquisition funnel… web visits, downloads, installs, and activations.
  • 10. 2. Environment: Before • Many data silos and technology limitations • Data definitions not well defined. • Multiple data formats. • No place to build MVDPs at scale.
  • 11. 2. Environment: After Open Source Technologies Flume, Sqoop, Oozie, and others Analytics Sandbox Relational Database Minimally Viable Data Products Selectively include data that is Relevant
  • 12. How was our analytics platform deployed and connected to the network of systems. First there was FTP… dump raw log files from web analytics, content delivery network, and application log files. ! Second there was MS SQL… Most of the data was in Microsoft SQL databases. ! Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop. ! Now there is web based analytics…
  • 13. What is Hadoop? • Overview: Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. • Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. • Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a machine learning system.
  • 14. 3. Analytics Ninjas Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller Curiosity Willingness to learn Resourceful Risk Takers ! Schedule Standups… daily is best depending on the project it can vary Take notes! Open dialogue (peer review) Share knowledge ! Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success Create a Wiki Work backwards from the end goal to the analysis to the data Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
  • 15. What is Analytics? Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry. ! Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net value, and many other factors. ! Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data from our connected world. ! Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand database management tools ! McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per year and the European public sector €250 billion.
  • 16. Analytics Example: Platform Monetization Search Lifetime Value • 3rd Party Distribution • SEM • Organic Traffic Quality Analysis • PPC • PPD • PPI • PPA 1. Business Champion • Sales & Product 2.Integrated Environment • Web analytics / app data and SQL Database. 3. Analytics Ninjas • Search engine marketers, business analysts, and statisticians.
  • 17. Examples of Analytics: Campaign Management 1. Business Champion • Sales, Product, Marketing, and Project Managers. 2.Integrated Environment • Data Warehouse, SQL DB, Hadoop, and Tableau 3. Analytics Ninjas • Web analytics, product managers, business analysts, and business intelligence.
  • 18. Examples of Analytics: Customer Sentiment Keywords by Ratings 1. Business Champion • Product and Marketing 2.Integrated Environment • Mobile analytics, app logfiles, and Hadoop. 3. Analytics Ninjas • Web analytics, product managers, business analysts, mobile developers.
  • 19. What to avoid Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook FQL, others?). Analysis lifecycle… 1. Give me data… import data and ETL it into submission. 2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever 3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)... known knowns = data puking (dashboards) Interestingness rocks! Tell me where to look (I’m feeling lucky…) get people to "make love to data" to make actual good use of it Viva la revolucion! Data democracy!
  • 20. Thank you! Any Questions? Want to jump start your Agile Data Science project? Head over to http://start.alpinenow.com Follow me on @JSHorwitz