SlideShare a Scribd company logo
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku
x 54 +
x 1+
+
58
++
It’s Me !!
It’s our software !!
…and our software is
The most complete Data Science platform
Deployment
Dataiku - Data Tuesday
Meet Hal Alowne
Big Guys
• 10B$+ Revenue
• 100M+ customers
• 100+ Data Scientist
Hal Alowne
BI Manager
Dim’s Private Showroom
Hey Hal ! We need
a big data platform
like the big guys.
Let’s just do as they do!
‟
”Average E-commerce Web site
• 100M$ Revenue
• 1 Million customer
• 1 Data Analyst (Hal Himself)
Dim Sum
CEO & Founder
Dim’s Private Showroom
Big Data
Copy Cat
Project
Technology Disconnect
5
Welcome to Technoslavia !
LOL PLATFORM ANTI-PATTERN
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Cluster
is perceived as slow, not so used
and not reliable
TECHNO MISMATCH ANTI-PATTERN
Assume Being Polyglot
or
Be a Dictator
VS
VS
The Python
Clan
The R
Tribe
The Old Elephant
Fraternity
The New Elephant
Club
PREDICTIVE ANALYTICS DEPLOYMENT
STRATEGY
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for
Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDITICTIVE MODELS”
IN PRODUCTION
PEOPLE DISCONNECT
10
Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
DBA / IT Data Owner
Specs
Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor Data Team Manager
Data Engineer
Data Analyst
Data System Engineer /
Data Architect
Specs
Data Scientist
Built From Scratch
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
DBA / IT Data Owner
Specs
DATA SCIENTISTS EVERYWHERE
Built From Engineering
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Specs
DATA ENGINEERS
DATA ANALYSTS
Built From Analysts
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Specs
Manage Expectations
Data
Plumberer
Data
Engineer
Data
Scientist
Data
Waiter
Data
Cleaner
Data
Analyst
REAL
JOB
DREAM
JOB
Perfectly Natural Hidden thoughts
Business Project
Sponsor
Data Team Manager
Data Engineer
Data Analyst
Data Scientist
Managing Extreme Personalities
Data SCIENTIST
Highly Creative
Passionate
Hard to hire ?
Hard to manage ?
Want to take
your job ?
Ambitious
Paired for Data
Data Analyst
Discover Patterns
Data Engineer
Make things work
Fight
data
entropy
Entropy
tech
entropy
When do you prefer ?
One Analyst
One Engineer
One Data Scientist
That work together ?
Four data scientists
Data Disconnect
21
What is the main reason for data project to fail ?
DATA
NOT
AVAILABLE
BUT FOR ONLY INCREMENTAL GAIN
50 30 20
0% 25% 50% 75% 100%
Contribution to the overall project performance
Business Goal Definition and Data Feature Engineering Algorithm
How to Get Data if you don’t have it
THE GRASSHOPER THE SPIDER THE FOX
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
The Cicada : Optimistic and Opportunistic Data
THE CICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
The Spider: Power of the Network
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
The Fox: Hunt for the Big Money first
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
29
PRODUCT DISCONNECT
What is Big Data about ?
The Age Of Distributed Intelligence
Global, Personalised
and Real Time Data
Driven Services
Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Automated Decision VIsualize To Decide
Moving to a world of automated decision making
Involve product team
Product Feature
Personalised Item Ranking
Product Feature
Notify User Only when Needed
Product Feature:
Historical Data For Path Optimisation
Have Product Management Deeply Involved
In the Data Team
Where is your added value ?
Is the problem at the Core of
my Business Process?
Is it a common problem / with
share data ?
Go for Best of
Breed SAAS
Solution
Can I Solve it on my own ?
Really ?
Build by the
data team
Build by the
data team ?
Build by the
data team
Hire
Consultants
and Learn
Yes
Yes No
I can’t Ok, I can try
Yes!
No!
No
Be aware of the confort zone
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
Reporting
for Finance
in Any Industry
Analyze
Each Tweet
Web Navigation
For E-Merchant
Ticket Data
For Discounts
in Retail
Phone Call
Logs for Security
RTB Data
For Advertising
Customer
Consumption
For Anti-Churn
in Utilities
Optimization
Filings
For Fraud
in Insurance
Not Enough
Data To Learn
From ?
Not Enough
“Hard" Examples
So that you can learn
Create an "API" Culture
Do not share
• Random Piece of Code
• Flat File
Do share
• Reproductible documented workflows
• Clean, documented APIs
Food for thoughts
www.dataiku.com/blog
Free Data Science Software
www.dataiku.com/dss
THANK YOU !
Data Science
Is no longer a science

More Related Content

PDF
How to Build Successful Data Team - Dataiku ?
PDF
Building Data Science Teams
 
PPTX
Dataiku r users group v2
PPTX
Online Games Analytics - Data Science for Fun
PDF
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
PDF
Back to Square One: Building a Data Science Team from Scratch
PPTX
Idiots guide to setting up a data science team
PDF
The 3 Key Barriers Keeping Companies from Deploying Data Products
How to Build Successful Data Team - Dataiku ?
Building Data Science Teams
 
Dataiku r users group v2
Online Games Analytics - Data Science for Fun
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Back to Square One: Building a Data Science Team from Scratch
Idiots guide to setting up a data science team
The 3 Key Barriers Keeping Companies from Deploying Data Products

What's hot (20)

PDF
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
PPTX
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
KEY
Eat whatever you can with PyBabe
PDF
Dataiku - google cloud platform roadshow - october 2013
PDF
The Rise of the CDO in Today's Enterprise
PPTX
Building Data Science Teams: A Moneyball Approach
PPTX
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
PPTX
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
PDF
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
PDF
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
PPTX
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
PPTX
Dataiku - From Big Data To Machine Learning
PPTX
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
PDF
Dataiku productive application to production - pap is may 2015
PDF
You're the New CDO, Now What?
PPTX
Using Machine Learning & Spark to Power Data-Driven Marketing
PDF
Walmart Big Data Expo
PDF
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
PDF
How to build a data science team 20115.03.13v6
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Eat whatever you can with PyBabe
Dataiku - google cloud platform roadshow - october 2013
The Rise of the CDO in Today's Enterprise
Building Data Science Teams: A Moneyball Approach
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Dataiku - From Big Data To Machine Learning
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Dataiku productive application to production - pap is may 2015
You're the New CDO, Now What?
Using Machine Learning & Spark to Power Data-Driven Marketing
Walmart Big Data Expo
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
How to build a data science team 20115.03.13v6
Ad

Viewers also liked (6)

PDF
From DBA to DevOps to DataOps- The Revolution
PDF
DataOps with Project Amaterasu
PDF
Chief Data Officer: DataOps - Transformation of the Business Data Environment
PPTX
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
PDF
The Rise of the DataOps - Dataiku - J On the Beach 2016
From DBA to DevOps to DataOps- The Revolution
DataOps with Project Amaterasu
Chief Data Officer: DataOps - Transformation of the Business Data Environment
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
The Rise of the DataOps - Dataiku - J On the Beach 2016
Ad

Similar to How to Build a Successful Data Team - Florian Douetteau (@Dataiku) (20)

PDF
Building & Scaling Data Teams
PDF
Data and data scientists are not equal to money david hoyle
PDF
What Managers Need to Know about Data Science
PDF
Adoption is the only option hadoop is changing our world and changing yours f...
PPTX
Best Practices for Scaling Data Science Across the Organization
PDF
What makes an effective data team?
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
How Can Analytics Improve Business?
PPTX
Data Science at LinkedIn - Data-Driven Products & Insights
PPTX
Best Practices for Development Apps for Big Data
PPTX
An AI Maturity Roadmap for Becoming a Data-Driven Organization
PPTX
Predictive analytics from a to z
PDF
Building Resiliency and Agility with Data Virtualization for the New Normal
PPTX
Data Infused Product Design and Insights at LinkedIn
PPTX
How to Start a Data Science Initiative and Grow Your Team
PDF
Analytics-Enabled Experiences: The New Secret Weapon
PDF
Building successful data science teams
PPTX
Patternbuilders Founder Showcase Deck
PDF
Putting data science in your business a first utility feedback
PDF
Course 8 : How to start your big data project by Eric Rodriguez
Building & Scaling Data Teams
Data and data scientists are not equal to money david hoyle
What Managers Need to Know about Data Science
Adoption is the only option hadoop is changing our world and changing yours f...
Best Practices for Scaling Data Science Across the Organization
What makes an effective data team?
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
How Can Analytics Improve Business?
Data Science at LinkedIn - Data-Driven Products & Insights
Best Practices for Development Apps for Big Data
An AI Maturity Roadmap for Becoming a Data-Driven Organization
Predictive analytics from a to z
Building Resiliency and Agility with Data Virtualization for the New Normal
Data Infused Product Design and Insights at LinkedIn
How to Start a Data Science Initiative and Grow Your Team
Analytics-Enabled Experiences: The New Secret Weapon
Building successful data science teams
Patternbuilders Founder Showcase Deck
Putting data science in your business a first utility feedback
Course 8 : How to start your big data project by Eric Rodriguez

More from Dataiku (18)

PDF
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
PDF
Applied Data Science Course Part 2: the data science workflow and basic model...
PDF
Applied Data Science Course Part 1: Concepts & your first ML model
PDF
The US Healthcare Industry
PDF
Before Kaggle : from a business goal to a Machine Learning problem
PPTX
04Juin2015_Symposium_Présentation_Coyote_Dataiku
PDF
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
PDF
The paradox of big data - dataiku / oxalide APEROTECH
PPTX
OWF 2014 - Take back control of your Web tracking - Dataiku
PDF
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
PPTX
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
PPTX
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
PPTX
Dataiku big data paris - the rise of the hadoop ecosystem
PPTX
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
PDF
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
PPTX
Data Disruption for Insurance - Perspective from th
PPTX
Dataiku Flow and dctc - Berlin Buzzwords
PPTX
Dataiku - Paris JUG 2013 - Hadoop is a batch
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 1: Concepts & your first ML model
The US Healthcare Industry
Before Kaggle : from a business goal to a Machine Learning problem
04Juin2015_Symposium_Présentation_Coyote_Dataiku
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
The paradox of big data - dataiku / oxalide APEROTECH
OWF 2014 - Take back control of your Web tracking - Dataiku
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Data Disruption for Insurance - Perspective from th
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku - Paris JUG 2013 - Hadoop is a batch

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf

How to Build a Successful Data Team - Florian Douetteau (@Dataiku)

  • 2. Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku x 54 + x 1+ + 58 ++ It’s Me !! It’s our software !!
  • 3. …and our software is The most complete Data Science platform Deployment
  • 4. Dataiku - Data Tuesday Meet Hal Alowne Big Guys • 10B$+ Revenue • 100M+ customers • 100+ Data Scientist Hal Alowne BI Manager Dim’s Private Showroom Hey Hal ! We need a big data platform like the big guys. Let’s just do as they do! ‟ ”Average E-commerce Web site • 100M$ Revenue • 1 Million customer • 1 Data Analyst (Hal Himself) Dim Sum CEO & Founder Dim’s Private Showroom Big Data Copy Cat Project
  • 7. LOL PLATFORM ANTI-PATTERN Test and Invest in Infrastructure == Skilled People or Go For Cloud / Packaged Infrastructure Your Brand New Hadoop Cluster is perceived as slow, not so used and not reliable
  • 8. TECHNO MISMATCH ANTI-PATTERN Assume Being Polyglot or Be a Dictator VS VS The Python Clan The R Tribe The Old Elephant Fraternity The New Elephant Club
  • 9. PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY Website 2000’ winners Companies that were able to release fast "Artificial Intelligence with Data for Internet of Things" 2010’ winners Companies able to put intelligence in production ? Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
  • 11. Classic Business Intelligence Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor BI Solution Architect Model Designer ETL Developer Dashboard / Report Designer DBA / IT Data Owner Specs
  • 12. Data Science Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Data Team Manager Data Engineer Data Analyst Data System Engineer / Data Architect Specs Data Scientist
  • 13. Built From Scratch Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor DBA / IT Data Owner Specs DATA SCIENTISTS EVERYWHERE
  • 14. Built From Engineering Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Specs DATA ENGINEERS DATA ANALYSTS
  • 15. Built From Analysts Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Specs
  • 17. Perfectly Natural Hidden thoughts Business Project Sponsor Data Team Manager Data Engineer Data Analyst Data Scientist
  • 18. Managing Extreme Personalities Data SCIENTIST Highly Creative Passionate Hard to hire ? Hard to manage ? Want to take your job ? Ambitious
  • 19. Paired for Data Data Analyst Discover Patterns Data Engineer Make things work Fight data entropy Entropy tech entropy
  • 20. When do you prefer ? One Analyst One Engineer One Data Scientist That work together ? Four data scientists
  • 22. What is the main reason for data project to fail ? DATA NOT AVAILABLE
  • 23. BUT FOR ONLY INCREMENTAL GAIN 50 30 20 0% 25% 50% 75% 100% Contribution to the overall project performance Business Goal Definition and Data Feature Engineering Algorithm
  • 24. How to Get Data if you don’t have it THE GRASSHOPER THE SPIDER THE FOX
  • 26. The Cicada : Optimistic and Opportunistic Data THE CICADA As a startup As a group inside a company - Build a new product using open data - Benefit from the data sharing initiative within your company - Wait for data to be available in your data lake
  • 27. The Spider: Power of the Network THE SPIDER As a startup As a group inside a company - Create a network of (web trackers | sensors) - Make it available for free - Build your service on people’s collected data - Make a web service available to collect data - Promote it internally so that people use it
  • 28. The Fox: Hunt for the Big Money first THE FOX As a startup As a group inside a company - Hunt for a Business Group within a large company with a problem - Build a SaaS solution using their data - Replicate to competitors - Take in a charge a critical problem as per the CEO’s request - Build your own integrated tech team to solve it - Use those ressources to reset data services internally
  • 30. What is Big Data about ?
  • 31. The Age Of Distributed Intelligence Global, Personalised and Real Time Data Driven Services
  • 32. Data to Visualize or Data to Automate ? 2013 2014 2015 2016 2017 2018 Automated Decision VIsualize To Decide Moving to a world of automated decision making
  • 33. Involve product team Product Feature Personalised Item Ranking Product Feature Notify User Only when Needed Product Feature: Historical Data For Path Optimisation Have Product Management Deeply Involved In the Data Team
  • 34. Where is your added value ? Is the problem at the Core of my Business Process? Is it a common problem / with share data ? Go for Best of Breed SAAS Solution Can I Solve it on my own ? Really ? Build by the data team Build by the data team ? Build by the data team Hire Consultants and Learn Yes Yes No I can’t Ok, I can try Yes! No! No
  • 35. Be aware of the confort zone Mission Critical Small Structured Large Diverse Sheer Curiosity Reporting for Finance in Any Industry Analyze Each Tweet Web Navigation For E-Merchant Ticket Data For Discounts in Retail Phone Call Logs for Security RTB Data For Advertising Customer Consumption For Anti-Churn in Utilities Optimization Filings For Fraud in Insurance Not Enough Data To Learn From ? Not Enough “Hard" Examples So that you can learn
  • 36. Create an "API" Culture Do not share • Random Piece of Code • Flat File Do share • Reproductible documented workflows • Clean, documented APIs
  • 37. Food for thoughts www.dataiku.com/blog Free Data Science Software www.dataiku.com/dss THANK YOU ! Data Science Is no longer a science

Editor's Notes

  • #3: What do we do ? We help insurers develop new analytic data-driven products and platform withouth having to pay any technical debt from leaving already existing platform (that comes from the 90s) We help people build Data Labs