SlideShare a Scribd company logo
Three ways to Fail your Data Lab
Implementation
Dataiku DSS
DataLabs
10 M€ in 2014121 499 M€ in2014 3 029 M€ in2015
5 454 M€ in2014816 M€ in201410 M€ in 2008
Marketing/ Web
ü Behavioral segmentation
ü Churn prediction
ü Sales forecast
ü Dynamic Pricing
Industrie& Infrastructure
ü Predictive maintenance
ü Logistic Optimization
ü Smart Cities
Bank & Insurance
ü Fraud detection
ü Riskanticipation
ü Lifetime moment detection
Why a data Lab?
• 1 single Workflow : from a segmentated workflow to a transversal one
• Several use cases: Ability to adress many different data centric topics within a
single unit
• Multiple competences: Business focused approached mixing many different
competences
• End to end projects : combining data from different sources to handle several
aspects on a single topic
Deployment ofthe
predictions
Dataiku DSSfor fraud prediction
Client service
Sensor data
Garage data
Administration
• 1 Project Owner (IT)
• 1 Project Manager (Business)
• 1 Data scientist in house
• 3 data scientist sfrom 3 different firms
• 3 consultants from 3 different firms
• 1 architect (external)
Accepted file
INVESTIGATE !
Thetransactions areblocked
dependingontheir gap with the
business rules and behavioral
patterns
Welcometo Technoslavia!
6
Focuson the framework,not on the input
Data
Acquisition &
Understanding
Data
Preparation
Model Creation
Evaluation Deployment
Scored
dataset
Scored
dataset
Iteration 1
Iteration 2
Iteration n
✓ Read and import raw data
✓ Detect schemas and structure
✓ Analyze distributions
✓ Assess quality: outliers,
missing values...
✓ Performance metrics
✓ Robustness & generalization
(cross validation)
✓ Insights (eg variable importance)
✓ Create derived and
aggregated variables
→ Analytical dataset
→ Report
✓ Feature selection
✓ Compare algorithms
✓ Scoring engine
✓ Publish predictions
✓ Monitor performance
✓ API
Business
Understanding
Adapted from the CRISP-DM methodology
Dataset
1
Dataset
2
Dataset
n
People and Governance
?
PolyglottVS dictator
Problems :
• Collaboration between
technical and non
technical profiles inside
a single project
• Nécessary
collaboration between
business and tech
teams to adress
transversal projects
accurately
Focus :
• Promote diversity
• …within a workflow
centric environment
End to end, from prototyping into production
Do it you way …
…and scale!
DataLab Organisation
Data Lab
Lab Environment
MultydisciplinaryTeam:
Direction/ Project Management
Business Analysts
Data Miners / Data Scientists
Production Environment
Business needs
Internal Data
sources
External
datasources
Missions :
Priorisationof the business needs
Prototyping /Agile solution engineering
Support for Apps deployment
Business Applications
Marketing CampaignAutomation
Reporting webanalytics
Data as A Service Platform
Conceptionof“DATAPRODUCTS”
Integration of DataProducts
OptimisationEngine
Real Time Scoring
Data Flow
Insights & Services
Processing chain
API Deployment
Thank you !

More Related Content

PDF
Gianluigi Vigano, Senior Architect and Fouad Teban, Regional Presales Manager...
PPTX
ironSource Atom BigData Berlin
PPTX
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
PDF
"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & ...
PDF
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
PDF
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
PDF
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
PPT
Counting Unique Users in Real-Time: Here's a Challenge for You!
Gianluigi Vigano, Senior Architect and Fouad Teban, Regional Presales Manager...
ironSource Atom BigData Berlin
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & ...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Counting Unique Users in Real-Time: Here's a Challenge for You!

What's hot (20)

PDF
Building data "Py-pelines"
PDF
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
PPTX
The Yellowbrick Impact for MicroStrategy
PDF
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
PPT
Data Science Day New York: Data Science: A Personal History
PPTX
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
PPTX
AzureDay - Introduction Big Data Analytics.
PPTX
Business Innovations Through Big Data Analytics - 30th November 2017
PDF
Introduction to Cloud Applications
PDF
Big data from the trenches
PDF
DataVirtulization
PDF
Building a Distributed Collaborative Data Pipeline with Apache Spark
PPTX
The Big Data Ecosystem for Financial Services
PDF
Agile enterprise analytics on aws
PPTX
Big data-science-oanyc
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PDF
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
PPTX
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
PPTX
How OpenTable uses Big Data to impact growth by Raman Marya
PDF
Applied Data Science Course Part 2: the data science workflow and basic model...
Building data "Py-pelines"
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
The Yellowbrick Impact for MicroStrategy
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
Data Science Day New York: Data Science: A Personal History
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup
AzureDay - Introduction Big Data Analytics.
Business Innovations Through Big Data Analytics - 30th November 2017
Introduction to Cloud Applications
Big data from the trenches
DataVirtulization
Building a Distributed Collaborative Data Pipeline with Apache Spark
The Big Data Ecosystem for Financial Services
Agile enterprise analytics on aws
Big data-science-oanyc
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
How OpenTable uses Big Data to impact growth by Raman Marya
Applied Data Science Course Part 2: the data science workflow and basic model...
Ad

Similar to Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation" (20)

PDF
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
PDF
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
PDF
Building the Artificially Intelligent Enterprise
PDF
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
PDF
Data Science Operationalization: The Journey of Enterprise AI
PDF
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
PDF
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
PDF
Introduction to BigData
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PDF
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
PDF
Introduction to Data Science (Data Summit, 2017)
PDF
Keyrus US Information
PDF
Keyrus US Information
PDF
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
PDF
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
DOC
Big Data Analyst at BankofAmerica
PDF
How Data Virtualization Puts Machine Learning into Production (APAC)
PDF
Data Con LA 2022 - Self-Service Success and Data Products
PDF
Key Considerations While Rolling Out Denodo Platform
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Building the Artificially Intelligent Enterprise
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Data Science Operationalization: The Journey of Enterprise AI
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Introduction to BigData
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Introduction to Data Science (Data Summit, 2017)
Keyrus US Information
Keyrus US Information
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Big Data Analyst at BankofAmerica
How Data Virtualization Puts Machine Learning into Production (APAC)
Data Con LA 2022 - Self-Service Success and Data Products
Key Considerations While Rolling Out Denodo Platform
Ad

More from Dataconomy Media (20)

PDF
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
PDF
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
PDF
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
PDF
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
PPTX
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
PPTX
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
PPTX
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
PDF
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
PPTX
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
PDF
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
PPTX
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
PDF
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
PDF
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
PDF
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
PDF
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
PPTX
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
PDF
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
PPTX
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
PPTX
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
PPTX
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Spectroscopy.pptx food analysis technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectroscopy.pptx food analysis technology
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction

Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation"

  • 1. Three ways to Fail your Data Lab Implementation
  • 3. DataLabs 10 M€ in 2014121 499 M€ in2014 3 029 M€ in2015 5 454 M€ in2014816 M€ in201410 M€ in 2008 Marketing/ Web ü Behavioral segmentation ü Churn prediction ü Sales forecast ü Dynamic Pricing Industrie& Infrastructure ü Predictive maintenance ü Logistic Optimization ü Smart Cities Bank & Insurance ü Fraud detection ü Riskanticipation ü Lifetime moment detection
  • 4. Why a data Lab? • 1 single Workflow : from a segmentated workflow to a transversal one • Several use cases: Ability to adress many different data centric topics within a single unit • Multiple competences: Business focused approached mixing many different competences • End to end projects : combining data from different sources to handle several aspects on a single topic
  • 5. Deployment ofthe predictions Dataiku DSSfor fraud prediction Client service Sensor data Garage data Administration • 1 Project Owner (IT) • 1 Project Manager (Business) • 1 Data scientist in house • 3 data scientist sfrom 3 different firms • 3 consultants from 3 different firms • 1 architect (external) Accepted file INVESTIGATE ! Thetransactions areblocked dependingontheir gap with the business rules and behavioral patterns
  • 7. Focuson the framework,not on the input Data Acquisition & Understanding Data Preparation Model Creation Evaluation Deployment Scored dataset Scored dataset Iteration 1 Iteration 2 Iteration n ✓ Read and import raw data ✓ Detect schemas and structure ✓ Analyze distributions ✓ Assess quality: outliers, missing values... ✓ Performance metrics ✓ Robustness & generalization (cross validation) ✓ Insights (eg variable importance) ✓ Create derived and aggregated variables → Analytical dataset → Report ✓ Feature selection ✓ Compare algorithms ✓ Scoring engine ✓ Publish predictions ✓ Monitor performance ✓ API Business Understanding Adapted from the CRISP-DM methodology Dataset 1 Dataset 2 Dataset n
  • 8. People and Governance ? PolyglottVS dictator Problems : • Collaboration between technical and non technical profiles inside a single project • Nécessary collaboration between business and tech teams to adress transversal projects accurately Focus : • Promote diversity • …within a workflow centric environment
  • 9. End to end, from prototyping into production Do it you way …
  • 11. DataLab Organisation Data Lab Lab Environment MultydisciplinaryTeam: Direction/ Project Management Business Analysts Data Miners / Data Scientists Production Environment Business needs Internal Data sources External datasources Missions : Priorisationof the business needs Prototyping /Agile solution engineering Support for Apps deployment Business Applications Marketing CampaignAutomation Reporting webanalytics Data as A Service Platform Conceptionof“DATAPRODUCTS” Integration of DataProducts OptimisationEngine Real Time Scoring Data Flow Insights & Services Processing chain API Deployment