www.globalbigdataconference.com
Twitter : @bigdataconf
How To Implement a Standardized Process
for Execution and Delivery Of Data Science
Solutions: Microsoft’s Team Data Science
Process (TDSP)
Debraj GuhaThakurta
Microsoft AI & Research
Algorithms and Data Science
debraj.guhathakurta@microsoft.com
Date: Aug 29, 2017
• Standardized process: A challenge in Data Science
• Microsoft Team Data Science Process (TDSP)
o Principle and Objective
o Components
o Adoption
• Ongoing & future work
• Summary
• Resources
Note: All relevant links are provided in Resources section
3
The opportunity and challenge of data science
in enterprises
• Opportunity: 17% … had a well-developed Predictive/Prescriptive
Analytics program in place, while 80% … planned on implementing such a
program within five years – Dataversity 2015 Survey
• Challenge: Only 27% of the big data projects are regarded as successful –
CapGenimi 2014
4
While tools & platforms have matured, there is still a major gap
in executing on the potential
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Diversity & growth creates process challenges in
enterprise DS teams
6
Organization
Collaboration
Quality
Knowledge Accumulation
Agility
Global Teams
•Five Geographic
Locations
Team Growth
•Onboard New
Members Rapidly
Varied Use Cases
•Industries and Use
Cases
Diverse DS
Backgrounds
•DS have diverse
backgrounds,
experiences with
tools, languages
Well-known DS processes are usually high-level
descriptions
7
Cross-Industry Standard Process for
Data Mining (CRISP-DM)
Knowledge Discovery in Databases
(KDD)
A process to make enterprise
DS teams more efficient
First - why a process?
A process specifies a detailed sequence of activities
necessary to perform specific business tasks
It is used to standardize procedures and
establish best practices
9
Technology and tools are changing rapidly. A standardized
process can provide continuity and stability of work-flow.
- Based on discussions with Luis Morinigo, Dir. IoT, NewSignature
Data Science can borrow processes from
Software Development Life Cycle (SDLC)
10
Data Science != Software Development
But, we can learn,
especially on standardized processes
SDLC has had much more to evolve, standardize and build
generalized best practices
Guiding principle and objective of TDSP
11
Objective:
Provide a process to improve the efficiency of development & delivery of DS solutions
Principle:
Integrate key practices of software
Development with that of a data-science
work-flow → Improve productivity,
collaboration and quality
4 Key components of TDSP
12
Standard DS Lifecycle
Project Structure, Templates & Roles
Shared, Distributed Data Platforms and Servers
(incl. version control server)
Productivity Tools, Shared Utilities
Component I: Standardized DS lifecycle
13
• 4 major stages:
DS lifecycle stages can be integrated with specific
deliverables & checkpoints
14
Business
Understanding
• Project Objective
• Data, Target &
Feature
Definition
• Data Dictionary
Data acquisition and
understanding
Modeling Deployment
Component II: Project structure and templates
15
Business understanding &
problem scope definition
Component II: Project roles (examples)
16
• Governance and Project Management
• Data Science and Engineering
Component III: Shared and distributed resources
Can be cloud or on–premises
• Virtual machines (VMs), or clusters are
disposable compute, added to projects
as needed
• Many-to-many relationship between
data scientists, VMs and projects
possible
• Data typically stored in cloud stores,
such as blob or database
• Project artifacts & code permanently
stored in central Git (version control)
repositories.
17
Project execution needs compute + data + process
Component III: Shared cloud compute & storage resources
DSVM: Convenient cloud compute resource for Data Scientists [Optional]
18
• The Microsoft DSVM is an Azure virtual
machine (VM) image pre-installed and
configured with several popular DS
tools
• Microsoft R Server Developer Edition
• Anaconda Python distribution
• Jupyter notebook (with R, Python kernels)
• Visual Studio Community Edition
• Power BI desktop
• SQL Server 2016 Developer Edition
• Machine learning and Data Analytics tools
• Deep Learning Toolkits
Component III: Shared Git repository for distributed
development and versioning
19
• Git is a Version Control System
• Each repo contains the full change
history
• Used in a distributed way with a
single remote repo and several local
repos (on local machine or a VM)
Remo
te
Local LocalLocal Local
TDSP Git Template
Component IV: Productivity tools and utilities - Analytics
IDEAR: Interactive data exploration and reporting (R and Python)
20
o Data quality assessment
o Getting business insights from the
data
o Association between variables
o Generating data reports
automatically
Clustering
Correlations
Distribution assessment
Component IV: Productivity tools and utilities - Analytics
AMAR: Automated modeling and reporting (R)
21
o Building baseline models
quickly
o Evaluating model accuracy
o Generating standardized
model reports
o Model explainability
(feature importance)
Predicted vs. Actual (diff. algorithms)
Feature importance (diff. algorithms)
Component IV: Productivity tools - Agile planning
Agile-like work planning and execution template DS
22
• Use VSTS to create data science agile
process template
• Use VSTS to track data science activities
• Use KANBAN to track the PBI/Stories
Visual Studio Team Services
Helps you customize Agile-derived work-templates
TDSP in action on Azure: E2E process templates
23
• Azure ML
• Spark
• Hadoop/MR
https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
• SQL-server with R and Python
• Azure Data Lake
Adoption: TDSP can be adopted in phases in a DS
organization
24
Level 1 Level 2 Level 3 Level 4
• Git repository per
data science project
• Standard directory
structure in each
repo
• Use provided
document
templates like
charter, exit reports
• Track work items
• Level 1 +
• Customize and
standardize key
document
templates to be
used by the team
• Develop shared DS
utilities like IDEAR
and AMAR
• Level 2 +
• Utility repository to
share scripts as
utilities
• Level 3 +
• Link git branch with
work items
• Code review
• Manage all models
and data/features
(model
management)
• Testing
Adoption: Organizations may choose customize
Preserving the DS lifecycle and 4 key components
25
• TDSP lifecycle and components provide a process,
but with flexibility
• Customization possible as per team/org needs
• Specific DS tools, algos etc.
• Heavier or lighter document templates for deliverables
• E.g. Consulting orgs’ delivery models, long-term engagements, PoCs projects
• Templates for different business verticals
Adoption: Internal and external organizations that
have adopted TDSP
26
• Microsoft internal
• Microsoft consulting services
(MCS)
• AI & R Cloud Platform: Algorithm
and data sciences team
• Windows Devices DS team
• Technology Evangelism and
Delivery DS team
• External partners
• New Signature
• BlueGranite
Ongoing work
• Enabling better project scoping, specification of RoI, criteria for
success at each stage of project
• Further incorporate SDLC features into DS process
• Continuous testing, release
• Instantiation from various data products
• Evolving process for deep learning
• Relevant worked-out templates for vision, NLP and others
• Providing appropriate data platforms guidelines
TDSP Summary: Integrate SDLC + DS workflows
Making DS teams more efficient
28
SWD Practices Data Science Workflow and reports
TDSP components aim to solve DS process challenges
29
Organization
Organization,
Collaboration
Collaboration, quality
Collaboration, knowledge
accumulation
Standard DS Lifecycle
Project Structure, Templates &
Roles
Shared, Distributed Data Platforms and
Servers
(incl. version control server)
Productivity Tools, Shared Utilities
Improved
DS project
execution &
delivery
Confidential under NDA
Resources
Resources
• TDSP on GitHub:
https://github.com/Azure/Microsoft-
TDSP
Public resources, blogs, announcements
by Microsoft
32
• Process Overview and Lifecycle:
• https://docs.microsoft.com/en-us/azure/machine-learning/data-science-process-overview
• Links to TDSP execution on Azure using Azure ML and other data platforms:
• AzureML: https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/
• Other data platforms (Spark, SQL-server): https://docs.microsoft.com/en-us/azure/machine-learning/data-science-process-
walkthroughs
• Blogs by Microsoft (releases and announcements):
• https://blogs.technet.microsoft.com/machinelearning/2016/10/11/introducing-the-team-data-science-process-from-microsoft/
• https://blogs.technet.microsoft.com/machinelearning/2017/04/05/latest-rev-of-utilities-for-microsoft-team-data-science-
process-tdsp-now-available/
Blogs and announcements by Microsoft Partners
and Community
33
• Blogs by Microsoft Partners (not exhaustive):
• https://newsignature.com/articles/new-signature-team-data-science-process/
• https://www.blue-granite.com/blog/getting-more-from-your-data-science-teams-organization-and-process-
considerations
• Blogs and announcements by community (not exhaustive):
• https://www.onmsft.com/news/team-data-science-process-hopes-to-help-improve-project-productivity-through-team-
learning
• https://mspoweruser.com/microsoft-announces-team-data-science-process-agile-methodology-improve-collaboration/
• http://news.thewindowsclub.com/microsoft-data-science-utilities-88002/
• https://www.mgicomputers.com/tech-news/team-data-science-process-hopes-to-help-improve-project-productivity-
through-team-learning
Confidential under NDA
Thank you!
Contact information: debraj.guhathakurta@microsoft.com

More Related Content

PDF
Brief introduction to data visualization
PDF
Data Architecture - The Foundation for Enterprise Architecture and Governance
PDF
OpenBOM: Neo4j and Bill of Materials meetup, Boston
PDF
The Neo4j Data Platform for Today & Tomorrow.pdf
PPTX
Dimensional modeling in oracle sql developer
PPTX
Data Visualization and Dashboard Design
PDF
Supply Chain Twin Demo - Companion Deck
PPTX
Snowflake Data Access.pptx
Brief introduction to data visualization
Data Architecture - The Foundation for Enterprise Architecture and Governance
OpenBOM: Neo4j and Bill of Materials meetup, Boston
The Neo4j Data Platform for Today & Tomorrow.pdf
Dimensional modeling in oracle sql developer
Data Visualization and Dashboard Design
Supply Chain Twin Demo - Companion Deck
Snowflake Data Access.pptx

What's hot (20)

PDF
The Knowledge Graph Explosion
DOC
Manual clase VMware
PDF
Neo4j : Graphes de Connaissance, IA et LLMs
PPTX
The 8 Best Examples Of Real-Time Data Analytics
PPTX
NOW 2022 Conference Lora Cecere
PPTX
Data Engineer's Lunch #54: dbt and Spark
PDF
Data visualization introduction
PDF
Marketing KPI Dashboard Showing Lead Funnel Traffic Sources Key Metrics
PPTX
Data Engineering Proposal for Homerunner.pptx
PPT
Informatica Cloud Overview
PPTX
Database management systems
PDF
Introdution to Dataops and AIOps (or MLOps)
PPTX
Developer To Architect
PDF
Data Visualization With R
PPTX
Data Profiling and Quality Assurance with Great Expectations.pptx
PDF
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
PDF
Introduction to Streaming Analytics
PDF
Navigating the Strategic Portfolio Management Landscape with OnePlan.pdf
PDF
Hadoop and Spark
PDF
Graphs in Retail: Know Your Customers and Make Your Recommendations Engine Learn
The Knowledge Graph Explosion
Manual clase VMware
Neo4j : Graphes de Connaissance, IA et LLMs
The 8 Best Examples Of Real-Time Data Analytics
NOW 2022 Conference Lora Cecere
Data Engineer's Lunch #54: dbt and Spark
Data visualization introduction
Marketing KPI Dashboard Showing Lead Funnel Traffic Sources Key Metrics
Data Engineering Proposal for Homerunner.pptx
Informatica Cloud Overview
Database management systems
Introdution to Dataops and AIOps (or MLOps)
Developer To Architect
Data Visualization With R
Data Profiling and Quality Assurance with Great Expectations.pptx
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Introduction to Streaming Analytics
Navigating the Strategic Portfolio Management Landscape with OnePlan.pdf
Hadoop and Spark
Graphs in Retail: Know Your Customers and Make Your Recommendations Engine Learn
Ad

Similar to Team Data Science Process Presentation (TDSP), Aug 29, 2017 (20)

PDF
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
PDF
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
PDF
Managing Enterprise Data Science 201904
PPTX
Building enterprise advance analytics platform
PPTX
RDM Roadmap to the Future, or: Lords and Ladies of the Data
PPTX
Efficient & effective data management for research projects : ILRI's Data Ma...
DOCX
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
PPTX
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
PDF
1. Overview_of_data_analytics (1).pdf
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
PDF
Data-Ed: Data Architecture Requirements
PDF
Data-Ed Online: Data Architecture Requirements
PDF
DevOps Spain 2019. Olivier Perard-Oracle
PPTX
SPS Vancouver 2018 - What is CDM and CDS
PDF
How to make your data count webinar, 26 Nov 2018
PDF
Hadoop meets Agile! - An Agile Big Data Model
PPTX
Breed data scientists_ A Presentation.pptx
PDF
Ds for finance day 4
PDF
A Data Management Maturity Model Case Study
PDF
Building Data Science into Organizations: Field Experience
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Managing Enterprise Data Science 201904
Building enterprise advance analytics platform
RDM Roadmap to the Future, or: Lords and Ladies of the Data
Efficient & effective data management for research projects : ILRI's Data Ma...
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
1. Overview_of_data_analytics (1).pdf
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Data-Ed: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
DevOps Spain 2019. Olivier Perard-Oracle
SPS Vancouver 2018 - What is CDM and CDS
How to make your data count webinar, 26 Nov 2018
Hadoop meets Agile! - An Agile Big Data Model
Breed data scientists_ A Presentation.pptx
Ds for finance day 4
A Data Management Maturity Model Case Study
Building Data Science into Organizations: Field Experience
Ad

Recently uploaded (20)

PDF
Global Data and Analytics Market Outlook Report
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Transcultural that can help you someday.
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
DOCX
Factor Analysis Word Document Presentation
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
New ISO 27001_2022 standard and the changes
PPT
statistic analysis for study - data collection
PPTX
Introduction to Inferential Statistics.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
SET 1 Compulsory MNH machine learning intro
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Global Data and Analytics Market Outlook Report
[EN] Industrial Machine Downtime Prediction
Transcultural that can help you someday.
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
modul_python (1).pptx for professional and student
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Factor Analysis Word Document Presentation
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Leprosy and NLEP programme community medicine
IMPACT OF LANDSLIDE.....................
New ISO 27001_2022 standard and the changes
statistic analysis for study - data collection
Introduction to Inferential Statistics.pptx
SAP 2 completion done . PRESENTATION.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
SET 1 Compulsory MNH machine learning intro
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305

Team Data Science Process Presentation (TDSP), Aug 29, 2017

  • 2. How To Implement a Standardized Process for Execution and Delivery Of Data Science Solutions: Microsoft’s Team Data Science Process (TDSP) Debraj GuhaThakurta Microsoft AI & Research Algorithms and Data Science debraj.guhathakurta@microsoft.com Date: Aug 29, 2017
  • 3. • Standardized process: A challenge in Data Science • Microsoft Team Data Science Process (TDSP) o Principle and Objective o Components o Adoption • Ongoing & future work • Summary • Resources Note: All relevant links are provided in Resources section 3
  • 4. The opportunity and challenge of data science in enterprises • Opportunity: 17% … had a well-developed Predictive/Prescriptive Analytics program in place, while 80% … planned on implementing such a program within five years – Dataversity 2015 Survey • Challenge: Only 27% of the big data projects are regarded as successful – CapGenimi 2014 4 While tools & platforms have matured, there is still a major gap in executing on the potential
  • 6. Diversity & growth creates process challenges in enterprise DS teams 6 Organization Collaboration Quality Knowledge Accumulation Agility Global Teams •Five Geographic Locations Team Growth •Onboard New Members Rapidly Varied Use Cases •Industries and Use Cases Diverse DS Backgrounds •DS have diverse backgrounds, experiences with tools, languages
  • 7. Well-known DS processes are usually high-level descriptions 7 Cross-Industry Standard Process for Data Mining (CRISP-DM) Knowledge Discovery in Databases (KDD)
  • 8. A process to make enterprise DS teams more efficient
  • 9. First - why a process? A process specifies a detailed sequence of activities necessary to perform specific business tasks It is used to standardize procedures and establish best practices 9 Technology and tools are changing rapidly. A standardized process can provide continuity and stability of work-flow. - Based on discussions with Luis Morinigo, Dir. IoT, NewSignature
  • 10. Data Science can borrow processes from Software Development Life Cycle (SDLC) 10 Data Science != Software Development But, we can learn, especially on standardized processes SDLC has had much more to evolve, standardize and build generalized best practices
  • 11. Guiding principle and objective of TDSP 11 Objective: Provide a process to improve the efficiency of development & delivery of DS solutions Principle: Integrate key practices of software Development with that of a data-science work-flow → Improve productivity, collaboration and quality
  • 12. 4 Key components of TDSP 12 Standard DS Lifecycle Project Structure, Templates & Roles Shared, Distributed Data Platforms and Servers (incl. version control server) Productivity Tools, Shared Utilities
  • 13. Component I: Standardized DS lifecycle 13 • 4 major stages:
  • 14. DS lifecycle stages can be integrated with specific deliverables & checkpoints 14 Business Understanding • Project Objective • Data, Target & Feature Definition • Data Dictionary Data acquisition and understanding Modeling Deployment
  • 15. Component II: Project structure and templates 15 Business understanding & problem scope definition
  • 16. Component II: Project roles (examples) 16 • Governance and Project Management • Data Science and Engineering
  • 17. Component III: Shared and distributed resources Can be cloud or on–premises • Virtual machines (VMs), or clusters are disposable compute, added to projects as needed • Many-to-many relationship between data scientists, VMs and projects possible • Data typically stored in cloud stores, such as blob or database • Project artifacts & code permanently stored in central Git (version control) repositories. 17 Project execution needs compute + data + process
  • 18. Component III: Shared cloud compute & storage resources DSVM: Convenient cloud compute resource for Data Scientists [Optional] 18 • The Microsoft DSVM is an Azure virtual machine (VM) image pre-installed and configured with several popular DS tools • Microsoft R Server Developer Edition • Anaconda Python distribution • Jupyter notebook (with R, Python kernels) • Visual Studio Community Edition • Power BI desktop • SQL Server 2016 Developer Edition • Machine learning and Data Analytics tools • Deep Learning Toolkits
  • 19. Component III: Shared Git repository for distributed development and versioning 19 • Git is a Version Control System • Each repo contains the full change history • Used in a distributed way with a single remote repo and several local repos (on local machine or a VM) Remo te Local LocalLocal Local TDSP Git Template
  • 20. Component IV: Productivity tools and utilities - Analytics IDEAR: Interactive data exploration and reporting (R and Python) 20 o Data quality assessment o Getting business insights from the data o Association between variables o Generating data reports automatically Clustering Correlations Distribution assessment
  • 21. Component IV: Productivity tools and utilities - Analytics AMAR: Automated modeling and reporting (R) 21 o Building baseline models quickly o Evaluating model accuracy o Generating standardized model reports o Model explainability (feature importance) Predicted vs. Actual (diff. algorithms) Feature importance (diff. algorithms)
  • 22. Component IV: Productivity tools - Agile planning Agile-like work planning and execution template DS 22 • Use VSTS to create data science agile process template • Use VSTS to track data science activities • Use KANBAN to track the PBI/Stories Visual Studio Team Services Helps you customize Agile-derived work-templates
  • 23. TDSP in action on Azure: E2E process templates 23 • Azure ML • Spark • Hadoop/MR https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/ • SQL-server with R and Python • Azure Data Lake
  • 24. Adoption: TDSP can be adopted in phases in a DS organization 24 Level 1 Level 2 Level 3 Level 4 • Git repository per data science project • Standard directory structure in each repo • Use provided document templates like charter, exit reports • Track work items • Level 1 + • Customize and standardize key document templates to be used by the team • Develop shared DS utilities like IDEAR and AMAR • Level 2 + • Utility repository to share scripts as utilities • Level 3 + • Link git branch with work items • Code review • Manage all models and data/features (model management) • Testing
  • 25. Adoption: Organizations may choose customize Preserving the DS lifecycle and 4 key components 25 • TDSP lifecycle and components provide a process, but with flexibility • Customization possible as per team/org needs • Specific DS tools, algos etc. • Heavier or lighter document templates for deliverables • E.g. Consulting orgs’ delivery models, long-term engagements, PoCs projects • Templates for different business verticals
  • 26. Adoption: Internal and external organizations that have adopted TDSP 26 • Microsoft internal • Microsoft consulting services (MCS) • AI & R Cloud Platform: Algorithm and data sciences team • Windows Devices DS team • Technology Evangelism and Delivery DS team • External partners • New Signature • BlueGranite
  • 27. Ongoing work • Enabling better project scoping, specification of RoI, criteria for success at each stage of project • Further incorporate SDLC features into DS process • Continuous testing, release • Instantiation from various data products • Evolving process for deep learning • Relevant worked-out templates for vision, NLP and others • Providing appropriate data platforms guidelines
  • 28. TDSP Summary: Integrate SDLC + DS workflows Making DS teams more efficient 28 SWD Practices Data Science Workflow and reports
  • 29. TDSP components aim to solve DS process challenges 29 Organization Organization, Collaboration Collaboration, quality Collaboration, knowledge accumulation Standard DS Lifecycle Project Structure, Templates & Roles Shared, Distributed Data Platforms and Servers (incl. version control server) Productivity Tools, Shared Utilities Improved DS project execution & delivery
  • 31. Resources • TDSP on GitHub: https://github.com/Azure/Microsoft- TDSP
  • 32. Public resources, blogs, announcements by Microsoft 32 • Process Overview and Lifecycle: • https://docs.microsoft.com/en-us/azure/machine-learning/data-science-process-overview • Links to TDSP execution on Azure using Azure ML and other data platforms: • AzureML: https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/ • Other data platforms (Spark, SQL-server): https://docs.microsoft.com/en-us/azure/machine-learning/data-science-process- walkthroughs • Blogs by Microsoft (releases and announcements): • https://blogs.technet.microsoft.com/machinelearning/2016/10/11/introducing-the-team-data-science-process-from-microsoft/ • https://blogs.technet.microsoft.com/machinelearning/2017/04/05/latest-rev-of-utilities-for-microsoft-team-data-science- process-tdsp-now-available/
  • 33. Blogs and announcements by Microsoft Partners and Community 33 • Blogs by Microsoft Partners (not exhaustive): • https://newsignature.com/articles/new-signature-team-data-science-process/ • https://www.blue-granite.com/blog/getting-more-from-your-data-science-teams-organization-and-process- considerations • Blogs and announcements by community (not exhaustive): • https://www.onmsft.com/news/team-data-science-process-hopes-to-help-improve-project-productivity-through-team- learning • https://mspoweruser.com/microsoft-announces-team-data-science-process-agile-methodology-improve-collaboration/ • http://news.thewindowsclub.com/microsoft-data-science-utilities-88002/ • https://www.mgicomputers.com/tech-news/team-data-science-process-hopes-to-help-improve-project-productivity- through-team-learning
  • 34. Confidential under NDA Thank you! Contact information: debraj.guhathakurta@microsoft.com