SlideShare a Scribd company logo
From Proliferation to Productivity with a
Machine Learning Data Catalog
Stephanie McReynolds, Vice President of Marketing
@slangenfeld
Data is proliferating
“data creation will swell to a total of 163 zettabytes by 2025”
Enterprises are driving this
“enterprises will create 60% of this data”
Data consumers are growing even faster
• Tableau Users 2018 = 78,000+
• Tableau Users 2014 = 26,000+Source: IDC Thought Leadership Practice Case Study, Data Age 2025
“90% of the time that knowledge workers
spend in creating new reports is
recreating information that already
exists.”
Source: KMM World
The social challenge of self-service analytics
• Human data discovery is still in its infancy
- Self-service analytics adoption has increased access to analysis
- Understanding comes next and then data-driven decisions
• We still rely on experts to Find, Understand & Trust data
- Data distributed across 100s – 1,000s of sources
- Nuances of data are often not documented
- Self-service data prep is coming but early
• Decision-maker understanding of visualized analytics is often:
- Incomplete
- Un-actionable
- Inaccurate
As a result, data-driven decisions are illusive.
"By 2020, organizations that offer users access to a
curated catalog of internal and external data will
realize 2x the business value from analytics
investments than those that do not."
- Gartner Magic Quadrant for Business Intelligence
and Analytics Platforms, 2017
UseFind & Understand Trust
Technical Metadata Business Metadata Social Metadata
Data trains
Machines guess Humans confirm
Machines scale
the learnings
Feedback loop
!!
Machine Learning Data Catalogs
Data catalogs deliver break-through drugs to market faster
“Data science shouldn’t be
confined to mathematicians.”
- JEFF KEISLING
CHIEF INFORMATION OFFICER
The Pfizer data landscape was fragmented
• 200B+ data Points
• 180+ countries
• 1M+ data sources
• 500k+ tables
• 6M+ columns
• 6B+ queries
“Data has become a very powerful currency
for any company, and what we were finding
is that the data was very fragmented.”
- JULIE SCHIFFMAN
VP OF BUSINESS ANALYTICS
The data catalogs linked queries & users by usage
Break-through
ideas now
come down to
simple
search…
…collaboration..
…and a diverse team of data literate employees.
Business outcomes driven by an Analytics Workbench
• Goal: Deliver break-through drugs to market faster
• Initiative: Identify rare disease markers, like transthyretin
cardiomyopathy heart failures
• Often goes undiagnosed because symptoms are similar to
more common forms of heart failure
• Analytics Workbench used to identify potential candidates
• ML algorithms help with diagnosis and identifying clinical trial
participants
• Solution: Physician education + Targeted clinical trials
Alation + Dataiku + Tableau
And more…
Data Catalog Outcomes to Consider
Analyst Productivity Agile Stewardship Infonomics
Increase analytic
productivity 20-40% for
data consumers.
Accurate documentation up
to 40% faster with machine
learning.
An interface to quantify the
business value of all data
assets stored.
For more information visit us in Booth #
473
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA CATALOGUES

More Related Content

PPSX
Lean Data Lineage
PDF
A Dynamic Data Catalog for Autonomy and Self-Service
PDF
Data Catalog as a Business Enabler
PDF
Discovering Big Data in the Fog: Why Catalogs Matter
PDF
Lean Data Lineage v10
PDF
Data Discoverability at SpotHero
PDF
The Chief Data Officer: Tomorrow's Corporate Rockstar
PPTX
Data Modeling, Meta Data and Data Lineage Demo - Highlights from 2016 Data Mo...
Lean Data Lineage
A Dynamic Data Catalog for Autonomy and Self-Service
Data Catalog as a Business Enabler
Discovering Big Data in the Fog: Why Catalogs Matter
Lean Data Lineage v10
Data Discoverability at SpotHero
The Chief Data Officer: Tomorrow's Corporate Rockstar
Data Modeling, Meta Data and Data Lineage Demo - Highlights from 2016 Data Mo...

What's hot (20)

PDF
Birst for Recurring Revenue
PDF
Graphically understand and interactively explore your Data Lineage
PDF
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
PDF
Setting Up the Data Lake
PPTX
Data Analytics
PPTX
You Need a Data Catalog. Do You Know Why?
PDF
Data catalog
PDF
Data Preparation of Data Science
PDF
intelligent-data-lake_executive-brief
PPTX
Data analytics
ODP
AtlasCHUG
PDF
Consumer Data Management
PDF
2. Smart Data Discovery
PDF
Information Security Forum (ISF) Congress 2013
PDF
Leveraging Graphs for AI and ML - Alicia Frame, Neo4j
PDF
Introduction to Data To Value Managed Services
PPTX
Tamr Gartner BI and Analytics Summit
PPTX
3 Ways Tableau Improves Predictive Analytics
PDF
Maturing Your Organization's Information Risk Management Strategy
PDF
How I Learned to Stop Worrying and Love Linked Data
Birst for Recurring Revenue
Graphically understand and interactively explore your Data Lineage
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
Setting Up the Data Lake
Data Analytics
You Need a Data Catalog. Do You Know Why?
Data catalog
Data Preparation of Data Science
intelligent-data-lake_executive-brief
Data analytics
AtlasCHUG
Consumer Data Management
2. Smart Data Discovery
Information Security Forum (ISF) Congress 2013
Leveraging Graphs for AI and ML - Alicia Frame, Neo4j
Introduction to Data To Value Managed Services
Tamr Gartner BI and Analytics Summit
3 Ways Tableau Improves Predictive Analytics
Maturing Your Organization's Information Risk Management Strategy
How I Learned to Stop Worrying and Love Linked Data
Ad

Similar to Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA CATALOGUES (20)

PDF
Noise to Signal - The Biggest Problem in Data
PPTX
Data Detectives - Presentation
PDF
Chief Data & Analytics Officer Fall Boston - Presentation
PDF
Self-service Analytic for Business Users-19july2017-final
PPTX
The Value of Pervasive Analytics
PDF
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
PDF
Trends in analytics - Feb 2019
PPTX
Data science / Big Data
PDF
HOW DO BI AND DATA ANALYTICS REVOLUTIONIZE DECISION-MAKING
PPTX
BI, AI/ML, Use Cases, Business Impact and how to get started
PPTX
In-Depth Data Analytics
PDF
2023 Trends in Enterprise Analytics
PPTX
Simplify your analytics strategy
PPTX
Usama Fayyad talk in South Africa: From BigData to Data Science
PPTX
Making advanced analytics work for you
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
PPTX
ETE 2013: Going Big with Big Data...one step at a time
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
PPTX
Turning information chaos into reliable data: Tools and techniques to interpr...
PDF
Augmented Analytics The Future Of Data & Analytics.pdf
Noise to Signal - The Biggest Problem in Data
Data Detectives - Presentation
Chief Data & Analytics Officer Fall Boston - Presentation
Self-service Analytic for Business Users-19july2017-final
The Value of Pervasive Analytics
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
Trends in analytics - Feb 2019
Data science / Big Data
HOW DO BI AND DATA ANALYTICS REVOLUTIONIZE DECISION-MAKING
BI, AI/ML, Use Cases, Business Impact and how to get started
In-Depth Data Analytics
2023 Trends in Enterprise Analytics
Simplify your analytics strategy
Usama Fayyad talk in South Africa: From BigData to Data Science
Making advanced analytics work for you
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
ETE 2013: Going Big with Big Data...one step at a time
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Turning information chaos into reliable data: Tools and techniques to interpr...
Augmented Analytics The Future Of Data & Analytics.pdf
Ad

More from Matt Stubbs (20)

PDF
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
PDF
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
PDF
Blueprint Series: Expedia Partner Solutions, Data Platform
PDF
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
PDF
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
PDF
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
PDF
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
PDF
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
PDF
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
PDF
Big Data LDN 2018: AI VS. GDPR
PDF
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
PDF
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
PDF
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
PDF
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
PDF
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
PDF
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
PDF
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
PDF
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES

Recently uploaded (20)

PDF
Introduction to Business Data Analytics.
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Computer network topology notes for revision
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Foundation of Data Science unit number two notes
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Business Data Analytics.
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Computer network topology notes for revision
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
1_Introduction to advance data techniques.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
Launch Your Data Science Career in Kochi – 2025
Foundation of Data Science unit number two notes
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
oil_refinery_comprehensive_20250804084928 (1).pptx

Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA CATALOGUES

  • 1. From Proliferation to Productivity with a Machine Learning Data Catalog Stephanie McReynolds, Vice President of Marketing @slangenfeld
  • 2. Data is proliferating “data creation will swell to a total of 163 zettabytes by 2025” Enterprises are driving this “enterprises will create 60% of this data” Data consumers are growing even faster • Tableau Users 2018 = 78,000+ • Tableau Users 2014 = 26,000+Source: IDC Thought Leadership Practice Case Study, Data Age 2025
  • 3. “90% of the time that knowledge workers spend in creating new reports is recreating information that already exists.” Source: KMM World
  • 4. The social challenge of self-service analytics • Human data discovery is still in its infancy - Self-service analytics adoption has increased access to analysis - Understanding comes next and then data-driven decisions • We still rely on experts to Find, Understand & Trust data - Data distributed across 100s – 1,000s of sources - Nuances of data are often not documented - Self-service data prep is coming but early • Decision-maker understanding of visualized analytics is often: - Incomplete - Un-actionable - Inaccurate As a result, data-driven decisions are illusive.
  • 5. "By 2020, organizations that offer users access to a curated catalog of internal and external data will realize 2x the business value from analytics investments than those that do not." - Gartner Magic Quadrant for Business Intelligence and Analytics Platforms, 2017
  • 6. UseFind & Understand Trust Technical Metadata Business Metadata Social Metadata Data trains Machines guess Humans confirm Machines scale the learnings Feedback loop !! Machine Learning Data Catalogs
  • 7. Data catalogs deliver break-through drugs to market faster “Data science shouldn’t be confined to mathematicians.” - JEFF KEISLING CHIEF INFORMATION OFFICER
  • 8. The Pfizer data landscape was fragmented • 200B+ data Points • 180+ countries • 1M+ data sources • 500k+ tables • 6M+ columns • 6B+ queries “Data has become a very powerful currency for any company, and what we were finding is that the data was very fragmented.” - JULIE SCHIFFMAN VP OF BUSINESS ANALYTICS
  • 9. The data catalogs linked queries & users by usage
  • 10. Break-through ideas now come down to simple search…
  • 12. …and a diverse team of data literate employees.
  • 13. Business outcomes driven by an Analytics Workbench • Goal: Deliver break-through drugs to market faster • Initiative: Identify rare disease markers, like transthyretin cardiomyopathy heart failures • Often goes undiagnosed because symptoms are similar to more common forms of heart failure • Analytics Workbench used to identify potential candidates • ML algorithms help with diagnosis and identifying clinical trial participants • Solution: Physician education + Targeted clinical trials Alation + Dataiku + Tableau
  • 15. Data Catalog Outcomes to Consider Analyst Productivity Agile Stewardship Infonomics Increase analytic productivity 20-40% for data consumers. Accurate documentation up to 40% faster with machine learning. An interface to quantify the business value of all data assets stored.
  • 16. For more information visit us in Booth # 473