SlideShare a Scribd company logo
Donna Burbank, Managing Director, Global Data Strategy, Ltd.
Amnon Drori, CEO, Octopai
July 24th , 2018
Donna Burbank
Donna is a recognised industry expert in
information management with over 20 years
of experience in data strategy, information
management, data modeling, metadata
management, and enterprise architecture.
Her background is multi-faceted across
consulting, product development, product
management, brand strategy, marketing,
and business leadership.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specializes in the alignment of
business drivers with data-centric
technology. In past roles, she has served in
key brand strategy and product
management roles at CA Technologies and
Embarcadero Technologies for several of the
leading data management products in the
market.
As an active contributor to the data
management community, she is a long time
DAMA International member, Past President
and Advisor to the DAMA Rocky Mountain
chapter, and was recently awarded the
Excellence in Data Management Award from
DAMA International in 2016.
Donna is also an analyst at the Boulder BI
Train Trust (BBBT) where she provides advice
and gains insight on the latest BI and
Analytics software in the market. She was on
several review committees for the Object
Management Group’s for key information
management and process modeling
notations.
She has worked with dozens of Fortune 500
companies worldwide in the Americas,
Europe, Asia, and Africa and speaks regularly
at industry conferences. She has co-
authored two books: Data Modeling for the
Business and Data Modeling Made Simple
with ERwin Data Modeler was a contributor
to the book Metadata Solutions, and is a
regular contributor to industry publications.
She can be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
2Follow on Twitter @donnaburbank
AMNON DRORI
CO-FOUNDER & CEO, OCTOPAI
3
Amnon has over 20 years of leadership experience in technology companies. Before
co-founding Octopai he led sales efforts at companies like Panaya, Zend
Technologies, ModusNovo and Alvarion, and also served as the Chief Revenue
Officer at CoolaData, a big data behavioral analytics platform. Amnon studied
Management and Computer Science at the Open University of Tel Aviv.
Octopai was founded in 2015 by business intelligence professionals that saw a real
pain point in the sector, Octopai’s SaaS solution fully automates metadata
management and analysis, enabling enterprise BI groups to quickly, easily and
accurately find and understand their data for improved reporting accuracy,
regulation compliance, data modeling, data quality and data governance.
Today’s Topic
4
• Companies are drowning in their data — so much data,
from so many different sources.
• They understand that data governance is hugely
important, but many have not grasped the criticality of
metadata in the process.
• Metadata helps with locating data - a must for BI groups
dealing with analytics and business user reporting.
• Automating metadata management for data discovery
and data lineage for BI is critical for enterprise data
governance.
• BI groups use Octopai to locate their data instantly, and to
quickly and accurately visualize and understand the
entire data journey.
The Missing Link in Enterprise Data Governance:
Automated Metadata Management
Agenda
• Data Governance and Metadata: The Critical Link
• The Business Need: Why Metadata is hotter than ever
• New Strategies & Approaches to support the ever-evolving data landscape
• How Octopai can Help
5
DataGovernance & Metadata– the Interdependency
Metadata Management is critical to enforcing Data Governance
Retail
What is Data Governance? 1
Data Governance is the exercise of authority and
control (planning, monitoring, and enforcement)
over the management of data assets.
Metadata
Provides the means to
deliver & enforce
Data Governance
Drives the need for
What is Metadata? 1
Metadata “includes information about technical
and business processes, data rules and constraints,
and logical and physical data structures.”
1 From DAMA DMBOK
Metadata is the “Who, What, Where, Why, When & How” of Data
7
Who What Where Why When How
Who created this
data?
What is the business
definition of this data
element?
Where is this data
stored?
Why are we storing
this data?
When was this data
created?
How is this data
formatted?
(character, numeric,
etc.)
Who is the Steward of
this data?
What are the business
rules for this data?
Where did this data
come from?
What is its usage &
purpose?
When was this data
last updated?
How many databases
or data sources store
this data?
Who is using this
data?
What is the security
level or privacy level
of this data?
Where is this data
used & shared?
What are the business
drivers for using this
data?
How long should it be
stored?
Who “owns” this
data?
What is the
abbreviation or
acronym for this data
element?
Where is the backup
for this data?
When does it need to
be purged/deleted?
Who is regulating or
auditing this data?
What are the technical
naming standards for
database
implementation?
Are there regional
privacy or security
policies that regulate
this data?
Metadata is Data In Context
Metadata is Part of a Larger Enterprise Landscape
8
A Successful Strategy Requires Many Inter-related Disciplines
“Top-Down” alignment with
business priorities
“Bottom-Up” management &
inventory of data sources
Managing the people, process,
policies & culture around data
Coordinating & integrating
disparate data sources
Leveraging & managing data for
strategic advantage
Metadata is Hotter than ever
9
A Growing Trend
In a recent DATAVERSITY survey, over 80% of
respondents stated that:
Metadata is as important, if not more important,
than in the past.
Metadata Management Use Cases
10
• Leading Use Cases were similar in 2016
& 2017, according to two recent
DATAVERSITY surveys1:
• Data Governance
• Data Quality
• Data Warehousing (DW) & Business
Intelligence (BI)
• Master Data Management (MDM)
• 2017 saw growth in:
• Regulation & Audit (e.g. GDPR)
• Master Data Management
1 Emerging Trends in Metadata Management, 2016, DATAVERSITY, by Donna Burbank and Charles Roe
Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe
BI Reporting, Data Governance, and Metadata
11
Total Sales Figures seem wrong
in this report. How were they
calculated?
I need the answer for this
afternoon’s meeting.
Thanks!
Sure!
• With the rise of the data-driven organization, data in business intelligence reports has more visibility than ever.
• This visibility highlights data quality issues, and drives the need for data lineage and governance
Data
Source 1
Reality Can Be Complicated
12
Data
Source 2
Data
Source 1
Data
Source X
The complexity of most data warehouse and BI systems makes manual documentation
feel like a Rube Goldberg diagram!
Data Lineage Automation
• Automated metadata lineage can help show the path from source system to BI report.
• The good news is that many systems have embedded metadata that can be inferred & captured by
automated tools.
13
Audit and Traceability
Sales Report
CUSTOMER
Database Table
CUST
Database Table
CUSTOMER
Database Table
CUSTOMER
Database Table
TBL_C1
Database Table
ETL Tool ETL Tool
Physical Data Model
Physical Data Model
Logical Data Model
Dimensional
Data Model
BI Tool
Total Sales for
Customer X this
Quarter are $1.5M
Unlocking the Details of ETL Mappings
• In addition to high-level data flow mapping, detailed source to target mapping and
transformation is important to business intelligence lineage and governance.
14
The Devil is Often in the Details
Field to Field Mapping is Complex
Cust_No
Cust_Num
Associate ID Customer_Number
Managing & Governing Change
15
Hey man, for that big marketing launch next week,
we’re changing the product name – it’s really cool.
Could you make sure all the reports and systems
show the new name? Thanks! Sure!
• Business today moves quickly, and technical systems need to keep pace.
• A change in one simple field can wreak havoc on downstream systems if not managed carefully -> metadata
management can help govern change and impact analysis.
Impact Analysis & Where Used
• Impact Analysis shows the relationship between data sources to assess the impact of a potential change.
• Driving Agility & Responsiveness
• Reducing Risk
• For example, if I change the length & name of a field, what other systems that are referencing that field will be affected?
• With this roadmap in place, it is easier to assess the impact of a proposed change, significantly reducing development and maintenance
time, and improving overall governance.
16
Proactively Showing the Impact of Change
What happens if I change the name &
length of the “Brand” field?
Brand CHAR(10)
MyBrand VARCHAR(30)
Customer
Database
Oracle
Sales Application
Sales Database
DB2
Staging Area
ETL
Data
Warehouse Sales Report
Data Governance – Overarching Framework
Organization &
People
Process &
Workflows
Data Management &
Measures
Culture &
Communication
Vision & Strategy
Tools & Technology
- Automation is critical
Business Goals &
Objectives
Data Issues &
Challenges
Managing the Complex Interactions between Technology, Process and People
Automated metadata management is a critical foundation for data governance.
Technical Metadata Makes Data Governance Actionable
• Metadata helps align data governance policies and make them actionable in physical systems,
maintaining a lineage & audit trail.
• How was a given field calculated on a report?
• Where is personal information (e.g. PII) used across the organization?
• Etc.
18
Policies & Procedures Business Rules & Definitions Technical Implementation Audit & Lineage
Technical Metadata Lineage makes Data Governance Policies Actionable.
Metadata Matters
Even with today’s advanced hardware & storage options, self-service BI tools, and data
science skills & tools, attention needs to be paid to the quality, context, & structure of data
(aka Metadata)
The absence of commonly understood and shared
metadata and data definitions and the lack of data
governance are cited as the main impediments to the
success of Data Lakes.
Source: Radiant Advisors
71% of interviewees surveyed in larger global
organizations expect data-driven digitization to help
their business grow. But…
• 70% say the biggest barrier is finding the right data
• 62% cite inconsistent data. Source: Stibo Systems
If I have to manually map data
lineage in one more spreadsheet,
I’m going to shoot myself.
Types of Metadata Managed in Today’s Organization1
20
Now Future
• Strong focus on:
• Data Warehousing
• Relational Databases
• Data Models
• Business Glossaries
• Business Intelligence
• ETL Tools
• Focus on Data Warehouse & Relational systems continues.
• With more diverse sources added:
• Big Data Platforms
• Machine Learning/AI
• Semantic Technologies
• NoSQL Platforms
• Legacy Platforms (?!) – retirees?
• Social Media
• Media Files
• Etc.
1 Emerging Trends in Metadata Management, 2016, DATAVERSITY, by Donna Burbank and Charles Roe
Diversity of Sources Makes Metadata Management More Challenging
In the 2017 DATAVERSITY Report on Trends in Data Architecture, two of the top
concerns from respondents around metadata management were:
• Tracking metadata and lineage across heterogeneous environments.
• More automation for metadata management.
21
Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe
Technical Innovation in Metadata Management
Technical Innovation not only has created new sources of metadata.
…but it has created new ways to manage metadata as well.
22
Machine Learning & Metadata Discovery
• Machine Learning offers ways to automate
tedious tasks that may have been done
manually before:
• e.g. Data Mapping
• SSN -> Field1_SSN
• SSN -> Soc_Num
• Etc.
• Machine Learning Pattern Matching
• NNN-NN-NNNN -> Field_X follows this
pattern, it must be a SSN
23
Source kdnuggets.com
• There is a place for both methods:
• Sometimes you want to define specific mapping rules
• Sometimes you want a pattern-matching, discovery-
style approach.
Metadata Discovery Tools
24
• With the ever-growing sources of data…
• …and increased visibility on data due to regulation and business needs…
• Automation is critical for managing the complexity of today’s metadata environment.
• Automated population from common sources (reports, databases, ETL tools, etc.)
• Machine Learning capabilities facilitate mapping & lineage
• Visual data lineage to understand traceability, audit, and impact of change
Metadata Discovery
Tools
Automated Lineage,
Governance & Traceability
ERP
OLAP
Oracle
Teradata
SQL ServerCognos
Informatica
ETL
SAP BO
Tableau
Disrupting Metadata Management with Metadata Automation
25
Build New Processes M&A Fix Broken Processes
Migration/Upgrades Fix Reports Changes & Impact Analysis
JUST SOME OF MANY USE CASES
METADATA IS SCATTERED ALL OVER THE PLACE
DB Sources
Marketing
CRM
ERP
Finance
HR
Other Sources
ETL Tools
Informatica
DataStage (IBM)
SSIS (MS)
Others
(Talend, etc’)
BI groups invest >50% of time and effort to manually find and understand metadata
Octopai – Cross Platform Metadata Management for BI (in IT org)
Data Warehouse
Oracle
Others
(Hadoop, etc’)
SQL Server(MS)
Teradata
Vertica(Big Data)
Reporting & Analysis
Tools
Cognos (IBM)
BO (SAP)
QlikView
Tableau
OBIEE (Oracle)
Others (Sisense, etc’)
SSAS (OLAP)
Demo
28
Questions?
29
Thoughts? Ideas?

More Related Content

PDF
Summary introduction to data engineering
PDF
Lessons in Data Modeling: Data Modeling & MDM
PDF
Data Quality Best Practices
PDF
Data modeling for the business
PPTX
PDF
DAS Slides: Data Quality Best Practices
PDF
Data Architecture Strategies
PDF
Implementing Effective Data Governance
Summary introduction to data engineering
Lessons in Data Modeling: Data Modeling & MDM
Data Quality Best Practices
Data modeling for the business
DAS Slides: Data Quality Best Practices
Data Architecture Strategies
Implementing Effective Data Governance

What's hot (20)

PDF
Modern Data architecture Design
PPTX
Strategic Business Requirements for Master Data Management Systems
PDF
Data Catalog as a Business Enabler
PDF
DAMA CDMP exam cram
PDF
Enterprise Architecture vs. Data Architecture
PDF
Why data governance is the new buzz?
PPTX
Data Governance Best Practices
PPT
Data Governance
PDF
Measuring Data Quality Return on Investment
PPTX
Modern Data Architecture
PDF
DAS Slides: Data Governance - Combining Data Management with Organizational ...
PDF
Mdm: why, when, how
PDF
The ABCs of Treating Data as Product
PDF
Gathering Business Requirements for Data Warehouses
PDF
Overcoming the Challenges of your Master Data Management Journey
PPTX
How to Build & Sustain a Data Governance Operating Model
PPT
Data Architecture for Data Governance
PPT
Why Data Virtualization? An Introduction by Denodo
PPT
Business Impact Analysis
PPT
Data Management Strategies
Modern Data architecture Design
Strategic Business Requirements for Master Data Management Systems
Data Catalog as a Business Enabler
DAMA CDMP exam cram
Enterprise Architecture vs. Data Architecture
Why data governance is the new buzz?
Data Governance Best Practices
Data Governance
Measuring Data Quality Return on Investment
Modern Data Architecture
DAS Slides: Data Governance - Combining Data Management with Organizational ...
Mdm: why, when, how
The ABCs of Treating Data as Product
Gathering Business Requirements for Data Warehouses
Overcoming the Challenges of your Master Data Management Journey
How to Build & Sustain a Data Governance Operating Model
Data Architecture for Data Governance
Why Data Virtualization? An Introduction by Denodo
Business Impact Analysis
Data Management Strategies
Ad

Similar to The Missing Link in Enterprise Data Governance - Automated Metadata Management (20)

PDF
Modern Metadata Strategies
PDF
The Business Value of Metadata for Data Governance
PDF
Best Practices in Metadata Management
PDF
Data Modeling, Data Governance, & Data Quality
PDF
LDM Webinar: Data Modeling & Metadata Management
PDF
Data Modeling & Metadata Management
PDF
dataversitydatagovernanceorgchangeapril2019-190429155809.pdf
PDF
Metadata Strategies
PDF
Best Practices in Metadata Management
PPTX
Data Science Salon 2018 - Building a true enterprise data governance platform...
PDF
Data Governance & Data Architecture - Alignment and Synergies
PDF
DAS Slides: Enterprise Architecture vs. Data Architecture
PDF
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Modeling & Data Integration
PDF
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
PDF
Metadata Strategies - Data Squared
PDF
RungananW-DA&DG 201701 V2.0
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
LDM Webinar: Data Modeling & Business Intelligence
PDF
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Modern Metadata Strategies
The Business Value of Metadata for Data Governance
Best Practices in Metadata Management
Data Modeling, Data Governance, & Data Quality
LDM Webinar: Data Modeling & Metadata Management
Data Modeling & Metadata Management
dataversitydatagovernanceorgchangeapril2019-190429155809.pdf
Metadata Strategies
Best Practices in Metadata Management
Data Science Salon 2018 - Building a true enterprise data governance platform...
Data Governance & Data Architecture - Alignment and Synergies
DAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Modeling & Data Integration
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
Metadata Strategies - Data Squared
RungananW-DA&DG 201701 V2.0
Emerging Trends in Data Architecture – What’s the Next Big Thing?
LDM Webinar: Data Modeling & Business Intelligence
Lessons in Data Modeling: Why a Data Model is an Important Part of Your Data ...
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
MYSQL Presentation for SQL database connectivity
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx

The Missing Link in Enterprise Data Governance - Automated Metadata Management

  • 1. Donna Burbank, Managing Director, Global Data Strategy, Ltd. Amnon Drori, CEO, Octopai July 24th , 2018
  • 2. Donna Burbank Donna is a recognised industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. Her background is multi-faceted across consulting, product development, product management, brand strategy, marketing, and business leadership. She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. In past roles, she has served in key brand strategy and product management roles at CA Technologies and Embarcadero Technologies for several of the leading data management products in the market. As an active contributor to the data management community, she is a long time DAMA International member, Past President and Advisor to the DAMA Rocky Mountain chapter, and was recently awarded the Excellence in Data Management Award from DAMA International in 2016. Donna is also an analyst at the Boulder BI Train Trust (BBBT) where she provides advice and gains insight on the latest BI and Analytics software in the market. She was on several review committees for the Object Management Group’s for key information management and process modeling notations. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co- authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler was a contributor to the book Metadata Solutions, and is a regular contributor to industry publications. She can be reached at donna.burbank@globaldatastrategy.com Donna is based in Boulder, Colorado, USA. 2Follow on Twitter @donnaburbank
  • 3. AMNON DRORI CO-FOUNDER & CEO, OCTOPAI 3 Amnon has over 20 years of leadership experience in technology companies. Before co-founding Octopai he led sales efforts at companies like Panaya, Zend Technologies, ModusNovo and Alvarion, and also served as the Chief Revenue Officer at CoolaData, a big data behavioral analytics platform. Amnon studied Management and Computer Science at the Open University of Tel Aviv. Octopai was founded in 2015 by business intelligence professionals that saw a real pain point in the sector, Octopai’s SaaS solution fully automates metadata management and analysis, enabling enterprise BI groups to quickly, easily and accurately find and understand their data for improved reporting accuracy, regulation compliance, data modeling, data quality and data governance.
  • 4. Today’s Topic 4 • Companies are drowning in their data — so much data, from so many different sources. • They understand that data governance is hugely important, but many have not grasped the criticality of metadata in the process. • Metadata helps with locating data - a must for BI groups dealing with analytics and business user reporting. • Automating metadata management for data discovery and data lineage for BI is critical for enterprise data governance. • BI groups use Octopai to locate their data instantly, and to quickly and accurately visualize and understand the entire data journey. The Missing Link in Enterprise Data Governance: Automated Metadata Management
  • 5. Agenda • Data Governance and Metadata: The Critical Link • The Business Need: Why Metadata is hotter than ever • New Strategies & Approaches to support the ever-evolving data landscape • How Octopai can Help 5
  • 6. DataGovernance & Metadata– the Interdependency Metadata Management is critical to enforcing Data Governance Retail What is Data Governance? 1 Data Governance is the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets. Metadata Provides the means to deliver & enforce Data Governance Drives the need for What is Metadata? 1 Metadata “includes information about technical and business processes, data rules and constraints, and logical and physical data structures.” 1 From DAMA DMBOK
  • 7. Metadata is the “Who, What, Where, Why, When & How” of Data 7 Who What Where Why When How Who created this data? What is the business definition of this data element? Where is this data stored? Why are we storing this data? When was this data created? How is this data formatted? (character, numeric, etc.) Who is the Steward of this data? What are the business rules for this data? Where did this data come from? What is its usage & purpose? When was this data last updated? How many databases or data sources store this data? Who is using this data? What is the security level or privacy level of this data? Where is this data used & shared? What are the business drivers for using this data? How long should it be stored? Who “owns” this data? What is the abbreviation or acronym for this data element? Where is the backup for this data? When does it need to be purged/deleted? Who is regulating or auditing this data? What are the technical naming standards for database implementation? Are there regional privacy or security policies that regulate this data? Metadata is Data In Context
  • 8. Metadata is Part of a Larger Enterprise Landscape 8 A Successful Strategy Requires Many Inter-related Disciplines “Top-Down” alignment with business priorities “Bottom-Up” management & inventory of data sources Managing the people, process, policies & culture around data Coordinating & integrating disparate data sources Leveraging & managing data for strategic advantage
  • 9. Metadata is Hotter than ever 9 A Growing Trend In a recent DATAVERSITY survey, over 80% of respondents stated that: Metadata is as important, if not more important, than in the past.
  • 10. Metadata Management Use Cases 10 • Leading Use Cases were similar in 2016 & 2017, according to two recent DATAVERSITY surveys1: • Data Governance • Data Quality • Data Warehousing (DW) & Business Intelligence (BI) • Master Data Management (MDM) • 2017 saw growth in: • Regulation & Audit (e.g. GDPR) • Master Data Management 1 Emerging Trends in Metadata Management, 2016, DATAVERSITY, by Donna Burbank and Charles Roe Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe
  • 11. BI Reporting, Data Governance, and Metadata 11 Total Sales Figures seem wrong in this report. How were they calculated? I need the answer for this afternoon’s meeting. Thanks! Sure! • With the rise of the data-driven organization, data in business intelligence reports has more visibility than ever. • This visibility highlights data quality issues, and drives the need for data lineage and governance
  • 12. Data Source 1 Reality Can Be Complicated 12 Data Source 2 Data Source 1 Data Source X The complexity of most data warehouse and BI systems makes manual documentation feel like a Rube Goldberg diagram!
  • 13. Data Lineage Automation • Automated metadata lineage can help show the path from source system to BI report. • The good news is that many systems have embedded metadata that can be inferred & captured by automated tools. 13 Audit and Traceability Sales Report CUSTOMER Database Table CUST Database Table CUSTOMER Database Table CUSTOMER Database Table TBL_C1 Database Table ETL Tool ETL Tool Physical Data Model Physical Data Model Logical Data Model Dimensional Data Model BI Tool Total Sales for Customer X this Quarter are $1.5M
  • 14. Unlocking the Details of ETL Mappings • In addition to high-level data flow mapping, detailed source to target mapping and transformation is important to business intelligence lineage and governance. 14 The Devil is Often in the Details Field to Field Mapping is Complex Cust_No Cust_Num Associate ID Customer_Number
  • 15. Managing & Governing Change 15 Hey man, for that big marketing launch next week, we’re changing the product name – it’s really cool. Could you make sure all the reports and systems show the new name? Thanks! Sure! • Business today moves quickly, and technical systems need to keep pace. • A change in one simple field can wreak havoc on downstream systems if not managed carefully -> metadata management can help govern change and impact analysis.
  • 16. Impact Analysis & Where Used • Impact Analysis shows the relationship between data sources to assess the impact of a potential change. • Driving Agility & Responsiveness • Reducing Risk • For example, if I change the length & name of a field, what other systems that are referencing that field will be affected? • With this roadmap in place, it is easier to assess the impact of a proposed change, significantly reducing development and maintenance time, and improving overall governance. 16 Proactively Showing the Impact of Change What happens if I change the name & length of the “Brand” field? Brand CHAR(10) MyBrand VARCHAR(30) Customer Database Oracle Sales Application Sales Database DB2 Staging Area ETL Data Warehouse Sales Report
  • 17. Data Governance – Overarching Framework Organization & People Process & Workflows Data Management & Measures Culture & Communication Vision & Strategy Tools & Technology - Automation is critical Business Goals & Objectives Data Issues & Challenges Managing the Complex Interactions between Technology, Process and People Automated metadata management is a critical foundation for data governance.
  • 18. Technical Metadata Makes Data Governance Actionable • Metadata helps align data governance policies and make them actionable in physical systems, maintaining a lineage & audit trail. • How was a given field calculated on a report? • Where is personal information (e.g. PII) used across the organization? • Etc. 18 Policies & Procedures Business Rules & Definitions Technical Implementation Audit & Lineage Technical Metadata Lineage makes Data Governance Policies Actionable.
  • 19. Metadata Matters Even with today’s advanced hardware & storage options, self-service BI tools, and data science skills & tools, attention needs to be paid to the quality, context, & structure of data (aka Metadata) The absence of commonly understood and shared metadata and data definitions and the lack of data governance are cited as the main impediments to the success of Data Lakes. Source: Radiant Advisors 71% of interviewees surveyed in larger global organizations expect data-driven digitization to help their business grow. But… • 70% say the biggest barrier is finding the right data • 62% cite inconsistent data. Source: Stibo Systems If I have to manually map data lineage in one more spreadsheet, I’m going to shoot myself.
  • 20. Types of Metadata Managed in Today’s Organization1 20 Now Future • Strong focus on: • Data Warehousing • Relational Databases • Data Models • Business Glossaries • Business Intelligence • ETL Tools • Focus on Data Warehouse & Relational systems continues. • With more diverse sources added: • Big Data Platforms • Machine Learning/AI • Semantic Technologies • NoSQL Platforms • Legacy Platforms (?!) – retirees? • Social Media • Media Files • Etc. 1 Emerging Trends in Metadata Management, 2016, DATAVERSITY, by Donna Burbank and Charles Roe
  • 21. Diversity of Sources Makes Metadata Management More Challenging In the 2017 DATAVERSITY Report on Trends in Data Architecture, two of the top concerns from respondents around metadata management were: • Tracking metadata and lineage across heterogeneous environments. • More automation for metadata management. 21 Trends in Data Architecture, 2017, DATAVERSITY, by Donna Burbank and Charles Roe
  • 22. Technical Innovation in Metadata Management Technical Innovation not only has created new sources of metadata. …but it has created new ways to manage metadata as well. 22
  • 23. Machine Learning & Metadata Discovery • Machine Learning offers ways to automate tedious tasks that may have been done manually before: • e.g. Data Mapping • SSN -> Field1_SSN • SSN -> Soc_Num • Etc. • Machine Learning Pattern Matching • NNN-NN-NNNN -> Field_X follows this pattern, it must be a SSN 23 Source kdnuggets.com • There is a place for both methods: • Sometimes you want to define specific mapping rules • Sometimes you want a pattern-matching, discovery- style approach.
  • 24. Metadata Discovery Tools 24 • With the ever-growing sources of data… • …and increased visibility on data due to regulation and business needs… • Automation is critical for managing the complexity of today’s metadata environment. • Automated population from common sources (reports, databases, ETL tools, etc.) • Machine Learning capabilities facilitate mapping & lineage • Visual data lineage to understand traceability, audit, and impact of change Metadata Discovery Tools Automated Lineage, Governance & Traceability ERP OLAP Oracle Teradata SQL ServerCognos Informatica ETL SAP BO Tableau
  • 25. Disrupting Metadata Management with Metadata Automation 25
  • 26. Build New Processes M&A Fix Broken Processes Migration/Upgrades Fix Reports Changes & Impact Analysis JUST SOME OF MANY USE CASES
  • 27. METADATA IS SCATTERED ALL OVER THE PLACE DB Sources Marketing CRM ERP Finance HR Other Sources ETL Tools Informatica DataStage (IBM) SSIS (MS) Others (Talend, etc’) BI groups invest >50% of time and effort to manually find and understand metadata Octopai – Cross Platform Metadata Management for BI (in IT org) Data Warehouse Oracle Others (Hadoop, etc’) SQL Server(MS) Teradata Vertica(Big Data) Reporting & Analysis Tools Cognos (IBM) BO (SAP) QlikView Tableau OBIEE (Oracle) Others (Sisense, etc’) SSAS (OLAP)