SlideShare a Scribd company logo
4
Most read
6
Most read
20
Most read
Conformed Dimensions of Data Quality –
An Organized Approach to Data Quality Measurement
-Dan Myers (dan@DQMatters.com)
8/15/20171
Agenda
Introduction
▪4 W’s
-What are the Dimensions of Data Quality?
-Why do organizations use the dimensions?
-Where to use them in the SDLC?
-Which set do I use?
▪Disagreement in the Industry about Definition & Scope
▪Reasons to Agree Upon a Cross-Industry Standard
▪Conformed Dimensions Website
▪Annual Dimensions of Data Quality Survey & Report
▪Q & A
2
What Are The
Dimensions of
Data Quality?
3
Definition: The Conformed Dimensions of Data Quality are
categories used to characterize data and it’s fitness for use.
Application: These can be applied in any industry to assess,
measure, track and communicate information and data
quality.
Why do I
need the
Dimensions
of Data
Quality?
4
Because they add value:
a) Act as quick reference, checklist, and guide
to quality standards
b) Can be used as framework to segment DQ
efforts
c) Enable people to communicate current and
desired state of data (e.g. survey based)
d) Reuse of existing categories and definitions
enables faster implementation times
e) Match dimensions against a business need
and prioritize which assessments to
complete first1
f) Understand what you will (and will not) get
from assessing each dimension.1
Where are they
used:
• To define DQ
measures on
scorecards,
dashboards
• In conversation
• Embedded in
instructions or on
forms
• Included in
Service Level
Agreements
• Throughout the
SDLC (next slide)
References: 1. McGilvray, 2008 p. 30-31
Where to use
them in the
SDLC?
5
•Innovation starts with the customer and data
•Know thy data (using quality lens ensures success)Ideation
•Improved customer service comes from minor enhancements
•Innovative product development comes from meeting unmet need
(connecting the dots)
Conceptualization &
Initiation
•New application development (eyes wide open about data available)
•Discussion about why need quality->improved process design and
execution
Requirements
•Data models and levels of abstraction accomplished faster
•Strong focus on error handling inevitably benefits functionality too!Design
•If sample data is available, unit testing outcomes are improved
•Data inherently forces focus on desired outcome, not intermediate
functionality
Build
•Similar to build phase, test cases; usability increased with high and low
quality sample data and engages business users more to see their own
data (use cases) during UAT
Test
•Go live is focused on customer and improvement if prior steps include
data quality focus in prior phases. This means faster response to
customers and more agile organization
Go Live &
Support
6
Which set
do I use?
6
Wang & Strong,
1996 or 2002
Or JDQ 2006
Danette
McGilvray
(2008)
English (2009) &
Redman (1996)
ISO/IEC
25012:2008
Dimensions of
Data Quality
Dimensions of
Data Quality
Solid, but Confusion
between Timeliness &
Currency
Practical and
integrated with Ten
Steps MethodologyTM
Strong technical and
logical basis.
Lacks complete
descriptions and
hierarchy
Interconnected to
other ISO standards
=> Strong Systems
Focus.
Lacks hierarchy and
overly context
sensitive
ICIQ Research
Due Oct, 6, 2017
Brian Blake
&
Dan Myers
Title: An Evaluation of the
Conformed Dimensions of Data
Quality in Application to an Existing
Information Quality Privacy-Trust
Research Framework
7
Disagreement in the Industry about Definition & Scope
Confusion between the meaning of each dimension led me to write a series of articles for
Information-Management.com, titled “The Value of Using the Dimensions of Data Quality” (2013)
• Compared six author’s definitions of the dimensions of data quality
• Proposed initial version of a Conformed set of Dimensions of Data Quality
http://dqm.mx
/imdm2013
Reasons to Agree
Upon a Cross-Industry
Standard
8
▪Communication-
-Provide language to communicate DQ requirements
▪Efficiency-
-Enables faster implementation times based on
decreased argument between implementation team
members (local)
-Discourages repetitive philosophical arguments on
the same topic (global)
▪Measurement- if it isn’t measured it can’t be managed
-Consistency between organizations enables
comparisons used to benchmark and improve
-Provides framework to define more detailed
measurements associated with sub-concepts
▪Teaching- Provides a solid framework for teaching
Survey conducted 3 years in a row shows general
consistency of
 45-55% Very Interested Respondents
 29-39% Somewhat Interested Respondents
9
Website
Site:
http:DimensionsOfDataQuality.com
Blog Site:
http:DimensionsOfDataQuality.com/
blog
Blog Signup:
http://eepurl.com/cEgkJj
Summary
Level
Detailed
Level
Blog
Blog
History
Whitepaper Delivered
After Webinar
10
Available for last three years:
2015 || 2016 || 2017
http://dqm.mx/cddq-report2017
Survey
Methodology
How was the survey conducted?
11
▪ Web-based survey over a one-month (April 2017) period of time
▪ Advertised via LinkedIn, Twitter, CDDQ Website, referral and prior-
year sign-up
▪ 48 Complete survey responses
▪ Near-zero dollar budget to promote the survey
▪ There is likely a response bias- given that only respondents
(organizations) aware of the dimensions of data quality concept may
feel comfortable completing the survey, and they may naturally over-
represent organizations that have already implemented a version of
the dimensions.
▪ Where does your organization fit into this picture? Why?
▪ Get a copy of the Conformed Dimensions from the website and start today if
you haven’t already.
12
Results of the
2017 Annual
Dimensions of
Data Quality
Survey
13
▪How well is your set of dimensions governed?
▪Would it help if they were better documented? Consider the
Conformed Dimensions.
Results of
the Survey
▪How many of these dimensions does your organization use?
14
15
Take care with inferring too much meaning in ranks due to small sample size of the survey
as a whole
Results of
the Survey
16
Grouping  List of Industries in Group
Tier 1 
Finance/Banking/Accounting (19.1%)
Healthcare/Medical/Pharmaceutical/Biotech (10.6%)
Insurance/Legal/Real Estate (8.5%)
Tier 2
(Industries at 6%)
Government – State/Retail/Manufacturing/Software Development/Application
Development/Consultant/Business Service/Other
Tier 3
(Industries from 2-4%)
Utilities/Chemicals/Mining/Petroleum/Textiles/Government –
Federal/Media/Entertainment/Transportation/Logistics
Tier 4
(No representation 0%)
Entrepreneur/ISP/Web Host/IT Services
Outsourcer/Education/Government/Military/Public Administration
 
Professional Profile
17
Dan Myers is Principal Info Quality Educator at DQMatters- an
eLearning organization focused on Information Quality training
and consulting.
In previous roles Dan has managed business intelligence teams,
and lead architecture reviews of data management (metadata,
data quality…etc) tools and implemented associated governance
programs. In his role at Farmers Insurance, he authored the
Finance led data governance policies for integration/sourcing,
metadata, and data quality. Previously Dan has worked as an
independent Oracle Certified Professional consultant in both front
and back-end development capacities. Dan's fluency in Japanese
enabled him to work in both the public and private sector in
Japan. Dan received his MBA from the U.S.C. Marshall School of
Business in 2009.
Contact: dan@DQMatters.com
Twitter: @kiwidankun or @dqmatters
18
Q&A
19Version 3.4.1. http://dimensionsofdataquality.com/alldimensions
Conformed
Dimension
(11)
Conformed Dimension Definition Underlying Concepts Non Standard
Terminology for
Dimension
Completeness Completeness measures the degree of population of data values in a data set. Record Population, Attribute Population, Truncation,
Existence
Fill Rate, Coverage,
Usability, Scope
Accuracy Accuracy measures the degree to which data factually represents its
associated real-world object, event, concept or alternatively matches the
agreed upon source(s).
Agree with Real-world, Match to Agreed Source Consistency
Consistency Consistency measures whether or not data is equivalent across systems or
location of storage.
Equivalence of Redundant or Distributed Data, Format
Consistency
Integrity, Concurrence,
Coherence
Validity Validity measures whether a value conforms to a preset standard. Values in Specified Range, Values Conform to Business
Rule, Domain of Predefined Values, Values Conform to Data
Type, Values Conform to Format
Accuracy, Integrity,
Reasonableness,
Compliance
Timeliness Timeliness is a measure of time between when data is expected versus made
available.
Time Expectation for Availability, Manual Float Currency, Lag Time,
Latency, Information Float
Currency Currency measures how quickly data reflects the real-world concept that it
represents.
Current with World it Models Timeliness
Integrity Integrity measures the structural or relational quality of data sets. Referential Integrity, Uniqueness, Cardinality Validity, Duplication
Accessibility Accessibility measures how easy it is to acquire data when needed, how long it
is retained, and how access is controlled.
Ease of Obtaining Data, Access Control, Retention Availability
Precision Precision measures the number of decimal places and rounding of a data
value or level of aggregation.
Precision of Data Value, Granularity Coverage, Detail
Lineage Lineage measures whether factual documentation exists about where data
came from, how it was transformed, where it went and end-to-end graphical
illustration.
Source Documentation, Segment Documentation, Target
Documentation, End-to-End Graphical Documentation
Representation Representation measures ease of understanding data, consistency of
presentation, appropriate media choice, and availability of documentation
(metadata).
Easy to Read & Interpret, Presentation Language, Media
Appropriate, Metadata Availability, Includes Measurement
Units
Presentation
20
Conformed
Dimension
Underlying Concepts Definition of Underlying Concept
Completeness Record Population This measures whether a row is present in a data set (table).
Attribute Population This measures whether a value is present (not null) for an attribute (column).
Truncation This measures whether the value contains all characters of the correct value.
Existence Existence identifies whether a real-life fact has been captured as data.
Accuracy Agree with Real-world Degree that data factually represents its associated real-world object, event, or
concept.
Match to Agreed Source Measure of agreement between data and the source of that data. This is used
when the data represent intangible objects or transactions that can't be
observed visually.
Consistency Equivalence of Redundant
or Distributed Data
The measure of similarity with other sources of data that represent the same
concept.
Format Consistency This measures the conformity of format of the same data in different places.
Logical Consistency Logical consistency measures whether two attributes of related data are
conceptually in agreement, even though they may not record the same
characteristic of a fact.
Validity Values in Specified Range Values must be between some lower number and some higher number.
Values Conform to Business
Rule
Validity measures whether values adhere to some declarative formula.
Domain of Predefined
Values
This is a set of permitted values.
Values Conform to Data
Type
Validity measures whether values have a specific characteristic (e.g. Integer,
Character, Boolean). Data types restrict what values can exist, the operations
that can be use on it, and the way that the data is stored.
Values Conform to Format Validity measures whether the data are arranged or composed in a predefined
way.
Timeliness Time Expectation for
Availability
The measure of time between when data is expected versus made available.
Manual Float Manual float is a measure of the time from when an observation is made to the
point it is recorded in electronic format.
Currency Current with World it Models Data is current if it reflects the present state of the concept it models.
Conformed
Dimension
Underlying Concepts Definition of Underlying Concept
Integrity Referential Integrity Referential integrity measures whether if when a value (foreign key) is
used it must reference an existing key (primary key) in the parent table.
Uniqueness Uniqueness measures whether each fact is uniquely represented.
Cardinality Cardinality describes the relationship between one data set and another,
such as one-to-one, one-to-many, or many-to-many.
Accessibility Ease of Obtaining Data This measures how easy it is to obtain data.
Access Control Access control includes the identification of a person that wants to
access data, authentication of their identity, review and approval to
access required data, and lastly auditing the access of that data.
Retention Retention refers to the period of time that data is kept before being
removed from a database through purge or archive processing.
Precision Precision of Data Value The measure of preciseness of numeric data using decimal places,
rounding and truncation.
Granularity The detail or summary of data defines the granularity measured by the
number of attributes used to represent a single concept.
Lineage Source Documentation Source documentation provides data provenance which describes the
origin of the data.
Segment Documentation Segment documentation provides how data is transformed and
transported from one location to another.
Target Documentation Documentation about the target explains where the data moved to and
how it is stored.
End-to-End Graphical
Documentation
End-to-End documentation provides diagrammatic visual representation
of how the data flows from beginning to end.
Representati
on
Easy to Read & Interpret Illustrations and charts should be self-explanatory and presented with
appropriate labels, providing context.
Presentation Language Data that is represented well is simple but elegantly formed with good
grammar and presented in a standard way.
Media Appropriate The appropriate media (e.g. Web-based, hardcopy, or audio…etc) are
provided.
Metadata Availability Comprehensive descriptions and other information about the
characteristics of the data are provided in plain language.
Includes Measurement
Units
Well represented data includes the scale of measurement, such as
weight, height, distance…etc.
Version 3.4.1. http://dimensionsofdataquality.com/content/list-underlying-concepts

More Related Content

PPTX
TOP_407070357-Data-Governance-Playbook.pptx
PPTX
Capability Model_Data Governance
PDF
Data Modeling, Data Governance, & Data Quality
PPTX
How to Build & Sustain a Data Governance Operating Model
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
PDF
Data Catalogs Are the Answer – What Is the Question?
PPT
Data Classification Presentation
PPTX
Chapter 3: Data Governance
TOP_407070357-Data-Governance-Playbook.pptx
Capability Model_Data Governance
Data Modeling, Data Governance, & Data Quality
How to Build & Sustain a Data Governance Operating Model
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Catalogs Are the Answer – What Is the Question?
Data Classification Presentation
Chapter 3: Data Governance

What's hot (20)

PDF
Convincing Stakeholders Data Governance Is Essential
PDF
How to Strengthen Enterprise Data Governance with Data Quality
PDF
Data Modeling & Metadata Management
PPTX
Data Governance
PDF
Top 10 Artifacts Needed For Data Governance
PPTX
Chapter 1: The Importance of Data Assets
PDF
RWDG Slides: A Complete Set of Data Governance Roles & Responsibilities
PDF
RWDG Slides: Data Governance Roles and Responsibilities
PDF
Ibm data governance framework
PDF
Data Management vs. Data Governance Program
PPTX
Data Governance Workshop
 
PDF
Data Governance Program Powerpoint Presentation Slides
PDF
Glossaries, Dictionaries, and Catalogs Result in Data Governance
PDF
The Non-Invasive Data Governance Framework
PPTX
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
PDF
How to Make a Data Governance Program that Lasts
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Data Governance: Keystone of Information Management Initiatives
PDF
Data Governance and Metadata Management
PDF
Data Profiling, Data Catalogs and Metadata Harmonisation
Convincing Stakeholders Data Governance Is Essential
How to Strengthen Enterprise Data Governance with Data Quality
Data Modeling & Metadata Management
Data Governance
Top 10 Artifacts Needed For Data Governance
Chapter 1: The Importance of Data Assets
RWDG Slides: A Complete Set of Data Governance Roles & Responsibilities
RWDG Slides: Data Governance Roles and Responsibilities
Ibm data governance framework
Data Management vs. Data Governance Program
Data Governance Workshop
 
Data Governance Program Powerpoint Presentation Slides
Glossaries, Dictionaries, and Catalogs Result in Data Governance
The Non-Invasive Data Governance Framework
‏‏‏‏‏‏‏‏‏‏‏‏Chapter 13: Professional Development
How to Make a Data Governance Program that Lasts
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Data Governance: Keystone of Information Management Initiatives
Data Governance and Metadata Management
Data Profiling, Data Catalogs and Metadata Harmonisation
Ad

Similar to Conformed Dimensions of Data Quality – An Organized Approach to Data Quality Measurement (20)

PPTX
Data quality dimensions untangled
PDF
The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King
PDF
A step towards a data quality theory
PPTX
Data Quality
PPTX
Enhancing educational data quality in heterogeneous learning contexts using p...
PDF
Data Quality
PDF
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
PPTX
Data_Quality_Awareness_and_Approach.pptx
PDF
Data Quality
PDF
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
PDF
Analysis of data quality and information quality problems in digital manufact...
PDF
Lean Data Quality Management
ODP
Data quality overview
PPT
Chapter 4 Organizational Aspects of Data Management.ppt
PPTX
Quality key users
PDF
The Great Data Debate (1) Launch of DAMA DQ Dimensions White Paper
PPTX
The New Age Data Quality
PDF
Survival Guide: Taming the Data Quality Beast
PDF
Data Profiling: The First Step to Big Data Quality
PDF
The High Quality Data Gathering System Essay
Data quality dimensions untangled
The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King
A step towards a data quality theory
Data Quality
Enhancing educational data quality in heterogeneous learning contexts using p...
Data Quality
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Data_Quality_Awareness_and_Approach.pptx
Data Quality
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
Analysis of data quality and information quality problems in digital manufact...
Lean Data Quality Management
Data quality overview
Chapter 4 Organizational Aspects of Data Management.ppt
Quality key users
The Great Data Debate (1) Launch of DAMA DQ Dimensions White Paper
The New Age Data Quality
Survival Guide: Taming the Data Quality Beast
Data Profiling: The First Step to Big Data Quality
The High Quality Data Gathering System Essay
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
PDF
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...

Recently uploaded (20)

PPTX
Amazon (Business Studies) management studies
PPTX
Lecture (1)-Introduction.pptx business communication
PPT
Data mining for business intelligence ch04 sharda
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PPTX
5 Stages of group development guide.pptx
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
DOCX
Euro SEO Services 1st 3 General Updates.docx
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PDF
Business model innovation report 2022.pdf
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
IFRS Notes in your pocket for study all the time
PDF
How to Get Business Funding for Small Business Fast
PDF
Nidhal Samdaie CV - International Business Consultant
PPT
Chapter four Project-Preparation material
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Amazon (Business Studies) management studies
Lecture (1)-Introduction.pptx business communication
Data mining for business intelligence ch04 sharda
Power and position in leadershipDOC-20250808-WA0011..pdf
5 Stages of group development guide.pptx
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Reconciliation AND MEMORANDUM RECONCILATION
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Euro SEO Services 1st 3 General Updates.docx
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
New Microsoft PowerPoint Presentation - Copy.pptx
Business model innovation report 2022.pdf
Ôn tập tiếng anh trong kinh doanh nâng cao
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
IFRS Notes in your pocket for study all the time
How to Get Business Funding for Small Business Fast
Nidhal Samdaie CV - International Business Consultant
Chapter four Project-Preparation material
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement

Conformed Dimensions of Data Quality – An Organized Approach to Data Quality Measurement

  • 1. Conformed Dimensions of Data Quality – An Organized Approach to Data Quality Measurement -Dan Myers (dan@DQMatters.com) 8/15/20171
  • 2. Agenda Introduction ▪4 W’s -What are the Dimensions of Data Quality? -Why do organizations use the dimensions? -Where to use them in the SDLC? -Which set do I use? ▪Disagreement in the Industry about Definition & Scope ▪Reasons to Agree Upon a Cross-Industry Standard ▪Conformed Dimensions Website ▪Annual Dimensions of Data Quality Survey & Report ▪Q & A 2
  • 3. What Are The Dimensions of Data Quality? 3 Definition: The Conformed Dimensions of Data Quality are categories used to characterize data and it’s fitness for use. Application: These can be applied in any industry to assess, measure, track and communicate information and data quality.
  • 4. Why do I need the Dimensions of Data Quality? 4 Because they add value: a) Act as quick reference, checklist, and guide to quality standards b) Can be used as framework to segment DQ efforts c) Enable people to communicate current and desired state of data (e.g. survey based) d) Reuse of existing categories and definitions enables faster implementation times e) Match dimensions against a business need and prioritize which assessments to complete first1 f) Understand what you will (and will not) get from assessing each dimension.1 Where are they used: • To define DQ measures on scorecards, dashboards • In conversation • Embedded in instructions or on forms • Included in Service Level Agreements • Throughout the SDLC (next slide) References: 1. McGilvray, 2008 p. 30-31
  • 5. Where to use them in the SDLC? 5 •Innovation starts with the customer and data •Know thy data (using quality lens ensures success)Ideation •Improved customer service comes from minor enhancements •Innovative product development comes from meeting unmet need (connecting the dots) Conceptualization & Initiation •New application development (eyes wide open about data available) •Discussion about why need quality->improved process design and execution Requirements •Data models and levels of abstraction accomplished faster •Strong focus on error handling inevitably benefits functionality too!Design •If sample data is available, unit testing outcomes are improved •Data inherently forces focus on desired outcome, not intermediate functionality Build •Similar to build phase, test cases; usability increased with high and low quality sample data and engages business users more to see their own data (use cases) during UAT Test •Go live is focused on customer and improvement if prior steps include data quality focus in prior phases. This means faster response to customers and more agile organization Go Live & Support
  • 6. 6 Which set do I use? 6 Wang & Strong, 1996 or 2002 Or JDQ 2006 Danette McGilvray (2008) English (2009) & Redman (1996) ISO/IEC 25012:2008 Dimensions of Data Quality Dimensions of Data Quality Solid, but Confusion between Timeliness & Currency Practical and integrated with Ten Steps MethodologyTM Strong technical and logical basis. Lacks complete descriptions and hierarchy Interconnected to other ISO standards => Strong Systems Focus. Lacks hierarchy and overly context sensitive ICIQ Research Due Oct, 6, 2017 Brian Blake & Dan Myers Title: An Evaluation of the Conformed Dimensions of Data Quality in Application to an Existing Information Quality Privacy-Trust Research Framework
  • 7. 7 Disagreement in the Industry about Definition & Scope Confusion between the meaning of each dimension led me to write a series of articles for Information-Management.com, titled “The Value of Using the Dimensions of Data Quality” (2013) • Compared six author’s definitions of the dimensions of data quality • Proposed initial version of a Conformed set of Dimensions of Data Quality http://dqm.mx /imdm2013
  • 8. Reasons to Agree Upon a Cross-Industry Standard 8 ▪Communication- -Provide language to communicate DQ requirements ▪Efficiency- -Enables faster implementation times based on decreased argument between implementation team members (local) -Discourages repetitive philosophical arguments on the same topic (global) ▪Measurement- if it isn’t measured it can’t be managed -Consistency between organizations enables comparisons used to benchmark and improve -Provides framework to define more detailed measurements associated with sub-concepts ▪Teaching- Provides a solid framework for teaching Survey conducted 3 years in a row shows general consistency of  45-55% Very Interested Respondents  29-39% Somewhat Interested Respondents
  • 10. Whitepaper Delivered After Webinar 10 Available for last three years: 2015 || 2016 || 2017 http://dqm.mx/cddq-report2017
  • 11. Survey Methodology How was the survey conducted? 11 ▪ Web-based survey over a one-month (April 2017) period of time ▪ Advertised via LinkedIn, Twitter, CDDQ Website, referral and prior- year sign-up ▪ 48 Complete survey responses ▪ Near-zero dollar budget to promote the survey ▪ There is likely a response bias- given that only respondents (organizations) aware of the dimensions of data quality concept may feel comfortable completing the survey, and they may naturally over- represent organizations that have already implemented a version of the dimensions.
  • 12. ▪ Where does your organization fit into this picture? Why? ▪ Get a copy of the Conformed Dimensions from the website and start today if you haven’t already. 12 Results of the 2017 Annual Dimensions of Data Quality Survey
  • 13. 13 ▪How well is your set of dimensions governed? ▪Would it help if they were better documented? Consider the Conformed Dimensions.
  • 14. Results of the Survey ▪How many of these dimensions does your organization use? 14
  • 15. 15 Take care with inferring too much meaning in ranks due to small sample size of the survey as a whole
  • 16. Results of the Survey 16 Grouping  List of Industries in Group Tier 1  Finance/Banking/Accounting (19.1%) Healthcare/Medical/Pharmaceutical/Biotech (10.6%) Insurance/Legal/Real Estate (8.5%) Tier 2 (Industries at 6%) Government – State/Retail/Manufacturing/Software Development/Application Development/Consultant/Business Service/Other Tier 3 (Industries from 2-4%) Utilities/Chemicals/Mining/Petroleum/Textiles/Government – Federal/Media/Entertainment/Transportation/Logistics Tier 4 (No representation 0%) Entrepreneur/ISP/Web Host/IT Services Outsourcer/Education/Government/Military/Public Administration  
  • 17. Professional Profile 17 Dan Myers is Principal Info Quality Educator at DQMatters- an eLearning organization focused on Information Quality training and consulting. In previous roles Dan has managed business intelligence teams, and lead architecture reviews of data management (metadata, data quality…etc) tools and implemented associated governance programs. In his role at Farmers Insurance, he authored the Finance led data governance policies for integration/sourcing, metadata, and data quality. Previously Dan has worked as an independent Oracle Certified Professional consultant in both front and back-end development capacities. Dan's fluency in Japanese enabled him to work in both the public and private sector in Japan. Dan received his MBA from the U.S.C. Marshall School of Business in 2009. Contact: dan@DQMatters.com Twitter: @kiwidankun or @dqmatters
  • 19. 19Version 3.4.1. http://dimensionsofdataquality.com/alldimensions Conformed Dimension (11) Conformed Dimension Definition Underlying Concepts Non Standard Terminology for Dimension Completeness Completeness measures the degree of population of data values in a data set. Record Population, Attribute Population, Truncation, Existence Fill Rate, Coverage, Usability, Scope Accuracy Accuracy measures the degree to which data factually represents its associated real-world object, event, concept or alternatively matches the agreed upon source(s). Agree with Real-world, Match to Agreed Source Consistency Consistency Consistency measures whether or not data is equivalent across systems or location of storage. Equivalence of Redundant or Distributed Data, Format Consistency Integrity, Concurrence, Coherence Validity Validity measures whether a value conforms to a preset standard. Values in Specified Range, Values Conform to Business Rule, Domain of Predefined Values, Values Conform to Data Type, Values Conform to Format Accuracy, Integrity, Reasonableness, Compliance Timeliness Timeliness is a measure of time between when data is expected versus made available. Time Expectation for Availability, Manual Float Currency, Lag Time, Latency, Information Float Currency Currency measures how quickly data reflects the real-world concept that it represents. Current with World it Models Timeliness Integrity Integrity measures the structural or relational quality of data sets. Referential Integrity, Uniqueness, Cardinality Validity, Duplication Accessibility Accessibility measures how easy it is to acquire data when needed, how long it is retained, and how access is controlled. Ease of Obtaining Data, Access Control, Retention Availability Precision Precision measures the number of decimal places and rounding of a data value or level of aggregation. Precision of Data Value, Granularity Coverage, Detail Lineage Lineage measures whether factual documentation exists about where data came from, how it was transformed, where it went and end-to-end graphical illustration. Source Documentation, Segment Documentation, Target Documentation, End-to-End Graphical Documentation Representation Representation measures ease of understanding data, consistency of presentation, appropriate media choice, and availability of documentation (metadata). Easy to Read & Interpret, Presentation Language, Media Appropriate, Metadata Availability, Includes Measurement Units Presentation
  • 20. 20 Conformed Dimension Underlying Concepts Definition of Underlying Concept Completeness Record Population This measures whether a row is present in a data set (table). Attribute Population This measures whether a value is present (not null) for an attribute (column). Truncation This measures whether the value contains all characters of the correct value. Existence Existence identifies whether a real-life fact has been captured as data. Accuracy Agree with Real-world Degree that data factually represents its associated real-world object, event, or concept. Match to Agreed Source Measure of agreement between data and the source of that data. This is used when the data represent intangible objects or transactions that can't be observed visually. Consistency Equivalence of Redundant or Distributed Data The measure of similarity with other sources of data that represent the same concept. Format Consistency This measures the conformity of format of the same data in different places. Logical Consistency Logical consistency measures whether two attributes of related data are conceptually in agreement, even though they may not record the same characteristic of a fact. Validity Values in Specified Range Values must be between some lower number and some higher number. Values Conform to Business Rule Validity measures whether values adhere to some declarative formula. Domain of Predefined Values This is a set of permitted values. Values Conform to Data Type Validity measures whether values have a specific characteristic (e.g. Integer, Character, Boolean). Data types restrict what values can exist, the operations that can be use on it, and the way that the data is stored. Values Conform to Format Validity measures whether the data are arranged or composed in a predefined way. Timeliness Time Expectation for Availability The measure of time between when data is expected versus made available. Manual Float Manual float is a measure of the time from when an observation is made to the point it is recorded in electronic format. Currency Current with World it Models Data is current if it reflects the present state of the concept it models. Conformed Dimension Underlying Concepts Definition of Underlying Concept Integrity Referential Integrity Referential integrity measures whether if when a value (foreign key) is used it must reference an existing key (primary key) in the parent table. Uniqueness Uniqueness measures whether each fact is uniquely represented. Cardinality Cardinality describes the relationship between one data set and another, such as one-to-one, one-to-many, or many-to-many. Accessibility Ease of Obtaining Data This measures how easy it is to obtain data. Access Control Access control includes the identification of a person that wants to access data, authentication of their identity, review and approval to access required data, and lastly auditing the access of that data. Retention Retention refers to the period of time that data is kept before being removed from a database through purge or archive processing. Precision Precision of Data Value The measure of preciseness of numeric data using decimal places, rounding and truncation. Granularity The detail or summary of data defines the granularity measured by the number of attributes used to represent a single concept. Lineage Source Documentation Source documentation provides data provenance which describes the origin of the data. Segment Documentation Segment documentation provides how data is transformed and transported from one location to another. Target Documentation Documentation about the target explains where the data moved to and how it is stored. End-to-End Graphical Documentation End-to-End documentation provides diagrammatic visual representation of how the data flows from beginning to end. Representati on Easy to Read & Interpret Illustrations and charts should be self-explanatory and presented with appropriate labels, providing context. Presentation Language Data that is represented well is simple but elegantly formed with good grammar and presented in a standard way. Media Appropriate The appropriate media (e.g. Web-based, hardcopy, or audio…etc) are provided. Metadata Availability Comprehensive descriptions and other information about the characteristics of the data are provided in plain language. Includes Measurement Units Well represented data includes the scale of measurement, such as weight, height, distance…etc. Version 3.4.1. http://dimensionsofdataquality.com/content/list-underlying-concepts