SlideShare a Scribd company logo
© Copyright 2021 by Peter Aiken Slide # 1
peter.aiken@anythingawesome.com +1.804.382.5957 Peter Aiken, PhD
Approaching
Data Quality
Engineering Business Success Stories
Peter Aiken, Ph.D.
• I've been doing this a long time
• My work is recognized as useful
• Associate Professor of IS (vcu.edu)
• Institute for Defense Analyses (ida.org)
• DAMA International (dama.org)
• MIT CDO Society (iscdo.org)
• Anything Awesome (plusanythingawesome.com)
• Experienced w/ 500+ data
management practices worldwide
• Multi-year immersions
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart …
• 12 books and
dozens of articles
© Copyright 2021 by Peter Aiken Slide # 2
https://anythingawesome.com
+
• DAMA International President 2009-2013/2018/2020
• DAMA International Achievement Award 2001
(with Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
3
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com
Approaching Data Quality
Engineering Success Stories
Program
• Approaching Data Quality
– Definitions
– Causes can be difficult to discern
– Data quality challenges are the
root cause of most IT and
business failures
– Must be built on leverage
– Requires a programmatic
approach to be most effective
– Early business cases
often have a dual purpose
– High quality data requires
architecture and engineering
• What do we need to get
better at?
– Systems thinking
– Not looking at data
quality in isolation
– Understanding data ROT
– Not underestimating
the role of culture
– Developing repeatable
capabilities/core data
quality expertise
• How do we get better?
– Refocus the request around
business outcomes
– Leadership
– Program focus
– Math (cost or investment?)
– Storytelling/Practice
• Takeaways and Q&A
© Copyright 2021 by Peter Aiken Slide # 4
https://anythingawesome.com https://www.youtube.com/watch?v=uL2PsmlGn9g
Definitions
• Quality Data
– Fit for purpose meets the requirements of its authors,
users, and administrators (from Martin Eppler)
– Synonymous with information quality, since poor data
quality results in inaccurate information and poor performance
• Data Quality Management
– "Planning, implementation and control activities that apply quality
management techniques to measure, assess, improve, and
ensure data quality"
– Encompasses life cycle activities
– Include supporting processes from change management, etc.
– Continuous improvement process requiring core capabilities
• Data Quality Engineering
– Recognition that data quality solutions cannot not managed but
must be engineered
– Data quality engineering concepts are generally not known and understood
within IT or business!
© Copyright 2021 by Peter Aiken Slide # 5
https://anythingawesome.com Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
DQ Effort Pattern
© Copyright 2021 by Peter Aiken Slide # 6
https://anythingawesome.com
from The DAMA Guide to the Data Management Body of Knowledge © 2017 by DAMA International
80% time spent 20%
Hidden Data Factories
© Copyright 2021 by Peter Aiken Slide # 7
https://anythingawesome.com
https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
Work products are
delivered to
Customers
Customers
Knowledge Workers
80% looking for stuff
20% doing useful work
Department B
1. Check A's work
2. Make any corrections
3. Complete B's work
4. Deliver to Department C
Department A
https://en.wikipedia.org/wiki/Theory_of_constraints
Department C
1. Check B's work
2. Make any corrections
3. Complete C's work
4. Deliver to Customer
5. Deal with consequences
Blind Persons and the Elephant
© Copyright 2021 by Peter Aiken Slide # 8
https://anythingawesome.com http://www.dailymirror.lk/print/opinion/editorial-we-need-to-become-channels-of-peace/172-27164
It is like a fan!
It is like a snake!
It is like a wall!
It is like a rope!
It is like a tree!
Events are not always recognized as data quality challenges?
© Copyright 2021 by Peter Aiken Slide # 9
https://anythingawesome.com
• Letters from a bank
• A very expensive, very
small data rounding error
• Health data story
• The chocolate story
• Covid-19
© Copyright 2021 by Peter Aiken Slide # 10
https://anythingawesome.com
A congratulations
letter from another
bank
Problems
• Bank did not know it
made an error
• Tools alone could not
have prevented this
error
• Lost confidence in the
ability of the bank to
manage customer
funds
© Copyright 2021 by Peter Aiken Slide # 11
https://anythingawesome.com
• Needed trench for electrical
cable 2.52" - delivered 2.5"
• $1M required to rent other
facilities while new cable is obtained
• Either rounding or truncation could explain
– We need to get a summary on all of this," he said.
"How did the mistake occur? Who's at fault? What
are the damages? And how is money going to be recovered?"
Port of Seattle
2.52" ➜ 2.5"
© Copyright 2021 by Peter Aiken Slide # 12
https://anythingawesome.com
This research, published as a letter
this week in the British Medical
Journal, was meant to draw attention
to how much data gets entered
incorrectly in the country’s medical
system. These guys weren’t turning
up at the doctor for pregnancy-
related services. Instead, they were
at their doctor for procedures that
had medical codes similar to those of
midwifery and obstetric services.
With a misplaced keystroke here or
there, an annual physical could
become a consultation with a
midwife.
Why Britain has 17,000 pregnant men
here
keystroke
misplaced
there
© Copyright 2021 by Peter Aiken Slide # 13
https://anythingawesome.com
Areyougonnatellusthechocolatestory-again?
Why using Microsoft's tool caused Covid-19 results to be lost
© Copyright 2021 by Peter Aiken Slide # 14
https://anythingawesome.com
https://www.bbc.com/news/technology-54423988?es_p=12801491
© Copyright 2021 by Peter Aiken Slide # 15
https://anythingawesome.com
https://www.bbc.com/news/technology-54423988?es_p=12801491
• Since 2007 should have
been forced to use .xlsx
(1,000,000+ rows)
• Used .xls (65,000 rows)
• Additional data was
dropped without
notification
Practice-Oriented
• Failure in rigor when capturing/
manipulating data
• Allowing imprecise or incorrect data to
be collected when requirements specify
otherwise
• Presenting data out of sequence
Structure-Oriented
• Data and metadata arranged
imperfectly
• Data is captured but inaccessible
• When a incorrect data is provided as
the correct response
Practice-oriented
activities focus on the
capture and manipulation
of data
Data quality best
practices depend
on both
© Copyright 2021 by Peter Aiken Slide # 16
https://anythingawesome.com
Structure-oriented
activities focus on the
data implementation
Quality
"Fit for
purpose"
Data
Poor data manifests as multifaceted organizational challenges
© Copyright 2021 by Peter Aiken Slide # 17
https://anythingawesome.com
Root cause analysis is required to diagnose
© Copyright 2021 by Peter Aiken Slide # 18
https://anythingawesome.com
IT
System
Business
Challenge
Business
Process
Business
Challenge
IT
Process
Business
Challenge
Business
System
Business
Challenge
IT
Process
Business
Challenge
IT
System
Business
Challenge
Business
Process
Business
Challenge
Poor results
Many DQ challenges are unique and/or context specific!
© Copyright 2021 by Peter Aiken Slide # 19
https://anythingawesome.com
Burning Bridge
• Something bad happened
– Imperfect data was to blame
• Someone needs to fix
– Poor quality data
• You currently have
management's attention
– It is wise to ensure you
also have their understanding
• "Do something" often
leads to "Buy something"
– Mostly technology-based
• Get data quality-ing!
– A fool with a tool is still a fool
• Something is accomplished
– Most often all the funding is used up
© Copyright 2021 by Peter Aiken Slide # 20
https://anythingawesome.com
• Early cases have a dual purpose
– Make the case that this will fix the
immediate challenge
– Illustrate why a programmatic approach
is preferable
Leverage is an Engineering Concept
• Using proper engineering techniques, a human can lift a bulk that
is weighs much more than the human
© Copyright 2021 by Peter Aiken Slide # 21
https://anythingawesome.com
1 kg
10 kg
11 kg
A wholistic approach to obtaining data leverage
© Copyright 2021 by Peter Aiken Slide # 22
https://anythingawesome.com
Organizational
Data
Knowledge workers
supplemented by
data professionals
Process
Guided by strategy
https://www.computerhope.com/jargon/f/framework.htm
People
Technology
Reducing ROT increases data leverage
Data Leverage is a multi-use concept
• Permits organizations to better manage their data
– Within the organization, and
– With organizational data exchange partners
– In support of the organizational mission
• Leverage
– Obtained by implementation of data-centric
technologies, processes, and human skill sets
– Focus on the non-ROT data
• The bigger the organization, the greater potential leverage exists
• Treating data more asset-like simultaneously
– Lowers organizational IT costs and
– Increases organizational knowledge worker productivity
© Copyright 2021 by Peter Aiken Slide # 23
https://anythingawesome.com
Concrete example of data leverage
• Reference
– Controls accessible data values
• Master
– Controls access to system capabilities
• Transaction
– Instances of values
© Copyright 2021 by Peter Aiken Slide # 24
https://anythingawesome.com
Countries where we do business?
Types of accounts available?
Controlled vocabulary items
Are you a member of our premium club?
Authorizing uses/users?
Common/standard data structures
$5
Authorized
Like !
Example based on: Dr. Christopher Bradley of DMAdvisors–he has more, ping him at chris.bradley@dmadvisors.co.uk
Cannot do business overseas?
Cannot determine product origin?
Cannot add a foreign language to the website?
Cannot select a valid menu item?
Simple Math
• At the beginning of a project,
• Where the parties know the least about each other
• All are expected to agree on the meaning of price, timing, and
functionalities
• Define X (some resources)
• Define Y (cleaning 1 set of data)
• Define Z (that data will be clean)
© Copyright 2021 by Peter Aiken Slide # 25
https://anythingawesome.com
If X is invested in Y then outcome Z will result (Z > X)
Simple Math
• Define X ($100)
• Define Y (cleaning 1 set of data)
• Define Z ($1000)
© Copyright 2021 by Peter Aiken Slide # 26
https://anythingawesome.com
If $100 is invested in cleaning 1 set of data then outcome $1000 will result
Data is not a Project
• Durable asset
– An asset that has a usable
life more than one year
• Reasonable project
deliverables
– 90 day increments
– Data evolution is measured in years
• Data
– Evolves - it is not created
– Significantly more stable
• Readymade data architectural components
– Prerequisite to agile development
• Only alternative is to create additional data siloes!
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com 27
Differences between Programs and Projects
• Programs are Ongoing, Projects End
– Managing a program involves long term strategic planning and
continuous process improvement is not required of a project
• Programs are Tied to the Financial Calendar
– Program managers are often responsible for delivering
results tied to the organization's financial calendar
• Program Management is Governance Intensive
– Programs are governed by a senior board that provides direction,
oversight, and control while projects tend to be less governance-intensive
• Programs Have Greater Scope of Financial Management
– Projects typically have a straight-forward budget and project financial
management is focused on spending to budget while program planning,
management and control is significantly more complex
• Program Change Management is an Executive Leadership
Capability
– Projects employ a formal change management process while at the program
level, change management requires executive leadership skills and program
change is driven more by an organization's strategy and is subject to market
conditions and changing business goals
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com
Adapted from http://top.idownloadnew.com/program_vs_project/ and http://management.simplicable.com/management/new/program-management-vs-project-management
28
Your data quality
program must last at
least as long as your
HR program!
© Copyright 2021 by Peter Aiken Slide # 29
https://anythingawesome.com
Making a Better Quality Data Sandwich
Data supply
Data literacy
Standard data
Standard data
Leverage point - high performance automation
© Copyright 2021 by Peter Aiken Slide #
Data literacy
30
https://anythingawesome.com
Data supply
Leverage point - high performance automation
© Copyright 2021 by Peter Aiken Slide #
Standard data
Data supply
Data literacy
31
https://anythingawesome.com
Leverage point - high performance automation
© Copyright 2021 by Peter Aiken Slide #
This cannot happen without engineering and architecture!
Quality engineering/
architecture work products
do not happen accidentally!
32
https://anythingawesome.com
Data supply
Data literacy
Standard data
Leverage point - high performance automation
© Copyright 2021 by Peter Aiken Slide #
This cannot happen without data engineering and architecture!
33
https://anythingawesome.com
Quality data engineering/
architecture work products
do not happen accidentally!
Data supply
Data literacy
Standard data
USS Midway
& Pancakes
Why is this an excellent
example of engineering?
• It is tall
• It has a clutch
• It was built in 1942
• It is cemented to the floor
• It is still in regular use!
© Copyright 2021 by Peter Aiken Slide # 34
https://anythingawesome.com
Our barn had to pass a foundation inspection
• Before further construction could proceed
• No IT equivalent
© Copyright 2021 by Peter Aiken Slide # 35
https://anythingawesome.com
https://plusanythingawesome.com
What does is mean "data quality program"
• Ongoing commitment
– Permits evolutionary improvement of the approach
• Governance
– Senior level coordination, direction, and control
• Executive leadership capabilities
– Change and risk management
• Data quality approach inherits (above)
– Budget, strategic priorities
– Senior level attention and improving topical facility
– Reasonable timelines/expectations
© Copyright 2021 by Peter Aiken Slide # 36
https://anythingawesome.com
https://blog.ducenit.com/data-quality-management
37
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com
Approaching Data Quality
Engineering Success Stories
Program
• Approaching Data Quality
– Definitions
– Causes can be difficult to discern
– Data quality challenges are the
root cause of most IT and
business failures
– Must be built on leverage
– Requires a programmatic
approach to be most effective
– Early business cases
often have a dual purpose
– High quality data requires
architecture and engineering
• What do we need to get
better at?
– Systems thinking
– Not looking at data
quality in isolation
– Understanding data ROT
– Not underestimating
the role of culture
– Developing repeatable
capabilities/core data
quality expertise
• How do we get better?
– Refocus the request around
business outcomes
– Leadership
– Program focus
– Math (cost or investment?)
– Storytelling/Practice
• Takeaways and Q&A
Systems Thinking
© Copyright 2021 by Peter Aiken Slide # 38
https://anythingawesome.com
http://victorianscandal.wordpress.com/picturesque/rachel-olshausen/
• A framework that is based on
the belief that the component
parts of a system can best
be understood in the
context of relationships
with other systems, rather than
in isolation.
• The only way to fully
understand why a problem or
element occurs and persists is
to understand the part in
relation to the whole.
Capra, F. (1996) The web of life: a new scientific understanding of living
systems (1st Anchor Books ed). New York: Anchor Books. p. 30
Process
Input ➜ Process ➜ Output Diagram
© Copyright 2021 by Peter Aiken Slide # 39
https://anythingawesome.com
Inputs Outputs
Pizza
Make Pizza
Dough
Water
Pizza
Crust
Make Crust
Make Pizza
Data Steward Quality
Responsibilities
• Inputs
– From where, do
each of these my
responsible data
items come?
– Why are they
produced?
– What level of quality
is required by 'my
processes?'
• Process
– What business
processes use the
data within my
fiduciary
responsibility?
– For what business
purpose do they use
each data item?
– What role does
quality play for my
processes to
contribute?
• Output
– What downstream
business processes
consume data that
was under my
fiduciary care?
– For what purpose
are each data items
consumed?
– What quality
attribute are
required by each
downstream
consumer?
© Copyright 2021 by Peter Aiken Slide # 40
https://anythingawesome.com
Interdependencies
© Copyright 2021 by Peter Aiken Slide # 41
https://anythingawesome.com
Data Governance
ERP
Data Quality
© Copyright 2021 by Peter Aiken Slide # 42
https://anythingawesome.com
Data
Management
Body
of
Knowledge
(DM
BoK
V2)
Practice
Areas
from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International
© Copyright 2021 by Peter Aiken Slide # 43
https://anythingawesome.com
Data
Strategy
Data
Governance
BI/
Warehouse
Perfecting
operations in 3
data management
practice areas
1X
1X
1X
Metadata
Data
Quality
from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International
© Copyright 2021 by Peter Aiken Slide # 44
https://anythingawesome.com
Separating the Wheat from the Chaff
Separating the Wheat from the Chaff
© Copyright 2021 by Peter Aiken Slide #
https://plusanythingawesome.com 45
https://anythingawesome.com
Is well organized data worth more?
Pre-Information Age Metadata
• Examples of information architecture achievements that
happened well before the information age:
– Page numbering
– Alphabetical order
– Table of contents
– Indexes
– Lexicons
– Maps
– Diagrams
© Copyright 2021 by Peter Aiken Slide # 46
https://anythingawesome.com
Example from: How to make sense of any mess
by Abby Covert (2014) ISBN: 1500615994
"While we can arrange things
with the intent to communicate
certain information, we can't
actually make information. Our
users do that for us."
https://www.youtube.com/watch?v=60oD1TDzAXQ&feature=emb_logo
https://www.youtube.com/watch?v=r10Sod44rME&t=1s
https://www.youtube.com/watch?v=XD2OkDPAl6s
https://plusanythingawesome.com
https://plusanythingawesome.com
Remove the structure and things fall apart rapidly
• Better organized data increases in value
© Copyright 2021 by Peter Aiken Slide #
https://plusanythingawesome.com 47
https://anythingawesome.com
https://plusanythingawesome.com
Separating the Wheat from the Chaff
• Data that is better organized increases in value
• Poor data management practices are
costing organizations money/time/effort
• 80% of organizational data is ROT
– Redundant
– Obsolete
– Trivial
• The question is which
data to eliminate?
– Most enterprise data is never analyzed
© Copyright 2021 by Peter Aiken Slide #
https://plusanythingawesome.com 48
https://anythingawesome.com
Multiple Sources of Master/Reference Data
© Copyright 2021 by Peter Aiken Slide #
Payroll Application
(3rd GL)
Payroll Data
(database)
R& D Applications
(researcher supported, no documentation)
R & D
Data
(raw) Mfg. Data
(home grown
database)
Mfg. Applications
(contractor supported)
Marketing Application
(4rd GL, query facilities,
no reporting, very large)
Marketing Data
(external database)
Finance
Data
(indexed)
Finance Application
(3rd GL, batch
system, no source)
Personnel App.
(20 years old,
un-normalized data)
Personnel Data
(database)
49
https://anythingawesome.com
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com 50
https://www.forbes.com/sites/ciocentral/2019/01/02/what-we-learned-from-top-execs-about-their-big-data-and-ai-initiatives/
2020
0%
25%
50%
75%
100%
% of challenges: technology % of challenges: people/process
90%
10%
Culture's impact
• 2019 challenges
– 5% technology
– 95% people/process
• 2020 challenges
– 10% technology
– 95% people/process
Change Management & Leadership
© Copyright 2021 by Peter Aiken Slide # 51
https://anythingawesome.com
Diagnosing Organizational Readiness
© Copyright 2021 by Peter Aiken Slide #
adapted from the Managing Complex Change model by Lippitt, 1987
Culture is the biggest impediment to a
shift in organizational thinking about data!
52
https://anythingawesome.com
Consistency Encourages Quality Analysis
© Copyright 2021 by Peter Aiken Slide # 53
https://anythingawesome.com
IT
System
Business
Challenge
Business
Process
Business
Challenge
IT
Process
Business
Challenge
Business
System
Business
Challenge
IT
Process
Business
Challenge
IT
System
Business
Challenge
Business
Process
Business
Challenge
Eliminating data debt
requires a team with
specialized skills
deployed to create a
repeatable process
and develop sustained
organizational
skillsets
1. Allow the form of the
Problem to guide the
form of the solution
2. Provide a means of
decomposing the problem
3. Feature a variety of tools
simplifying system understanding
4. Offer a set of strategies for evolving a design solution
5. Provide criteria for evaluating the quality of the various solutions
6. Facilitate development of a framework for developing
organizational knowledge.
© Copyright 2021 by Peter Aiken Slide # 54
https://anythingawesome.com
Programmatic
Data Quality
Engineering
Structured Approaches to Data Quality
• Use organizational challenges to
guide the form of quality remediation
• Decompose implementation in a
manner that will be seen by all as
helping to address specific
challenges
• Aid the implementation using a
variety of techniques (not just tools)
• Develop a series of progressively
stronger strategies for addressing
the challenges
• Provide meaningful feedback on
progress
• Facilitate development of a data-
centric framework for
institutionalizing organizational data
quality knowledge
© Copyright 2021 by Peter Aiken Slide # 55
https://anythingawesome.com
© Copyright 2021 by Peter Aiken Slide # 56
https://anythingawesome.com
https://en.wikipedia.org/wiki/Theory_of_constraints
(TOC)
• A management paradigm that views any
manageable system as being limited in
achieving more of its goals by a small
number of constraints(Eliyahu M. Goldratt)
• There is always at least one constraint, and
TOC uses a focusing process to identify the
constraint and restructure the rest of the
organization to address it
• TOC adopts the common idiom "a chain
is no stronger than its weakest link,"
processes, organizations, etc., are
vulnerable because the weakest
component can damage or break them or
at least adversely affect the outcome
The DQE Cycle
© Copyright 2021 by Peter Aiken Slide # 57
https://anythingawesome.com
• Deming cycle
• "Plan-do-study-act" or
"plan-do-check-act"
– Identifying data issues that are
critical to the achievement of
business objectives
– Defining business requirements
for data quality
– Identifying key data quality
dimensions
– Defining business rules critical
to ensuring high quality data
The DQE Cycle: (1) Plan
© Copyright 2021 by Peter Aiken Slide # 58
https://anythingawesome.com
• Plan for the assessment of
the current state and
identification of key metrics
for measuring quality
• The data quality engineering
team assesses the scope of
known issues
– Determining cost and impact
– Evaluating alternatives for
addressing them
The DQE Cycle: (2) Deploy
© Copyright 2021 by Peter Aiken Slide # 59
https://anythingawesome.com
• Deploy processes for
measuring and improving
the quality of data:
• Data profiling
– Institute inspections and
monitors to identify data issues
when they occur
– Fix flawed processes that are
the root cause of data errors or
correct errors downstream
– When it is not possible to
correct errors at their source,
correct them at their earliest
point in the data flow
The DQE Cycle: (3) Monitor
© Copyright 2021 by Peter Aiken Slide # 60
https://anythingawesome.com
• Monitor the quality of data
as measured against the
defined business rules
• If data quality meets defined
thresholds for acceptability,
the processes are in control
and the level of data quality
meets the business
requirements
• If data quality falls below
acceptability thresholds,
notify data stewards so they
can take action during the
next stage
The DQE Cycle: (4) Act
© Copyright 2021 by Peter Aiken Slide # 61
https://anythingawesome.com
• Act to resolve any identified
issues to improve data
quality and better meet
business expectations
• New cycles begin as new
data sets come under
investigation or as new data
quality requirements are
identified for existing data
sets
Starting
point
for new
system
development
data performance metadata
data architecture
data
architecture and
data models
shared data updated data
corrected
data
architecture
refinements
facts &
meanings
Metadata &
Data Storage
Starting point
for existing
systems
Metadata Refinement
• Correct Structural Defects
• Update Implementation
Metadata Creation
• Define Data Architecture
• Define Data Model Structures
Metadata Structuring
• Implement Data Model Views
• Populate Data Model Views
Data Refinement
• Correct Data Value Defects
• Re-store Data Values
Data Manipulation
• Manipulate Data
• Updata Data
Data Utilization
• Inspect Data
• Present Data
Data Creation
• Create Data
• Verify Data Values
Data Assessment
• Assess Data Values
• Assess Metadata
Extended data life cycle model with metadata sources and uses
© Copyright 2021 by Peter Aiken Slide # 62
https://anythingawesome.com
Data Quality Attributes
© Copyright 2021 by Peter Aiken Slide # 63
https://anythingawesome.com
64
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com
Approaching Data Quality
Engineering Success Stories
Program
• Approaching Data Quality
– Definitions
– Causes can be difficult to discern
– Data quality challenges are the
root cause of most IT and
business failures
– Must be built on leverage
– Requires a programmatic
approach to be most effective
– Early business cases
often have a dual purpose
– High quality data requires
architecture and engineering
• What do we need to get
better at?
– Systems thinking
– Not looking at data
quality in isolation
– Understanding data ROT
– Not underestimating
the role of culture
– Developing repeatable
capabilities/core data
quality expertise
• How do we get better?
– Refocus the request around
business outcomes
– Leadership
– Program focus
– Math (cost or investment?)
– Storytelling/Practice
• Takeaways and Q&A
© Copyright 2021 by Peter Aiken Slide # 65
https://anythingawesome.com
Engineers say: Business wants to hear:
Clean some data
Decrease the number of
undeliverable targeted marketing
ads
Reorganize the database
Increase the ability of the
salesforce to
perform their own analyses
Develop a taxonomy
Create a common vocabulary for
the organization
Optimize a query
Shaved 1 second off a task that
runs a billion times a day
Reverse engineer the legacy
system
Understand: what was good
about the old system so it can be
formally preserved and, what
was bad so it can be improved
Compare the utility of data quality conversation topics
CDO Agenda
Inventory Data -> uncovering
assets & decreasing ROT
Develop the first version of an
organizational data strategy
Monetize your organization's data
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com 66
The CDOs goal is to better manage data as an organizational
asset in support of the organizational mission!
Data Asset Inventory (Implementation)
1. Purpose is the goal of understanding, not definitions
– Definitions are passive, purpose statements incorporate strategic elements,
the rationale and justification based on the need for data
2. The sharing of inventoried data assets are categorized as:
A. Data items that are shared with external organizations
B. Data items that are shared within the organization
C. Data items that are not shared but are used to derive shared data items
D. Data items not shared outside but used to support workgroup activities
E. Organizational data ROT
3. Assign each data asset inventoried, an existing subject area from which
that data item best supports the organizational mission (ex. PAY is part of
BACK OFFICE OPERATIONS) – based on (refine-able) purpose statements,
primary subject-area allegiance is posited
4. Identify, de-dupe and harmonize data assets participating in synonyms/
homonym/other challenges - ensure only one item is designated as a
(current) golden source
5. Identify which data items are deemed to be sensitive or personal data items
and what specific controls need to be in place
6. Document all mapping rules for data items in categories 2A and 2B above
© Copyright 2021 by Peter Aiken Slide # 67
https://anythingawesome.com
Note: this exercise cannot be comprehensively performed in a single cycle so equally as important as the exercise itself, a
processing system needs to be established so that as other data items are inevitably discovered, this inventory can be easily updated
$
What is Strategy?
• Current use derived from military
- a pattern in a stream of decisions
[Henry Mintzberg]
© Copyright 2021 by Peter Aiken Slide # 68
https://anythingawesome.com
A thing
Q1
Organizations
without
a formalized
data quality focus
Q4
Data Quality
Focus: both,
simultaneously
Q2
Data Quality
Focus: Increase
organizational
efficiencies/effectiveness
Improve Operations
Innovation
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com 69
Initially pick one or the other but not both
x
x
Q3
Data Quality
Focus: Use data
to create |
strategic |
opportunities |
Math
© Copyright 2021 by Peter Aiken Slide # 70
https://anythingawesome.com
• VCU
– $5m 35 year faculty member
– +$20 million in grants/funded
research projects/student
supplemental salaries
• Collaborations
– Range
– $0 ← (range) →+$1.5 billion
documented savings
–
• My introduction is often:
–Peter is a professor
with a positive
cash flow!
A musical analogy that works for both practice and storytelling
© Copyright 2021 by Peter Aiken Slide #
https://anythingawesome.com 71
+ =
https://www.youtube.com/watch?v=4n1GT-VjjVs&frags=pl%2Cwn
12.5
25
37.5
50
Monday Tuesday Wednesday Thursday Friday
48
24
12
6
3
Pandemic Math Question? (a very bad week)
• If demand at a 48-bed hospital facility is doubling-daily …
• … at what point does anyone notice that the hospital beds are
becoming scarce?
– Monday 3 beds occupied
– Tuesday 6
– Wednesday – ¾ of all beds were available
– Yesterday – ½ of all beds were available
– Today – zero beds available
– Tomorrow …???
© Copyright 2021 by Peter Aiken Slide # 72
https://anythingawesome.com
• Approaching Data Quality
– Definitions
– Causes can be difficult to discern
– Data quality challenges are the
root cause of most IT and
business failures
– Must be built on leverage
– Requires a programmatic
approach to be most effective
– Early business cases
often have a dual purpose
– High quality data requires
architecture and engineering
• What do we need to get
better at?
– Systems thinking
– Not looking at data
quality in isolation
– Understanding data ROT
– Not underestimating
the role of culture
– Developing repeatable
capabilities/core data
quality expertise
• How do we get better?
– Refocus the request around
business outcomes
– Leadership
– Program focus
– Math (cost or investment?)
– Storytelling/Practice
• Takeaways and Q&A
© Copyright 2021 by Peter Aiken Slide # 73
https://anythingawesome.com
Approaching Data Quality
Engineering Success Stories
Program
Famous 1990's Words?
• Question:
– Why haven't organizations taken a
more proactive approach to data quality?
• Answer:
– Fixing data quality problems is not easy
– It is dangerous -- they'll come after you
– Your efforts are likely to be misunderstood
– You could make things worse
– Now you get to fix it
• A single data quality
issue can grow
into a significant,
unexpected
investment
© Copyright 2021 by Peter Aiken Slide # 74
https://anythingawesome.com
© Copyright 2021 by Peter Aiken Slide # 75
https://anythingawesome.com
• Information transparency
• Analytics
• Business Intelligence
• Increasing efficiencies
• Decreasing costs
• Driving holistic decision-making
across the organization
• Information transparency
• Analytics
• Business Intelligence
• Increasing efficiencies
• Decreasing costs
• Driving holistic decision-making
across the organization
High
Quality
Data is
Critical
N
o
t
H
e
l
p
f
u
l
• Information transparency $
• Analytics $
• Business Intelligence $
• Increasing efficiencies $
• Decreasing costs $
• Driving holistic decision-making
across the organization $
Data Quality Dimensions
© Copyright 2021 by Peter Aiken Slide # 76
https://anythingawesome.com
Data Value Quality
© Copyright 2021 by Peter Aiken Slide # 77
https://anythingawesome.com
Data Representation Quality
© Copyright 2021 by Peter Aiken Slide # 78
https://anythingawesome.com
Data Model Quality
© Copyright 2021 by Peter Aiken Slide # 79
https://anythingawesome.com
Data Architecture Quality
© Copyright 2021 by Peter Aiken Slide # 80
https://anythingawesome.com
Upcoming Events
Essential Metadata Strategies
12 October 2021
Necessary Prerequisites to Data Success:
Exorcising the Seven Deadly Data Sins
9 November 2021
Data Management vs. Data Governance Program
14 December 2021
© Copyright 2021 by Peter Aiken Slide # 81
https://anythingawesome.com
Brought to you by:
Time: 19:00 UTC (2:00 PM NYC) | Presented by: Peter Aiken, PhD
Note: In this .pdf, clicking any webinar title opens the registration link
Event Pricing
© Copyright 2021 by Peter Aiken Slide # 82
https://anythingawesome.com
• 20% off
directly from the publisher on
select titles
• My Book Store @
http://plusanythingawesome.com
• Enter the code
"anythingawesome" at the
Technics bookstore checkout
where it says to
"Apply Coupon"
anythingawesome
Peter.Aiken@AnythingAwesome.com +1.804.382.5957
Thank You!
© Copyright 2021 by Peter Aiken Slide # 83
Book a call with Peter to discuss anything - https://anythingawesome.com/OfficeHours.html
+ =
Data
Things
Happen
Organizational
Things
Happen
This approach only works if
• We know where the data that needs to be fixed–resides
• We can communicate precisely and correctly amongst team
members
• We are adept with the correct technological support
• …
© Copyright 2021 by Peter Aiken Slide # 84
https://anythingawesome.com
≈
≈
≈
≈
≈
≈
X
$
X $
X
$
X $
X $
X
$
X
$
X $
X $
© Copyright 2021 by Peter Aiken Slide # 85
https://anythingawesome.com
1 The project needs to be small Projects should not be allowed to
begin unless the data
requirements for the entire
project are verified
2 The product Owner or sponsor
must be highly skilled
Few in IT have the requisite data
skills and knowledge
3 The process must be agile Agile is a construction technique/
data requires more planning
before construction
4 The agile team must be highly
skilled in both the agile process
and the technology
Few agile teams have requisite
levels of data skills
5 The organization must be highly
skilled at emotional maturity
Few organizations understand
data stuff
Winning Cards for data quality program success

More Related Content

PDF
Implementing Effective Data Governance
PPTX
Data Governance Best Practices
PPTX
Data Quality & Data Governance
PDF
Introduction to Data Governance
PDF
Data Governance Best Practices
PDF
Data-Ed Webinar: Data Quality Success Stories
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PPTX
Linking Data Governance to Business Goals
Implementing Effective Data Governance
Data Governance Best Practices
Data Quality & Data Governance
Introduction to Data Governance
Data Governance Best Practices
Data-Ed Webinar: Data Quality Success Stories
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Linking Data Governance to Business Goals

What's hot (20)

PDF
Data Modeling, Data Governance, & Data Quality
PDF
Data Governance
PDF
Data Governance and Metadata Management
PPTX
Data Governance Workshop
 
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
PDF
Improving Data Literacy Around Data Architecture
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
DataMinds 2022 Azure Purview Erwin de Kreuk
PPT
Data Governance
PDF
You Need a Data Catalog. Do You Know Why?
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Data Quality Best Practices
PDF
Data Modeling Techniques
PPTX
TOP_407070357-Data-Governance-Playbook.pptx
PDF
Data modelling 101
PDF
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
PPTX
Enterprise Data Architecture Deliverables
PDF
Enterprise Architecture vs. Data Architecture
PDF
Modern Data architecture Design
PPTX
How to Build & Sustain a Data Governance Operating Model
Data Modeling, Data Governance, & Data Quality
Data Governance
Data Governance and Metadata Management
Data Governance Workshop
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Improving Data Literacy Around Data Architecture
Data Governance Trends and Best Practices To Implement Today
DataMinds 2022 Azure Purview Erwin de Kreuk
Data Governance
You Need a Data Catalog. Do You Know Why?
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Data Quality Best Practices
Data Modeling Techniques
TOP_407070357-Data-Governance-Playbook.pptx
Data modelling 101
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
Enterprise Data Architecture Deliverables
Enterprise Architecture vs. Data Architecture
Modern Data architecture Design
How to Build & Sustain a Data Governance Operating Model
Ad

Similar to Approaching Data Quality (20)

PDF
Data-Ed Online: Approaching Data Quality
PDF
Getting Data Quality Right
PDF
Data Management vs. Data Governance Program
PDF
DataEd Slides: Data Management + Data Strategy = Interoperability
PDF
DataEd Slides: Expressing Data Improvements as Business Outcomes
PDF
Data-Ed Webinar: Data Quality Success Stories
PDF
Data Preparation Fundamentals
PDF
Data-Ed Webinar: Data Quality Engineering
PDF
2014 dqe handouts
PDF
DataEd Slides: Data Management Best Practices
PDF
Data-Ed: Unlock Business Value through Data Quality Engineering
PDF
Data-Ed: Unlock Business Value through Data Quality Engineering
PDF
DataEd Slides: Data Management vs. Data Strategy
DOC
Data Management Strategies - Speakers Notes
PDF
Key Elements of a Successful Data Governance Program
PDF
DataEd Slides: Getting Data Quality Right – Success Stories
PDF
What Is Data Quality.pdf
PDF
Beyond Firefighting: A Leaders Guide to Proactive Data Quality Management
PDF
DataEd Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Online: Approaching Data Quality
Getting Data Quality Right
Data Management vs. Data Governance Program
DataEd Slides: Data Management + Data Strategy = Interoperability
DataEd Slides: Expressing Data Improvements as Business Outcomes
Data-Ed Webinar: Data Quality Success Stories
Data Preparation Fundamentals
Data-Ed Webinar: Data Quality Engineering
2014 dqe handouts
DataEd Slides: Data Management Best Practices
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
DataEd Slides: Data Management vs. Data Strategy
Data Management Strategies - Speakers Notes
Key Elements of a Successful Data Governance Program
DataEd Slides: Getting Data Quality Right – Success Stories
What Is Data Quality.pdf
Beyond Firefighting: A Leaders Guide to Proactive Data Quality Management
DataEd Slides: Exorcising the Seven Deadly Data Sins
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
PDF
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
PDF
Empowering the Data Driven Business with Modern Business Intelligence
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Empowering the Data Driven Business with Modern Business Intelligence

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Managing Community Partner Relationships
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Modelling in Business Intelligence , information system
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Microsoft Core Cloud Services powerpoint
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
annual-report-2024-2025 original latest.
STERILIZATION AND DISINFECTION-1.ppthhhbx
Managing Community Partner Relationships
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Database Infoormation System (DBIS).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
A Complete Guide to Streamlining Business Processes
climate analysis of Dhaka ,Banglades.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
SAP 2 completion done . PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Modelling in Business Intelligence , information system
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Microsoft Core Cloud Services powerpoint
importance of Data-Visualization-in-Data-Science. for mba studnts
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Leprosy and NLEP programme community medicine
annual-report-2024-2025 original latest.

Approaching Data Quality

  • 1. © Copyright 2021 by Peter Aiken Slide # 1 peter.aiken@anythingawesome.com +1.804.382.5957 Peter Aiken, PhD Approaching Data Quality Engineering Business Success Stories Peter Aiken, Ph.D. • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Institute for Defense Analyses (ida.org) • DAMA International (dama.org) • MIT CDO Society (iscdo.org) • Anything Awesome (plusanythingawesome.com) • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart … • 12 books and dozens of articles © Copyright 2021 by Peter Aiken Slide # 2 https://anythingawesome.com + • DAMA International President 2009-2013/2018/2020 • DAMA International Achievement Award 2001 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005
  • 2. 3 © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A © Copyright 2021 by Peter Aiken Slide # 4 https://anythingawesome.com https://www.youtube.com/watch?v=uL2PsmlGn9g
  • 3. Definitions • Quality Data – Fit for purpose meets the requirements of its authors, users, and administrators (from Martin Eppler) – Synonymous with information quality, since poor data quality results in inaccurate information and poor performance • Data Quality Management – "Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure data quality" – Encompasses life cycle activities – Include supporting processes from change management, etc. – Continuous improvement process requiring core capabilities • Data Quality Engineering – Recognition that data quality solutions cannot not managed but must be engineered – Data quality engineering concepts are generally not known and understood within IT or business! © Copyright 2021 by Peter Aiken Slide # 5 https://anythingawesome.com Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166 DQ Effort Pattern © Copyright 2021 by Peter Aiken Slide # 6 https://anythingawesome.com from The DAMA Guide to the Data Management Body of Knowledge © 2017 by DAMA International 80% time spent 20%
  • 4. Hidden Data Factories © Copyright 2021 by Peter Aiken Slide # 7 https://anythingawesome.com https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year Work products are delivered to Customers Customers Knowledge Workers 80% looking for stuff 20% doing useful work Department B 1. Check A's work 2. Make any corrections 3. Complete B's work 4. Deliver to Department C Department A https://en.wikipedia.org/wiki/Theory_of_constraints Department C 1. Check B's work 2. Make any corrections 3. Complete C's work 4. Deliver to Customer 5. Deal with consequences Blind Persons and the Elephant © Copyright 2021 by Peter Aiken Slide # 8 https://anythingawesome.com http://www.dailymirror.lk/print/opinion/editorial-we-need-to-become-channels-of-peace/172-27164 It is like a fan! It is like a snake! It is like a wall! It is like a rope! It is like a tree!
  • 5. Events are not always recognized as data quality challenges? © Copyright 2021 by Peter Aiken Slide # 9 https://anythingawesome.com • Letters from a bank • A very expensive, very small data rounding error • Health data story • The chocolate story • Covid-19 © Copyright 2021 by Peter Aiken Slide # 10 https://anythingawesome.com A congratulations letter from another bank Problems • Bank did not know it made an error • Tools alone could not have prevented this error • Lost confidence in the ability of the bank to manage customer funds
  • 6. © Copyright 2021 by Peter Aiken Slide # 11 https://anythingawesome.com • Needed trench for electrical cable 2.52" - delivered 2.5" • $1M required to rent other facilities while new cable is obtained • Either rounding or truncation could explain – We need to get a summary on all of this," he said. "How did the mistake occur? Who's at fault? What are the damages? And how is money going to be recovered?" Port of Seattle 2.52" ➜ 2.5" © Copyright 2021 by Peter Aiken Slide # 12 https://anythingawesome.com This research, published as a letter this week in the British Medical Journal, was meant to draw attention to how much data gets entered incorrectly in the country’s medical system. These guys weren’t turning up at the doctor for pregnancy- related services. Instead, they were at their doctor for procedures that had medical codes similar to those of midwifery and obstetric services. With a misplaced keystroke here or there, an annual physical could become a consultation with a midwife. Why Britain has 17,000 pregnant men here keystroke misplaced there
  • 7. © Copyright 2021 by Peter Aiken Slide # 13 https://anythingawesome.com Areyougonnatellusthechocolatestory-again? Why using Microsoft's tool caused Covid-19 results to be lost © Copyright 2021 by Peter Aiken Slide # 14 https://anythingawesome.com https://www.bbc.com/news/technology-54423988?es_p=12801491
  • 8. © Copyright 2021 by Peter Aiken Slide # 15 https://anythingawesome.com https://www.bbc.com/news/technology-54423988?es_p=12801491 • Since 2007 should have been forced to use .xlsx (1,000,000+ rows) • Used .xls (65,000 rows) • Additional data was dropped without notification Practice-Oriented • Failure in rigor when capturing/ manipulating data • Allowing imprecise or incorrect data to be collected when requirements specify otherwise • Presenting data out of sequence Structure-Oriented • Data and metadata arranged imperfectly • Data is captured but inaccessible • When a incorrect data is provided as the correct response Practice-oriented activities focus on the capture and manipulation of data Data quality best practices depend on both © Copyright 2021 by Peter Aiken Slide # 16 https://anythingawesome.com Structure-oriented activities focus on the data implementation Quality "Fit for purpose" Data
  • 9. Poor data manifests as multifaceted organizational challenges © Copyright 2021 by Peter Aiken Slide # 17 https://anythingawesome.com Root cause analysis is required to diagnose © Copyright 2021 by Peter Aiken Slide # 18 https://anythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Poor results
  • 10. Many DQ challenges are unique and/or context specific! © Copyright 2021 by Peter Aiken Slide # 19 https://anythingawesome.com Burning Bridge • Something bad happened – Imperfect data was to blame • Someone needs to fix – Poor quality data • You currently have management's attention – It is wise to ensure you also have their understanding • "Do something" often leads to "Buy something" – Mostly technology-based • Get data quality-ing! – A fool with a tool is still a fool • Something is accomplished – Most often all the funding is used up © Copyright 2021 by Peter Aiken Slide # 20 https://anythingawesome.com • Early cases have a dual purpose – Make the case that this will fix the immediate challenge – Illustrate why a programmatic approach is preferable
  • 11. Leverage is an Engineering Concept • Using proper engineering techniques, a human can lift a bulk that is weighs much more than the human © Copyright 2021 by Peter Aiken Slide # 21 https://anythingawesome.com 1 kg 10 kg 11 kg A wholistic approach to obtaining data leverage © Copyright 2021 by Peter Aiken Slide # 22 https://anythingawesome.com Organizational Data Knowledge workers supplemented by data professionals Process Guided by strategy https://www.computerhope.com/jargon/f/framework.htm People Technology Reducing ROT increases data leverage
  • 12. Data Leverage is a multi-use concept • Permits organizations to better manage their data – Within the organization, and – With organizational data exchange partners – In support of the organizational mission • Leverage – Obtained by implementation of data-centric technologies, processes, and human skill sets – Focus on the non-ROT data • The bigger the organization, the greater potential leverage exists • Treating data more asset-like simultaneously – Lowers organizational IT costs and – Increases organizational knowledge worker productivity © Copyright 2021 by Peter Aiken Slide # 23 https://anythingawesome.com Concrete example of data leverage • Reference – Controls accessible data values • Master – Controls access to system capabilities • Transaction – Instances of values © Copyright 2021 by Peter Aiken Slide # 24 https://anythingawesome.com Countries where we do business? Types of accounts available? Controlled vocabulary items Are you a member of our premium club? Authorizing uses/users? Common/standard data structures $5 Authorized Like ! Example based on: Dr. Christopher Bradley of DMAdvisors–he has more, ping him at chris.bradley@dmadvisors.co.uk Cannot do business overseas? Cannot determine product origin? Cannot add a foreign language to the website? Cannot select a valid menu item?
  • 13. Simple Math • At the beginning of a project, • Where the parties know the least about each other • All are expected to agree on the meaning of price, timing, and functionalities • Define X (some resources) • Define Y (cleaning 1 set of data) • Define Z (that data will be clean) © Copyright 2021 by Peter Aiken Slide # 25 https://anythingawesome.com If X is invested in Y then outcome Z will result (Z > X) Simple Math • Define X ($100) • Define Y (cleaning 1 set of data) • Define Z ($1000) © Copyright 2021 by Peter Aiken Slide # 26 https://anythingawesome.com If $100 is invested in cleaning 1 set of data then outcome $1000 will result
  • 14. Data is not a Project • Durable asset – An asset that has a usable life more than one year • Reasonable project deliverables – 90 day increments – Data evolution is measured in years • Data – Evolves - it is not created – Significantly more stable • Readymade data architectural components – Prerequisite to agile development • Only alternative is to create additional data siloes! © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 27 Differences between Programs and Projects • Programs are Ongoing, Projects End – Managing a program involves long term strategic planning and continuous process improvement is not required of a project • Programs are Tied to the Financial Calendar – Program managers are often responsible for delivering results tied to the organization's financial calendar • Program Management is Governance Intensive – Programs are governed by a senior board that provides direction, oversight, and control while projects tend to be less governance-intensive • Programs Have Greater Scope of Financial Management – Projects typically have a straight-forward budget and project financial management is focused on spending to budget while program planning, management and control is significantly more complex • Program Change Management is an Executive Leadership Capability – Projects employ a formal change management process while at the program level, change management requires executive leadership skills and program change is driven more by an organization's strategy and is subject to market conditions and changing business goals © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Adapted from http://top.idownloadnew.com/program_vs_project/ and http://management.simplicable.com/management/new/program-management-vs-project-management 28 Your data quality program must last at least as long as your HR program!
  • 15. © Copyright 2021 by Peter Aiken Slide # 29 https://anythingawesome.com Making a Better Quality Data Sandwich Data supply Data literacy Standard data Standard data Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # Data literacy 30 https://anythingawesome.com Data supply
  • 16. Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # Standard data Data supply Data literacy 31 https://anythingawesome.com Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # This cannot happen without engineering and architecture! Quality engineering/ architecture work products do not happen accidentally! 32 https://anythingawesome.com Data supply Data literacy Standard data
  • 17. Leverage point - high performance automation © Copyright 2021 by Peter Aiken Slide # This cannot happen without data engineering and architecture! 33 https://anythingawesome.com Quality data engineering/ architecture work products do not happen accidentally! Data supply Data literacy Standard data USS Midway & Pancakes Why is this an excellent example of engineering? • It is tall • It has a clutch • It was built in 1942 • It is cemented to the floor • It is still in regular use! © Copyright 2021 by Peter Aiken Slide # 34 https://anythingawesome.com
  • 18. Our barn had to pass a foundation inspection • Before further construction could proceed • No IT equivalent © Copyright 2021 by Peter Aiken Slide # 35 https://anythingawesome.com https://plusanythingawesome.com What does is mean "data quality program" • Ongoing commitment – Permits evolutionary improvement of the approach • Governance – Senior level coordination, direction, and control • Executive leadership capabilities – Change and risk management • Data quality approach inherits (above) – Budget, strategic priorities – Senior level attention and improving topical facility – Reasonable timelines/expectations © Copyright 2021 by Peter Aiken Slide # 36 https://anythingawesome.com https://blog.ducenit.com/data-quality-management
  • 19. 37 © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A Systems Thinking © Copyright 2021 by Peter Aiken Slide # 38 https://anythingawesome.com http://victorianscandal.wordpress.com/picturesque/rachel-olshausen/ • A framework that is based on the belief that the component parts of a system can best be understood in the context of relationships with other systems, rather than in isolation. • The only way to fully understand why a problem or element occurs and persists is to understand the part in relation to the whole. Capra, F. (1996) The web of life: a new scientific understanding of living systems (1st Anchor Books ed). New York: Anchor Books. p. 30
  • 20. Process Input ➜ Process ➜ Output Diagram © Copyright 2021 by Peter Aiken Slide # 39 https://anythingawesome.com Inputs Outputs Pizza Make Pizza Dough Water Pizza Crust Make Crust Make Pizza Data Steward Quality Responsibilities • Inputs – From where, do each of these my responsible data items come? – Why are they produced? – What level of quality is required by 'my processes?' • Process – What business processes use the data within my fiduciary responsibility? – For what business purpose do they use each data item? – What role does quality play for my processes to contribute? • Output – What downstream business processes consume data that was under my fiduciary care? – For what purpose are each data items consumed? – What quality attribute are required by each downstream consumer? © Copyright 2021 by Peter Aiken Slide # 40 https://anythingawesome.com
  • 21. Interdependencies © Copyright 2021 by Peter Aiken Slide # 41 https://anythingawesome.com Data Governance ERP Data Quality © Copyright 2021 by Peter Aiken Slide # 42 https://anythingawesome.com Data Management Body of Knowledge (DM BoK V2) Practice Areas from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International
  • 22. © Copyright 2021 by Peter Aiken Slide # 43 https://anythingawesome.com Data Strategy Data Governance BI/ Warehouse Perfecting operations in 3 data management practice areas 1X 1X 1X Metadata Data Quality from The DAMA Guide to the Data Management Body of Knowledge 2E © 2017 by DAMA International © Copyright 2021 by Peter Aiken Slide # 44 https://anythingawesome.com Separating the Wheat from the Chaff
  • 23. Separating the Wheat from the Chaff © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 45 https://anythingawesome.com Is well organized data worth more? Pre-Information Age Metadata • Examples of information architecture achievements that happened well before the information age: – Page numbering – Alphabetical order – Table of contents – Indexes – Lexicons – Maps – Diagrams © Copyright 2021 by Peter Aiken Slide # 46 https://anythingawesome.com Example from: How to make sense of any mess by Abby Covert (2014) ISBN: 1500615994 "While we can arrange things with the intent to communicate certain information, we can't actually make information. Our users do that for us." https://www.youtube.com/watch?v=60oD1TDzAXQ&feature=emb_logo https://www.youtube.com/watch?v=r10Sod44rME&t=1s https://www.youtube.com/watch?v=XD2OkDPAl6s https://plusanythingawesome.com https://plusanythingawesome.com
  • 24. Remove the structure and things fall apart rapidly • Better organized data increases in value © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 47 https://anythingawesome.com https://plusanythingawesome.com Separating the Wheat from the Chaff • Data that is better organized increases in value • Poor data management practices are costing organizations money/time/effort • 80% of organizational data is ROT – Redundant – Obsolete – Trivial • The question is which data to eliminate? – Most enterprise data is never analyzed © Copyright 2021 by Peter Aiken Slide # https://plusanythingawesome.com 48 https://anythingawesome.com
  • 25. Multiple Sources of Master/Reference Data © Copyright 2021 by Peter Aiken Slide # Payroll Application (3rd GL) Payroll Data (database) R& D Applications (researcher supported, no documentation) R & D Data (raw) Mfg. Data (home grown database) Mfg. Applications (contractor supported) Marketing Application (4rd GL, query facilities, no reporting, very large) Marketing Data (external database) Finance Data (indexed) Finance Application (3rd GL, batch system, no source) Personnel App. (20 years old, un-normalized data) Personnel Data (database) 49 https://anythingawesome.com © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 50 https://www.forbes.com/sites/ciocentral/2019/01/02/what-we-learned-from-top-execs-about-their-big-data-and-ai-initiatives/ 2020 0% 25% 50% 75% 100% % of challenges: technology % of challenges: people/process 90% 10% Culture's impact • 2019 challenges – 5% technology – 95% people/process • 2020 challenges – 10% technology – 95% people/process
  • 26. Change Management & Leadership © Copyright 2021 by Peter Aiken Slide # 51 https://anythingawesome.com Diagnosing Organizational Readiness © Copyright 2021 by Peter Aiken Slide # adapted from the Managing Complex Change model by Lippitt, 1987 Culture is the biggest impediment to a shift in organizational thinking about data! 52 https://anythingawesome.com
  • 27. Consistency Encourages Quality Analysis © Copyright 2021 by Peter Aiken Slide # 53 https://anythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Eliminating data debt requires a team with specialized skills deployed to create a repeatable process and develop sustained organizational skillsets 1. Allow the form of the Problem to guide the form of the solution 2. Provide a means of decomposing the problem 3. Feature a variety of tools simplifying system understanding 4. Offer a set of strategies for evolving a design solution 5. Provide criteria for evaluating the quality of the various solutions 6. Facilitate development of a framework for developing organizational knowledge. © Copyright 2021 by Peter Aiken Slide # 54 https://anythingawesome.com Programmatic Data Quality Engineering
  • 28. Structured Approaches to Data Quality • Use organizational challenges to guide the form of quality remediation • Decompose implementation in a manner that will be seen by all as helping to address specific challenges • Aid the implementation using a variety of techniques (not just tools) • Develop a series of progressively stronger strategies for addressing the challenges • Provide meaningful feedback on progress • Facilitate development of a data- centric framework for institutionalizing organizational data quality knowledge © Copyright 2021 by Peter Aiken Slide # 55 https://anythingawesome.com © Copyright 2021 by Peter Aiken Slide # 56 https://anythingawesome.com https://en.wikipedia.org/wiki/Theory_of_constraints (TOC) • A management paradigm that views any manageable system as being limited in achieving more of its goals by a small number of constraints(Eliyahu M. Goldratt) • There is always at least one constraint, and TOC uses a focusing process to identify the constraint and restructure the rest of the organization to address it • TOC adopts the common idiom "a chain is no stronger than its weakest link," processes, organizations, etc., are vulnerable because the weakest component can damage or break them or at least adversely affect the outcome
  • 29. The DQE Cycle © Copyright 2021 by Peter Aiken Slide # 57 https://anythingawesome.com • Deming cycle • "Plan-do-study-act" or "plan-do-check-act" – Identifying data issues that are critical to the achievement of business objectives – Defining business requirements for data quality – Identifying key data quality dimensions – Defining business rules critical to ensuring high quality data The DQE Cycle: (1) Plan © Copyright 2021 by Peter Aiken Slide # 58 https://anythingawesome.com • Plan for the assessment of the current state and identification of key metrics for measuring quality • The data quality engineering team assesses the scope of known issues – Determining cost and impact – Evaluating alternatives for addressing them
  • 30. The DQE Cycle: (2) Deploy © Copyright 2021 by Peter Aiken Slide # 59 https://anythingawesome.com • Deploy processes for measuring and improving the quality of data: • Data profiling – Institute inspections and monitors to identify data issues when they occur – Fix flawed processes that are the root cause of data errors or correct errors downstream – When it is not possible to correct errors at their source, correct them at their earliest point in the data flow The DQE Cycle: (3) Monitor © Copyright 2021 by Peter Aiken Slide # 60 https://anythingawesome.com • Monitor the quality of data as measured against the defined business rules • If data quality meets defined thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements • If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage
  • 31. The DQE Cycle: (4) Act © Copyright 2021 by Peter Aiken Slide # 61 https://anythingawesome.com • Act to resolve any identified issues to improve data quality and better meet business expectations • New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets Starting point for new system development data performance metadata data architecture data architecture and data models shared data updated data corrected data architecture refinements facts & meanings Metadata & Data Storage Starting point for existing systems Metadata Refinement • Correct Structural Defects • Update Implementation Metadata Creation • Define Data Architecture • Define Data Model Structures Metadata Structuring • Implement Data Model Views • Populate Data Model Views Data Refinement • Correct Data Value Defects • Re-store Data Values Data Manipulation • Manipulate Data • Updata Data Data Utilization • Inspect Data • Present Data Data Creation • Create Data • Verify Data Values Data Assessment • Assess Data Values • Assess Metadata Extended data life cycle model with metadata sources and uses © Copyright 2021 by Peter Aiken Slide # 62 https://anythingawesome.com
  • 32. Data Quality Attributes © Copyright 2021 by Peter Aiken Slide # 63 https://anythingawesome.com 64 © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A
  • 33. © Copyright 2021 by Peter Aiken Slide # 65 https://anythingawesome.com Engineers say: Business wants to hear: Clean some data Decrease the number of undeliverable targeted marketing ads Reorganize the database Increase the ability of the salesforce to perform their own analyses Develop a taxonomy Create a common vocabulary for the organization Optimize a query Shaved 1 second off a task that runs a billion times a day Reverse engineer the legacy system Understand: what was good about the old system so it can be formally preserved and, what was bad so it can be improved Compare the utility of data quality conversation topics CDO Agenda Inventory Data -> uncovering assets & decreasing ROT Develop the first version of an organizational data strategy Monetize your organization's data © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 66 The CDOs goal is to better manage data as an organizational asset in support of the organizational mission!
  • 34. Data Asset Inventory (Implementation) 1. Purpose is the goal of understanding, not definitions – Definitions are passive, purpose statements incorporate strategic elements, the rationale and justification based on the need for data 2. The sharing of inventoried data assets are categorized as: A. Data items that are shared with external organizations B. Data items that are shared within the organization C. Data items that are not shared but are used to derive shared data items D. Data items not shared outside but used to support workgroup activities E. Organizational data ROT 3. Assign each data asset inventoried, an existing subject area from which that data item best supports the organizational mission (ex. PAY is part of BACK OFFICE OPERATIONS) – based on (refine-able) purpose statements, primary subject-area allegiance is posited 4. Identify, de-dupe and harmonize data assets participating in synonyms/ homonym/other challenges - ensure only one item is designated as a (current) golden source 5. Identify which data items are deemed to be sensitive or personal data items and what specific controls need to be in place 6. Document all mapping rules for data items in categories 2A and 2B above © Copyright 2021 by Peter Aiken Slide # 67 https://anythingawesome.com Note: this exercise cannot be comprehensively performed in a single cycle so equally as important as the exercise itself, a processing system needs to be established so that as other data items are inevitably discovered, this inventory can be easily updated $ What is Strategy? • Current use derived from military - a pattern in a stream of decisions [Henry Mintzberg] © Copyright 2021 by Peter Aiken Slide # 68 https://anythingawesome.com A thing
  • 35. Q1 Organizations without a formalized data quality focus Q4 Data Quality Focus: both, simultaneously Q2 Data Quality Focus: Increase organizational efficiencies/effectiveness Improve Operations Innovation © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 69 Initially pick one or the other but not both x x Q3 Data Quality Focus: Use data to create | strategic | opportunities | Math © Copyright 2021 by Peter Aiken Slide # 70 https://anythingawesome.com • VCU – $5m 35 year faculty member – +$20 million in grants/funded research projects/student supplemental salaries • Collaborations – Range – $0 ← (range) →+$1.5 billion documented savings – • My introduction is often: –Peter is a professor with a positive cash flow!
  • 36. A musical analogy that works for both practice and storytelling © Copyright 2021 by Peter Aiken Slide # https://anythingawesome.com 71 + = https://www.youtube.com/watch?v=4n1GT-VjjVs&frags=pl%2Cwn 12.5 25 37.5 50 Monday Tuesday Wednesday Thursday Friday 48 24 12 6 3 Pandemic Math Question? (a very bad week) • If demand at a 48-bed hospital facility is doubling-daily … • … at what point does anyone notice that the hospital beds are becoming scarce? – Monday 3 beds occupied – Tuesday 6 – Wednesday – ¾ of all beds were available – Yesterday – ½ of all beds were available – Today – zero beds available – Tomorrow …??? © Copyright 2021 by Peter Aiken Slide # 72 https://anythingawesome.com
  • 37. • Approaching Data Quality – Definitions – Causes can be difficult to discern – Data quality challenges are the root cause of most IT and business failures – Must be built on leverage – Requires a programmatic approach to be most effective – Early business cases often have a dual purpose – High quality data requires architecture and engineering • What do we need to get better at? – Systems thinking – Not looking at data quality in isolation – Understanding data ROT – Not underestimating the role of culture – Developing repeatable capabilities/core data quality expertise • How do we get better? – Refocus the request around business outcomes – Leadership – Program focus – Math (cost or investment?) – Storytelling/Practice • Takeaways and Q&A © Copyright 2021 by Peter Aiken Slide # 73 https://anythingawesome.com Approaching Data Quality Engineering Success Stories Program Famous 1990's Words? • Question: – Why haven't organizations taken a more proactive approach to data quality? • Answer: – Fixing data quality problems is not easy – It is dangerous -- they'll come after you – Your efforts are likely to be misunderstood – You could make things worse – Now you get to fix it • A single data quality issue can grow into a significant, unexpected investment © Copyright 2021 by Peter Aiken Slide # 74 https://anythingawesome.com
  • 38. © Copyright 2021 by Peter Aiken Slide # 75 https://anythingawesome.com • Information transparency • Analytics • Business Intelligence • Increasing efficiencies • Decreasing costs • Driving holistic decision-making across the organization • Information transparency • Analytics • Business Intelligence • Increasing efficiencies • Decreasing costs • Driving holistic decision-making across the organization High Quality Data is Critical N o t H e l p f u l • Information transparency $ • Analytics $ • Business Intelligence $ • Increasing efficiencies $ • Decreasing costs $ • Driving holistic decision-making across the organization $ Data Quality Dimensions © Copyright 2021 by Peter Aiken Slide # 76 https://anythingawesome.com
  • 39. Data Value Quality © Copyright 2021 by Peter Aiken Slide # 77 https://anythingawesome.com Data Representation Quality © Copyright 2021 by Peter Aiken Slide # 78 https://anythingawesome.com
  • 40. Data Model Quality © Copyright 2021 by Peter Aiken Slide # 79 https://anythingawesome.com Data Architecture Quality © Copyright 2021 by Peter Aiken Slide # 80 https://anythingawesome.com
  • 41. Upcoming Events Essential Metadata Strategies 12 October 2021 Necessary Prerequisites to Data Success: Exorcising the Seven Deadly Data Sins 9 November 2021 Data Management vs. Data Governance Program 14 December 2021 © Copyright 2021 by Peter Aiken Slide # 81 https://anythingawesome.com Brought to you by: Time: 19:00 UTC (2:00 PM NYC) | Presented by: Peter Aiken, PhD Note: In this .pdf, clicking any webinar title opens the registration link Event Pricing © Copyright 2021 by Peter Aiken Slide # 82 https://anythingawesome.com • 20% off directly from the publisher on select titles • My Book Store @ http://plusanythingawesome.com • Enter the code "anythingawesome" at the Technics bookstore checkout where it says to "Apply Coupon" anythingawesome
  • 42. Peter.Aiken@AnythingAwesome.com +1.804.382.5957 Thank You! © Copyright 2021 by Peter Aiken Slide # 83 Book a call with Peter to discuss anything - https://anythingawesome.com/OfficeHours.html + = Data Things Happen Organizational Things Happen This approach only works if • We know where the data that needs to be fixed–resides • We can communicate precisely and correctly amongst team members • We are adept with the correct technological support • … © Copyright 2021 by Peter Aiken Slide # 84 https://anythingawesome.com ≈ ≈ ≈ ≈ ≈ ≈ X $ X $ X $ X $ X $ X $ X $ X $ X $
  • 43. © Copyright 2021 by Peter Aiken Slide # 85 https://anythingawesome.com 1 The project needs to be small Projects should not be allowed to begin unless the data requirements for the entire project are verified 2 The product Owner or sponsor must be highly skilled Few in IT have the requisite data skills and knowledge 3 The process must be agile Agile is a construction technique/ data requires more planning before construction 4 The agile team must be highly skilled in both the agile process and the technology Few agile teams have requisite levels of data skills 5 The organization must be highly skilled at emotional maturity Few organizations understand data stuff Winning Cards for data quality program success