SlideShare a Scribd company logo
© Copyright 2020 by Peter Aiken Slide # 1paiken@plusanythingawesome.com+1.804.382.5957 Peter Aiken, PhD
Getting Data Quality Right
Engineering Success Stories
Peter Aiken, Ph.D.
• I've been doing this a long time
• My work is recognized as useful
• Associate Professor of IS (vcu.edu)
• Founder, Data Blueprint (datablueprint.com)
• DAMA International (dama.org)
• MIT CDO Society (iscdo.org)
• Anything Awesome (plusanythingawesome.com)
• 11 books and dozens of articles
• Experienced w/ 500+ data
management practices worldwide
• Multi-year immersions
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart …
© Copyright 2020 by Peter Aiken Slide # 2https://plusanythingawesome.com
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
3
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
How to solve this data quality problem using just tools?
© Copyright 2020 by Peter Aiken Slide #
Microwave Oven retail price was $40
4https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 5https://plusanythingawesome.com
4/6/12
Why Britain has 17,000
pregnant men
This research, published as a letter
this week in the British Medical
Journal, was meant to draw attention
to how much data gets entered
incorrectly in the country’s medical
system. These guys weren’t turning up
at the doctor for pregnancy-related
services. Instead, they were at their
doctor for procedures that had medical
codes similar to those of midwifery
and obstetric services. With a
misplaced keystroke here or there, an
annual physical could become a
consultation with a midwife.
This research, published as a letter
this week in the British Medical
Journal, was meant to draw attention
to how much data gets entered
incorrectly in the country’s medical
system. These guys weren’t turning up
at the doctor for pregnancy-related
services. Instead, they were at their
doctor for procedures that had medical
codes similar to those of midwifery
and obstetric services. With a
misplaced keystroke here or there, an
annual physical could become a
consultation with a midwife.
© Copyright 2020 by Peter Aiken Slide #https://www.washingtonpost.com/us-policy/2020/06/25/irs-stimulus-checks-dead-people-gao/
https://slate.com/business/2020/06/irs-stimulus-check-dead-people-yawn.html
Payments sent 160,000,000
to dead taxpayers 1,100,000
error rate 0.4%
(substantive?) $1.4 billion
Note: IRS lawyers determined "did not have legal authority to
deny payments to those who filed a return for 2019, even if
they were deceased at the time of payment” and checking
with SSA might have slowed the distribution process down.
From SLATE:
• The real headline here should really
be that the government did its job
pretty well!
• Speed was the priority
• Within two weeks, the IRS delivered
80 million payments electronically
• "Speed is part of the reason that
poorer households were able to start
spending normally again by May
despite an unemployment rate
rivaling the Great Depression’s."
6https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 7https://plusanythingawesome.com
A congratulations
letter from another
bank
Problems
• Bank did not know it
made an error
• Tools alone could not
have prevented this
error
• Lost confidence in the
ability of the bank to
manage customer
funds
• Likely did not have the
ability to respond
Year 2000 (or Y2K) Bug
© Copyright 2020 by Peter Aiken Slide # 8https://plusanythingawesome.com
• Before the internet
- Computing resources were expensive
- It was worth the tradeoff to represent the year
field using two digits
- 1959 was represented to the computer as 59
- Subtracting 59 from 99 yields the correct answer
40 (for dates prior to 2000/01/01!)
- No one expected those programs to still be in use
- Documentation was poorly created/maintained
• If all these fields were not expanded to
four digits before 2000/01/01 then date
calculations will not give correct results
- Subtracting 59 from 00 yields the incorrect
answer -41
- No one knew how long this would take or
cost–only when it must be completed!
On the OFFICIAL Clock of the United States at 1
second BEFORE Midnight showed:
December 31, 1999 11:59:59
One SECOND later the OFFICIAL Clock of the United
States showed:
January 1, 19100 00:00:01
US DoD Reverse Engineering Program Manager
• "Your first project is to keep me from
having to testify to a Congressional Hearing!"
(Belkis Leon-Hong former ASD-C3I)
• Problem:
– 37 systems paid personnel within DoD
– How many were needed?
– How many potential losers?
– What do you mean by employee?
• Process modeling
– Inconclusive results
• Data reverse engineering - definitive
– One legged engineer,
working in waist deep waters,
underneath rotating helicopter blades,
on overtime
© Copyright 2020 by Peter Aiken Slide # 9https://plusanythingawesome.com
Why using Microsoft's tool caused Covid-19 results to be lost
© Copyright 2020 by Peter Aiken Slide # 10https://plusanythingawesome.com
https://www.bbc.com/news/technology-54423988?es_p=12801491
• Since 2007 should have
been forced to use .xlsx
(1,000,000+ rows)
• Used .xls (65,000 rows)
• Additional data was
dropped without
notification
Reasonable individuals disagree on basic definitions
© Copyright 2020 by Peter Aiken Slide # 11https://plusanythingawesome.com
"Data quality
is part of
data management"
"Data management
is part of
data quality"
Poor data manifests as multifaceted organizational challenges
© Copyright 2020 by Peter Aiken Slide # 12https://plusanythingawesome.com
Poor data quality manifests as multifaceted organizational challenges
© Copyright 2020 by Peter Aiken Slide # 13https://plusanythingawesome.com
IT
System
Business
Challenge
Business
Process
Business
Challenge
IT
Process
Business
Challenge
Business
System
Business
Challenge
IT
Process
Business
Challenge
IT
System
Business
Challenge
Business
Process
Business
Challenge
Poor results
The Blind Men and the Elephant
(Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend )
• It was six men of Indostan, To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.
• The First approached the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
"God bless me! but the Elephant
Is very like a wall!"
• The Second, feeling of the tusk
Cried, "Ho! what have we here,
So very round and smooth and sharp? To me `tis mighty clear
This wonder of an Elephant
Is very like a spear!"
• The Third approached the animal,
And happening to take
The squirming trunk within his hands, Thus boldly up he spake:
"I see," quoth he, "the Elephant
Is very like a snake!"
• The Fourth reached out an eager hand, And felt about the knee:
"What most this wondrous beast is like Is mighty plain," quoth he;
"'Tis clear enough the Elephant
Is very like a tree!"
© Copyright 2020 by Peter Aiken Slide #
• The Fifth, who chanced to touch the ear, Said: "E'en
the blindest man
Can tell what this resembles most;
Deny the fact who can,
This marvel of an Elephant
Is very like a fan!"
• The Sixth no sooner had begun
About the beast to grope,
Than, seizing on the swinging tail
That fell within his scope.
"I see," quoth he, "the Elephant
Is very like a rope!"
• And so these men of Indostan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!
14https://plusanythingawesome.com
• Problem:
– Most organizations approach
data quality problems in the same way
that the blind men approached the elephant - people tend to see only the data
that is in front of them
– Little cooperation across boundaries, just as the blind men were unable to
convey their impressions about the elephant to recognize the entire entity.
– Leads to confusion, disputes and narrow views
• Solution:
– Data quality engineering can help achieve a more complete picture and facilitate
cross boundary communications
No universal conception of data
quality exists, instead many
differing perspective compete
© Copyright 2020 by Peter Aiken Slide # 15https://plusanythingawesome.com
Definitions
• Quality Data
– Fit for purpose meets the requirements of its authors, users,
and administrators (adapted from Martin Eppler)
– Synonymous with information quality, since poor data quality
results in inaccurate information and poor business performance
• Data Quality Management
– Planning, implementation and control activities that apply quality
management techniques to measure, assess, improve, and
ensure data quality
– Entails the "establishment and deployment of roles, responsibilities
concerning the acquisition, maintenance, dissemination, and
disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf
✓ Critical supporting process from change management
✓ Continuous process for defining acceptable levels of data quality to meet
business needs and for ensuring that data quality meets these levels
• Data Quality Engineering
– Recognition that data quality solutions cannot not managed but must be engineered
– Engineering is the application of scientific, economic, social, and practical knowledge
in order to design, build, and maintain solutions to data quality challenges
– Engineering concepts are generally not known and understood within IT or business!
© Copyright 2020 by Peter Aiken Slide # 16https://plusanythingawesome.com Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
Quality Data is ...
© Copyright 2020 by Peter Aiken Slide # 17https://plusanythingawesome.com
Fit
For
Purpose
18
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
Separating the Wheat from the Chaff
• Better organized data increases in value
• Data that is better organized increases in value
• Poor data management practices are costing organizations
money/time/effort
• 80% of organizational data is ROT
– Redundant
– Obsolete
– Trivial
• The question is which data to eliminate?
– Most enterprise data is never analyzed
– 54% of data is unidentified (plus)
32% ROT =
14% business critical?
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com 19https://plusanythingawesome.comhttps://plusanythingawesome.com
Data
Data
Data
Information
Fact Meaning
Request
[Built on definitions from Dan Appleton 1983]
Intelligence
Strategic Use
Data
Data
Data Data
A Model Defining 3 Important Concepts
© Copyright 2020 by Peter Aiken Slide # 20https://plusanythingawesome.com
“You can have data without information, but
you cannot have information without data”
— Daniel Keys Moran, Science Fiction Writer
1. Each FACT combines with one or more MEANINGS.
2. Each specific FACT and MEANING combination is referred to as a DATUM.
3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST
4. INFORMATION REUSE is enabled when one FACT is combined with more than one MEANING.
5. INTELLIGENCE is INFORMATION associated with its STRATEGIC USES.
6. DATA/INFORMATION must formally arranged into an ARCHITECTURE.
Wisdom & knowledge are
often used synonymously
Useful Data
My most profound lesson! (so far)
© Copyright 2020 by Peter Aiken Slide # 21https://plusanythingawesome.com
+
© Copyright 2020 by Peter Aiken Slide # 22https://plusanythingawesome.com
Perfect
Model
Poor
Quality
Data
Poor
Results
Data
Warehouse
Machine
Learning
Business
Intelligence
Block ChainAIMDM
Data
Governance
AnalyticsTechnology
© Copyright 2020 by Peter Aiken Slide # 23https://plusanythingawesome.com
Perfect
Model
Poor
Quality
Data
Poor
Results
Data
Warehouse
Machine
Learning
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
Business
Intelligence
© Copyright 2020 by Peter Aiken Slide # 24https://plusanythingawesome.com
Perfect
Model
Quality
Data
Good
Results
Data
Warehouse
Machine
Learning
Business
Intelligence
Block Chain
AI
MDM
Analytics
Technology
Data
Governance
Quality In ➜ Quality Out!
25
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
Data Footprints
• SQL Server
– 47,000,000,000,000 bytes
– Largest table 34 billion records 3.5 TBs
• Informix
– 1,800,000,000 queries/day
– 65,000,000 tables / 517,000 databases
• Teradata
– 117 billion records
– 23 TBs for one table
• DB2
– 29,838,518,078 daily queries
© Copyright 2020 by Peter Aiken Slide # 26https://plusanythingawesome.com
Repeat 100s, thousands, millions of times ...
© Copyright 2020 by Peter Aiken Slide # 27https://plusanythingawesome.com
Death by 1000 Cuts
© Copyright 2020 by Peter Aiken Slide #
W o r k i n g
W h i l e
B l e e d i n g
P r o f u s e l y
D E A T H
B Y A
T H O U S A N D
C U T S
28https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 29https://plusanythingawesome.com
bleeding
unnecessarily
from a lots of
cuts
Working While Bleeding
© Copyright 2020 by Peter Aiken Slide # 30https://plusanythingawesome.com
Making a Better
Data Sandwich
© Copyright 2020 by Peter Aiken Slide # 31https://plusanythingawesome.com
Standard data
Data supply
Data literacy
Making a Better Data Sandwich
© Copyright 2020 by Peter Aiken Slide # 32https://plusanythingawesome.com
Data literacy
Standard data
Data supply
Making a Better Data Sandwich
© Copyright 2020 by Peter Aiken Slide # 33https://plusanythingawesome.com
Standard data
Data supply
Data literacy
Making a Better Data Sandwich
© Copyright 2020 by Peter Aiken Slide # 34https://plusanythingawesome.com
Standard data
Data supply
Data literacy
This cannot happen without engineering and architecture!
Quality engineering/
architecture work products
do not happen accidentally!
Making a Better Data Sandwich
© Copyright 2020 by Peter Aiken Slide # 35https://plusanythingawesome.com
Standard data
Data supply
Data literacy
This cannot happen without data engineering and architecture!
Quality data engineering/
architecture work products
do not happen accidentally!
Engineering
Architecture
Engineering/Architecting
Relationship
• Architecting is used to create
and build systems too complex
to be treated by engineering
analysis alone
– Require technical details as the
exception
• Engineers develop the
technical designs
– Engineering/Crafts-persons deliver
components supervised by:
• Manufacturers
• Building Contractors
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com 36
USS Midway
& Pancakes
What is this?
• It is tall
• It has a clutch
• It was built in 1942
• It is cemented to the floor
• It is still in regular use!
© Copyright 2020 by Peter Aiken Slide # 37https://plusanythingawesome.com
Data should not act as chaff
© Copyright 2020 by Peter Aiken Slide # 38https://plusanythingawesome.com
39
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
© Copyright 2020 by Peter Aiken Slide #
DQ challenges are context specific!
40https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 41https://plusanythingawesome.com
DQ challenges are context specific!
Hidden Data Factories
• Make explicit the extra steps required to correct costly and time-
consuming data errors
© Copyright 2020 by Peter Aiken Slide # 42https://plusanythingawesome.com
https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
Department A
Department B
1. Check A's work
2. Make any corrections
3. Complete B's work
4. Deliver to customers
5. Deal with consequences
Work products
are delivered to
CustomersCustomers
Knowledge Workers
80% looking for stuff
20% doing useful work
© Copyright 2020 by Peter Aiken Slide # 43https://plusanythingawesome.com
Hidden Data Factories
Hidden Data Factories are expensive https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year
• Consider these two questions:
– Were your systems explicitly designed to
be integrated or otherwise work together?
– If not then what is the likelihood that they
will just happen to work well together?
• Data must function at the most granular
interaction or it results in things that:
– Take longer (end-of-day job runs 45 hours)
– Cost more (the wrong assets are transferred)
– Deliver less (features are not delivered)
– Present greater risk (billing delayed 30 days, monthly)
• 20-40% of IT budgets are spent evolving data:
– Data migration (changing the location from one place to another)
– Data conversion (changing it into another form, state, or product)
– Data improvement (inspecting, manipulating it, preparing for subsequent use)
© Copyright 2020 by Peter Aiken Slide # 44https://plusanythingawesome.com
"The choice of data structure and algorithm
can make the difference between software
running in a few seconds or many days."
http://slideplayer.com/slide/7664141/
© Copyright 2020 by Peter Aiken Slide # 45https://plusanythingawesome.com
Great inspiration ...
• How to Measure Anything: Finding the Value of
Intangibles in Business by Douglas Hubbard (ISBN: 0470539399)
• If something can be observed then it can be measured
• Measurement is a reduction in uncertainty
• Formalizing stuff forces clarity
• Specific challenge characteristics
– Whatever your measurement problem is, it's been done before
– You have more data than you think
– You need less data than you think
– You probably need different data than you think
– Getting data is more economical than you think
© Copyright 2020 by Peter Aiken Slide # 46https://plusanythingawesome.com
Enrico Fermi (Nobel Prize Physics 1938)
• How many piano tuners in the city of Chicago?
– Without using existing lists such as yellow pages, google ...
– Current population of Chicago (3 million at the time)
– Average number of people per household (2 or 3)
– Share of households with regularly tuned pianos (1 in 3)
– Required frequency of tuning (1/year)
– How many pianos can a tuner tune daily? (4 or 5)
– How many days/year are worked (250)
© Copyright 2020 by Peter Aiken Slide # 47https://plusanythingawesome.com
• Tuners in Chicago ≈ Population/people per household
times % households with tuned pianos
times tunings per year
divided by (tunings per tuner per day
times workdays/year)
© Copyright 2020 by Peter Aiken Slide # 48https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 49https://plusanythingawesome.com
Technical Business
Clean some data
Decrease the number of
undeliverable targeted
marketing ads
Reorganize the database
Increase the ability of the
salesforce to
perform their own analyses
Develop a taxonomy
Create a common vocabulary
for the organization
Optimize a query
Shaved 1 second off a task that
runs a billion times a day
Reverse engineer
the legacy system
Understand: what was good
about the old system so it can
be formally preserved and,
what was bad so it can be
improved
Compare Story Types
Monetization: Time & Leave Tracking
© Copyright 2020 by Peter Aiken Slide #
At Least 300 employees are
spending 15 minutes/week
tracking leave/time
50https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 51https://plusanythingawesome.com
Capture Cost of Labor/Category
District-L (as an example) Leave Tracking Time Accounting
Employees 73 50
Number of documents 1000 2040
Timesheet/employee 13.7 40.8
Time spent 0.08 0.25
Hourly Cost $6.92 $6.92
Additive Rate $11.23 $11.23
Cost per timekeeper $12.31 $114.56
Total timekeeper cost $898.49 $5,727.89
Monthly cost $21,563.83 $137,469.40
Compute Labor Costs
© Copyright 2020 by Peter Aiken Slide # 52https://plusanythingawesome.com
Annual Organizational Totals
• Range $192,000 - $159,000/month
• $100,000 Salem
• $159,000 Lynchburg
• $100,000 Richmond
• $100,000 Suffolk
• $150,000 Fredericksburg
• $100,000 Staunton
• $100,000 NOVA
• $800,000/month or $9,600,000/annually
• Awareness of the cost of things considered overhead
© Copyright 2020 by Peter Aiken Slide # 53https://plusanythingawesome.com
54
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
Improving Data Quality during System Migration
• Challenge
– Millions of NSN/SKUs
maintained in a catalog
– Key and other data stored in
clear text/comment fields
– Original suggestion was manual
approach to text extraction
– Left the data structuring problem unsolved
• Solution
– Proprietary, improvable text extraction process
– Converted non-tabular data into tabular data
– Saved a minimum of $5 million
– Literally person centuries of work
© Copyright 2020 by Peter Aiken Slide # 55https://plusanythingawesome.com
Unmatched
Items
Ignorable
Items
Items
Matched
Week # (% Total) (% Total) (% Total)
1 31.47% 1.34% N/A
2 21.22% 6.97% N/A
3 20.66% 7.49% N/A
4 32.48% 11.99% 55.53%
… … … …
14 9.02% 22.62% 68.36%
15 9.06% 22.62% 68.33%
16 9.53% 22.62% 67.85%
17 9.5% 22.62% 67.88%
18 7.46% 22.62% 69.92%
Determining Diminishing Returns
© Copyright 2020 by Peter Aiken Slide # 56https://plusanythingawesome.com
Before
After
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
© Copyright 2020 by Peter Aiken Slide # 57https://plusanythingawesome.com
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
© Copyright 2020 by Peter Aiken Slide # 58https://plusanythingawesome.com
Time needed to review all NSNs once over the life of the project:
NSNs 150,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 750,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 750,000
Minutes available person/year 108,000
Total Person-Years 7
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 7
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $420,000
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
© Copyright 2020 by Peter Aiken Slide # 59https://plusanythingawesome.com
Why should a knowledge worker
• with a PhD in Chemical Engineering
• have to know whether this product was
Y2K compliant?
© Copyright 2020 by Peter Aiken Slide # 60https://plusanythingawesome.com
International Chemical Company Engine Testing
• $1billion (+) chemical company
• Develops/manufactures additives
enhancing the performance of oils
and fuels ...
• ... to enhance engine/machine
performance
– Helps fuels burn cleaner
– Engines run smoother
– Machines last longer
• Tens of thousands of
tests annually
– Test costs range
up to $250,000
© Copyright 2020 by Peter Aiken Slide # 61https://plusanythingawesome.com
1.Manual transfer of digital data
2.Manual file movement/duplication
3.Manual data manipulation
4.Disparate synonym reconciliation
5.Tribal knowledge requirements
6.Non-sustainable technology
© Copyright 2020 by Peter Aiken Slide # 62https://plusanythingawesome.com
Improving Knowledge Worker Productivity
Data Integration Solution
• Integrated the existing systems to
easily search on and find similar
or identical tests
• Results:
– Reduced expenses
– Improved competitive edge
and customer service
– Time savings and improve operational capabilities
• According to our client’s internal business
case development, they expect to realize
a $25 million gain each year thanks to
this data integration
© Copyright 2020 by Peter Aiken Slide # 63https://plusanythingawesome.com
• 2 person months = 40 person days
• 683 + 1478 ≈ 2,000 attributes
• 2,000 attributes mapped onto 7,000
• 2,000/40 person days = 50/day
or 50/8 hours = 6.25 attributes/hour
and
• 7,000/40 days = 375/person day
or 375/8 hours = 46.875 attributes/
hour
• 6.25 + 46.875 ≈ 52/60
• Locate, identify, understand, map,
transform, document
• 1 attribute/minute!
Platform: UniSys
OS: OS
1998 Age: 21
Data Structure: DMS (Network)
Physical Records: 4,950,000
Logical Records: 250,000
Relationships: 62
Entities: 57
Attributes: 1478
Legacy System
#1: Payroll
Legacy System
#2: Personnel
Platform: Amdahl
OS: MVS
1998 Age: 15
Data Structure: VSAM/virtual
database tables
Physical Records: 780,000
Logical Records: 60,000
Relationships: 64
Entities: 4/350
Attributes: 683
Characteristics Logical Physical
Platform: WinTel Records: 250,000 600,000
OS: Win'95 Relationships: 1,034 1,020
1998 Age: new Entities: 1,600 2,706
Data Structure: Client/Sever RDBMS Attributes: 15,000 7,073
Predicting Engineering Problem Characteristics
© Copyright 2020 by Peter Aiken Slide # 64https://plusanythingawesome.com
New System
Extreme Data Engineering!
Logistics Company
• Fortune 450
• Room of 100 associates
• Manually correcting every
item on every customer invoice
• Upon noting this to the
responsible manager - the reply:
– This is the best quarter
– Of the best year
– I've ever had
– Perhaps I need
to double the
number in
that room?
© Copyright 2020 by Peter Aiken Slide # 65https://plusanythingawesome.com
66
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
67
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
Motivations for doing more with data
• Because data
points to where
valuable things are
located
• Because data has
intrinsic value by
itself
• Because data
has inherent
combinatorial
value
• Valuing Data
– Use data to measure
change
– Use data to manage
change
– Use data to motivate
change
• Creating a
competitive
advantage with data
© Copyright 2020 by Peter Aiken Slide # 68https://plusanythingawesome.com
Improve your
organization’s data
Improve the way your
people use its data
Improve the way your
data and your people
support your
organizational strategy
What did Rolls Royce Learn
© Copyright 2020 by Peter Aiken Slide # 69https://plusanythingawesome.com
from Nascar?
• Old model
- Sell jet engines
• New model
- Sell hours of powered thrust
- “Power-by-the-hour”
- No payment for down time
- Wing to wing
- When was this new model invented?
https://www.youtube.com/watch?v=RRy_73ivcms
Fan Blade Sensor
• 1 Sensor
– Probabilistic (generalist) maintenance forecasts
• 100 Sensors
– Establish optimal monitoring targets
– Finer tuned and safer maintenance
– Mission Readiness ???
– Storage $$$
– Handling $$$
– Opportunity $$$
– Systemic $$$
– Maintenance $$$
– Total > $1.5 Billion
© Copyright 2020 by Peter Aiken Slide # 70https://plusanythingawesome.com
71
Program
© Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com
Getting Data Quality Right -
Success Stories
• Adopt a broad definition to data quality
• Understand data quality in the
broader context of organizational data use
• Approach data quality as an engineering challenge
• Putting a price on data quality
• Savings based stories
• Innovation based stories
• Non-monetary stories
• Takeaways and Q&A
UbiquitousMysteryObject
© Copyright 2020 by Peter Aiken Slide # 72https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 73https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 74https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide #
Data Mapping
Mental
illness
Deploy
ments
Work
History
Soldier Legal
Issues
Abuse
Suicide
Analysis
FAPDMSS G1 DMDC CID
Data objects
complete?
All sources
identified?
Best source for
each object?
How reconcile
differences
between
sources?
MDR
75https://plusanythingawesome.com
z
© Copyright 2020 by Peter Aiken Slide #
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
76https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 77https://plusanythingawesome.com
Senior Army Official
• Room full of Stewards
• A very heavy dose of management support
• Advised the group of his opinion on the matter
• Any questions as to future direction
– "They should make an appointment to speak directly with me!"
• Empower the team
– The conversation turned from "can this be done?" to
"how are we going to accomplish this?"
– Mistakes along the way would be tolerated
– Implement a workable solution in prototype form
© Copyright 2020 by Peter Aiken Slide # 78https://plusanythingawesome.com
© Copyright 2020 by Peter Aiken Slide # 79https://plusanythingawesome.com
Managing Data with Guidance?
• Federal employees
• 44 users from whitehouse.gov
• Thousands of military and
government e-mails
• Canadian citizens
• One-fifth of Quebec
© Copyright 2020 by Peter Aiken Slide # 80https://plusanythingawesome.com
Ashley
Madison
37,000,000
25,000,000
OPM
70,000,000
Target
How the Government Jeopardized Our National
Security for More than a Generation
© Copyright 2020 by Peter Aiken Slide # 81https://plusanythingawesome.com
Target Corporation's Database Contents
© Copyright 2020 by Peter Aiken Slide #
• Your age
• Marital status
• Part of town you live in
• How long it takes you to drive
to work
• Estimated salary
• If you have recently moved
• Credit cards carried in your
wallet
• What websites you visit
• Your ethnicity
• Your job history
• The magazines you read
• Work commute
• Sexual preferences
• If you’ve ever declared
bankruptcy or got divorced
• The year you bought (or lost)
your house
• Where you went to school(s)
• What kinds of topics you talk
about online
• Whether you prefer certain
brands of coffee, paper
towels, cereal or applesauce
• Your political leanings,
reading habits, charitable
giving and
• The number of cars you own
82https://plusanythingawesome.com
Key Findings
• Preventable
• Leadership failed
– To heed repeated
recommendations
– To sufficiently respond to
growing threats of sophisticated
cyber attacks, and
– To prioritize resources for
cybersecurity
• 2014 data breaches were
likely connected and
possibly coordinated to the
2015 data breach
• OPM misled the public on
the extent of the damage of
the breach and made false
statements to Congress
© Copyright 2020 by Peter Aiken Slide #
https://oversight.house.gov/report/opm-data-breach-government-jeopardized-national-security-generation/
How the Government Jeopardized Our
National Security for More than a
Generation
83https://plusanythingawesome.com
Take Aways
• Quality data requires a context specific definition
• Most business problems have data challenges (hidden data
factories) at their root
• All advanced data practices depend on quality data
• AI/ML are suffering from lack of training data
• Few 'easy' fixes exist
• Successful data quality stories demonstrate
– Tangible ongoing savings
– Innovative data uses
– Outcomes more important than money
© Copyright 2020 by Peter Aiken Slide # 84https://plusanythingawesome.com
References & Recommended Reading
© Copyright 2020 by Peter Aiken Slide # 85https://plusanythingawesome.com
Event Pricing
© Copyright 2020 by Peter Aiken Slide # 86https://plusanythingawesome.com
• 20% off
directly from the publisher on
select titles
• My Book Store @
http://plusanythingawwsome.com
• Enter the code
"anythingawesome" at the
Technics bookstore checkout
where it says to
"Apply Coupon"
Necessary Prerequisites to Data Success:
Exorcising the Seven Deadly Data Sins
8 December 2020
Data Strategy:
Plans Are Useless but Planning is Invaluable
14 January 2021
Data Management Best Practices/
Practicing Data Management Better
9 February 2021
© Copyright 2020 by Peter Aiken Slide # 87
Brought to you by:
Upcoming Events (All webinars begin @ 17:00 UTC/2:00 PM NYC)
https://plusanythingawesome.com
peter@plusanythingawesome.com +1.804.382.5957
Questions?
Thank You!
© Copyright 2020 by Peter Aiken Slide # 88
+ =
Book a call with Peter to discuss anything - https://plusanythingawesome.com/OfficeHours.html

More Related Content

PDF
Essential Metadata Strategies
PDF
DataEd Slides: Expressing Data Improvements as Business Outcomes
PDF
Data-Ed Webinar: Data Quality Success Stories
PDF
Data-Ed Online Webinar: Business Value from MDM
PDF
Data-Ed Online: Making the Case for Data Governance
PPTX
Data analytics introduction
PDF
Data Systems Integration & Business Value PT. 3: Warehousing
PDF
DataEd Slides: Exorcising the Seven Deadly Data Sins
Essential Metadata Strategies
DataEd Slides: Expressing Data Improvements as Business Outcomes
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online: Making the Case for Data Governance
Data analytics introduction
Data Systems Integration & Business Value PT. 3: Warehousing
DataEd Slides: Exorcising the Seven Deadly Data Sins

What's hot (20)

PDF
DataEd Slides: Data Management + Data Strategy = Interoperability
PDF
Data-Ed: Show Me the Money: The Business Value of Data and ROI
PDF
Approaching Data Quality
PDF
Data Governance and Metadata Management
PDF
DataEd Online: Unlock Business Value through Data Governance
PPT
Data governance, Information security strategy
PDF
Data-Ed: A Framework for no sql and Hadoop
PDF
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
PDF
Business Value Through Reference and Master Data Strategies
PDF
Data Management vs. Data Governance Program
PDF
DataEd Slides: Exorcising the Seven Deadly Data Sins
PDF
Data Stewards – Defining and Assigning
PDF
Data-Ed: Monetizing Data Management
PDF
DataEd Slides: Data Management vs. Data Strategy
PDF
DataEd Slides: Data Strategy – Plans Are Useless but Planning Is Invaluable
PDF
Necessary Prerequisites to Data Success
PDF
Data-Ed: Data Architecture Requirements
PDF
A Modern Approach to DI & MDM
PDF
ADV Slides: Organizational Change Management in Becoming an Analytic Organiza...
PDF
Data-Ed Webinar: Best Practices with the DMM
DataEd Slides: Data Management + Data Strategy = Interoperability
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Approaching Data Quality
Data Governance and Metadata Management
DataEd Online: Unlock Business Value through Data Governance
Data governance, Information security strategy
Data-Ed: A Framework for no sql and Hadoop
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
Business Value Through Reference and Master Data Strategies
Data Management vs. Data Governance Program
DataEd Slides: Exorcising the Seven Deadly Data Sins
Data Stewards – Defining and Assigning
Data-Ed: Monetizing Data Management
DataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Strategy – Plans Are Useless but Planning Is Invaluable
Necessary Prerequisites to Data Success
Data-Ed: Data Architecture Requirements
A Modern Approach to DI & MDM
ADV Slides: Organizational Change Management in Becoming an Analytic Organiza...
Data-Ed Webinar: Best Practices with the DMM
Ad

Similar to DataEd Slides: Getting Data Quality Right – Success Stories (20)

PDF
DataEd Slides: Data Management Best Practices
PPTX
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
PPTX
Big Data Analytics and Open Data
PDF
DataEd Slides: Data Management Best Practices
PDF
Noise to Signal - The Biggest Problem in Data
PPTX
Big Data World
PDF
Data-Ed Online: Approaching Data Quality
PPTX
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
PPTX
Generating Big Value from Big Data
PPT
It's not the documents; it's the DATA
PDF
Briefing on US EPA Open Data Strategy using a Linked Data Approach
PDF
DataEd Slides: Approaching Data Governance Strategically
PPTX
From Near to Maturity - Presentation to European Data Forum
PDF
DataEd Slides: Growing Practical Data Governance Programs
PPTX
Evolution & Introduction to Big data-2.pptx
PDF
Ictam big data
PPTX
Big data and health care
PPTX
Big data and health care
PDF
Data science and its potential to change business as we know it. The Roadmap ...
DataEd Slides: Data Management Best Practices
Big Data Past, Present and Future – Where are we Headed? - StampedeCon 2014
Big Data Analytics and Open Data
DataEd Slides: Data Management Best Practices
Noise to Signal - The Biggest Problem in Data
Big Data World
Data-Ed Online: Approaching Data Quality
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Generating Big Value from Big Data
It's not the documents; it's the DATA
Briefing on US EPA Open Data Strategy using a Linked Data Approach
DataEd Slides: Approaching Data Governance Strategically
From Near to Maturity - Presentation to European Data Forum
DataEd Slides: Growing Practical Data Governance Programs
Evolution & Introduction to Big data-2.pptx
Ictam big data
Big data and health care
Big data and health care
Data science and its potential to change business as we know it. The Roadmap ...
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Mega Projects Data Mega Projects Data
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Lecture1 pattern recognition............
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Business Data Analytics.
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Database Infoormation System (DBIS).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
climate analysis of Dhaka ,Banglades.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Supervised vs unsupervised machine learning algorithms
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Galatica Smart Energy Infrastructure Startup Pitch Deck
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Lecture1 pattern recognition............
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Business Data Analytics.
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
IB Computer Science - Internal Assessment.pptx

DataEd Slides: Getting Data Quality Right – Success Stories

  • 1. © Copyright 2020 by Peter Aiken Slide # 1paiken@plusanythingawesome.com+1.804.382.5957 Peter Aiken, PhD Getting Data Quality Right Engineering Success Stories
  • 2. Peter Aiken, Ph.D. • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Founder, Data Blueprint (datablueprint.com) • DAMA International (dama.org) • MIT CDO Society (iscdo.org) • Anything Awesome (plusanythingawesome.com) • 11 books and dozens of articles • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart … © Copyright 2020 by Peter Aiken Slide # 2https://plusanythingawesome.com PETER AIKEN WITH JUANITA BILLINGS FOREWORD BY JOHN BOTTEGA MONETIZING DATA MANAGEMENT Unlocking the Value in Your Organization’s Most Important Asset.
  • 3. 3 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 4. How to solve this data quality problem using just tools? © Copyright 2020 by Peter Aiken Slide # Microwave Oven retail price was $40 4https://plusanythingawesome.com
  • 5. © Copyright 2020 by Peter Aiken Slide # 5https://plusanythingawesome.com 4/6/12 Why Britain has 17,000 pregnant men This research, published as a letter this week in the British Medical Journal, was meant to draw attention to how much data gets entered incorrectly in the country’s medical system. These guys weren’t turning up at the doctor for pregnancy-related services. Instead, they were at their doctor for procedures that had medical codes similar to those of midwifery and obstetric services. With a misplaced keystroke here or there, an annual physical could become a consultation with a midwife. This research, published as a letter this week in the British Medical Journal, was meant to draw attention to how much data gets entered incorrectly in the country’s medical system. These guys weren’t turning up at the doctor for pregnancy-related services. Instead, they were at their doctor for procedures that had medical codes similar to those of midwifery and obstetric services. With a misplaced keystroke here or there, an annual physical could become a consultation with a midwife.
  • 6. © Copyright 2020 by Peter Aiken Slide #https://www.washingtonpost.com/us-policy/2020/06/25/irs-stimulus-checks-dead-people-gao/ https://slate.com/business/2020/06/irs-stimulus-check-dead-people-yawn.html Payments sent 160,000,000 to dead taxpayers 1,100,000 error rate 0.4% (substantive?) $1.4 billion Note: IRS lawyers determined "did not have legal authority to deny payments to those who filed a return for 2019, even if they were deceased at the time of payment” and checking with SSA might have slowed the distribution process down. From SLATE: • The real headline here should really be that the government did its job pretty well! • Speed was the priority • Within two weeks, the IRS delivered 80 million payments electronically • "Speed is part of the reason that poorer households were able to start spending normally again by May despite an unemployment rate rivaling the Great Depression’s." 6https://plusanythingawesome.com
  • 7. © Copyright 2020 by Peter Aiken Slide # 7https://plusanythingawesome.com A congratulations letter from another bank Problems • Bank did not know it made an error • Tools alone could not have prevented this error • Lost confidence in the ability of the bank to manage customer funds • Likely did not have the ability to respond
  • 8. Year 2000 (or Y2K) Bug © Copyright 2020 by Peter Aiken Slide # 8https://plusanythingawesome.com • Before the internet - Computing resources were expensive - It was worth the tradeoff to represent the year field using two digits - 1959 was represented to the computer as 59 - Subtracting 59 from 99 yields the correct answer 40 (for dates prior to 2000/01/01!) - No one expected those programs to still be in use - Documentation was poorly created/maintained • If all these fields were not expanded to four digits before 2000/01/01 then date calculations will not give correct results - Subtracting 59 from 00 yields the incorrect answer -41 - No one knew how long this would take or cost–only when it must be completed! On the OFFICIAL Clock of the United States at 1 second BEFORE Midnight showed: December 31, 1999 11:59:59 One SECOND later the OFFICIAL Clock of the United States showed: January 1, 19100 00:00:01
  • 9. US DoD Reverse Engineering Program Manager • "Your first project is to keep me from having to testify to a Congressional Hearing!" (Belkis Leon-Hong former ASD-C3I) • Problem: – 37 systems paid personnel within DoD – How many were needed? – How many potential losers? – What do you mean by employee? • Process modeling – Inconclusive results • Data reverse engineering - definitive – One legged engineer, working in waist deep waters, underneath rotating helicopter blades, on overtime © Copyright 2020 by Peter Aiken Slide # 9https://plusanythingawesome.com
  • 10. Why using Microsoft's tool caused Covid-19 results to be lost © Copyright 2020 by Peter Aiken Slide # 10https://plusanythingawesome.com https://www.bbc.com/news/technology-54423988?es_p=12801491 • Since 2007 should have been forced to use .xlsx (1,000,000+ rows) • Used .xls (65,000 rows) • Additional data was dropped without notification
  • 11. Reasonable individuals disagree on basic definitions © Copyright 2020 by Peter Aiken Slide # 11https://plusanythingawesome.com "Data quality is part of data management" "Data management is part of data quality"
  • 12. Poor data manifests as multifaceted organizational challenges © Copyright 2020 by Peter Aiken Slide # 12https://plusanythingawesome.com
  • 13. Poor data quality manifests as multifaceted organizational challenges © Copyright 2020 by Peter Aiken Slide # 13https://plusanythingawesome.com IT System Business Challenge Business Process Business Challenge IT Process Business Challenge Business System Business Challenge IT Process Business Challenge IT System Business Challenge Business Process Business Challenge Poor results
  • 14. The Blind Men and the Elephant (Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend ) • It was six men of Indostan, To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind. • The First approached the Elephant, And happening to fall Against his broad and sturdy side, At once began to bawl: "God bless me! but the Elephant Is very like a wall!" • The Second, feeling of the tusk Cried, "Ho! what have we here, So very round and smooth and sharp? To me `tis mighty clear This wonder of an Elephant Is very like a spear!" • The Third approached the animal, And happening to take The squirming trunk within his hands, Thus boldly up he spake: "I see," quoth he, "the Elephant Is very like a snake!" • The Fourth reached out an eager hand, And felt about the knee: "What most this wondrous beast is like Is mighty plain," quoth he; "'Tis clear enough the Elephant Is very like a tree!" © Copyright 2020 by Peter Aiken Slide # • The Fifth, who chanced to touch the ear, Said: "E'en the blindest man Can tell what this resembles most; Deny the fact who can, This marvel of an Elephant Is very like a fan!" • The Sixth no sooner had begun About the beast to grope, Than, seizing on the swinging tail That fell within his scope. "I see," quoth he, "the Elephant Is very like a rope!" • And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong, Though each was partly in the right, And all were in the wrong! 14https://plusanythingawesome.com
  • 15. • Problem: – Most organizations approach data quality problems in the same way that the blind men approached the elephant - people tend to see only the data that is in front of them – Little cooperation across boundaries, just as the blind men were unable to convey their impressions about the elephant to recognize the entire entity. – Leads to confusion, disputes and narrow views • Solution: – Data quality engineering can help achieve a more complete picture and facilitate cross boundary communications No universal conception of data quality exists, instead many differing perspective compete © Copyright 2020 by Peter Aiken Slide # 15https://plusanythingawesome.com
  • 16. Definitions • Quality Data – Fit for purpose meets the requirements of its authors, users, and administrators (adapted from Martin Eppler) – Synonymous with information quality, since poor data quality results in inaccurate information and poor business performance • Data Quality Management – Planning, implementation and control activities that apply quality management techniques to measure, assess, improve, and ensure data quality – Entails the "establishment and deployment of roles, responsibilities concerning the acquisition, maintenance, dissemination, and disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf ✓ Critical supporting process from change management ✓ Continuous process for defining acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels • Data Quality Engineering – Recognition that data quality solutions cannot not managed but must be engineered – Engineering is the application of scientific, economic, social, and practical knowledge in order to design, build, and maintain solutions to data quality challenges – Engineering concepts are generally not known and understood within IT or business! © Copyright 2020 by Peter Aiken Slide # 16https://plusanythingawesome.com Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
  • 17. Quality Data is ... © Copyright 2020 by Peter Aiken Slide # 17https://plusanythingawesome.com Fit For Purpose
  • 18. 18 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 19. Separating the Wheat from the Chaff • Better organized data increases in value • Data that is better organized increases in value • Poor data management practices are costing organizations money/time/effort • 80% of organizational data is ROT – Redundant – Obsolete – Trivial • The question is which data to eliminate? – Most enterprise data is never analyzed – 54% of data is unidentified (plus) 32% ROT = 14% business critical? © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com 19https://plusanythingawesome.comhttps://plusanythingawesome.com
  • 20. Data Data Data Information Fact Meaning Request [Built on definitions from Dan Appleton 1983] Intelligence Strategic Use Data Data Data Data A Model Defining 3 Important Concepts © Copyright 2020 by Peter Aiken Slide # 20https://plusanythingawesome.com “You can have data without information, but you cannot have information without data” — Daniel Keys Moran, Science Fiction Writer 1. Each FACT combines with one or more MEANINGS. 2. Each specific FACT and MEANING combination is referred to as a DATUM. 3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST 4. INFORMATION REUSE is enabled when one FACT is combined with more than one MEANING. 5. INTELLIGENCE is INFORMATION associated with its STRATEGIC USES. 6. DATA/INFORMATION must formally arranged into an ARCHITECTURE. Wisdom & knowledge are often used synonymously Useful Data
  • 21. My most profound lesson! (so far) © Copyright 2020 by Peter Aiken Slide # 21https://plusanythingawesome.com +
  • 22. © Copyright 2020 by Peter Aiken Slide # 22https://plusanythingawesome.com Perfect Model Poor Quality Data Poor Results Data Warehouse Machine Learning Business Intelligence Block ChainAIMDM Data Governance AnalyticsTechnology
  • 23. © Copyright 2020 by Peter Aiken Slide # 23https://plusanythingawesome.com Perfect Model Poor Quality Data Poor Results Data Warehouse Machine Learning Block Chain AI MDM Analytics Technology Data Governance Business Intelligence
  • 24. © Copyright 2020 by Peter Aiken Slide # 24https://plusanythingawesome.com Perfect Model Quality Data Good Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Analytics Technology Data Governance Quality In ➜ Quality Out!
  • 25. 25 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 26. Data Footprints • SQL Server – 47,000,000,000,000 bytes – Largest table 34 billion records 3.5 TBs • Informix – 1,800,000,000 queries/day – 65,000,000 tables / 517,000 databases • Teradata – 117 billion records – 23 TBs for one table • DB2 – 29,838,518,078 daily queries © Copyright 2020 by Peter Aiken Slide # 26https://plusanythingawesome.com
  • 27. Repeat 100s, thousands, millions of times ... © Copyright 2020 by Peter Aiken Slide # 27https://plusanythingawesome.com
  • 28. Death by 1000 Cuts © Copyright 2020 by Peter Aiken Slide # W o r k i n g W h i l e B l e e d i n g P r o f u s e l y D E A T H B Y A T H O U S A N D C U T S 28https://plusanythingawesome.com
  • 29. © Copyright 2020 by Peter Aiken Slide # 29https://plusanythingawesome.com bleeding unnecessarily from a lots of cuts
  • 30. Working While Bleeding © Copyright 2020 by Peter Aiken Slide # 30https://plusanythingawesome.com
  • 31. Making a Better Data Sandwich © Copyright 2020 by Peter Aiken Slide # 31https://plusanythingawesome.com
  • 32. Standard data Data supply Data literacy Making a Better Data Sandwich © Copyright 2020 by Peter Aiken Slide # 32https://plusanythingawesome.com Data literacy Standard data Data supply
  • 33. Making a Better Data Sandwich © Copyright 2020 by Peter Aiken Slide # 33https://plusanythingawesome.com Standard data Data supply Data literacy
  • 34. Making a Better Data Sandwich © Copyright 2020 by Peter Aiken Slide # 34https://plusanythingawesome.com Standard data Data supply Data literacy This cannot happen without engineering and architecture! Quality engineering/ architecture work products do not happen accidentally!
  • 35. Making a Better Data Sandwich © Copyright 2020 by Peter Aiken Slide # 35https://plusanythingawesome.com Standard data Data supply Data literacy This cannot happen without data engineering and architecture! Quality data engineering/ architecture work products do not happen accidentally!
  • 36. Engineering Architecture Engineering/Architecting Relationship • Architecting is used to create and build systems too complex to be treated by engineering analysis alone – Require technical details as the exception • Engineers develop the technical designs – Engineering/Crafts-persons deliver components supervised by: • Manufacturers • Building Contractors © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com 36
  • 37. USS Midway & Pancakes What is this? • It is tall • It has a clutch • It was built in 1942 • It is cemented to the floor • It is still in regular use! © Copyright 2020 by Peter Aiken Slide # 37https://plusanythingawesome.com
  • 38. Data should not act as chaff © Copyright 2020 by Peter Aiken Slide # 38https://plusanythingawesome.com
  • 39. 39 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 40. © Copyright 2020 by Peter Aiken Slide # DQ challenges are context specific! 40https://plusanythingawesome.com
  • 41. © Copyright 2020 by Peter Aiken Slide # 41https://plusanythingawesome.com DQ challenges are context specific!
  • 42. Hidden Data Factories • Make explicit the extra steps required to correct costly and time- consuming data errors © Copyright 2020 by Peter Aiken Slide # 42https://plusanythingawesome.com https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year Department A Department B 1. Check A's work 2. Make any corrections 3. Complete B's work 4. Deliver to customers 5. Deal with consequences Work products are delivered to CustomersCustomers Knowledge Workers 80% looking for stuff 20% doing useful work
  • 43. © Copyright 2020 by Peter Aiken Slide # 43https://plusanythingawesome.com Hidden Data Factories
  • 44. Hidden Data Factories are expensive https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year • Consider these two questions: – Were your systems explicitly designed to be integrated or otherwise work together? – If not then what is the likelihood that they will just happen to work well together? • Data must function at the most granular interaction or it results in things that: – Take longer (end-of-day job runs 45 hours) – Cost more (the wrong assets are transferred) – Deliver less (features are not delivered) – Present greater risk (billing delayed 30 days, monthly) • 20-40% of IT budgets are spent evolving data: – Data migration (changing the location from one place to another) – Data conversion (changing it into another form, state, or product) – Data improvement (inspecting, manipulating it, preparing for subsequent use) © Copyright 2020 by Peter Aiken Slide # 44https://plusanythingawesome.com "The choice of data structure and algorithm can make the difference between software running in a few seconds or many days." http://slideplayer.com/slide/7664141/
  • 45. © Copyright 2020 by Peter Aiken Slide # 45https://plusanythingawesome.com
  • 46. Great inspiration ... • How to Measure Anything: Finding the Value of Intangibles in Business by Douglas Hubbard (ISBN: 0470539399) • If something can be observed then it can be measured • Measurement is a reduction in uncertainty • Formalizing stuff forces clarity • Specific challenge characteristics – Whatever your measurement problem is, it's been done before – You have more data than you think – You need less data than you think – You probably need different data than you think – Getting data is more economical than you think © Copyright 2020 by Peter Aiken Slide # 46https://plusanythingawesome.com
  • 47. Enrico Fermi (Nobel Prize Physics 1938) • How many piano tuners in the city of Chicago? – Without using existing lists such as yellow pages, google ... – Current population of Chicago (3 million at the time) – Average number of people per household (2 or 3) – Share of households with regularly tuned pianos (1 in 3) – Required frequency of tuning (1/year) – How many pianos can a tuner tune daily? (4 or 5) – How many days/year are worked (250) © Copyright 2020 by Peter Aiken Slide # 47https://plusanythingawesome.com • Tuners in Chicago ≈ Population/people per household times % households with tuned pianos times tunings per year divided by (tunings per tuner per day times workdays/year)
  • 48. © Copyright 2020 by Peter Aiken Slide # 48https://plusanythingawesome.com
  • 49. © Copyright 2020 by Peter Aiken Slide # 49https://plusanythingawesome.com Technical Business Clean some data Decrease the number of undeliverable targeted marketing ads Reorganize the database Increase the ability of the salesforce to perform their own analyses Develop a taxonomy Create a common vocabulary for the organization Optimize a query Shaved 1 second off a task that runs a billion times a day Reverse engineer the legacy system Understand: what was good about the old system so it can be formally preserved and, what was bad so it can be improved Compare Story Types
  • 50. Monetization: Time & Leave Tracking © Copyright 2020 by Peter Aiken Slide # At Least 300 employees are spending 15 minutes/week tracking leave/time 50https://plusanythingawesome.com
  • 51. © Copyright 2020 by Peter Aiken Slide # 51https://plusanythingawesome.com Capture Cost of Labor/Category
  • 52. District-L (as an example) Leave Tracking Time Accounting Employees 73 50 Number of documents 1000 2040 Timesheet/employee 13.7 40.8 Time spent 0.08 0.25 Hourly Cost $6.92 $6.92 Additive Rate $11.23 $11.23 Cost per timekeeper $12.31 $114.56 Total timekeeper cost $898.49 $5,727.89 Monthly cost $21,563.83 $137,469.40 Compute Labor Costs © Copyright 2020 by Peter Aiken Slide # 52https://plusanythingawesome.com
  • 53. Annual Organizational Totals • Range $192,000 - $159,000/month • $100,000 Salem • $159,000 Lynchburg • $100,000 Richmond • $100,000 Suffolk • $150,000 Fredericksburg • $100,000 Staunton • $100,000 NOVA • $800,000/month or $9,600,000/annually • Awareness of the cost of things considered overhead © Copyright 2020 by Peter Aiken Slide # 53https://plusanythingawesome.com
  • 54. 54 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 55. Improving Data Quality during System Migration • Challenge – Millions of NSN/SKUs maintained in a catalog – Key and other data stored in clear text/comment fields – Original suggestion was manual approach to text extraction – Left the data structuring problem unsolved • Solution – Proprietary, improvable text extraction process – Converted non-tabular data into tabular data – Saved a minimum of $5 million – Literally person centuries of work © Copyright 2020 by Peter Aiken Slide # 55https://plusanythingawesome.com
  • 56. Unmatched Items Ignorable Items Items Matched Week # (% Total) (% Total) (% Total) 1 31.47% 1.34% N/A 2 21.22% 6.97% N/A 3 20.66% 7.49% N/A 4 32.48% 11.99% 55.53% … … … … 14 9.02% 22.62% 68.36% 15 9.06% 22.62% 68.33% 16 9.53% 22.62% 67.85% 17 9.5% 22.62% 67.88% 18 7.46% 22.62% 69.92% Determining Diminishing Returns © Copyright 2020 by Peter Aiken Slide # 56https://plusanythingawesome.com Before After
  • 57. Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits © Copyright 2020 by Peter Aiken Slide # 57https://plusanythingawesome.com
  • 58. Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits © Copyright 2020 by Peter Aiken Slide # 58https://plusanythingawesome.com Time needed to review all NSNs once over the life of the project: NSNs 150,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 750,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 750,000 Minutes available person/year 108,000 Total Person-Years 7 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 7 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $420,000
  • 59. Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits © Copyright 2020 by Peter Aiken Slide # 59https://plusanythingawesome.com
  • 60. Why should a knowledge worker • with a PhD in Chemical Engineering • have to know whether this product was Y2K compliant? © Copyright 2020 by Peter Aiken Slide # 60https://plusanythingawesome.com
  • 61. International Chemical Company Engine Testing • $1billion (+) chemical company • Develops/manufactures additives enhancing the performance of oils and fuels ... • ... to enhance engine/machine performance – Helps fuels burn cleaner – Engines run smoother – Machines last longer • Tens of thousands of tests annually – Test costs range up to $250,000 © Copyright 2020 by Peter Aiken Slide # 61https://plusanythingawesome.com
  • 62. 1.Manual transfer of digital data 2.Manual file movement/duplication 3.Manual data manipulation 4.Disparate synonym reconciliation 5.Tribal knowledge requirements 6.Non-sustainable technology © Copyright 2020 by Peter Aiken Slide # 62https://plusanythingawesome.com Improving Knowledge Worker Productivity
  • 63. Data Integration Solution • Integrated the existing systems to easily search on and find similar or identical tests • Results: – Reduced expenses – Improved competitive edge and customer service – Time savings and improve operational capabilities • According to our client’s internal business case development, they expect to realize a $25 million gain each year thanks to this data integration © Copyright 2020 by Peter Aiken Slide # 63https://plusanythingawesome.com
  • 64. • 2 person months = 40 person days • 683 + 1478 ≈ 2,000 attributes • 2,000 attributes mapped onto 7,000 • 2,000/40 person days = 50/day or 50/8 hours = 6.25 attributes/hour and • 7,000/40 days = 375/person day or 375/8 hours = 46.875 attributes/ hour • 6.25 + 46.875 ≈ 52/60 • Locate, identify, understand, map, transform, document • 1 attribute/minute! Platform: UniSys OS: OS 1998 Age: 21 Data Structure: DMS (Network) Physical Records: 4,950,000 Logical Records: 250,000 Relationships: 62 Entities: 57 Attributes: 1478 Legacy System #1: Payroll Legacy System #2: Personnel Platform: Amdahl OS: MVS 1998 Age: 15 Data Structure: VSAM/virtual database tables Physical Records: 780,000 Logical Records: 60,000 Relationships: 64 Entities: 4/350 Attributes: 683 Characteristics Logical Physical Platform: WinTel Records: 250,000 600,000 OS: Win'95 Relationships: 1,034 1,020 1998 Age: new Entities: 1,600 2,706 Data Structure: Client/Sever RDBMS Attributes: 15,000 7,073 Predicting Engineering Problem Characteristics © Copyright 2020 by Peter Aiken Slide # 64https://plusanythingawesome.com New System Extreme Data Engineering!
  • 65. Logistics Company • Fortune 450 • Room of 100 associates • Manually correcting every item on every customer invoice • Upon noting this to the responsible manager - the reply: – This is the best quarter – Of the best year – I've ever had – Perhaps I need to double the number in that room? © Copyright 2020 by Peter Aiken Slide # 65https://plusanythingawesome.com
  • 66. 66 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 67. 67 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 68. Motivations for doing more with data • Because data points to where valuable things are located • Because data has intrinsic value by itself • Because data has inherent combinatorial value • Valuing Data – Use data to measure change – Use data to manage change – Use data to motivate change • Creating a competitive advantage with data © Copyright 2020 by Peter Aiken Slide # 68https://plusanythingawesome.com Improve your organization’s data Improve the way your people use its data Improve the way your data and your people support your organizational strategy
  • 69. What did Rolls Royce Learn © Copyright 2020 by Peter Aiken Slide # 69https://plusanythingawesome.com from Nascar? • Old model - Sell jet engines • New model - Sell hours of powered thrust - “Power-by-the-hour” - No payment for down time - Wing to wing - When was this new model invented? https://www.youtube.com/watch?v=RRy_73ivcms
  • 70. Fan Blade Sensor • 1 Sensor – Probabilistic (generalist) maintenance forecasts • 100 Sensors – Establish optimal monitoring targets – Finer tuned and safer maintenance – Mission Readiness ??? – Storage $$$ – Handling $$$ – Opportunity $$$ – Systemic $$$ – Maintenance $$$ – Total > $1.5 Billion © Copyright 2020 by Peter Aiken Slide # 70https://plusanythingawesome.com
  • 71. 71 Program © Copyright 2020 by Peter Aiken Slide #https://plusanythingawesome.com Getting Data Quality Right - Success Stories • Adopt a broad definition to data quality • Understand data quality in the broader context of organizational data use • Approach data quality as an engineering challenge • Putting a price on data quality • Savings based stories • Innovation based stories • Non-monetary stories • Takeaways and Q&A
  • 72. UbiquitousMysteryObject © Copyright 2020 by Peter Aiken Slide # 72https://plusanythingawesome.com
  • 73. © Copyright 2020 by Peter Aiken Slide # 73https://plusanythingawesome.com
  • 74. © Copyright 2020 by Peter Aiken Slide # 74https://plusanythingawesome.com
  • 75. © Copyright 2020 by Peter Aiken Slide # Data Mapping Mental illness Deploy ments Work History Soldier Legal Issues Abuse Suicide Analysis FAPDMSS G1 DMDC CID Data objects complete? All sources identified? Best source for each object? How reconcile differences between sources? MDR 75https://plusanythingawesome.com
  • 76. z © Copyright 2020 by Peter Aiken Slide # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 76https://plusanythingawesome.com
  • 77. © Copyright 2020 by Peter Aiken Slide # 77https://plusanythingawesome.com
  • 78. Senior Army Official • Room full of Stewards • A very heavy dose of management support • Advised the group of his opinion on the matter • Any questions as to future direction – "They should make an appointment to speak directly with me!" • Empower the team – The conversation turned from "can this be done?" to "how are we going to accomplish this?" – Mistakes along the way would be tolerated – Implement a workable solution in prototype form © Copyright 2020 by Peter Aiken Slide # 78https://plusanythingawesome.com
  • 79. © Copyright 2020 by Peter Aiken Slide # 79https://plusanythingawesome.com
  • 80. Managing Data with Guidance? • Federal employees • 44 users from whitehouse.gov • Thousands of military and government e-mails • Canadian citizens • One-fifth of Quebec © Copyright 2020 by Peter Aiken Slide # 80https://plusanythingawesome.com
  • 81. Ashley Madison 37,000,000 25,000,000 OPM 70,000,000 Target How the Government Jeopardized Our National Security for More than a Generation © Copyright 2020 by Peter Aiken Slide # 81https://plusanythingawesome.com
  • 82. Target Corporation's Database Contents © Copyright 2020 by Peter Aiken Slide # • Your age • Marital status • Part of town you live in • How long it takes you to drive to work • Estimated salary • If you have recently moved • Credit cards carried in your wallet • What websites you visit • Your ethnicity • Your job history • The magazines you read • Work commute • Sexual preferences • If you’ve ever declared bankruptcy or got divorced • The year you bought (or lost) your house • Where you went to school(s) • What kinds of topics you talk about online • Whether you prefer certain brands of coffee, paper towels, cereal or applesauce • Your political leanings, reading habits, charitable giving and • The number of cars you own 82https://plusanythingawesome.com
  • 83. Key Findings • Preventable • Leadership failed – To heed repeated recommendations – To sufficiently respond to growing threats of sophisticated cyber attacks, and – To prioritize resources for cybersecurity • 2014 data breaches were likely connected and possibly coordinated to the 2015 data breach • OPM misled the public on the extent of the damage of the breach and made false statements to Congress © Copyright 2020 by Peter Aiken Slide # https://oversight.house.gov/report/opm-data-breach-government-jeopardized-national-security-generation/ How the Government Jeopardized Our National Security for More than a Generation 83https://plusanythingawesome.com
  • 84. Take Aways • Quality data requires a context specific definition • Most business problems have data challenges (hidden data factories) at their root • All advanced data practices depend on quality data • AI/ML are suffering from lack of training data • Few 'easy' fixes exist • Successful data quality stories demonstrate – Tangible ongoing savings – Innovative data uses – Outcomes more important than money © Copyright 2020 by Peter Aiken Slide # 84https://plusanythingawesome.com
  • 85. References & Recommended Reading © Copyright 2020 by Peter Aiken Slide # 85https://plusanythingawesome.com
  • 86. Event Pricing © Copyright 2020 by Peter Aiken Slide # 86https://plusanythingawesome.com • 20% off directly from the publisher on select titles • My Book Store @ http://plusanythingawwsome.com • Enter the code "anythingawesome" at the Technics bookstore checkout where it says to "Apply Coupon"
  • 87. Necessary Prerequisites to Data Success: Exorcising the Seven Deadly Data Sins 8 December 2020 Data Strategy: Plans Are Useless but Planning is Invaluable 14 January 2021 Data Management Best Practices/ Practicing Data Management Better 9 February 2021 © Copyright 2020 by Peter Aiken Slide # 87 Brought to you by: Upcoming Events (All webinars begin @ 17:00 UTC/2:00 PM NYC) https://plusanythingawesome.com
  • 88. peter@plusanythingawesome.com +1.804.382.5957 Questions? Thank You! © Copyright 2020 by Peter Aiken Slide # 88 + = Book a call with Peter to discuss anything - https://plusanythingawesome.com/OfficeHours.html