A HITCHHIKER'S GUIDE TO
DATA QUALITY
Tatiana Stebakova
The Data & InformationAssembly Australia April 2015
 Evolution of DQ Governance approach over the past 10
years
 How to make a quantum leap from DQ theory to
execution, personal view
 You’ve done it all by the book, but there is little traction
in Data quality. DQ and system’s thinking. Don’t panic!
Content
Evolution of DQ Governance approach
over the past 10 years
 Data Duplicates – still magic words
 Data Quality Frameworks - from emergence to maturity
 Senior Management Support - a breakthrough
 Senior Architects Support – little change
 Data Quality Governance - from novelty to mainstream
 Data QualityTools andTechnology – from luxury to BAU
 Metadata - from “what is it?” to “new black”
How to make a quantum leap from DQ theory
to execution, personal view
Step1. Data Quality Justification
DQ Horror stories
About 6.5 million Americans are 112 or
older. The US Social Security office has 6.5
million people on record as having reached
the age of 112, even though only 42 people
are known to be that old globally
"Studies in cost analysis show that
between 15% to > 20% of a company’s operating
revenue is spent doing things to get around or fix
data quality issues"
Larry English
Option 1 –What can we
gain?
Option 2 – Scare technique
Option 3(my favourite)–Risks
"Poor data is like a dirty windscreen. You can continue driving as your
vision degrades, but at some point you must stop and clear the
windscreen or risk everything"
Ken Orr
Step2. Build DQ requirements into solution
architecture and system’s development contract
Example of DQ requirements
ETL solution SHALLhave capability to perform Column integrity screening/ profiling
ETL solution SHALLhave capability to perform Data Structure screening/ profiling
ETL solution SHALLhave capability to perform Compliance to Business rule screening/ profiling
Quality should be built into the product, and testing alone
cannot be relied to ensure product quality (FDA,Current
Good Manufacturing Practice)
The … ETL controls solution SHALL perform a periodic full snapshot
of the same data for reconciliation purposes, if Delta files are used.
The … ETL solution SHALL have capability to perform Data
Structure screening/profiling
The … data extract process SHALL support logical data
consistency (temporal relationship of data).
Step3. Build data quality requirements into
system’s operation contract + DQ KPIs
“I’ve never been a good
spectator.
Either I’m playing the
game or I’m not
interested.”
Christiaan Barnard, the first surgeon,
performed heart transplant
…..solution shall have a capability to measure and report on the data quality Key Performance Indicators
(KPIs) as defined by the Governance authority.
KPI Examples:
• customer record uniqueness
• directory currency and accessibility
• information provenance.
• uptake rate - coverage
• quality of records per DQ dimensions and characteristics
• response time for typical transactions.
You’ve done it all by the book, but there
is little traction in Data quality.
 Don’t be afraid
 From Hitchhiker to Hijacker
 Become a driver. Apply for the architect’s, project lead or data
management jobs
 Drop your “data quality bugs/requirements” anywhere you can
 Look for opportunities.Change your strategy all the time
 Mimic your requirements, do not call them DQ requirements
 Lean on standards
 Do not reference DQ gurus. ReferenceTechnology gurus instead
 Befriend architects
 Be patient, keep cool
““Success is not final,
failure is not fatal: it is
the courage to continue
that counts.”
Winston Churchill
 Complex adaptive systems (CAS) - are dynamic systems able to
adapt with a changing environment where all participants are closely
linked with each other making up an “IT ecosystem” (MIT)
 Within such ecosystem, change becomes not so much as adaptation,
but co-evolution with all other related systems
 Rules of flocking:
 Follow the leader
 Align with neighbours
 Avoid overcrowding
Data Quality and system’s thinking
System’s thinking – delayed response
 Launch date - 2 March 2004
 Mission duration 10 years, 11
months and 23 days
 6.5 billion Kilometres
“After 10 years, and a journey of more than six
billion kilometres, the Rosetta spacecraft sent
its fridge-sized Philae lander down to Comet
67P/Churyumov-Gerasimenko”.
Questions

More Related Content

PDF
Look at agile starting from thermodynamics
PPTX
Supporting innovation in insurance with randomized experimentation
PPTX
Managing Data Science | Lessons from the Field
PDF
The Black Box: Interpretability, Reproducibility, and Data Management
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
PPT
Data Quality Tools In Data Migrations
PPTX
Data Quality Analytics: Understanding what is in your data, before using it
PDF
Data quality management Basic
Look at agile starting from thermodynamics
Supporting innovation in insurance with randomized experimentation
Managing Data Science | Lessons from the Field
The Black Box: Interpretability, Reproducibility, and Data Management
Pay no attention to the man behind the curtain - the unseen work behind data ...
Data Quality Tools In Data Migrations
Data Quality Analytics: Understanding what is in your data, before using it
Data quality management Basic

What's hot (20)

PPTX
Moving Data Science from an Event to A Program: Considerations in Creating Su...
PDF
Operationalizing Machine Learning in the Enterprise
PDF
Building a Data Platform Strata SF 2019
PDF
Machine Learning Risk Management
PDF
Data quality - The True Big Data Challenge
PDF
IT & Innovation - short summary
PDF
Asking Why
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
Foundation of data quality
PDF
Leveraged Analytics at Scale
PPTX
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
PPTX
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
PPTX
Intel boubker el mouttahid
PDF
Big data and other buzzwords
PDF
BIG DATA ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
PPTX
eDiscovery Perspective
PDF
How to Document Agile Projects
PPTX
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
PDF
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Operationalizing Machine Learning in the Enterprise
Building a Data Platform Strata SF 2019
Machine Learning Risk Management
Data quality - The True Big Data Challenge
IT & Innovation - short summary
Asking Why
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Foundation of data quality
Leveraged Analytics at Scale
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Intel boubker el mouttahid
Big data and other buzzwords
BIG DATA ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
Domino and AWS: collaborative analytics and model governance at financial ser...
eDiscovery Perspective
How to Document Agile Projects
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Ad

Viewers also liked (19)

PPTX
Justicia de paz
PPTX
توسعه هواشناسی کاربردی - تهک دستگاهی
PDF
Securing Large Commercial Deposits
PPTX
თავგანწირული მხედარი
PDF
Anish Kapoor at Versailles独家专访
PDF
浅谈电商 (赵千雨)
PPTX
Redaccion juridica
DOCX
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
PPTX
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
PDF
Las cifras de los Adblockers
PPTX
Piganavej 11 facts
PPTX
The innatist position (presentation)
PPTX
توسعه هواشناسی کاربردی (تهک) دستگاهی
PPTX
PROLOG SYSTEM TUNASKITA
PDF
Miracle Essential Oil's
PPTX
PERSPECTIVES OF ONLINE EDUCATION IN THE CZECH REPUBLIC
PDF
pliant_cd
PDF
DHS_StrategicAugust2012_Final
PDF
Soa bpm standalone_installation
Justicia de paz
توسعه هواشناسی کاربردی - تهک دستگاهی
Securing Large Commercial Deposits
თავგანწირული მხედარი
Anish Kapoor at Versailles独家专访
浅谈电商 (赵千雨)
Redaccion juridica
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
Las cifras de los Adblockers
Piganavej 11 facts
The innatist position (presentation)
توسعه هواشناسی کاربردی (تهک) دستگاهی
PROLOG SYSTEM TUNASKITA
Miracle Essential Oil's
PERSPECTIVES OF ONLINE EDUCATION IN THE CZECH REPUBLIC
pliant_cd
DHS_StrategicAugust2012_Final
Soa bpm standalone_installation
Ad

Similar to A Hitchhiker's Guide to Data Quality_20150331 (20)

PDF
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
PPTX
10 Steps for Taking Control of Your Organization's Digital Debris
PPT
Lecture 23
PDF
Why data governance is the new buzz?
PPTX
DataOps: Nine steps to transform your data science impact Strata London May 18
PDF
Turn Data Into Power: Proven Strategies for Real Impact
PPTX
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
PDF
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
PPTX
Max Cottica slides from Future of Business Intelligence
PPTX
Becoming Datacentric
PPT
Intel Faster Risk Oct08 - Andrew Parry
PDF
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
PDF
Data-Ed: Unlock Business Value through Data Quality Engineering
PDF
Data-Ed: Unlock Business Value through Data Quality Engineering
PPTX
Umm, how did you get that number? Managing Data Integrity throughout the Data...
PDF
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
PPTX
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
PPTX
BDA 2012 Big data why the big fuss?
PDF
How to unlock new data-driven potential for your organization
PDF
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
10 Steps for Taking Control of Your Organization's Digital Debris
Lecture 23
Why data governance is the new buzz?
DataOps: Nine steps to transform your data science impact Strata London May 18
Turn Data Into Power: Proven Strategies for Real Impact
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Max Cottica slides from Future of Business Intelligence
Becoming Datacentric
Intel Faster Risk Oct08 - Andrew Parry
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
Umm, how did you get that number? Managing Data Integrity throughout the Data...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
BDA 2012 Big data why the big fuss?
How to unlock new data-driven potential for your organization
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...

A Hitchhiker's Guide to Data Quality_20150331

  • 1. A HITCHHIKER'S GUIDE TO DATA QUALITY Tatiana Stebakova The Data & InformationAssembly Australia April 2015
  • 2.  Evolution of DQ Governance approach over the past 10 years  How to make a quantum leap from DQ theory to execution, personal view  You’ve done it all by the book, but there is little traction in Data quality. DQ and system’s thinking. Don’t panic! Content
  • 3. Evolution of DQ Governance approach over the past 10 years  Data Duplicates – still magic words  Data Quality Frameworks - from emergence to maturity  Senior Management Support - a breakthrough  Senior Architects Support – little change  Data Quality Governance - from novelty to mainstream  Data QualityTools andTechnology – from luxury to BAU  Metadata - from “what is it?” to “new black”
  • 4. How to make a quantum leap from DQ theory to execution, personal view
  • 5. Step1. Data Quality Justification DQ Horror stories About 6.5 million Americans are 112 or older. The US Social Security office has 6.5 million people on record as having reached the age of 112, even though only 42 people are known to be that old globally "Studies in cost analysis show that between 15% to > 20% of a company’s operating revenue is spent doing things to get around or fix data quality issues" Larry English Option 1 –What can we gain? Option 2 – Scare technique
  • 6. Option 3(my favourite)–Risks "Poor data is like a dirty windscreen. You can continue driving as your vision degrades, but at some point you must stop and clear the windscreen or risk everything" Ken Orr
  • 7. Step2. Build DQ requirements into solution architecture and system’s development contract Example of DQ requirements ETL solution SHALLhave capability to perform Column integrity screening/ profiling ETL solution SHALLhave capability to perform Data Structure screening/ profiling ETL solution SHALLhave capability to perform Compliance to Business rule screening/ profiling Quality should be built into the product, and testing alone cannot be relied to ensure product quality (FDA,Current Good Manufacturing Practice) The … ETL controls solution SHALL perform a periodic full snapshot of the same data for reconciliation purposes, if Delta files are used. The … ETL solution SHALL have capability to perform Data Structure screening/profiling The … data extract process SHALL support logical data consistency (temporal relationship of data).
  • 8. Step3. Build data quality requirements into system’s operation contract + DQ KPIs “I’ve never been a good spectator. Either I’m playing the game or I’m not interested.” Christiaan Barnard, the first surgeon, performed heart transplant …..solution shall have a capability to measure and report on the data quality Key Performance Indicators (KPIs) as defined by the Governance authority. KPI Examples: • customer record uniqueness • directory currency and accessibility • information provenance. • uptake rate - coverage • quality of records per DQ dimensions and characteristics • response time for typical transactions.
  • 9. You’ve done it all by the book, but there is little traction in Data quality.  Don’t be afraid  From Hitchhiker to Hijacker  Become a driver. Apply for the architect’s, project lead or data management jobs  Drop your “data quality bugs/requirements” anywhere you can  Look for opportunities.Change your strategy all the time  Mimic your requirements, do not call them DQ requirements  Lean on standards  Do not reference DQ gurus. ReferenceTechnology gurus instead  Befriend architects  Be patient, keep cool ““Success is not final, failure is not fatal: it is the courage to continue that counts.” Winston Churchill
  • 10.  Complex adaptive systems (CAS) - are dynamic systems able to adapt with a changing environment where all participants are closely linked with each other making up an “IT ecosystem” (MIT)  Within such ecosystem, change becomes not so much as adaptation, but co-evolution with all other related systems  Rules of flocking:  Follow the leader  Align with neighbours  Avoid overcrowding Data Quality and system’s thinking
  • 11. System’s thinking – delayed response  Launch date - 2 March 2004  Mission duration 10 years, 11 months and 23 days  6.5 billion Kilometres “After 10 years, and a journey of more than six billion kilometres, the Rosetta spacecraft sent its fridge-sized Philae lander down to Comet 67P/Churyumov-Gerasimenko”.

Editor's Notes

  • #4: Marvin Ford Prefect ARTUR DENT