SlideShare a Scribd company logo
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
11
Data WarehousingData Warehousing
Lecture-21Lecture-21
Introduction to Data Quality Management (DQM)Introduction to Data Quality Management (DQM)
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan101@yahoo.com
DWH-Ahsan Abdullah
2
Introduction to Data QualityIntroduction to Data Quality
Management (DQM)Management (DQM)
DWH-Ahsan Abdullah
3
What is Quality? InformallyWhat is Quality? Informally
Some things are better than others i.e. they are ofSome things are better than others i.e. they are of
higher quality. How much “better” is better?higher quality. How much “better” is better?
Is the right item the best item to purchase? HowIs the right item the best item to purchase? How
about after the purchase?about after the purchase?
What is quality of service? The bank exampleWhat is quality of service? The bank example
DWH-Ahsan Abdullah
4
What is Quality? FormallyWhat is Quality? Formally
“Quality is conformance to requirements”
P. Crosby, “Quality is Free” 1979
“Degree of excellence”
Webster’s Third New International Dictionary
DWH-Ahsan Abdullah
5
What is Quality? Examples from Auto IndustryWhat is Quality? Examples from Auto Industry
Quality means meeting customer’s needs,
not necessarily exceeding them.
Quality means improving things customers
care about, because that makes their lives
easier and more comfortable.
Why example from auto-industry?
DWH-Ahsan Abdullah
6
What is Data Quality?What is Data Quality?
Muhammad Khan
Height = 5’8”
Weight = 160 lbs
Gender = Male
Age = 35 yrs
Emp_ID = 440
All data is an abstraction of something real
What is Data?
Note Change
the picture
DWH-Ahsan Abdullah
7
What is Data Quality?What is Data Quality?
Intrinsic Data Quality
Electronic reproduction of reality.
Realistic Data Quality
Degree of utility or value of data to business.
DWH-Ahsan Abdullah
8
Data Quality & OrganizationsData Quality & Organizations
Intelligent Learning Organization:
High-quality data is an open, shared resource with value-
adding processes.
The dysfunctional learning
organization:
Low-quality data is a proprietary resource with cost-adding
processes.
{Comment: Put picture of person in water holding round tube with data written on it}
DWH-Ahsan Abdullah
9
Law #1 - “Data that is not used cannot be correct!”
Law #2 - “Data quality is a function of its use, not its
collection!”
Law #3 - “Data will be no better than its most stringent use!”
Law #4 - “Data quality problems increase with the age of the
system!”
Law #5 – “The less likely something is to occur, the more
traumatic it will be when it happens!”
Orr’s Laws of Data QualityOrr’s Laws of Data Quality
DWH-Ahsan Abdullah
10
Total Quality Control (TQM)Total Quality Control (TQM)
Philosophy of involving all forPhilosophy of involving all for systematicsystematic andand
continuouscontinuous improvement.improvement.
It is customer oriented. Why?It is customer oriented. Why?
TQM incorporates the concept of product quality,TQM incorporates the concept of product quality,
process control, quality assurance, and qualityprocess control, quality assurance, and quality
improvement.improvement.
Quality assurance isQuality assurance is NOTNOT Quality improvementQuality improvement
DWH-Ahsan Abdullah
11
Co$t of fixing data qualityCo$t of fixing data quality
Lowest Quality Highest quality
Costofachievingquality
 Defect minimization is economical.
 Defect elimination is very very expensive.
Exponential rise
in cost
DWH-Ahsan Abdullah
12
Co$t of Data Quality DefectsCo$t of Data Quality Defects
 Controllable CostsControllable Costs
 Recurring costs for analyzing, correcting, and preventingRecurring costs for analyzing, correcting, and preventing
data errorsdata errors
 Resultant CostsResultant Costs
 Internal and external failure costs of business opportunitiesInternal and external failure costs of business opportunities
missed.missed.
 Equipment & Training CostsEquipment & Training Costs
DWH-Ahsan Abdullah
13
Where data quality is critical?Where data quality is critical?
Almost everywhere, some examples:Almost everywhere, some examples:
Marketing communications.Marketing communications.
Customer matching.Customer matching.
Retail house-holding.Retail house-holding.
Combining MIS systems after acquisition.Combining MIS systems after acquisition.
DWH-Ahsan Abdullah
14
Characteristics or Dimensions of Data QualityCharacteristics or Dimensions of Data Quality
Data Quality
Characteristic
Definition
Accuracy Qualitatively assessing lack of error, high accuracy
corresponding to small error.
Completeness The degree to which values are present in the attributes that
require them.
DWH-Ahsan Abdullah
15
Completeness Vs AccuracyCompleteness Vs Accuracy
95% accurate and 100% complete
OR
100% accurate and 95% complete
Which is better?
Depends on data quality (i) tolerances,Depends on data quality (i) tolerances,
the (ii) corresponding application and the (iii) cost ofthe (ii) corresponding application and the (iii) cost of
achieving that data quality vs. the (iv) business value.achieving that data quality vs. the (iv) business value.
DWH-Ahsan Abdullah
16
Characteristics or Dimensions of Data QualityCharacteristics or Dimensions of Data Quality
Data Quality
Characteristic
Definition
Consistency A measure of the degree to which a set of data satisfies a set of
constraints.
Timeliness A measure of how current or up to date the data is.
Uniqueness The state of being only one of its kind or being without an equal
or parallel.
Interpretability The extent to which data is in appropriate languages, symbols,
and units, and the definitions are clear.
Accessibility The extent to which data is available, or easily and quickly
retrievable
Objectivity The extent to which data is unbiased, unprejudiced, and
impartial

More Related Content

PPT
Lecture 22
PPT
Lecture 23
PDF
Adventures in Data Profiling
PPT
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
PPTX
Data analytics
PPTX
Introduction to data analytics
PPTX
Data analytics
PPTX
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Lecture 22
Lecture 23
Adventures in Data Profiling
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
Data analytics
Introduction to data analytics
Data analytics
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...

What's hot (20)

PDF
Data quality - The True Big Data Challenge
PDF
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PDF
Data science tutorial
PPTX
Machine Learning in Healthcare: A Case Study
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
PDF
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
PPTX
Data Analytics
PPTX
Data mining financial services
PDF
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
PPTX
Have Data—Need Analysts. Lessons Learned From The Woodworking Industry
PPTX
Predictive analytics
PPTX
Life Science Analytics
PDF
Introduction to data analytics
PDF
Paradigm4 Research Report: Leaving Data on the table
PPTX
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
PDF
Aa proj assited-living_iot
PPTX
Machine Learning and Multi Drug Resistant(MDR) Infections case study
PPTX
Data Analytics Life Cycle
Data quality - The True Big Data Challenge
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data science tutorial
Machine Learning in Healthcare: A Case Study
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analytics
Data mining financial services
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Have Data—Need Analysts. Lessons Learned From The Woodworking Industry
Predictive analytics
Life Science Analytics
Introduction to data analytics
Paradigm4 Research Report: Leaving Data on the table
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
Aa proj assited-living_iot
Machine Learning and Multi Drug Resistant(MDR) Infections case study
Data Analytics Life Cycle
Ad

Viewers also liked (20)

PPT
Lecture 40
PPT
Lecture 17
PPT
Lecture 27
PPT
Lecture 4
PPT
Lecture 2
PPT
Lecture 16
PPT
Lecture 31
PPT
Lecture 32
PPT
Lecture 20
PPT
Lecture 26
PPT
Lecture 30
PPT
Lecture 38
PPT
Lecture 18
PPT
Lecture 29
PPT
Lecture 5
PPT
Lecture 35
PPT
Lecture 33
PPT
Lecture 34
PPT
Lecture 37
PPT
Lecture 7
Lecture 40
Lecture 17
Lecture 27
Lecture 4
Lecture 2
Lecture 16
Lecture 31
Lecture 32
Lecture 20
Lecture 26
Lecture 30
Lecture 38
Lecture 18
Lecture 29
Lecture 5
Lecture 35
Lecture 33
Lecture 34
Lecture 37
Lecture 7
Ad

Similar to Lecture 21 (20)

PPTX
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
PPTX
Reframing the Value Proposition and Proposed Value of Information Quality
PPTX
10 Steps for Taking Control of Your Organization's Digital Debris
PDF
John Mancini's Predictions for Information Management in 2015
PDF
Data quality management Basic
PDF
Developing A Universal Approach to Cleansing Customer and Product Data
DOC
PROJECT softwares (28 May 14)
PPTX
Quality in information_security
PPTX
A Hitchhiker's Guide to Data Quality_20150331
PDF
From Compliance to Customer 360: Winning with Data Quality & Data Governance
PDF
CDO - Chief Data Officer Momentum and Trends
PDF
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
PDF
AI-Led-Cognitive-Data-Quality.pdf
PPT
Cloud and business agility
PDF
Data Quality Management With Semantic Technologies 1st Edition Christian Frbe...
PDF
Data Quality Management With Semantic Technologies 1st Edition Christian Frbe...
PDF
E outsource asia 2010
PDF
Article Week 20-August-2024-Radha-Data Engineering Services (1).pdf
PPTX
Surviving the Change Agents - How Business Survive the Next Evolution
PDF
The Bigger They Are The Harder They Fall
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Reframing the Value Proposition and Proposed Value of Information Quality
10 Steps for Taking Control of Your Organization's Digital Debris
John Mancini's Predictions for Information Management in 2015
Data quality management Basic
Developing A Universal Approach to Cleansing Customer and Product Data
PROJECT softwares (28 May 14)
Quality in information_security
A Hitchhiker's Guide to Data Quality_20150331
From Compliance to Customer 360: Winning with Data Quality & Data Governance
CDO - Chief Data Officer Momentum and Trends
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
AI-Led-Cognitive-Data-Quality.pdf
Cloud and business agility
Data Quality Management With Semantic Technologies 1st Edition Christian Frbe...
Data Quality Management With Semantic Technologies 1st Edition Christian Frbe...
E outsource asia 2010
Article Week 20-August-2024-Radha-Data Engineering Services (1).pdf
Surviving the Change Agents - How Business Survive the Next Evolution
The Bigger They Are The Harder They Fall

More from Shani729 (19)

PPT
Python tutorialfeb152012
PPT
Python tutorial
PDF
Interaction design _beyond_human_computer_interaction
PPTX
Fm lecturer 13(final)
PPT
Lecture slides week14-15
PPT
Frequent itemset mining using pattern growth method
PPT
Dwh lecture slides-week15
PPT
Dwh lecture slides-week10
PPT
Dwh lecture slidesweek7&8
PPT
Dwh lecture slides-week5&6
PPT
Dwh lecture slides-week3&4
PPT
Dwh lecture slides-week2
PPTX
Dwh lecture slides-week1
PPT
Dwh lecture slides-week 13
PPT
Dwh lecture slides-week 12&13
PPTX
Data warehousing and mining furc
PPT
Lecture 39
PPT
Lecture 36
PPT
Lecture 28
Python tutorialfeb152012
Python tutorial
Interaction design _beyond_human_computer_interaction
Fm lecturer 13(final)
Lecture slides week14-15
Frequent itemset mining using pattern growth method
Dwh lecture slides-week15
Dwh lecture slides-week10
Dwh lecture slidesweek7&8
Dwh lecture slides-week5&6
Dwh lecture slides-week3&4
Dwh lecture slides-week2
Dwh lecture slides-week1
Dwh lecture slides-week 13
Dwh lecture slides-week 12&13
Data warehousing and mining furc
Lecture 39
Lecture 36
Lecture 28

Recently uploaded (20)

PPTX
Artificial Intelligence
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PPT
Total quality management ppt for engineering students
PPTX
Current and future trends in Computer Vision.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
introduction to high performance computing
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
communication and presentation skills 01
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Artificial Intelligence
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Total quality management ppt for engineering students
Current and future trends in Computer Vision.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
III.4.1.2_The_Space_Environment.p pdffdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
UNIT 4 Total Quality Management .pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
introduction to high performance computing
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
communication and presentation skills 01
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
UNIT - 3 Total quality Management .pptx
Categorization of Factors Affecting Classification Algorithms Selection
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS

Lecture 21

  • 1. DWH-Ahsan AbdullahDWH-Ahsan Abdullah 11 Data WarehousingData Warehousing Lecture-21Lecture-21 Introduction to Data Quality Management (DQM)Introduction to Data Quality Management (DQM) Virtual University of PakistanVirtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan101@yahoo.com
  • 2. DWH-Ahsan Abdullah 2 Introduction to Data QualityIntroduction to Data Quality Management (DQM)Management (DQM)
  • 3. DWH-Ahsan Abdullah 3 What is Quality? InformallyWhat is Quality? Informally Some things are better than others i.e. they are ofSome things are better than others i.e. they are of higher quality. How much “better” is better?higher quality. How much “better” is better? Is the right item the best item to purchase? HowIs the right item the best item to purchase? How about after the purchase?about after the purchase? What is quality of service? The bank exampleWhat is quality of service? The bank example
  • 4. DWH-Ahsan Abdullah 4 What is Quality? FormallyWhat is Quality? Formally “Quality is conformance to requirements” P. Crosby, “Quality is Free” 1979 “Degree of excellence” Webster’s Third New International Dictionary
  • 5. DWH-Ahsan Abdullah 5 What is Quality? Examples from Auto IndustryWhat is Quality? Examples from Auto Industry Quality means meeting customer’s needs, not necessarily exceeding them. Quality means improving things customers care about, because that makes their lives easier and more comfortable. Why example from auto-industry?
  • 6. DWH-Ahsan Abdullah 6 What is Data Quality?What is Data Quality? Muhammad Khan Height = 5’8” Weight = 160 lbs Gender = Male Age = 35 yrs Emp_ID = 440 All data is an abstraction of something real What is Data? Note Change the picture
  • 7. DWH-Ahsan Abdullah 7 What is Data Quality?What is Data Quality? Intrinsic Data Quality Electronic reproduction of reality. Realistic Data Quality Degree of utility or value of data to business.
  • 8. DWH-Ahsan Abdullah 8 Data Quality & OrganizationsData Quality & Organizations Intelligent Learning Organization: High-quality data is an open, shared resource with value- adding processes. The dysfunctional learning organization: Low-quality data is a proprietary resource with cost-adding processes. {Comment: Put picture of person in water holding round tube with data written on it}
  • 9. DWH-Ahsan Abdullah 9 Law #1 - “Data that is not used cannot be correct!” Law #2 - “Data quality is a function of its use, not its collection!” Law #3 - “Data will be no better than its most stringent use!” Law #4 - “Data quality problems increase with the age of the system!” Law #5 – “The less likely something is to occur, the more traumatic it will be when it happens!” Orr’s Laws of Data QualityOrr’s Laws of Data Quality
  • 10. DWH-Ahsan Abdullah 10 Total Quality Control (TQM)Total Quality Control (TQM) Philosophy of involving all forPhilosophy of involving all for systematicsystematic andand continuouscontinuous improvement.improvement. It is customer oriented. Why?It is customer oriented. Why? TQM incorporates the concept of product quality,TQM incorporates the concept of product quality, process control, quality assurance, and qualityprocess control, quality assurance, and quality improvement.improvement. Quality assurance isQuality assurance is NOTNOT Quality improvementQuality improvement
  • 11. DWH-Ahsan Abdullah 11 Co$t of fixing data qualityCo$t of fixing data quality Lowest Quality Highest quality Costofachievingquality  Defect minimization is economical.  Defect elimination is very very expensive. Exponential rise in cost
  • 12. DWH-Ahsan Abdullah 12 Co$t of Data Quality DefectsCo$t of Data Quality Defects  Controllable CostsControllable Costs  Recurring costs for analyzing, correcting, and preventingRecurring costs for analyzing, correcting, and preventing data errorsdata errors  Resultant CostsResultant Costs  Internal and external failure costs of business opportunitiesInternal and external failure costs of business opportunities missed.missed.  Equipment & Training CostsEquipment & Training Costs
  • 13. DWH-Ahsan Abdullah 13 Where data quality is critical?Where data quality is critical? Almost everywhere, some examples:Almost everywhere, some examples: Marketing communications.Marketing communications. Customer matching.Customer matching. Retail house-holding.Retail house-holding. Combining MIS systems after acquisition.Combining MIS systems after acquisition.
  • 14. DWH-Ahsan Abdullah 14 Characteristics or Dimensions of Data QualityCharacteristics or Dimensions of Data Quality Data Quality Characteristic Definition Accuracy Qualitatively assessing lack of error, high accuracy corresponding to small error. Completeness The degree to which values are present in the attributes that require them.
  • 15. DWH-Ahsan Abdullah 15 Completeness Vs AccuracyCompleteness Vs Accuracy 95% accurate and 100% complete OR 100% accurate and 95% complete Which is better? Depends on data quality (i) tolerances,Depends on data quality (i) tolerances, the (ii) corresponding application and the (iii) cost ofthe (ii) corresponding application and the (iii) cost of achieving that data quality vs. the (iv) business value.achieving that data quality vs. the (iv) business value.
  • 16. DWH-Ahsan Abdullah 16 Characteristics or Dimensions of Data QualityCharacteristics or Dimensions of Data Quality Data Quality Characteristic Definition Consistency A measure of the degree to which a set of data satisfies a set of constraints. Timeliness A measure of how current or up to date the data is. Uniqueness The state of being only one of its kind or being without an equal or parallel. Interpretability The extent to which data is in appropriate languages, symbols, and units, and the definitions are clear. Accessibility The extent to which data is available, or easily and quickly retrievable Objectivity The extent to which data is unbiased, unprejudiced, and impartial

Editor's Notes

  • #15: <number>
  • #17: <number>