SlideShare a Scribd company logo
GiGo
“Fast is fine, but accuracy is everything.”
• Data Quality: How good is our data?
• Importance of Data Quality
How Good is our Data?
• Scale
• ratio of distance on a map to the equivalent distance on the earth's surface
• Primarily an output issue; at what scale do I wish to display?
• Precision or Resolution
• the exactness of measurement or description
• Determined by input; can output at lower (but not higher) resolution
• Accuracy
• the degree of correspondence between data and the real world
• Fundamentally controlled by the quality of the input
• Lineage
• The original sources for the data and the processing steps it has undergone
• Currency
• the degree to which data represents the world at the present moment in time
• Documentation or Metadata
• data about data: recording all of the above
• Standards
• Common or “agreed-to” ways of doing things
• Data built to standards is more valuable since it’s more easily shareable
Accuracy
Positional Accuracy (sometimes called Quantitative accuracy)
Spatial
• horizontal accuracy: distance from true location
• vertical accuracy: difference from true height
Temporal
• Difference from actual time and/or date
Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf.
• a feature is what the GIS/map purports it to be
• a railroad is a railroad, and not a road
• A soil sample agrees with the type mapped
Completeness--the reliability concept from experimental design/stat. inf.
• Are all instances of a feature the GIS/map claims to include, in fact, there?
• Partially a function of the criteria for including features: when does a road become a track?
• Simply put, how much data is missing?
Logical Consistency: The presence of contradictory relationships in the database
• Non-Spatial
• Some crimes recorded at place of occurrence, others at place where report taken
• Data for one country is for 2000, for another its for 2001
• Annual data series not taken on same day/month etc. (sometimes called lineage error)
• Data uses different source or estimation technique for different years (again, lineage)
• Spatial
• Overshoots and gaps in road networks or parcel polygons
5
Sources of Error
Error is the inverse of accuracy. It is a discrepancy between the
coded and actual values.
Sources
• Inherent instability of the phenomena itself
• E.g. Random variation of most phenomena
(e.g. leaf size)
• Measurement
• E.g. surveyor or instrument error
• Model used to represent data
• E.g. choice of spheroid, or classification
systems
• Data encoding and entry
• E.g. keying or digitizing errors
• Data processing
• E.g. single versus double precision;
algorithms used
• Propagation or cascading from one data set
to another
• E.g. using inaccurate layer as source for
another layer
Example for Positional Accuracy
• choice of spheroid and datum
• choice of map projection and its parameters
• accuracy of measured locations (surveying) of
features on earth
• media stability (stretching ,folding, wrinkling of
maps, photos)
• human drafting, digitizing or interpretation error
• resolution &/or accuracy of drafting/digitising
equipment
• Thinnest visible line: 0.1-0.2 millimeters
• At scale of 1:20,000 = 6.5 - 12.8 feet
(20,000 x 0.2 = 4,000mm = 4m = 12.8 feet)
• registration accuracy of tics
• machine precision: coordinate rounding error in
storage and manipulation
• other unknown
6
Currency: Is my data “up-to-date”?
• data is always relative to a specific point in time, which must be
documented.
• there are important applications for historical data (e.g. analysing trends),
so don’t necessarily trash old data
• “current” data requires a specific plan for on-going maintenance
• may be continuous, or at pre-defined points in time.
• otherwise, data becomes outdated very quickly
• currency is not really an independent quality dimension; it is
simply a factor contributing to lack of accuracy regarding
• consistency: some GIS features do not match those in the real world today
• completeness: some real world features are missing from the GIS database
Many organisations spend substantial amounts acquiring a data set without giving
any thought to how it will be maintained.
Standards: common “agreed-to” ways of doing things
• May exist for:
• Data itself [including process (the way it’s produced) and product (the outcome)]
• Utilities Data Content Standard, FGDC-STD-010-2000
• Accuracy of data
• Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD-007.3-1998
• Documentation about the data (metadata)
• Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
• Transfer of data and its documentation
• Spatial Data Transfer Standard (SDTS), FGDC-STD-002
• For symbology and presentation
• Digital Geologic Map Symbolization
• May address:
• Content (what is recorded)
• Format (how it’s recorded: file format, .tif, shapefile, etc)
• May be a product of:
• An organization’s internal actions [private or organization standards]
• An external government body (Federal Geographic Data Committee) or third sector body (Open GIS
Consortium) [public or de jure standards]
• Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [industry or
de facto standards]
http://www.fgdc.gov/standards/
8
Adopting Standards: What you should do
• Data quality achieved by adoption and use of standards: Do it!
• Common ways of doing things essential for using & sharing data internally and externally
• only federal agencies required to use FGDC standards, its optional for any others
(e.g. state, local)
• power of feds often results in adoption by everybody, although there are some noted failures
(e.g. The OSI, GOSIP, & POSIX standards in computing in the 1980s failed and were withdrawn)
• FGDC or ISO standards provide excellent starting point for local standards, and
should be adopted unless there are compelling reasons otherwise
• Standards for metadata (“documenting your data”) are the most important and
should be first priority.
• Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
• ISO Document 19115 Geographic Information-Metadata (content) and 19139, Geographic Information—Metadata—
Implementation Specification, (format for storing ISO 19115 metadata in XML format)
• If not one of these standard for metadata, adopt some standard!
Importance of Data Quality- Water Utilities
• Data Access – decision making
• Data Integration – customer information, billing, hydraulic modelling
• Data Usage
• Data Content
QA/QC
• Do not confuse the two
• Introduced into Workflows
• Ensure proactiveness
Quality Assurance
• Techniques
• Geodatabase (multiuser) – pros over shapefile
• Encapsulate data and business rules (prevent editing mistakes e.g. not allowing invalid
attributes, network connectivity and relationships (topology)

More Related Content

PPTX
TYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
PDF
Data IS the new dollar
PDF
Components of Spatial Data Quality in GIS
PPT
Final ies
PDF
PPT
Introduction to GIS systems
PDF
GI2012 cajthaml-quality
PPT
Env. mon
TYBSC IT PGIS Unit III Chapter II Data Entry and Preparation
Data IS the new dollar
Components of Spatial Data Quality in GIS
Final ies
Introduction to GIS systems
GI2012 cajthaml-quality
Env. mon

Similar to Data Quality - Garbage In Garbage Out GIGO (20)

PDF
Introduction to geographic information systems Eighth Edition. Edition Chang
PPTX
Gis
PDF
GIS Geographical Information System Basics.pdf
PPTX
Geographical information systems
PPTX
Editing_and_data_quality.pptx
PDF
Au 2007 It’S Not Cad To Gis Final
PDF
Intro To Geospatial
PPTX
TYBSC IT PGIS Unit I Chapter I- Introduction to Geographic Information Systems
PDF
GIS and Remote Sensing Training at Pitney Bowes Software
PPT
Jorelyn report in it....hehe
PPTX
Unit 4 Data Editing.pptx
PPTX
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
PPTX
Introduction and Application of GIS
PPTX
Geographic information system
PDF
PDF
A short introduction to GIS
PDF
Simplifying Data Interoperability with Geo Addressing and Enrichment
PDF
INTRODUCTION TO GIS.pdf
PDF
(eBook PDF) Introduction to Geographic Information Systems, 9th Edition
Introduction to geographic information systems Eighth Edition. Edition Chang
Gis
GIS Geographical Information System Basics.pdf
Geographical information systems
Editing_and_data_quality.pptx
Au 2007 It’S Not Cad To Gis Final
Intro To Geospatial
TYBSC IT PGIS Unit I Chapter I- Introduction to Geographic Information Systems
GIS and Remote Sensing Training at Pitney Bowes Software
Jorelyn report in it....hehe
Unit 4 Data Editing.pptx
TYBSC IT PGIS Unit II Chapter I Data Management and Processing Systems
Introduction and Application of GIS
Geographic information system
A short introduction to GIS
Simplifying Data Interoperability with Geo Addressing and Enrichment
INTRODUCTION TO GIS.pdf
(eBook PDF) Introduction to Geographic Information Systems, 9th Edition
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Unlocking AI with Model Context Protocol (MCP)
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Ad

Data Quality - Garbage In Garbage Out GIGO

  • 1. GiGo “Fast is fine, but accuracy is everything.”
  • 2. • Data Quality: How good is our data? • Importance of Data Quality
  • 3. How Good is our Data? • Scale • ratio of distance on a map to the equivalent distance on the earth's surface • Primarily an output issue; at what scale do I wish to display? • Precision or Resolution • the exactness of measurement or description • Determined by input; can output at lower (but not higher) resolution • Accuracy • the degree of correspondence between data and the real world • Fundamentally controlled by the quality of the input • Lineage • The original sources for the data and the processing steps it has undergone • Currency • the degree to which data represents the world at the present moment in time • Documentation or Metadata • data about data: recording all of the above • Standards • Common or “agreed-to” ways of doing things • Data built to standards is more valuable since it’s more easily shareable
  • 4. Accuracy Positional Accuracy (sometimes called Quantitative accuracy) Spatial • horizontal accuracy: distance from true location • vertical accuracy: difference from true height Temporal • Difference from actual time and/or date Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf. • a feature is what the GIS/map purports it to be • a railroad is a railroad, and not a road • A soil sample agrees with the type mapped Completeness--the reliability concept from experimental design/stat. inf. • Are all instances of a feature the GIS/map claims to include, in fact, there? • Partially a function of the criteria for including features: when does a road become a track? • Simply put, how much data is missing? Logical Consistency: The presence of contradictory relationships in the database • Non-Spatial • Some crimes recorded at place of occurrence, others at place where report taken • Data for one country is for 2000, for another its for 2001 • Annual data series not taken on same day/month etc. (sometimes called lineage error) • Data uses different source or estimation technique for different years (again, lineage) • Spatial • Overshoots and gaps in road networks or parcel polygons
  • 5. 5 Sources of Error Error is the inverse of accuracy. It is a discrepancy between the coded and actual values. Sources • Inherent instability of the phenomena itself • E.g. Random variation of most phenomena (e.g. leaf size) • Measurement • E.g. surveyor or instrument error • Model used to represent data • E.g. choice of spheroid, or classification systems • Data encoding and entry • E.g. keying or digitizing errors • Data processing • E.g. single versus double precision; algorithms used • Propagation or cascading from one data set to another • E.g. using inaccurate layer as source for another layer Example for Positional Accuracy • choice of spheroid and datum • choice of map projection and its parameters • accuracy of measured locations (surveying) of features on earth • media stability (stretching ,folding, wrinkling of maps, photos) • human drafting, digitizing or interpretation error • resolution &/or accuracy of drafting/digitising equipment • Thinnest visible line: 0.1-0.2 millimeters • At scale of 1:20,000 = 6.5 - 12.8 feet (20,000 x 0.2 = 4,000mm = 4m = 12.8 feet) • registration accuracy of tics • machine precision: coordinate rounding error in storage and manipulation • other unknown
  • 6. 6 Currency: Is my data “up-to-date”? • data is always relative to a specific point in time, which must be documented. • there are important applications for historical data (e.g. analysing trends), so don’t necessarily trash old data • “current” data requires a specific plan for on-going maintenance • may be continuous, or at pre-defined points in time. • otherwise, data becomes outdated very quickly • currency is not really an independent quality dimension; it is simply a factor contributing to lack of accuracy regarding • consistency: some GIS features do not match those in the real world today • completeness: some real world features are missing from the GIS database Many organisations spend substantial amounts acquiring a data set without giving any thought to how it will be maintained.
  • 7. Standards: common “agreed-to” ways of doing things • May exist for: • Data itself [including process (the way it’s produced) and product (the outcome)] • Utilities Data Content Standard, FGDC-STD-010-2000 • Accuracy of data • Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD-007.3-1998 • Documentation about the data (metadata) • Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998 • Transfer of data and its documentation • Spatial Data Transfer Standard (SDTS), FGDC-STD-002 • For symbology and presentation • Digital Geologic Map Symbolization • May address: • Content (what is recorded) • Format (how it’s recorded: file format, .tif, shapefile, etc) • May be a product of: • An organization’s internal actions [private or organization standards] • An external government body (Federal Geographic Data Committee) or third sector body (Open GIS Consortium) [public or de jure standards] • Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [industry or de facto standards] http://www.fgdc.gov/standards/
  • 8. 8 Adopting Standards: What you should do • Data quality achieved by adoption and use of standards: Do it! • Common ways of doing things essential for using & sharing data internally and externally • only federal agencies required to use FGDC standards, its optional for any others (e.g. state, local) • power of feds often results in adoption by everybody, although there are some noted failures (e.g. The OSI, GOSIP, & POSIX standards in computing in the 1980s failed and were withdrawn) • FGDC or ISO standards provide excellent starting point for local standards, and should be adopted unless there are compelling reasons otherwise • Standards for metadata (“documenting your data”) are the most important and should be first priority. • Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998 • ISO Document 19115 Geographic Information-Metadata (content) and 19139, Geographic Information—Metadata— Implementation Specification, (format for storing ISO 19115 metadata in XML format) • If not one of these standard for metadata, adopt some standard!
  • 9. Importance of Data Quality- Water Utilities • Data Access – decision making • Data Integration – customer information, billing, hydraulic modelling • Data Usage • Data Content
  • 10. QA/QC • Do not confuse the two • Introduced into Workflows • Ensure proactiveness
  • 11. Quality Assurance • Techniques • Geodatabase (multiuser) – pros over shapefile • Encapsulate data and business rules (prevent editing mistakes e.g. not allowing invalid attributes, network connectivity and relationships (topology)