2. • Data Quality: How good is our data?
• Importance of Data Quality
3. How Good is our Data?
• Scale
• ratio of distance on a map to the equivalent distance on the earth's surface
• Primarily an output issue; at what scale do I wish to display?
• Precision or Resolution
• the exactness of measurement or description
• Determined by input; can output at lower (but not higher) resolution
• Accuracy
• the degree of correspondence between data and the real world
• Fundamentally controlled by the quality of the input
• Lineage
• The original sources for the data and the processing steps it has undergone
• Currency
• the degree to which data represents the world at the present moment in time
• Documentation or Metadata
• data about data: recording all of the above
• Standards
• Common or “agreed-to” ways of doing things
• Data built to standards is more valuable since it’s more easily shareable
4. Accuracy
Positional Accuracy (sometimes called Quantitative accuracy)
Spatial
• horizontal accuracy: distance from true location
• vertical accuracy: difference from true height
Temporal
• Difference from actual time and/or date
Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf.
• a feature is what the GIS/map purports it to be
• a railroad is a railroad, and not a road
• A soil sample agrees with the type mapped
Completeness--the reliability concept from experimental design/stat. inf.
• Are all instances of a feature the GIS/map claims to include, in fact, there?
• Partially a function of the criteria for including features: when does a road become a track?
• Simply put, how much data is missing?
Logical Consistency: The presence of contradictory relationships in the database
• Non-Spatial
• Some crimes recorded at place of occurrence, others at place where report taken
• Data for one country is for 2000, for another its for 2001
• Annual data series not taken on same day/month etc. (sometimes called lineage error)
• Data uses different source or estimation technique for different years (again, lineage)
• Spatial
• Overshoots and gaps in road networks or parcel polygons
5. 5
Sources of Error
Error is the inverse of accuracy. It is a discrepancy between the
coded and actual values.
Sources
• Inherent instability of the phenomena itself
• E.g. Random variation of most phenomena
(e.g. leaf size)
• Measurement
• E.g. surveyor or instrument error
• Model used to represent data
• E.g. choice of spheroid, or classification
systems
• Data encoding and entry
• E.g. keying or digitizing errors
• Data processing
• E.g. single versus double precision;
algorithms used
• Propagation or cascading from one data set
to another
• E.g. using inaccurate layer as source for
another layer
Example for Positional Accuracy
• choice of spheroid and datum
• choice of map projection and its parameters
• accuracy of measured locations (surveying) of
features on earth
• media stability (stretching ,folding, wrinkling of
maps, photos)
• human drafting, digitizing or interpretation error
• resolution &/or accuracy of drafting/digitising
equipment
• Thinnest visible line: 0.1-0.2 millimeters
• At scale of 1:20,000 = 6.5 - 12.8 feet
(20,000 x 0.2 = 4,000mm = 4m = 12.8 feet)
• registration accuracy of tics
• machine precision: coordinate rounding error in
storage and manipulation
• other unknown
6. 6
Currency: Is my data “up-to-date”?
• data is always relative to a specific point in time, which must be
documented.
• there are important applications for historical data (e.g. analysing trends),
so don’t necessarily trash old data
• “current” data requires a specific plan for on-going maintenance
• may be continuous, or at pre-defined points in time.
• otherwise, data becomes outdated very quickly
• currency is not really an independent quality dimension; it is
simply a factor contributing to lack of accuracy regarding
• consistency: some GIS features do not match those in the real world today
• completeness: some real world features are missing from the GIS database
Many organisations spend substantial amounts acquiring a data set without giving
any thought to how it will be maintained.
7. Standards: common “agreed-to” ways of doing things
• May exist for:
• Data itself [including process (the way it’s produced) and product (the outcome)]
• Utilities Data Content Standard, FGDC-STD-010-2000
• Accuracy of data
• Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD-007.3-1998
• Documentation about the data (metadata)
• Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
• Transfer of data and its documentation
• Spatial Data Transfer Standard (SDTS), FGDC-STD-002
• For symbology and presentation
• Digital Geologic Map Symbolization
• May address:
• Content (what is recorded)
• Format (how it’s recorded: file format, .tif, shapefile, etc)
• May be a product of:
• An organization’s internal actions [private or organization standards]
• An external government body (Federal Geographic Data Committee) or third sector body (Open GIS
Consortium) [public or de jure standards]
• Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [industry or
de facto standards]
http://www.fgdc.gov/standards/
8. 8
Adopting Standards: What you should do
• Data quality achieved by adoption and use of standards: Do it!
• Common ways of doing things essential for using & sharing data internally and externally
• only federal agencies required to use FGDC standards, its optional for any others
(e.g. state, local)
• power of feds often results in adoption by everybody, although there are some noted failures
(e.g. The OSI, GOSIP, & POSIX standards in computing in the 1980s failed and were withdrawn)
• FGDC or ISO standards provide excellent starting point for local standards, and
should be adopted unless there are compelling reasons otherwise
• Standards for metadata (“documenting your data”) are the most important and
should be first priority.
• Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
• ISO Document 19115 Geographic Information-Metadata (content) and 19139, Geographic Information—Metadata—
Implementation Specification, (format for storing ISO 19115 metadata in XML format)
• If not one of these standard for metadata, adopt some standard!
9. Importance of Data Quality- Water Utilities
• Data Access – decision making
• Data Integration – customer information, billing, hydraulic modelling
• Data Usage
• Data Content
10. QA/QC
• Do not confuse the two
• Introduced into Workflows
• Ensure proactiveness
11. Quality Assurance
• Techniques
• Geodatabase (multiuser) – pros over shapefile
• Encapsulate data and business rules (prevent editing mistakes e.g. not allowing invalid
attributes, network connectivity and relationships (topology)