SlideShare a Scribd company logo
FME Driven Metadata and
Data Governance
FME
User
Conference
20
22
Peter Veenstra - Director of Data Architecture
31 years in geo-spatial industry - analyst, programmer, architect, consultant, director
Love my family, hockey, mapping, GIS, data, fly-fishing, drawing, FME & the world we live in …
Pivvot
Location Intelligence, Software as a Service Company (4 years,
20 people, remote)
Terracon
Geotechnical, Environmental, Facilities, Materials Engineering
Company (50+ years, 5000+ people, 175 offices/remote)
20
22
FME
User
Conference
Making informed decisions
requires excellent data
aPPropriate
PRECISE
Pedigree
Provenance
20
22
FME
User
Conference
Informal Poll:
How many people/organizations here actively track metadata for your data
layers?
If yes, is it done at the table level or the row level or something else?
If yes, how do you document it?
If no, why not?
20
22
FME
User
Conference
1. FME is the data tool
(for us)
2. Row level metadata is
requirement for
excellent data …
The two things that I would hope you would
take away from this presentation are …
20
22
FME
User
Conference
Proposition …
Excellent data is driven by FME and Metadata …
● Our data environment and how we use FME to
manage data.
● How we create, track, and use Metadata.
20
22
FME
User
Conference
Part 1: Our data environment
20
22
FME
User
Conference
Pivvot Data in our platform …
20
22
FME
User
Conference
20
22
FME
User
Conference
FME Cloud
The Stack
AWS RDS AWS S3 AWS WS AWS SMS AWS Quicksite
FME WS
20
22
FME
User
Conference
20
22
FME
User
Conference
Rules around
how/why/when/where we add
data …
● We do not edit source data
○ we flag completely fubar features and store them
someplace else
● We merge datasets
○ From separate source keeping the best by numbers,
attributes, timeliness
● We deprecate datasets, agencies, sources
● Data is fit-for-purpose from trusted
agencies who publish on a regular basis
● Rules, procedures, how-too’s all
documented in Confluence
● All ingestion performed by FME
● All validation performed by FME
The Rules
20
22
FME
User
Conference
Data Source
Data Source
Data Source
Pivvot
Dataset
SCRATCH WHARF VERA
DEV STAGE PROD
•Zero Records
•Data Source exists
•Data Load does NOT already exist
•Subtype Validity
•Polygon/Line Crossing 180° longitude
•Invalid Geometry
•Proper SRID assigned
•Missing Metadata
•Invalid geometry format
•POLYGON characteristics
•DEM s
20
22
FME
User
Conference
Why FME?
What other tool can handle the
types of data we consume?
● 3218+ File Geodatabases in a Directory
● ArcGIS Online Web Services
● Shape Files
● Streaming Data in GeoJSON Format
● Blobs stored in Microsoft Azure
● GeoTiffs and COGTiffs in S3
● Raster Elevation Models
● Lidar Point Clouds
● CSV Files
● KMZ Files
● GeoPackage Files
Or the data we write too … and how?
● GeoTiffs and COGeoTiffs in S3 (support for rasters in Geoserver as layers)
● PostGIS Vector Data (Simplified for the Web) - 5m simplification
● 3 Band PostGIS Rasters (elevation, aspect, slope) from USGS DEM
● Composite_Parcels (Fuzzy Matcher) from Parcels using proximity and owner name
● All in the Cloud
20
22
FME
User
Conference
A set of workbenches that solve the problems
● Each data source has it’s own ingestion workbench (947 + )
● Each workbench once proven is scheduled to run in FME Cloud
TODO
Show fuzzy matcher workbench (for location and name) -
highlight metadata ingestion
Show interconnect queue workbench (highlight metadata
ingestion.
Show Data DIctionary publication Workbench.
Show USGS Stream Gauge Livestream Workbench (hightlight
metadata ingestion)
20
22
FME
User
Conference
Part 2: Metadata and Excellent Data
20
22
FME
User
Conference
Metadata: What is it? Why is it important?
● Metadata describes (for Pivvot) …
○ The Provenance (origin, processing, agency) and,
○ The Pedigree (completeness, accuracy, timeliness) of the Data
● This is important at the ROW level (not the table)
○ Pivvot combines data from different ‘sources’
■ Who did we get the data from, are they reliable, trusted, regular, consistent?
○ Some Data are processed differently at different times to be merged into a single dataset
■ There are FEDERAL Datasets, and STATE Datasets (some show more data, and better attributed data)
○ Understanding at a minimum the Source Agency survey and publication data, the Pivvot last
checked and last downloaded date
■ Now we can work the 30/90 day re-check, if found, download, process (with FME), validate (with FME),
assign a metadata ID to each record, rip-replace the old agency data with the new data
20
22
FME
User
Conference
Each record in each table is
assigned a metadata_pivvot_id
(GUID)
● This ID is common to all PIVVOT METADATA tables and
idenifies:
○ What agency the data came from
○ The File/URL source of the data (one agency can
produce many)
○ The table the data will be loaded in
○ The LOAD that the data was placed into the table
with including the:
■ The date the data was last checked
■ The date the data was last loaded
metadata_pivvot_id
AGENCY
SOURCE
LOAD
TABLE
20
22
FME
User
Conference
Metadata Tracking
Show Example of Metadata Pivvot ID and Standard Attribtues
Show Transformer
Show FME Cloud Interface for utilizing
20
22
FME
User
Conference
Operationally
● Roll-backs
● Data Source Tracing
● Querying the Database faster …
metadata_pivvot_id in
(select pivvot_id from md.vw_data_load
where table_physical_nm = ' gssurgo_chorizon '
and data_source_nm like ' NC%'
order by agency_publish_dt desc limit 1)
metadata_pivvot_id
20
22
FME
User
Conference
How does metadata
lead to excellent data?
Assume that the data from the source is good and
fit-for-purpose (curation, research)
Assume that the data has been QA/QC’ed for use in
the platform
Track each load, from each source, for each agency
including processing/transformation/projection as
performed and logged by FME
We have the knowledge of provenance and
pedigree of each tuple of data in our database and
can extract, delete, roll-back any push to
production based on that metadata
= Defensibility
20
22
FME
User
Conference
Summary
No data is perfect
Geospatial data is inherently an abstraction of the real-world
Our goal is to provide the ‘best’ and most ‘up-to-date’ from
source to platform on a consistent and regular basis.
The only way for us to achieve this at scale, in a
growth-mindset, for geospatial is FME and row-level Metadata
aPPropriate
PRECISE
Pedigree
Provenance
Thank You!
pveenstra@pivvot.com

More Related Content

PDF
Driving the Where & the What of Your Data.pptx.pdf
PPTX
Big data meet_up_08042016
PDF
Data Discovery and Metadata
PDF
An Introduction to All Data Enterprise Integration
PDF
From Data to Maps to Docs: Turn Days into Minutes with Automated Integration
PDF
Metadata Matters! What it is and How to Manage it
PDF
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
PPTX
SplunkLive! Munich 2018: Data Onboarding Overview
Driving the Where & the What of Your Data.pptx.pdf
Big data meet_up_08042016
Data Discovery and Metadata
An Introduction to All Data Enterprise Integration
From Data to Maps to Docs: Turn Days into Minutes with Automated Integration
Metadata Matters! What it is and How to Manage it
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
SplunkLive! Munich 2018: Data Onboarding Overview

Similar to FME Driven Metadata & Data Governance (20)

PPT
eBay EDW元数据管理及应用
PDF
Powering Real-Time Decisions with Continuous Data Streams
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Scaling Your Data: Data Democratisation and DataOps
PDF
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
PDF
Leveraging AI to Simplify and Speed Up ETL Testing
PPTX
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
PDF
Scaling the Peak - AWS, FME & Snowflake Spatial
PDF
Visualize and Analyze Apache Geode Real-time and Historical Metrics with Grafana
PDF
Visualize and Analyze Apache Geode Real-time and Historical Metrics
PDF
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
PPT
UNIT - 1 : Part 1: Data Warehousing and Data Mining
PDF
Uber Geo spatial data platform at DataWorks Summit
PDF
Data Virtualization: An Introduction
PPTX
SplunkLive! Presentation - Data Onboarding with Splunk
PDF
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
PPTX
MMM Data Proposal 06 05 2024 test test.pptx
PPTX
StreamCentral for the IT Professional
PDF
Maximizing Your Data’s Potential: DOTs & DPWs Edition
PDF
Temporal Analysis Health and Risk Assessment
eBay EDW元数据管理及应用
Powering Real-Time Decisions with Continuous Data Streams
Advanced Analytics and Machine Learning with Data Virtualization
Scaling Your Data: Data Democratisation and DataOps
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Leveraging AI to Simplify and Speed Up ETL Testing
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Scaling the Peak - AWS, FME & Snowflake Spatial
Visualize and Analyze Apache Geode Real-time and Historical Metrics with Grafana
Visualize and Analyze Apache Geode Real-time and Historical Metrics
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
UNIT - 1 : Part 1: Data Warehousing and Data Mining
Uber Geo spatial data platform at DataWorks Summit
Data Virtualization: An Introduction
SplunkLive! Presentation - Data Onboarding with Splunk
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
MMM Data Proposal 06 05 2024 test test.pptx
StreamCentral for the IT Professional
Maximizing Your Data’s Potential: DOTs & DPWs Edition
Temporal Analysis Health and Risk Assessment
Ad

More from Safe Software (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
PDF
Notification System for Construction Logistics Application
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
PDF
FME in Overdrive - Peak of Data & AI 2025
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
PDF
Pipeline Industry IoT - Real Time Data Monitoring
PDF
FME in Overdrive: Unleashing the Power of Parallel Processing
PDF
Fiber to the People! By Deutsche Telekom
PDF
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
PDF
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Getting Started with Data Integration: FME Form 101
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Notification System for Construction Logistics Application
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Transforming Utility Networks: Large-scale Data Migrations with FME
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
FME in Overdrive - Peak of Data & AI 2025
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Pipeline Industry IoT - Real Time Data Monitoring
FME in Overdrive: Unleashing the Power of Parallel Processing
Fiber to the People! By Deutsche Telekom
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
5 Things to Consider When Deploying AI in Your Enterprise
Ad

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Spectroscopy.pptx food analysis technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectroscopy.pptx food analysis technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction

FME Driven Metadata & Data Governance

  • 1. FME Driven Metadata and Data Governance
  • 2. FME User Conference 20 22 Peter Veenstra - Director of Data Architecture 31 years in geo-spatial industry - analyst, programmer, architect, consultant, director Love my family, hockey, mapping, GIS, data, fly-fishing, drawing, FME & the world we live in … Pivvot Location Intelligence, Software as a Service Company (4 years, 20 people, remote) Terracon Geotechnical, Environmental, Facilities, Materials Engineering Company (50+ years, 5000+ people, 175 offices/remote)
  • 3. 20 22 FME User Conference Making informed decisions requires excellent data aPPropriate PRECISE Pedigree Provenance
  • 4. 20 22 FME User Conference Informal Poll: How many people/organizations here actively track metadata for your data layers? If yes, is it done at the table level or the row level or something else? If yes, how do you document it? If no, why not?
  • 5. 20 22 FME User Conference 1. FME is the data tool (for us) 2. Row level metadata is requirement for excellent data … The two things that I would hope you would take away from this presentation are …
  • 6. 20 22 FME User Conference Proposition … Excellent data is driven by FME and Metadata … ● Our data environment and how we use FME to manage data. ● How we create, track, and use Metadata.
  • 10. 20 22 FME User Conference FME Cloud The Stack AWS RDS AWS S3 AWS WS AWS SMS AWS Quicksite FME WS
  • 12. 20 22 FME User Conference Rules around how/why/when/where we add data … ● We do not edit source data ○ we flag completely fubar features and store them someplace else ● We merge datasets ○ From separate source keeping the best by numbers, attributes, timeliness ● We deprecate datasets, agencies, sources ● Data is fit-for-purpose from trusted agencies who publish on a regular basis ● Rules, procedures, how-too’s all documented in Confluence ● All ingestion performed by FME ● All validation performed by FME The Rules
  • 13. 20 22 FME User Conference Data Source Data Source Data Source Pivvot Dataset SCRATCH WHARF VERA DEV STAGE PROD •Zero Records •Data Source exists •Data Load does NOT already exist •Subtype Validity •Polygon/Line Crossing 180° longitude •Invalid Geometry •Proper SRID assigned •Missing Metadata •Invalid geometry format •POLYGON characteristics •DEM s
  • 14. 20 22 FME User Conference Why FME? What other tool can handle the types of data we consume? ● 3218+ File Geodatabases in a Directory ● ArcGIS Online Web Services ● Shape Files ● Streaming Data in GeoJSON Format ● Blobs stored in Microsoft Azure ● GeoTiffs and COGTiffs in S3 ● Raster Elevation Models ● Lidar Point Clouds ● CSV Files ● KMZ Files ● GeoPackage Files Or the data we write too … and how? ● GeoTiffs and COGeoTiffs in S3 (support for rasters in Geoserver as layers) ● PostGIS Vector Data (Simplified for the Web) - 5m simplification ● 3 Band PostGIS Rasters (elevation, aspect, slope) from USGS DEM ● Composite_Parcels (Fuzzy Matcher) from Parcels using proximity and owner name ● All in the Cloud
  • 15. 20 22 FME User Conference A set of workbenches that solve the problems ● Each data source has it’s own ingestion workbench (947 + ) ● Each workbench once proven is scheduled to run in FME Cloud TODO Show fuzzy matcher workbench (for location and name) - highlight metadata ingestion Show interconnect queue workbench (highlight metadata ingestion. Show Data DIctionary publication Workbench. Show USGS Stream Gauge Livestream Workbench (hightlight metadata ingestion)
  • 17. 20 22 FME User Conference Metadata: What is it? Why is it important? ● Metadata describes (for Pivvot) … ○ The Provenance (origin, processing, agency) and, ○ The Pedigree (completeness, accuracy, timeliness) of the Data ● This is important at the ROW level (not the table) ○ Pivvot combines data from different ‘sources’ ■ Who did we get the data from, are they reliable, trusted, regular, consistent? ○ Some Data are processed differently at different times to be merged into a single dataset ■ There are FEDERAL Datasets, and STATE Datasets (some show more data, and better attributed data) ○ Understanding at a minimum the Source Agency survey and publication data, the Pivvot last checked and last downloaded date ■ Now we can work the 30/90 day re-check, if found, download, process (with FME), validate (with FME), assign a metadata ID to each record, rip-replace the old agency data with the new data
  • 18. 20 22 FME User Conference Each record in each table is assigned a metadata_pivvot_id (GUID) ● This ID is common to all PIVVOT METADATA tables and idenifies: ○ What agency the data came from ○ The File/URL source of the data (one agency can produce many) ○ The table the data will be loaded in ○ The LOAD that the data was placed into the table with including the: ■ The date the data was last checked ■ The date the data was last loaded metadata_pivvot_id AGENCY SOURCE LOAD TABLE
  • 19. 20 22 FME User Conference Metadata Tracking Show Example of Metadata Pivvot ID and Standard Attribtues Show Transformer Show FME Cloud Interface for utilizing
  • 20. 20 22 FME User Conference Operationally ● Roll-backs ● Data Source Tracing ● Querying the Database faster … metadata_pivvot_id in (select pivvot_id from md.vw_data_load where table_physical_nm = ' gssurgo_chorizon ' and data_source_nm like ' NC%' order by agency_publish_dt desc limit 1) metadata_pivvot_id
  • 21. 20 22 FME User Conference How does metadata lead to excellent data? Assume that the data from the source is good and fit-for-purpose (curation, research) Assume that the data has been QA/QC’ed for use in the platform Track each load, from each source, for each agency including processing/transformation/projection as performed and logged by FME We have the knowledge of provenance and pedigree of each tuple of data in our database and can extract, delete, roll-back any push to production based on that metadata = Defensibility
  • 22. 20 22 FME User Conference Summary No data is perfect Geospatial data is inherently an abstraction of the real-world Our goal is to provide the ‘best’ and most ‘up-to-date’ from source to platform on a consistent and regular basis. The only way for us to achieve this at scale, in a growth-mindset, for geospatial is FME and row-level Metadata aPPropriate PRECISE Pedigree Provenance