SlideShare a Scribd company logo
This is Where We Will
Write the Title for our
Webinar
Sienna
Emery
Solutions Specialist
Sanae
Mendoza
Solutions Specialist
Meet the Presenters
Welcome to Livestorm.
A few ways to engage with us during the webinar:
Audio issues? Click this for 4 simple
troubleshooting steps.
How to download slides
1. Hover over the
slide deck in the
webinar room
2. Click this button
Agenda
1 Introduction
2 Basics of Prompt Engineering for FME
3 Summarizing Documents
4 Auto-Extract & Organize Info from Unstructured Docs
5 Using OpenAI to Create Embeddings
6 Image Metadata Extraction & Geo-tagging
7 Conclusion
8 Resources & Next Steps
9 Q&A
Agenda
1
Introduction
“By 2025, IDC estimates there
will be 175 zettabytes of data
globally”... “with 80% of that
data being unstructured.”
- Forbes
● Data without a predefined format: text, images,
PDFs, and more.
● Hard to query, organize, or visualize without
transformation.
● Found in emails, reports, field photos, customer
forms, and more.
● Often holds critical information hidden in plain sight.
Unstructured Data & Why It Matters
Poll:
What’s your main goal for extracting
information from unstructured data?
(Select top 1-2)
Poll:
How often do you work with
unstructured data in your
day-to-day tasks?
Why Unstructured Data Is a Headache
● Manual processing is slow and error-prone.
● Data is inconsistent, non-standard, or incomplete.
● Difficult to search, filter, or analyze.
● High effort for low immediate value, unless automated.
What FME Enables:
● Extract structured meaning from messy sources.
● Automate classification, tagging, and conversion.
● Combine AI tools like OpenAI with geospatial intelligence.
● Power semantic search, dashboards, and maps from raw inputs.
The only All-Data, Any-AI Platform.
FME Form FME Flow
Data Movement and transformations
(“ETL”) workflows are built here.
Brings life to FME Form workflows
FME Flow Hosted
Safe Software managed FME Flow
fme.safe.com/platform
FME Enterprise Integration Platform
Safe & FME
FME Realize
Experience data in real world
context, in real time.
With 500+ supported data types in FME.
Unrivalled Data Support
GIS
CAD
Database
XML
Raster
3D
BIM
Web
Point
Cloud
Cloud
Big Data
IOT
Graph
BI
Indoor
Mapping
AR/VR
Generative
AI
Cloud
Native
Tabular
2
Basics of Prompt
Engineering
For FME
Prompt Engineering
1. Be clear and specific in your prompt. The more context you provide, the better the model
performs.
2. Role-based prompts (e.g., “You are a document classification assistant…”) help set
expectations for the model.
3. Use structured output whenever possible. This makes downstream processing much easier,
especially in FME, where structured JSON can be parsed into attributes using the
JSONFlattener.
4. Enumerate options. If you want consistent classification (e.g., fixed categories), provide a
defined list the model can choose from.
5. Ask for justification. Including fields like explanation or confidence_score can help with auditing
and quality assurance.
Bad Prompt
"Classify this text."
● No context or structure
● No role or output format
● No categories or justification
Better Prompt
Role-based: “You are a document classification assistant.”
Specific task: Classify text into one of these categories:
"Financial Report", "Policy Document", "Technical Manual", "Legal Agreement", "Other"
Structured output (JSON): {
"category": "<selected_category>",
"explanation": "<brief reasoning>",
"confidence_score": "<0.0 to 1.0>"
}
Enumerated options: Fixed list ensures consistency.
Auditable: Includes explanation and confidence score.
3
Summarizing
Documents
OpenAIChatGPTConnector
● OpenAI: GPT-4o
● Azure OpenAI Model Catalog:
Easy access to 1,672 models
FME & Cloud Based AI Services
AmazonBedrockConnector
● AI21 labs
● Claude
● Cohere
● Stable Diffusion
● Amazon Titan
● Llama
GoogleGeminiConnector
● Gemini 1.5 Pro
● Model Garden: Easy access to 165
models
Discover more on the FME Hub!
google-research/
albert
● Build a framework to exploit LLM models locally.
● Compliment the cloud based services.
● Data stays local reducing any security concerns.
FME & AI Services
Integrate Embedded AI Services
● Many organizations rely on lengthy PDF documents for reporting, compliance,
and operations.
● Manually summarizing this content is time-consuming and prone
to inconsistency.
● With FME, you can automate summarization using AI-powered large
language models.
● Integrate the results directly into Excel, databases, dashboards, or alerting tools.
Use Case Introduction
Article: Getting Started with AI in FME: Extracting Insights from Unstructured Documents
OpenAIConnector
The File Search API empowers the
OpenAIConnector to retrieve relevant
information from uploaded files using
combined semantic and keyword search
logic. It “grounds” responses in your own
document base—enhancing accuracy
and domain knowledge at runtime.
Documentation
Model: default gpt-5 but can use gpt-4o
Instructions: Persistent guidance that
sets the assistant’s behavior, style, or
goals for a conversation or request.
User Prompt: The specific question or
task the user wants answered in that
interaction.
Structured Output: A response format
that returns results in a defined schema
(e.g., JSON), making outputs consistent
and machine-readable.
Slide Title
Automate the
extraction of key
insights and
structured data
Goal Block Key
Summarizing Documents
Result
Valuable insights
often stay
hidden inside
large, messy
files
AI integrated
into automated
FME workflows
Minutes instead
of hours to
process reports
Demo
● Use Advanced Settings for Large Files
○ The OpenAIConnector has Advanced Settings where you can increase timeouts.
○ Helpful for processing larger files without hitting request limits.
● Adjust the Temperature Setting
○ Higher values (e.g., 0.8) → more creative, varied, and random output.
○ Lower values (e.g., 0.2) → more consistent, predictable output.
○ Keep it low for automation tasks to ensure repeatable results. Open AI Docs
Lessons Learned
4
Auto-Extract and
Organize Key Info from
Unstructured Docs
Directory and File
Pathnames Reader
Reads file paths to pass to
the OpenAIConnector
OpenAIConnector
Processes the files
and categorizes
them
FileCopy Writer
Moves the file to a
new location and
give them a proper
filename
Article: Getting Started with AI in FME: Classifying Unstructured PDF Files
Slide Title
Automatically
classify a folder
containing 1,000
unknown PDF files
into meaningful
categories
Goal Block Key
Classifying PDFs
Result
Without
automation, this
task would require
significant manual
time and effort.
Leverage FME
and AI to
automatically
classify
documents.
Now you have
organized files!
Demo
● Refine the prompt:
○ Language classification-
sometimes it would produce en,
ENGLISH or english
○ Filenames would often include
extensions
Lessons Learned
5
Using OpenAI to
Create
Embeddings
The Problem with Free-Form Text
● Notes, reports, emails, and logs are full of unstructured text.
● Keyword search is limited — if you don’t guess the exact word, results are
missed (e.g. “corrosion” ≠ “rusting”)
● People want to ask questions in their own words and still find the right
information.
A Smarter Semantic Way to Search
● Find similar meaning: e.g. “pipe is corroded” ≈ “pipe is rusted out”
● Spot anomalies: detect unusual notes or outliers automatically
● Ask in plain language: explore data through simple API queries
● Discover patterns: group related behaviors and track changes over time
FME + AI: Turn messy text into meaningful, searchable insights 🔍
Dictionary Corner
● Embeddings: Numbers that capture the meaning of text
● Embedding Models: AI services that turn text into numbers
○ OpenAI → text-embedding-3-small, text-embedding-3-large
○ Cohere → embed-english-v3.0, embed-multilingual-v3.0
● Vectors: The format those numbers are stored in
● Vector Databases: Store and search vectors (e.g. Pinecone,
Weaviate, Milvus, PostgreSQL)
● Semantic Search: Finds results by meaning, not exact words
Slide Title
Make messy
maintenance
notes
searchable by
meaning.
Goal Block Key
From Text to Vectors: Building a Semantic Search Engine
Result
Notes are free
text in a Word file
— hard to filter or
analyze.
Use FME + AI
embeddings +
Postgres
pgvector to store
and search.
Plain language
queries return
meaningful
matches, not just
keywords.
Demo
Case Study:
Semantic Search with
OpenAI and Postgres
Embeddings: Turning Unstructured Notes into Data
Demo
Case Study:
Semantic Search with
OpenAI and Postgres
“Severe conditions
requiring immediate
shutdown”
Semantic Search: Finding Insights Beyond Keywords
🔍
Lessons Learned
● Prep your text → Run QA; prep raw data for ingestion
● Prototype small → Use Sampler to save tokens during dev.
● Models matter → Balance size, accuracy, cost, and speed; keep all embeddings from the
same model.
● Normalize, or don’t → Some models do that for you.
● Performance tip → Run FME Engine close to DB to cut latency. Let the DB do the work.
● Pick the right DB types → In lieu of vector? Use TEXT or JSONB and cast it.
● Be creative → FME can bridge gaps when features aren’t out-of-the-box.
Sub-Section Name Here
● Maintenance logs in any industry.
● Customer service chat records.
● Research papers or legal documents.
● Product descriptions for e-commerce search.
● Multimedia captions and metadata.
● What else?
Better Decisions Through
Smarter Search My favorite vector database?
RepTileDB… optimized for fly
retrieval 🪰
6
Image Metadata
Extraction &
Geo-tagging
The Problem: Metadata You Can’t See
● Photos, PDFs, CAD, images =
unstructured files
● They carry embedded metadata
(time, location, device, authorship)
● Attributes are hidden inside the file
● Without extraction → no way to
trust, validate, or analyze
Why Metadata
Extraction Matters
● Metadata = structured clues inside unstructured files
● FME Readers + Transformers → expose hidden attributes
● Attributes become visible + comparable
● Enables automated QA/QC and confidence scoring
🛑 Bad Data Hurts
● Wasted effort
● Lost credibility
⚡ Fix with Metadata
● Automatic cross-checks
● Scalable QA/QC
● Trustworthy reports
Data Quality Detective Work
(49.2775,-123.1206)
Dec 20, 2024
(-33.8586, 151.2139)
Jul 3, 2020
Slide Title
Verify that
reported survey
information
matches the
actual photo
metadata.
Goal Block Key
JPEG Metadata QA: Survey Verification
Result
Survey photos
are unstructured
JPEGs with
hidden
GPS/time data,
not visible in raw
form.
Expose and
process hidden
EXIF attributes,
then compare
them to reported
survey details.
Confident,
automated QA
scoring ensures
trustworthy
survey records at
scale.
Demo
Lessons Learned
● Tame the chaos – metadata is messy; rename attributes clearly and keep only what
you need .
● Make it usable – expose and manage key values so downstream tools and AI can find
patterns, context, and even explain obscure fields.
● Handle with caution – metadata isn’t always accurate (e.g. GPS can lie), and sharing
photos can reveal more than you intended.
Sub-Section Name Here
● Asset inspections → confirm repair photos with
time & place
● Environmental monitoring → validate sample
collection metadata
● Compliance & audits → extract authorship,
edit history
● AI workflows → enrich datasets with
contextual attributes
Metadata Mysteries: Solved
7
Conclusion
Summary
● FME makes unstructured data
usable: PDFs, images, forms, and
photos
● Clean, structure, and enrich
metadata for real insights
● Combine FME with AI for
summarization, explanation, and
semantic search
● Practical workflows: categorization,
extraction, enrichment, validation
All Data. Any AI.
All Data Velocities
Batch (ETL, Reverse ETL, ...)
Event ( BPA, RPA, ...)
Stream
All Data Locations
Any Cloud
On-premises
Hybrid
Edge
Containers
Embedded
Mixed
All Data Types
Unstructured
Structured
Spatial
APIs
Web Apps
…
Any AI
Technology
OpenAI
Amazon Bedrock
Google Gemini
Ollama
Deepseek
Composite
30+
30K+
128
140+
25K+
years of solving data
challenges
FME Community
members
countries with
FME customers
organizations worldwide
global partners with
FME services
200K+
users worldwide
200K+
users worldwide
8
Resources
Get our Ebook
Spatial Data for the
Enterprise
fme.ly/gzc
Guided learning
experiences at your
fingertips
academy.safe.com
FME Academy
Resources
Check out how-to’s &
demos in the knowledge
base
support.safe.com
Knowledge Base Webinars
Upcoming &
on-demand webinars
safe.com/webinars
9
Next Steps
We’d love to help you get
started.
Get in touch with us at
info@safe.com
Experience the
FME Accelerator
Contact Us
A world where data is not just a
commodity but a catalyst for
real change.
fme.safe.com/accelerator
Next Steps
ClaimYour Community Badge &
Dive into the new Community!
● Get community badges for watching
webinars
● community.safe.com
● Today’s code: V91J70
● Join bingo!
community.safe.com/p/bingo25
Join the Community today!
Next Steps
10
Q&A
ThankYou
Recap of Next Steps
1 Follow us on LinkedIn!
2 Contact us
3 Experience the FME Accelerator
Please fill out our
webinar survey

More Related Content

PDF
Getting Started with Data Integration: FME Form 101
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
PDF
Notification System for Construction Logistics Application
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Getting Started with Data Integration: FME Form 101
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Notification System for Construction Logistics Application
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force

More from Safe Software (20)

PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
PDF
FME in Overdrive - Peak of Data & AI 2025
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
PDF
Pipeline Industry IoT - Real Time Data Monitoring
PDF
FME in Overdrive: Unleashing the Power of Parallel Processing
PDF
Fiber to the People! By Deutsche Telekom
PDF
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
PDF
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
PDF
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
PDF
5 Things to Consider When Deploying AI in Your Enterprise
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
PDF
Supporting the NextGen 911 Digital Transformation with FME
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
PDF
My Journey from CAD to BIM: A True Underdog Story
PDF
Modern Land & Property Management Supported by FME
PDF
Canopy Detection and Heat Stress Map in Support to Green Management of Urban ...
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Transforming Utility Networks: Large-scale Data Migrations with FME
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
FME in Overdrive - Peak of Data & AI 2025
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Pipeline Industry IoT - Real Time Data Monitoring
FME in Overdrive: Unleashing the Power of Parallel Processing
Fiber to the People! By Deutsche Telekom
Governing Geospatial Data at Scale: Optimizing ArcGIS Online with FME in Envi...
Enhancing Environmental Monitoring with Real-Time Data Integration: Leveragin...
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
5 Things to Consider When Deploying AI in Your Enterprise
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
ArcGIS Utility Network Migration - The Hunter Water Story
Supporting the NextGen 911 Digital Transformation with FME
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
My Journey from CAD to BIM: A True Underdog Story
Modern Land & Property Management Supported by FME
Canopy Detection and Heat Stress Map in Support to Green Management of Urban ...
FME as an Orchestration Tool with Principles From Data Gravity
Ad

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
STKI Israel Market Study 2025 version august
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
project resource management chapter-09.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
Hybrid model detection and classification of lung cancer
Zenith AI: Advanced Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
OMC Textile Division Presentation 2021.pptx
Programs and apps: productivity, graphics, security and other tools
STKI Israel Market Study 2025 version august
observCloud-Native Containerability and monitoring.pptx
Developing a website for English-speaking practice to English as a foreign la...
cloud_computing_Infrastucture_as_cloud_p
NewMind AI Weekly Chronicles – August ’25 Week III
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
project resource management chapter-09.pdf
The various Industrial Revolutions .pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hindi spoken digit analysis for native and non-native speakers
Univ-Connecticut-ChatGPT-Presentaion.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Ad

Taming the Chaos: How to Turn Unstructured Data into Decisions

  • 1. This is Where We Will Write the Title for our Webinar
  • 3. Welcome to Livestorm. A few ways to engage with us during the webinar: Audio issues? Click this for 4 simple troubleshooting steps.
  • 4. How to download slides 1. Hover over the slide deck in the webinar room 2. Click this button
  • 5. Agenda 1 Introduction 2 Basics of Prompt Engineering for FME 3 Summarizing Documents 4 Auto-Extract & Organize Info from Unstructured Docs 5 Using OpenAI to Create Embeddings 6 Image Metadata Extraction & Geo-tagging 7 Conclusion 8 Resources & Next Steps 9 Q&A Agenda
  • 7. “By 2025, IDC estimates there will be 175 zettabytes of data globally”... “with 80% of that data being unstructured.” - Forbes
  • 8. ● Data without a predefined format: text, images, PDFs, and more. ● Hard to query, organize, or visualize without transformation. ● Found in emails, reports, field photos, customer forms, and more. ● Often holds critical information hidden in plain sight. Unstructured Data & Why It Matters
  • 9. Poll: What’s your main goal for extracting information from unstructured data? (Select top 1-2)
  • 10. Poll: How often do you work with unstructured data in your day-to-day tasks?
  • 11. Why Unstructured Data Is a Headache ● Manual processing is slow and error-prone. ● Data is inconsistent, non-standard, or incomplete. ● Difficult to search, filter, or analyze. ● High effort for low immediate value, unless automated.
  • 12. What FME Enables: ● Extract structured meaning from messy sources. ● Automate classification, tagging, and conversion. ● Combine AI tools like OpenAI with geospatial intelligence. ● Power semantic search, dashboards, and maps from raw inputs.
  • 13. The only All-Data, Any-AI Platform. FME Form FME Flow Data Movement and transformations (“ETL”) workflows are built here. Brings life to FME Form workflows FME Flow Hosted Safe Software managed FME Flow fme.safe.com/platform FME Enterprise Integration Platform Safe & FME FME Realize Experience data in real world context, in real time.
  • 14. With 500+ supported data types in FME. Unrivalled Data Support GIS CAD Database XML Raster 3D BIM Web Point Cloud Cloud Big Data IOT Graph BI Indoor Mapping AR/VR Generative AI Cloud Native Tabular
  • 16. Prompt Engineering 1. Be clear and specific in your prompt. The more context you provide, the better the model performs. 2. Role-based prompts (e.g., “You are a document classification assistant…”) help set expectations for the model. 3. Use structured output whenever possible. This makes downstream processing much easier, especially in FME, where structured JSON can be parsed into attributes using the JSONFlattener. 4. Enumerate options. If you want consistent classification (e.g., fixed categories), provide a defined list the model can choose from. 5. Ask for justification. Including fields like explanation or confidence_score can help with auditing and quality assurance.
  • 17. Bad Prompt "Classify this text." ● No context or structure ● No role or output format ● No categories or justification
  • 18. Better Prompt Role-based: “You are a document classification assistant.” Specific task: Classify text into one of these categories: "Financial Report", "Policy Document", "Technical Manual", "Legal Agreement", "Other" Structured output (JSON): { "category": "<selected_category>", "explanation": "<brief reasoning>", "confidence_score": "<0.0 to 1.0>" } Enumerated options: Fixed list ensures consistency. Auditable: Includes explanation and confidence score.
  • 20. OpenAIChatGPTConnector ● OpenAI: GPT-4o ● Azure OpenAI Model Catalog: Easy access to 1,672 models FME & Cloud Based AI Services AmazonBedrockConnector ● AI21 labs ● Claude ● Cohere ● Stable Diffusion ● Amazon Titan ● Llama GoogleGeminiConnector ● Gemini 1.5 Pro ● Model Garden: Easy access to 165 models Discover more on the FME Hub!
  • 21. google-research/ albert ● Build a framework to exploit LLM models locally. ● Compliment the cloud based services. ● Data stays local reducing any security concerns. FME & AI Services Integrate Embedded AI Services
  • 22. ● Many organizations rely on lengthy PDF documents for reporting, compliance, and operations. ● Manually summarizing this content is time-consuming and prone to inconsistency. ● With FME, you can automate summarization using AI-powered large language models. ● Integrate the results directly into Excel, databases, dashboards, or alerting tools. Use Case Introduction Article: Getting Started with AI in FME: Extracting Insights from Unstructured Documents
  • 23. OpenAIConnector The File Search API empowers the OpenAIConnector to retrieve relevant information from uploaded files using combined semantic and keyword search logic. It “grounds” responses in your own document base—enhancing accuracy and domain knowledge at runtime. Documentation
  • 24. Model: default gpt-5 but can use gpt-4o Instructions: Persistent guidance that sets the assistant’s behavior, style, or goals for a conversation or request. User Prompt: The specific question or task the user wants answered in that interaction. Structured Output: A response format that returns results in a defined schema (e.g., JSON), making outputs consistent and machine-readable.
  • 25. Slide Title Automate the extraction of key insights and structured data Goal Block Key Summarizing Documents Result Valuable insights often stay hidden inside large, messy files AI integrated into automated FME workflows Minutes instead of hours to process reports
  • 26. Demo
  • 27. ● Use Advanced Settings for Large Files ○ The OpenAIConnector has Advanced Settings where you can increase timeouts. ○ Helpful for processing larger files without hitting request limits. ● Adjust the Temperature Setting ○ Higher values (e.g., 0.8) → more creative, varied, and random output. ○ Lower values (e.g., 0.2) → more consistent, predictable output. ○ Keep it low for automation tasks to ensure repeatable results. Open AI Docs Lessons Learned
  • 28. 4 Auto-Extract and Organize Key Info from Unstructured Docs
  • 29. Directory and File Pathnames Reader Reads file paths to pass to the OpenAIConnector OpenAIConnector Processes the files and categorizes them FileCopy Writer Moves the file to a new location and give them a proper filename Article: Getting Started with AI in FME: Classifying Unstructured PDF Files
  • 30. Slide Title Automatically classify a folder containing 1,000 unknown PDF files into meaningful categories Goal Block Key Classifying PDFs Result Without automation, this task would require significant manual time and effort. Leverage FME and AI to automatically classify documents. Now you have organized files!
  • 31. Demo
  • 32. ● Refine the prompt: ○ Language classification- sometimes it would produce en, ENGLISH or english ○ Filenames would often include extensions Lessons Learned
  • 34. The Problem with Free-Form Text ● Notes, reports, emails, and logs are full of unstructured text. ● Keyword search is limited — if you don’t guess the exact word, results are missed (e.g. “corrosion” ≠ “rusting”) ● People want to ask questions in their own words and still find the right information.
  • 35. A Smarter Semantic Way to Search ● Find similar meaning: e.g. “pipe is corroded” ≈ “pipe is rusted out” ● Spot anomalies: detect unusual notes or outliers automatically ● Ask in plain language: explore data through simple API queries ● Discover patterns: group related behaviors and track changes over time FME + AI: Turn messy text into meaningful, searchable insights 🔍
  • 36. Dictionary Corner ● Embeddings: Numbers that capture the meaning of text ● Embedding Models: AI services that turn text into numbers ○ OpenAI → text-embedding-3-small, text-embedding-3-large ○ Cohere → embed-english-v3.0, embed-multilingual-v3.0 ● Vectors: The format those numbers are stored in ● Vector Databases: Store and search vectors (e.g. Pinecone, Weaviate, Milvus, PostgreSQL) ● Semantic Search: Finds results by meaning, not exact words
  • 37. Slide Title Make messy maintenance notes searchable by meaning. Goal Block Key From Text to Vectors: Building a Semantic Search Engine Result Notes are free text in a Word file — hard to filter or analyze. Use FME + AI embeddings + Postgres pgvector to store and search. Plain language queries return meaningful matches, not just keywords.
  • 38. Demo
  • 39. Case Study: Semantic Search with OpenAI and Postgres Embeddings: Turning Unstructured Notes into Data
  • 40. Demo
  • 41. Case Study: Semantic Search with OpenAI and Postgres “Severe conditions requiring immediate shutdown” Semantic Search: Finding Insights Beyond Keywords 🔍
  • 42. Lessons Learned ● Prep your text → Run QA; prep raw data for ingestion ● Prototype small → Use Sampler to save tokens during dev. ● Models matter → Balance size, accuracy, cost, and speed; keep all embeddings from the same model. ● Normalize, or don’t → Some models do that for you. ● Performance tip → Run FME Engine close to DB to cut latency. Let the DB do the work. ● Pick the right DB types → In lieu of vector? Use TEXT or JSONB and cast it. ● Be creative → FME can bridge gaps when features aren’t out-of-the-box.
  • 43. Sub-Section Name Here ● Maintenance logs in any industry. ● Customer service chat records. ● Research papers or legal documents. ● Product descriptions for e-commerce search. ● Multimedia captions and metadata. ● What else? Better Decisions Through Smarter Search My favorite vector database? RepTileDB… optimized for fly retrieval 🪰
  • 45. The Problem: Metadata You Can’t See ● Photos, PDFs, CAD, images = unstructured files ● They carry embedded metadata (time, location, device, authorship) ● Attributes are hidden inside the file ● Without extraction → no way to trust, validate, or analyze
  • 46. Why Metadata Extraction Matters ● Metadata = structured clues inside unstructured files ● FME Readers + Transformers → expose hidden attributes ● Attributes become visible + comparable ● Enables automated QA/QC and confidence scoring
  • 47. 🛑 Bad Data Hurts ● Wasted effort ● Lost credibility ⚡ Fix with Metadata ● Automatic cross-checks ● Scalable QA/QC ● Trustworthy reports Data Quality Detective Work (49.2775,-123.1206) Dec 20, 2024 (-33.8586, 151.2139) Jul 3, 2020
  • 48. Slide Title Verify that reported survey information matches the actual photo metadata. Goal Block Key JPEG Metadata QA: Survey Verification Result Survey photos are unstructured JPEGs with hidden GPS/time data, not visible in raw form. Expose and process hidden EXIF attributes, then compare them to reported survey details. Confident, automated QA scoring ensures trustworthy survey records at scale.
  • 49. Demo
  • 50. Lessons Learned ● Tame the chaos – metadata is messy; rename attributes clearly and keep only what you need . ● Make it usable – expose and manage key values so downstream tools and AI can find patterns, context, and even explain obscure fields. ● Handle with caution – metadata isn’t always accurate (e.g. GPS can lie), and sharing photos can reveal more than you intended.
  • 51. Sub-Section Name Here ● Asset inspections → confirm repair photos with time & place ● Environmental monitoring → validate sample collection metadata ● Compliance & audits → extract authorship, edit history ● AI workflows → enrich datasets with contextual attributes Metadata Mysteries: Solved
  • 53. Summary ● FME makes unstructured data usable: PDFs, images, forms, and photos ● Clean, structure, and enrich metadata for real insights ● Combine FME with AI for summarization, explanation, and semantic search ● Practical workflows: categorization, extraction, enrichment, validation
  • 54. All Data. Any AI. All Data Velocities Batch (ETL, Reverse ETL, ...) Event ( BPA, RPA, ...) Stream All Data Locations Any Cloud On-premises Hybrid Edge Containers Embedded Mixed All Data Types Unstructured Structured Spatial APIs Web Apps … Any AI Technology OpenAI Amazon Bedrock Google Gemini Ollama Deepseek Composite
  • 55. 30+ 30K+ 128 140+ 25K+ years of solving data challenges FME Community members countries with FME customers organizations worldwide global partners with FME services 200K+ users worldwide 200K+ users worldwide
  • 57. Get our Ebook Spatial Data for the Enterprise fme.ly/gzc Guided learning experiences at your fingertips academy.safe.com FME Academy Resources Check out how-to’s & demos in the knowledge base support.safe.com Knowledge Base Webinars Upcoming & on-demand webinars safe.com/webinars
  • 59. We’d love to help you get started. Get in touch with us at info@safe.com Experience the FME Accelerator Contact Us A world where data is not just a commodity but a catalyst for real change. fme.safe.com/accelerator Next Steps
  • 60. ClaimYour Community Badge & Dive into the new Community! ● Get community badges for watching webinars ● community.safe.com ● Today’s code: V91J70 ● Join bingo! community.safe.com/p/bingo25 Join the Community today! Next Steps
  • 62. ThankYou Recap of Next Steps 1 Follow us on LinkedIn! 2 Contact us 3 Experience the FME Accelerator Please fill out our webinar survey