Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Information from Digital Assets Automatically in AEM

#evolve19
GOING BEYOND METADATA:
EXTRACTING MEANINGFUL
INFORMATION FROM YOUR
DIGITAL ASSETS
PAUL LEGAN
August 7th, 2019

#evolve19 2
DIGITAL ASSET MANAGEMENT
REALLY, IT MAKES THIS PROCESS EASIER.
Find an
existing asset
or set of asset
artifacts
Alter an
existing or
create a new
creative asset
Generate
variations for
different
audiences
Publish this
asset for an
appropriate
duration
Discovery Creation Automation Publication

#evolve19 3
• Supports workflows that allow for
content modification
• Reduces costs of asset creation
and distribution
• Automates tedious tasks like
thumbnail generation
• Increases marketing throughput
for content variations and
personalization
• Increases creative autonomy
DIGITAL ASSET MANAGEMENT
LET’S START WITH THE BENEFITS

#evolve19 4
IF IT’S SO GREAT, WHY ISN’T IT EASY?
WE CAN ALL PROBABLY NAME A FEW REASONS.

#evolve19 5
“Let’s all use in-progress folders.”
ISSUE #1: ORGANIZATION
NAMING CONVENTIONS AND FOLDER STRUCTURE
→
“We can delete this later.”

#evolve19 6
ISSUE #2: INCONSISTENCY
TRAINING + USAGE GUIDELINES
No validation
Poor Naming
Conventions
Number Duplication
Unused Fields

#evolve19 7
ISSUE #3: MYOPIA
THINK BEYOND THE CURRENT USE CASE
Tag Redundancy
Folder Mismatches
No Scheduled Cleanup

#evolve19 8
MULTI-TOOL OF CHOICE: METADATA
WE CAN ALL PROBABLY NAME A FEW REASONS.

#evolve19 9
THE GENRE PROBLEM
ID3, WINAMP, AND ITUNES – UNITE!
(for all of you who totally legally purchased music 20 years ago)

#evolve19 10
THE HUMBLE SCHEMA
YOUR ASSET DATA LAYER

#evolve19 11
INGESTION PROCESS
ASSET PROCESSING AT SCALE
Define a Schema
(Superset of Properties)
Define Ingestion Process
(IPTC, XMP, Validation)
Import Assets
(Auto-Tag, Pre-Fill)

#evolve19 12
INGESTION PROCESS
Define a Schema
Import Assets
Metadata Profiles
(Sensible Defaults)
Smart Organization
(Sort, Filter, Variants)
Smart Tags

#evolve19 13
INGESTION PROCESS
Define a Schema
Import Assets
Metadata Profiles
(Sensible Defaults)
Smart Organization
(Sort, Filter, Variants)
Smart Tags

#evolve19 14
• Level #1 Automation
• Helps alleviate tedious work
• Applying global tags
• Complementing IPTC/XMP
data embedded in the binaries
• Photoshoot Location
• Photographer
• Type of Asset
• Digital Rights Management
• Easy to apply at the folder or file
type level
METADATA PROFILES
SENSIBLE METADATA DEFAULTS

#evolve19 15
SMART TAGS
ADOBE I/O SMART CONTENT SERVICE
Can be trained and
training can be run on a
schedule
Auto-tag based on
object recognition

#evolve19 16
SO… HOW CAN WE GO FURTHER?
LET’S SAY YOU WANT MORE AUTOMATION.

#evolve19 17
Uses Optical
Character
Recognition (OCR)
to automatically
detect printed text
and numbers in a
scan or rendering of
a document.
AMAZON TEXTRACT
AN INTRODUCTION
Enables you to
detect key-value
pairs in documents
to retain the
inherent context of
the document
without any manual
intervention.
Returns a
confidence score
for everything it
identifies so you
can make informed
decisions about
how you want to
use the results.

#evolve19 18
LOOKING INSIDE WITH OCR
JUDGE ASSETS BY MORE THAN THEIR COVER

#evolve19 19
LOOKING INSIDE WITH OCR
JUDGE ASSETS BY MORE THAN THEIR COVER
→

#evolve19 20
STRUCTURED DATA
EMBEDDED DOCUMENT INFORMATION

#evolve19 21
STRUCTURED DATA
EMBEDDED DOCUMENT INFORMATION
driver-data.pdf

#evolve19 22
HOW IT WORKS
TECHNICAL PROCESS
Image Uploaded via API
(S3 or Base64 Bytes)
Service Analyzes Input
(Sync or Async)
ML Response Sent
(JSON Payload)
{
"Document": {
"Bytes": blob,
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
}
}
// SYNC
DetectDocumentText()
AnalyzeDocument()
// ASYNC
StartDocumentTextDetection()
GetDocumentTextDetection()
[Blocks]
[Geometry]
[Bounding Box]
[Confidence]
[Text]
[Block Type]
[ID]
[/Blocks]
→ →

#evolve19 23
HOW IT FITS IN AEM
TECHNICAL PROCESS
(Sync or Async)
ML Response Sent
(JSON Payload)
→ →
XML Binary Writeback
(If applicable)
Property Validation
(Notification, Banner)
Properties Saved to JCR
(JSON Payload)
→ →
→
AEMWorkflow

#evolve19
AEMWorkflow
24
HOW IT FITS IN AEM
TECHNICAL PROCESS
(Sync or Async)
ML Response Sent
(JSON Payload)
→ →
XML Binary Writeback
(If applicable)
Property Validation
(Notification, Banner)
Properties Saved to JCR
(JSON Payload)
→ →
→
3rd-Party DB
(Search)
Amazon Comprehend
(NLP)
Amazon Translate
(Translation)
→ →
→

#evolve19 26
HOW DO THESE TOOLS HELP?
MORE THAN YOU THINK.

#evolve19 27
BENEFITS & IMPACT
HIGHLIGHTS
-75% -60%Less Effort By Humans
Per Ingested Asset
Reduction in Calls
to IT to Deliver Assets
Tedious Data Entry
Increases the Risk of
Human Error
Reduces Margin of Error
Reduces the Time to
Find Assets and Lessens
the Dependency on IT
Better Discovery
A Scalable System is a
Usable System as
Adoption Increases
Enterprise Scale
+80%User Adoption YoY
Across Departments

#evolve19 28
FUTURE POSSIBILITIES
JUST THINKING OUT LOUD
Process Invoices
& Sales Receipts
Normalize Financial
Document Data
Automatically Redact
PII from a Claim

#evolve19 29
Links to Relevant Resources:
- https://aws.amazon.com/textract/
- https://github.com/aws-samples/amazon-textract-code-samples/
- https://github.com/aws-samples/amazon-textract-serverless-large-scale-document-
processing
MORE INFORMATION
GETTING STARTED & BEYOND

Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Information from Digital Assets Automatically in AEM

More Related Content

What's hot (20)

Similar to Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Information from Digital Assets Automatically in AEM (20)

More from Evolve The Adobe Digital Marketing Community (19)

Recently uploaded (20)

Evolve 19 | Paul Legan | Going Beyond Metadata: Extracting Meaningful Information from Digital Assets Automatically in AEM