SlideShare a Scribd company logo
FIAT/IFTA Media Management Seminar
“Game Changers? From Automation to Curation: Futureproofing AV Content”
IBM AI Overview with several Examples of
Projects in the Media and Lessons Learned
Jakob Rosinski | Lead Architect Video Solutions & Broadcast Industry Europe
Stockholm | 13.06.2018
This speech will give you an overview about client projects in the space of media archives worldwide IBM has
contributed to with it's own AI - named Watson - but also with it's knowledge and integration capabilities. Major topics are
scope definition and use case identification, further the usage of cognitive services of different kinds and vendors - with
success and open problems. In such a multi-modal approach training of services is also key, and the speech should
show how this can be managed both from a human and machine perspective.
Abstract
Jakob is the Lead Architect for Video Solutions & Broadcast Industry for IBM Services
in Europe. He is also the product owner of IBM AREMA, a workflow and essence
management solution which is widely used at different broadcasters for essence
archives and workflow automation.
Over the last decade Jakob was responsible for various projects in the media industry
at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast.
He is an expert for multi-site & multi-tier essence management and workflow
automation for ingest, archive, production & distribution.
Further he is known and valued as a subject matter expert for the topics above in the
WW IBM M&E community. He is skilled at translating business needs into systems
solutions
Video Enrichment uses industry leading AI capabilities to analyze textual, audio, and visual data
within multi-media content, and to build easily searchable metadata packages for every asset.
By understanding content in new ways, media companies can improve content discovery,
increase operational efficiency, deliver higher ad revenues, drive viewer engagement and offer
entirely new ways to meet the demands of their businesses.
Enriched content is inherently more searchable. Improved content discovery in your consumer
service leads to increased usage.
Cognitive base services used for content enrichment
Enhanced and automated
understanding of personalities
present in the frame, and objects
Activate decade-old material by
running it through the STT API and
then performing deeper analytics
Deeper understanding of concepts,
recognized entities, keywords, and
relationships
Target
Deeply
enriched
content
second-to-
second
Search for image and videodata for
not trained objects or contexts.
Visual Recognition
Audiomining & Speech
to Text
NLU & Translation
Videodetection / Speed /
Movement
Pattern Detection &
Similarity Search
A lot of vendors are providing base cognitive
services...
Visual Recognition
Audioming & Speech to
Text
NLU & Translation
Videodetection / Speed /
Movement
Pattern Detection &
Similarity Search
Rosinski ibm ai overview with several examples of projects in the media and lessons learned
7
8
9
https://www.foxsports.com/soccer/fifa-world-cup/highlights
©2018 IBM Corporation 27 June 2019 IBM Services10
Customer
MAM or DAM
Enriched metadata is delivered as an open JSON bundle to be
stored and used for search, compliance, recommendation and
other vital use cases.
Assets are acquired, ingested, processed and enriched
using the Watson Media platform.
SEMANTIC SCENE CHAPTERING
Divides the Media into meaningful chunks or chapters that can be more
easily managed by people responsible for editing or producing.
SPEECH TO TEXT
Converts audio into text, by leveraging machine intelligence to combine
information about grammar and language structure with knowledge of
the composition of the audio signal. Trainable.
NATURAL LANGUAGE UNDERSTANDING
Using the Textual output of S2T or a Close Caption File, NLU derives:
Concepts, Document-Level Emotions Sentiment, Entities, Keywords,
Language, & Taxonomy. Trainable.
VISUAL RECOGNITION
Detects the contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class description, face
detection, and text recognition. Trainable.
Watson Video
Enrichment Workflow
> > >
>>>
>>>
11
Customer
MAM or DAM
Enriched metadata is delivered as an open JSON bundle to be
stored and used for search, compliance, recommendation and
other vital use cases.
Assets are acquired, ingested, processed and enriched
using the Watson Media platform.
SEMANTIC SCENE CHAPTERING
Divides the Media into meaningful chunks or chapters that can be more
easily managed by people responsible for editing or producing.
SPEECH TO TEXT
Converts audio into text, by leveraging machine intelligence to combine
information about grammar and language structure with knowledge of
the composition of the audio signal. Trainable.
NATURAL LANGUAGE UNDERSTANDING
Using the Textual output of S2T or a Close Caption File, NLU derives:
Concepts, Document-Level Emotions Sentiment, Entities, Keywords,
Language, & Taxonomy. Trainable.
VISUAL RECOGNITION
Detects the contents of an image or video frame, answering the
question: “What is in this image?” Returns class, class description, face
detection, and text recognition. Trainable.
TONE ANALYZER & PERSONALITY INSIGHTS
Provide additional features that document the Emotional Tone, Writing
Tone, Social Tone of dialogue, as well as the overall personalities of
characters based on their words.
Watson Video Enrichment Workflow
> > >
>>>
>>>
12
13
14
Scene Detection
Deep Video-Analysis
 People-, Object and Context-Detection
 Classification of actors based on 24
emotions
 Classification of scenes based on 22.000
categories
Deep Audio-Analysis
 Background
 Actor sentiment and tone
Analysis of scene composition
 Classification of light and color
Analysis of succesful trailers
to automatically create a
new one
https://www.youtube.com/watch?v=gJEzuYynaiw
15
Concept and proving of an automatic content
enrichment system for 40+ years of soccer history
 Annotation by usage of a portfolio of cognitive solutions
 Audio: Speech-to-text / Transcript
 Audio: Speaker-Detection
 Audio: Atmosphere (cheers, whistles, ..)
 Video: Angle/Camera & Context Detection
 Video: Face- & Object Detection
 Domain trained services including Traningsportal
 Sharpening of results by knowledge of domain and
creation of timelines, identifiying of concepts
Link with Game- and Playerdata
 Optimize content analysis and search based on game
and player statistics
 Guided search.
Persona-based User Experience
 Personalized Discovery, Suggestions, Design &
Projects
Content enrichment for
Bundesliga archive
16
17
Target: Automatic content enrichment
of 30+ years of show content
Annotation by usage of a portfolio of
cognitive solutions (IBM, OpenCV)
 Audio: Speech-to-text / Transcript /
Phrase detection
 Video: Angle/Camera & Context
Detection
 Video: Face- & Object Detection
Domain trained services including
Traningsportal
Sharpening of results by knowledge of
domain and creation of timelines,
identifiying of concepts
Content enrichment for
Brazils most famous TV show
Architecture of “Captain Caption” Demo
AREMA
Speech
to Text
Deep Learning –
Sound
Recognition
Natural
Language
Understanding
Conform results into one Close Caption file
Translation into target language
L
19
Context / Solution
Frame accurate detection of trained frames of lead in and out scenes to mark those
scenes in the content and exchange those automatically in master format without
transcoding (unwrap, cut, wrap) and with appropriate audio track handling to
enable fast channel switch of content.
• Usage of own developed detection component using OpenCV and Watson VR for
frameaccurate detection of scenes.
• Usage of AREMA‘s Dalet Galaxy integration to directly pull and push content to
MAM system, no need to extend Galaxy for this purpose
• Automatically scalable by using AREMA autoscaler in combination with
Kubernetes & Docker
• Usage of AREMA MXF Package for
• metadata extraction of source file
• rewrapping / preparartion audiotrack schema of new scene
• partial cut of source file
• conforming of all parts to target file
=> very fast, no transcoding or change of audio and video streams
Use Case: “Implement a full integrated, trained
cognitive service to exchange ident in and out
scenes”
Result:
• Fully automatized exchange of scenes, deeply integrated with existing environment
• Nearly endlessly scalable as all components can run in Kubernetes/Docker environment leads to significant reduce of time and people effort and faster
change of content between programs => from 3 months (2 full-time persons) to days
Each Use Case of Multimodal Analysis has different requirements so the workflows and the
combination of AI Services have to be adopted to these requirements
 This is where the following model provides flexibility to adapt to each unique use case of
multimodal analytics
 Vendor independant usage of cognitive services
 The whole is greater than the sum of its parts (Aristoteles), but sometimes also particular
„tiny“ use cases are worth to be evaluated
 Flexible MULTIMODALITY is a must
There is no One Size Fits All
21
Elemental parts of a content
enrichment platform
Multi-Modality &
Training &
Vendorindependence
Data-Consolidation &
Monitoring
Integration
& Workflow
212121
...
Why is training necessary?
22
Why is training necessary?
- How do we tell Will Ferrell (famous actor) apart from
Chad Smith (famous rock musician)?
- Challenges include:
• Out-of-Plane Rotation: frontal, 45 degree, profile,
upside down
• Presence of beard, mustache, glasses.
• Facial Expressions
• Occlusions by long hair, hand
• In-Plane Rotation
• Image conditions: size, lighting condition, distortion,
noise, compression
Trust me, these are two non-related different people!
https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78
https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-
recognition-with-deep-learning-c3cffc121d78
A lot of vendors are providing base cognitive services...but without
individual training they do not provide sufficient benefit
Customized user
AI model
Industry/Domain AI
Model
Base AI Model
Training data size
Accuracy
70%
60%
40%
Base model
learning curve
Domain-specific model
learning curve
50%
Customer adapted model
learning curve
0
80%
90%
As the domain specializes, learning accelerates
• Public models
• Pre-trained
• Limited accuracy for
typical real life use
cases
• Trained with proprietary
data
• Data ownership critical for
differentiation
Automated TRAINING is a must
Source: Andrej Karp
Cognitive
Process with
Trainer,
Analysis
Workflow and
Aggregator
26
Cogntive
Analysis
Workflow
Cognitive
Trainer
Cogntive
Aggregator
Image
Classifier
Inbox
Taxonomy
Database
Image
Classifier
Repository
Media
Ingestion
Metadata
Repository
(MAM)
1
2
3
4
5
6
1. Configure Taxonomy (add
Classifiers, Categories, etc.)
2. Show and organize classifier
images
3. Move good classifiers to
repository to optimize training
4. Use classifier repository to
train services and perform
custom analysis
5. Move actual frame to inbox
when confidence ok
6. Use taxonomy for rule
creation
Parts for an successful content enrichment
1. By combination of
trained cognitive
serviced new valuable
metadata can be
retrieved from content
2. Automatic creation and
use of those metadata
must be included in
existing processes
3. Quality of cognitive
services and processes
must be supervisioned
Information Corpora
- Rule-based configuration
- Batch learning
- Manual labeling
- Cognitive workflow builder
- E2E Broadcast Integration
(MAM, etc.)
- Full integration into AREMA
Operations Dashboards
…
Training
Cognitive Workflow
Orchestration
Cognitive Workflow
Operations
Elementary AI Services
Cognitive Content Media Services
IBM Watson APIs 3rd Party APIs
Speech-
to-Text
NLC/
NLU*
Visual
Recogn. …
General Domain
Content Tagging
Domain-specific
Content Tagging
(3rd party)
Domain-specific
Content Tagging
(propriety)
Domain-specific
Content Tagging
(shared)
Speech
Languag
e
Visual …Watson
Media
Knowledge
Studio
Essence Files Meta Data Public Data
Other Data
sources
…
• A comparison between single cognitive services is not adequate, but the reasonable combination of
services is
• The solution approach must start with the use case given, for which the solution will be defined and
customized
• AI will not overtake all human work, but will support in the areas where automization is meaningful
• The process will be a mix of human an AI based tasks and steps
• Sufficient solutions will be created by try-out and optimization, not by waiting for the perfect
technology.
Summary
While AI can’t fully
equate the human
touch creatively, it can
optimize workflows and
media processes to
gain more value from
content.
Rosinski ibm ai overview with several examples of projects in the media and lessons learned
Rosinski ibm ai overview with several examples of projects in the media and lessons learned
31
Notes and Sources
McCaskill, Steve. “Wimbledon 2018: AI Marries Tennis Tradition With Digital Innovation.” Forbes. July 2018.
https://www.forbes.com/sites/stevemccaskill/2018/07/06/ wimbledon-marries-innovation-with-tradition-in-use-of- ai/#7686e2d92198
Moore, Mike. “Wimbledon 2018: How IBM Watson is serving up the best viewer experience.” Tech Radar. July 2018.
https://www.techradar.com/news/wimbledon-2018-how-ibm- watson-is-serving-up-the-best-viewer-experience
McCarthy, John. “IBM and Fox Sports lean on AI so fans can generate World Cup highlights packages.” The Drum. June 2018.
https://www.thedrum.com/news/2018/06/06/ibm-
and-fox-sports-lean-ai-so-fans-can-generate-world- cup-highlights-packages
Alvarez, Edgar. “Fox Sports’ World Cup Highlight Machine is powered by IBM’s Watson.” Engadget. June 2018.
https://www.engadget.com/2018/06/04/fox-sports-world- cup-highlight-machine-ibm-watson
Chang, Lulu. “IBM’s Watson will make headlines at the Masters tournament.” Digital Trends. April 2018.
https://www.digitaltrends.com/outdoors/ibm-watson-masters
Alexander, Julia, “Watch the first ever movie trailer made by artificial intelligence.” Polygon. September 2016.
https://www.polygon.com/2016/9/1/12753298/morgan- trailer-artificial-intelligence
Smith, John R. “IBM Research takes Watson to Hollywood with the first “cognitive movie trailer.” IBM. August 2016.
https://www.ibm.com/blogs/think/2016/08/cognitive- movie-trailer
“Uncovering Dark Video Data with AI: How Watson Video Enrichment can provide better decision-making data and unlock new business possibilities in
the media industry.” IBM. August 2017. https://public.dhe.ibm.com/common/ ssi/ecm/me/en/mew03018usen/uncovering-dark-data_
MEW03018USEN.pdf

More Related Content

PDF
Nuno Godinho
PDF
Hi tech it services
DOCX
PDF
Automatic multi-modal metadata annotation based on trained cognitive solution...
PDF
Prior AI consulting use cases
PPT
RichMediaPlatform.ppt RichMediaPlatform.ppt RichMediaPlatform.ppt
PDF
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
PPTX
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
Nuno Godinho
Hi tech it services
Automatic multi-modal metadata annotation based on trained cognitive solution...
Prior AI consulting use cases
RichMediaPlatform.ppt RichMediaPlatform.ppt RichMediaPlatform.ppt
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pptx

Similar to Rosinski ibm ai overview with several examples of projects in the media and lessons learned (20)

PDF
Watson API Use Case Demos for the Nittany Watson Challenge
PPTX
Evolve your app’s video experience with Azure: Processing and Video AI at scale
PPTX
InterBEE 2016: クラウドをコアにした「デジタル・トランスフォーメーション」が メディア業界に与えるインパクトとは何か?
PDF
Intro to watson bluemix services
PDF
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
PDF
Revolutionizing Communication with Video Transcription Services.pdf
PDF
Speech Recognition Dataset Spotlight: AMI Meeting Corpus
PDF
Intelligent ChatBot
PDF
Netex learningMaker | Authoring tool for HTML5 e-learning content [EN]
PDF
Deliver high-quality messaging, screen sharing, audio, and video capabilities...
PDF
[DSC Europe 22] On building a video recommendation system and other use-cases...
PDF
Mariana Alupului Inventions
PDF
AI at Scale in Enterprises
PDF
Using the power of Generative AI at scale
PDF
Artificial Intelligence on the AWS Platform
PDF
Video Data Collection Services: Driving Innovation in AI and Analytics
PDF
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
PDF
Video Data Annotation Techniques for Machine Learning
PDF
Cloud-Native Roadshow - Google - DC
PDF
Netex learningMaker | Dossier [EN]
Watson API Use Case Demos for the Nittany Watson Challenge
Evolve your app’s video experience with Azure: Processing and Video AI at scale
InterBEE 2016: クラウドをコアにした「デジタル・トランスフォーメーション」が メディア業界に与えるインパクトとは何か?
Intro to watson bluemix services
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
Revolutionizing Communication with Video Transcription Services.pdf
Speech Recognition Dataset Spotlight: AMI Meeting Corpus
Intelligent ChatBot
Netex learningMaker | Authoring tool for HTML5 e-learning content [EN]
Deliver high-quality messaging, screen sharing, audio, and video capabilities...
[DSC Europe 22] On building a video recommendation system and other use-cases...
Mariana Alupului Inventions
AI at Scale in Enterprises
Using the power of Generative AI at scale
Artificial Intelligence on the AWS Platform
Video Data Collection Services: Driving Innovation in AI and Analytics
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Video Data Annotation Techniques for Machine Learning
Cloud-Native Roadshow - Google - DC
Netex learningMaker | Dossier [EN]
Ad

More from FIAT/IFTA (20)

PPTX
2021 FIAT/IFTA Timeline Survey
PPTX
20211021 FIAT/IFTA Most Wanted List
PPTX
WARBURTON FIAT/IFTA Timeline Survey results 2020
PPTX
OOMEN MEZARIS ReTV
PPTX
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
PPTX
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
PPTX
HULSENBECK Value Use and Copyright Comission initiatives
PPT
WILSON Film digitisation at BBC Scotland
PDF
GOLODNOFF We need to make our past accessible!
PPTX
LORENZ Building an integrated digital media archive and legal deposit
PPTX
BIRATUNGANYE Shock of formats
PPTX
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
PPTX
BERGER RIPPON BBC Music memories
PDF
AOIBHINN and CHOISTIN Rehash your archive
PDF
HULSENBECK BLOM A blast from the past open up
PDF
PERVIZ Automated evolvable media console systems in digital archives
PPTX
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
PPTX
VINSON Accuracy and cost assessment for archival video transcription methods
PDF
LYCKE Artificial intelligence, hype or hope?
PDF
AZIZ BABBUCCI Let's play with the archive
2021 FIAT/IFTA Timeline Survey
20211021 FIAT/IFTA Most Wanted List
WARBURTON FIAT/IFTA Timeline Survey results 2020
OOMEN MEZARIS ReTV
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
HULSENBECK Value Use and Copyright Comission initiatives
WILSON Film digitisation at BBC Scotland
GOLODNOFF We need to make our past accessible!
LORENZ Building an integrated digital media archive and legal deposit
BIRATUNGANYE Shock of formats
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
BERGER RIPPON BBC Music memories
AOIBHINN and CHOISTIN Rehash your archive
HULSENBECK BLOM A blast from the past open up
PERVIZ Automated evolvable media console systems in digital archives
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
VINSON Accuracy and cost assessment for archival video transcription methods
LYCKE Artificial intelligence, hype or hope?
AZIZ BABBUCCI Let's play with the archive
Ad

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Computer network topology notes for revision
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
1_Introduction to advance data techniques.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Qualitative Qantitative and Mixed Methods.pptx
Introduction-to-Cloud-ComputingFinal.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Knowledge Engineering Part 1
Computer network topology notes for revision
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
1_Introduction to advance data techniques.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Fluorescence-microscope_Botany_detailed content
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu

Rosinski ibm ai overview with several examples of projects in the media and lessons learned

  • 1. FIAT/IFTA Media Management Seminar “Game Changers? From Automation to Curation: Futureproofing AV Content” IBM AI Overview with several Examples of Projects in the Media and Lessons Learned Jakob Rosinski | Lead Architect Video Solutions & Broadcast Industry Europe Stockholm | 13.06.2018
  • 2. This speech will give you an overview about client projects in the space of media archives worldwide IBM has contributed to with it's own AI - named Watson - but also with it's knowledge and integration capabilities. Major topics are scope definition and use case identification, further the usage of cognitive services of different kinds and vendors - with success and open problems. In such a multi-modal approach training of services is also key, and the speech should show how this can be managed both from a human and machine perspective. Abstract Jakob is the Lead Architect for Video Solutions & Broadcast Industry for IBM Services in Europe. He is also the product owner of IBM AREMA, a workflow and essence management solution which is widely used at different broadcasters for essence archives and workflow automation. Over the last decade Jakob was responsible for various projects in the media industry at HBO, France24, ORF, SRF, RTL Mediengruppe or Deutsche Bundesliga/Sportcast. He is an expert for multi-site & multi-tier essence management and workflow automation for ingest, archive, production & distribution. Further he is known and valued as a subject matter expert for the topics above in the WW IBM M&E community. He is skilled at translating business needs into systems solutions
  • 3. Video Enrichment uses industry leading AI capabilities to analyze textual, audio, and visual data within multi-media content, and to build easily searchable metadata packages for every asset. By understanding content in new ways, media companies can improve content discovery, increase operational efficiency, deliver higher ad revenues, drive viewer engagement and offer entirely new ways to meet the demands of their businesses. Enriched content is inherently more searchable. Improved content discovery in your consumer service leads to increased usage.
  • 4. Cognitive base services used for content enrichment Enhanced and automated understanding of personalities present in the frame, and objects Activate decade-old material by running it through the STT API and then performing deeper analytics Deeper understanding of concepts, recognized entities, keywords, and relationships Target Deeply enriched content second-to- second Search for image and videodata for not trained objects or contexts. Visual Recognition Audiomining & Speech to Text NLU & Translation Videodetection / Speed / Movement Pattern Detection & Similarity Search
  • 5. A lot of vendors are providing base cognitive services... Visual Recognition Audioming & Speech to Text NLU & Translation Videodetection / Speed / Movement Pattern Detection & Similarity Search
  • 7. 7
  • 8. 8
  • 10. ©2018 IBM Corporation 27 June 2019 IBM Services10 Customer MAM or DAM Enriched metadata is delivered as an open JSON bundle to be stored and used for search, compliance, recommendation and other vital use cases. Assets are acquired, ingested, processed and enriched using the Watson Media platform. SEMANTIC SCENE CHAPTERING Divides the Media into meaningful chunks or chapters that can be more easily managed by people responsible for editing or producing. SPEECH TO TEXT Converts audio into text, by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Trainable. NATURAL LANGUAGE UNDERSTANDING Using the Textual output of S2T or a Close Caption File, NLU derives: Concepts, Document-Level Emotions Sentiment, Entities, Keywords, Language, & Taxonomy. Trainable. VISUAL RECOGNITION Detects the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Trainable. Watson Video Enrichment Workflow > > > >>> >>>
  • 11. 11 Customer MAM or DAM Enriched metadata is delivered as an open JSON bundle to be stored and used for search, compliance, recommendation and other vital use cases. Assets are acquired, ingested, processed and enriched using the Watson Media platform. SEMANTIC SCENE CHAPTERING Divides the Media into meaningful chunks or chapters that can be more easily managed by people responsible for editing or producing. SPEECH TO TEXT Converts audio into text, by leveraging machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. Trainable. NATURAL LANGUAGE UNDERSTANDING Using the Textual output of S2T or a Close Caption File, NLU derives: Concepts, Document-Level Emotions Sentiment, Entities, Keywords, Language, & Taxonomy. Trainable. VISUAL RECOGNITION Detects the contents of an image or video frame, answering the question: “What is in this image?” Returns class, class description, face detection, and text recognition. Trainable. TONE ANALYZER & PERSONALITY INSIGHTS Provide additional features that document the Emotional Tone, Writing Tone, Social Tone of dialogue, as well as the overall personalities of characters based on their words. Watson Video Enrichment Workflow > > > >>> >>>
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. Scene Detection Deep Video-Analysis  People-, Object and Context-Detection  Classification of actors based on 24 emotions  Classification of scenes based on 22.000 categories Deep Audio-Analysis  Background  Actor sentiment and tone Analysis of scene composition  Classification of light and color Analysis of succesful trailers to automatically create a new one https://www.youtube.com/watch?v=gJEzuYynaiw 15
  • 16. Concept and proving of an automatic content enrichment system for 40+ years of soccer history  Annotation by usage of a portfolio of cognitive solutions  Audio: Speech-to-text / Transcript  Audio: Speaker-Detection  Audio: Atmosphere (cheers, whistles, ..)  Video: Angle/Camera & Context Detection  Video: Face- & Object Detection  Domain trained services including Traningsportal  Sharpening of results by knowledge of domain and creation of timelines, identifiying of concepts Link with Game- and Playerdata  Optimize content analysis and search based on game and player statistics  Guided search. Persona-based User Experience  Personalized Discovery, Suggestions, Design & Projects Content enrichment for Bundesliga archive 16
  • 17. 17 Target: Automatic content enrichment of 30+ years of show content Annotation by usage of a portfolio of cognitive solutions (IBM, OpenCV)  Audio: Speech-to-text / Transcript / Phrase detection  Video: Angle/Camera & Context Detection  Video: Face- & Object Detection Domain trained services including Traningsportal Sharpening of results by knowledge of domain and creation of timelines, identifiying of concepts Content enrichment for Brazils most famous TV show
  • 18. Architecture of “Captain Caption” Demo AREMA Speech to Text Deep Learning – Sound Recognition Natural Language Understanding Conform results into one Close Caption file Translation into target language L
  • 19. 19 Context / Solution Frame accurate detection of trained frames of lead in and out scenes to mark those scenes in the content and exchange those automatically in master format without transcoding (unwrap, cut, wrap) and with appropriate audio track handling to enable fast channel switch of content. • Usage of own developed detection component using OpenCV and Watson VR for frameaccurate detection of scenes. • Usage of AREMA‘s Dalet Galaxy integration to directly pull and push content to MAM system, no need to extend Galaxy for this purpose • Automatically scalable by using AREMA autoscaler in combination with Kubernetes & Docker • Usage of AREMA MXF Package for • metadata extraction of source file • rewrapping / preparartion audiotrack schema of new scene • partial cut of source file • conforming of all parts to target file => very fast, no transcoding or change of audio and video streams Use Case: “Implement a full integrated, trained cognitive service to exchange ident in and out scenes” Result: • Fully automatized exchange of scenes, deeply integrated with existing environment • Nearly endlessly scalable as all components can run in Kubernetes/Docker environment leads to significant reduce of time and people effort and faster change of content between programs => from 3 months (2 full-time persons) to days
  • 20. Each Use Case of Multimodal Analysis has different requirements so the workflows and the combination of AI Services have to be adopted to these requirements  This is where the following model provides flexibility to adapt to each unique use case of multimodal analytics  Vendor independant usage of cognitive services  The whole is greater than the sum of its parts (Aristoteles), but sometimes also particular „tiny“ use cases are worth to be evaluated  Flexible MULTIMODALITY is a must There is no One Size Fits All
  • 21. 21 Elemental parts of a content enrichment platform Multi-Modality & Training & Vendorindependence Data-Consolidation & Monitoring Integration & Workflow 212121
  • 22. ... Why is training necessary? 22
  • 23. Why is training necessary? - How do we tell Will Ferrell (famous actor) apart from Chad Smith (famous rock musician)? - Challenges include: • Out-of-Plane Rotation: frontal, 45 degree, profile, upside down • Presence of beard, mustache, glasses. • Facial Expressions • Occlusions by long hair, hand • In-Plane Rotation • Image conditions: size, lighting condition, distortion, noise, compression Trust me, these are two non-related different people! https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78 https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face- recognition-with-deep-learning-c3cffc121d78
  • 24. A lot of vendors are providing base cognitive services...but without individual training they do not provide sufficient benefit Customized user AI model Industry/Domain AI Model Base AI Model Training data size Accuracy 70% 60% 40% Base model learning curve Domain-specific model learning curve 50% Customer adapted model learning curve 0 80% 90% As the domain specializes, learning accelerates • Public models • Pre-trained • Limited accuracy for typical real life use cases • Trained with proprietary data • Data ownership critical for differentiation Automated TRAINING is a must
  • 26. Cognitive Process with Trainer, Analysis Workflow and Aggregator 26 Cogntive Analysis Workflow Cognitive Trainer Cogntive Aggregator Image Classifier Inbox Taxonomy Database Image Classifier Repository Media Ingestion Metadata Repository (MAM) 1 2 3 4 5 6 1. Configure Taxonomy (add Classifiers, Categories, etc.) 2. Show and organize classifier images 3. Move good classifiers to repository to optimize training 4. Use classifier repository to train services and perform custom analysis 5. Move actual frame to inbox when confidence ok 6. Use taxonomy for rule creation
  • 27. Parts for an successful content enrichment 1. By combination of trained cognitive serviced new valuable metadata can be retrieved from content 2. Automatic creation and use of those metadata must be included in existing processes 3. Quality of cognitive services and processes must be supervisioned Information Corpora - Rule-based configuration - Batch learning - Manual labeling - Cognitive workflow builder - E2E Broadcast Integration (MAM, etc.) - Full integration into AREMA Operations Dashboards … Training Cognitive Workflow Orchestration Cognitive Workflow Operations Elementary AI Services Cognitive Content Media Services IBM Watson APIs 3rd Party APIs Speech- to-Text NLC/ NLU* Visual Recogn. … General Domain Content Tagging Domain-specific Content Tagging (3rd party) Domain-specific Content Tagging (propriety) Domain-specific Content Tagging (shared) Speech Languag e Visual …Watson Media Knowledge Studio Essence Files Meta Data Public Data Other Data sources …
  • 28. • A comparison between single cognitive services is not adequate, but the reasonable combination of services is • The solution approach must start with the use case given, for which the solution will be defined and customized • AI will not overtake all human work, but will support in the areas where automization is meaningful • The process will be a mix of human an AI based tasks and steps • Sufficient solutions will be created by try-out and optimization, not by waiting for the perfect technology. Summary While AI can’t fully equate the human touch creatively, it can optimize workflows and media processes to gain more value from content.
  • 31. 31 Notes and Sources McCaskill, Steve. “Wimbledon 2018: AI Marries Tennis Tradition With Digital Innovation.” Forbes. July 2018. https://www.forbes.com/sites/stevemccaskill/2018/07/06/ wimbledon-marries-innovation-with-tradition-in-use-of- ai/#7686e2d92198 Moore, Mike. “Wimbledon 2018: How IBM Watson is serving up the best viewer experience.” Tech Radar. July 2018. https://www.techradar.com/news/wimbledon-2018-how-ibm- watson-is-serving-up-the-best-viewer-experience McCarthy, John. “IBM and Fox Sports lean on AI so fans can generate World Cup highlights packages.” The Drum. June 2018. https://www.thedrum.com/news/2018/06/06/ibm- and-fox-sports-lean-ai-so-fans-can-generate-world- cup-highlights-packages Alvarez, Edgar. “Fox Sports’ World Cup Highlight Machine is powered by IBM’s Watson.” Engadget. June 2018. https://www.engadget.com/2018/06/04/fox-sports-world- cup-highlight-machine-ibm-watson Chang, Lulu. “IBM’s Watson will make headlines at the Masters tournament.” Digital Trends. April 2018. https://www.digitaltrends.com/outdoors/ibm-watson-masters Alexander, Julia, “Watch the first ever movie trailer made by artificial intelligence.” Polygon. September 2016. https://www.polygon.com/2016/9/1/12753298/morgan- trailer-artificial-intelligence Smith, John R. “IBM Research takes Watson to Hollywood with the first “cognitive movie trailer.” IBM. August 2016. https://www.ibm.com/blogs/think/2016/08/cognitive- movie-trailer “Uncovering Dark Video Data with AI: How Watson Video Enrichment can provide better decision-making data and unlock new business possibilities in the media industry.” IBM. August 2017. https://public.dhe.ibm.com/common/ ssi/ecm/me/en/mew03018usen/uncovering-dark-data_ MEW03018USEN.pdf