SlideShare a Scribd company logo
Automated metadata generation projects at Yle
Elina Selkälä
Manager, archive publishing and metadata
Yle Archives
elina.selkala@yle.fi
FIAT/IFTA Media Management Seminar
Lugano
8.-9.6.2017
Agenda
Automated metadata generation projects at Yle
• Yle in a nutshell
• Yle Archives, collections and materials
• Production of metadata at Yle
• What we experimented on: examples of automatic content analysis projects
• What we learned
• What is happening next
• What is the role of the information professional in the age of AI
This is Yle
Automated metadata generation projects at Yle
• Public service broadcasting company
• 3 nationwide television & 6 radio channels, 24 regional radio stations
• Extensive online presence: yle.fi, svenska.yle.fi, Yle Areena, Yle Elävä arkisto
• In addition to Finnish and Swedish, has broadcasts in 11 languages, e.g. Sami,
English and Russian
• National programming hours per year:
50,000 hours of radio programming 20,000 hours of TV programming
5,000 hours of audio content online 15,000 hours of video content online
Yle Archives
Automated metadata generation projects at Yle
• Archives and catalogues Yle produced and co-produced radio and TV programmes
• Fosters and curates the archive collections of Yle
• Offers information services and training for Yle staff
• Publishes archive material online
Collections
• TV and radio materials, photographs, sound effects and music
• Archived in Media Asset Management System ”Metro” (Avid)
• Represents an important part of Finnish cultural heritage
• Archive has also sheet music, books and online resources e.g. papers, magazines,
databases
Radio and TV Archive collections
Automated metadata generation projects at Yle
TV materials
• TV programmes and raw material from
1957 onwards & film materials from 1906
onward
• Collection consists of around 700,000
programmes and clips
• All Yle productions / co-productions have
been systematically archived since 1984
• Archiving in native digital form since 2009
• Around 10,000 hours of video content is
archived / year
• Relatively good metadata
Radio materials
• Yle produced programmes and raw
material, oldest surviving clip from 1935
• The collection consists of around 2 million
programmes and clips
• Currently around 10% of radio
transmissions are archived (e.g. News and
works of art)
• Archiving in native digital form from the
beginning of the 2000s
• Around 20,000 hours of audio content is
archived / year
• Metadata of varying quality
Metadata production at Yle
Archived radio and TV programmes
Automated metadata generation projects at Yle
• Yle’s archive materials are widely used as whole programmes (reruns) and clips
• Metadata incomplete or insufficient for many reasons → hinders findability and safe
re-use
• Alongside tape collections digitization projects, related programme metadata is
updated and improved
• Huge endeavour, therefore prioritization is needed (most used, customer orders)
• Descriptive metadata is done manually
• Done by Archives’ information specialists (about 15 people)
Metadata production at Yle
New audio and video content
Automated metadata generation projects at Yle
• Metadata production decentralized
Metadata added and stored throughout the production and publishing process
Some metadata from production and publishing systems, descriptive metadata
filled out manually
Done by Yle staff; production coordinators, editors, producers, etc.
• Company-wide Archiving Policy
Defines the responsibilities, contents to be archived, metadata and formats
• Growing amount of published content
• Metadata is used for archiving and reuse purposes, as well as reporting
• New needs for metadata: improve discoverability and visibility on
Automated content analysis projects at Yle
Fall 2016
• Automated content analysis (virtual) team with participants from different parts of Yle
• Improve discoverability on web services (Yle Areena)
• Improve discoverability from archive databases
• New ways to subtitle video content
• Management of raw materials and versions
• Team’s goals were to:
• Learn about AI, machine learning and automatic content analysis methods in
theory and practice
• Carry out pilot projects (PoCs) with some companies
• Find solutions for automated metadata production in practic
Automated metadata generation projects at Yle
Case 1
Automatic content analysis of TV programmes (1/2)
Pilot project with Valossa Labs
Goal
• Test and evaluate the quality and
suitability of automatically produced
(descriptive) metadata in Yle’s metadata
production
Tested methods
• Text analysis of subtitles → tagging,
annotation
• Image recognition: object and face
recognition
• OCR of captions
• Automatic segmentation
Automated metadata generation projects at Yle
Case 1
Automatic content analysis of TV programmes (2/2)
Results
• Face recognition works well, object recognition is somewhat unreliable and too detailed
• Subtitles could also be used for content analysis
• Automatic segmentation (scenes, inserts) works well
• Test period was too short, no experiences about the learning capabilities of the system
• Speech recognition alongside image recognition would probably be profitable, but the
tested application did not support this feature
Automated metadata generation projects at Yle
Case 2
Automatic content analysis of audio content (1/2)
Pilot project with Lingsoft
Goal
• Test and evaluate the quality of speech &
music recognition and automatic
annotation
Tested methods
• Speech recognition → textual data for text
analysis
• Automatic annotation and indexing
• Music recognition (distinguish music from
speech)
Automated metadata generation projects at Yle
Case 2
Automatic content analysis of audio content (2/2)
Results
• Quality of the audio and speaker's way to speak have a significant impact
• Accuracy of the transcription is sufficient for annotation → relevant keywords, tags
• Music recognition works fairly well
• Speaker recognition would be useful, but the tested service did not support this feature
Automated metadata generation projects at Yle
Case 3
Automatic content analysis of Yle Areena content (1/3)
Pilot project with Qvik, Valossa Labs and Aalto University
Goal
• Improve findability and usability of audio and video content in Yle Areena online service
Three experiments
• Speech recognition: Time-code based transcriptions of audio files
• Image / structure recognition: fast forward opening & closing credits, inserts
• Text analysis: automatic annotation
Yle Areena
content
New functionalities
for the end user
Automatic
content
analysis
Media Metadata
Automated metadata generation projects at Yle
Case 3
Automatic content analysis of Yle Areena content (2/3)
Speech-to-text & text analysis
• Time-coded transcription and
automatic annotation of audio and
video content
Results
• Transcriptions were added to Yle
Areena web page, search engines
were able to index contents →
searches to verbal content was
made possible
• Identification of relevant concepts
was successful
Automated metadata generation projects at Yle
Case 3
Automatic content analysis of
Yle Areena content (3/3)
Identifying the structure of the content
• Automatic segmentation and identification
of recurrent elements (opening & closing
credits)
• Object recognition
Results
• Recurring elements (based on images) and
topics (based on subtitling) can be
identified → intelligent fast forward is
possible (Demo)
• Object recognition is somewhat unreliable
Automated metadata generation projects at Yle
Lessons learned
Define needs, requirements, and goals
• What is needed and who needs
• Costs and benefits
Define how success is measured
• Define how success is measured
• Evaluation criteria
Plan lead-through of projects
• Time and other resources
Cooperation with outside partners
• Ready-made test material packages
Contract and copyrights issues
Share your information
Automated metadata generation projects at Yle
On-going projects
Production
• Robot journalism, Voitto-robotti (pilot project)
• Automatic annotation of Yle’s web articles (in production)
Publishing
• Automatic metadata production by speech recognition and
image recognition (PoC)
• Speech recognition in subtitling (PoC)
Consumption / use
• Recommendation for Yle Areena content (in production)
• Yle Uutisvahti application, recommendation engine (in
production)
• Automatic moderation of web discussions (PoC)
• Deduction of customer demographics (in production)
Automated metadata generation projects at Yle
Information professionals changing role
What is the role of information professionals in the age of AI?
• Machine’s teacher
• Quality assessor, quality control manager
• Curator and valuer of metadata
• Customer value assessor
• Publisher of (archived) content
New skills are needed
• Comprehension of the methods to assess the opportunities available
• Technical know-how
Information professional and the machine need to coexist
Automated metadata generation projects at Yle
Automated metadata generation projects at yle - 2017 Selkala, Elina

More Related Content

PDF
A framework for visual search in broadcasting companies' multimedia archives
PDF
The adventurous trip of thesaurus terms into the portals of the new mam syste...
PDF
Savare transforming a silo into modular services. The continuous evolution of...
PDF
From a video archive to a near-live media distribution platform - Gaches, Ol...
PDF
Fraunhofer iais audio mining - automatic metadata gereration of audio streams...
PDF
atp world tour archive - img replay and imagen ltd - Jones, Hunt
PDF
fonte a distributed mam in the cloud - Harrada
PDF
Digitization, industrialisation - sport broadcasting challenges and the value...
A framework for visual search in broadcasting companies' multimedia archives
The adventurous trip of thesaurus terms into the portals of the new mam syste...
Savare transforming a silo into modular services. The continuous evolution of...
From a video archive to a near-live media distribution platform - Gaches, Ol...
Fraunhofer iais audio mining - automatic metadata gereration of audio streams...
atp world tour archive - img replay and imagen ltd - Jones, Hunt
fonte a distributed mam in the cloud - Harrada
Digitization, industrialisation - sport broadcasting challenges and the value...

Viewers also liked (18)

PPTX
Data Journalism and Social Media, Media Archives as Information Service Provi...
PPTX
Anne couteux - Audiovisual archiving at Ina
PPTX
Hernani Heffner - FIAT/IFTA Cinemateca do MAM
PPTX
Gracia ramirez - Czechoslovakia 1968: U.S. Propaganda at a Turning Point
PDF
Results of the 2nd fiatifta mam survey - Declercq, Stanz
PDF
Richard Legay - May 1968 in Paris lived and told by peripheral radio stations
PDF
Automagically archiving the bbc's tv programmes - 2017 Dent, Allcorn
PDF
Break out: Project Communication and Dissemination - Jeroen Poppe
PPTX
FIAT/IFTA MMC Seminar May 2015. MAM and Metadata. David Klee. Univision
PDF
Europeana Uncensored Keynote at FIAT/IFTA World Conference 2014, Harry Verway...
PPTX
Film processing in a digital wold, Jean Varra | Ina
PDF
Private broadcast, public access. digitisation and semiautomatic indexation o...
PDF
archives 2020 - Derighetti, Marco
PDF
Todd M. Goehle - Media, Activism, and Democratization: The News Coverage and ...
PDF
TOOLS and SOLUTIONS, Steny Solitude, Perfect Memory
PDF
where's the line,the intersection of cloud based and internal mam systems - K...
PPTX
FIAT/IFTA MMC Seminar May 2015. The BBC Twitter Archive. Carl Davies. BBC Arc...
PPT
S.O.S The Live Romanian Revolution, Save Your Archive, Irina Negaro, TVR
Data Journalism and Social Media, Media Archives as Information Service Provi...
Anne couteux - Audiovisual archiving at Ina
Hernani Heffner - FIAT/IFTA Cinemateca do MAM
Gracia ramirez - Czechoslovakia 1968: U.S. Propaganda at a Turning Point
Results of the 2nd fiatifta mam survey - Declercq, Stanz
Richard Legay - May 1968 in Paris lived and told by peripheral radio stations
Automagically archiving the bbc's tv programmes - 2017 Dent, Allcorn
Break out: Project Communication and Dissemination - Jeroen Poppe
FIAT/IFTA MMC Seminar May 2015. MAM and Metadata. David Klee. Univision
Europeana Uncensored Keynote at FIAT/IFTA World Conference 2014, Harry Verway...
Film processing in a digital wold, Jean Varra | Ina
Private broadcast, public access. digitisation and semiautomatic indexation o...
archives 2020 - Derighetti, Marco
Todd M. Goehle - Media, Activism, and Democratization: The News Coverage and ...
TOOLS and SOLUTIONS, Steny Solitude, Perfect Memory
where's the line,the intersection of cloud based and internal mam systems - K...
FIAT/IFTA MMC Seminar May 2015. The BBC Twitter Archive. Carl Davies. BBC Arc...
S.O.S The Live Romanian Revolution, Save Your Archive, Irina Negaro, TVR
Ad

Similar to Automated metadata generation projects at yle - 2017 Selkala, Elina (20)

PDF
Selkala viljanen identifying the business case for automatic metadata in the ...
PPTX
SAARIKOSKI YLE metadata machine
PDF
Green gupta 20 years of mmc
PPTX
2nd MAM Survey (DECLERCQ)
PPTX
FIAT/IFTA MAM Survey 2017: highlights of the results analysis
PPTX
Metadata for media companies - Ebu master class, Helsinki 2.5.2016
PPTX
From challenges to solutions
PPTX
Thinking the archives of 2020: Opportunitiws, priorities, Issues
PPTX
Metadata for audiovisual heritage: Semiotic considerations
PPTX
FIAT/IFTA Where are you on the Timeline 2019 results
PDF
AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries
PPTX
HILSKA KEINANEN NOLVI the archive publishing at yle collaborating with custom...
PDF
SEAPAVAA 2018 Closing panel
PPTX
Audiovisual content exploitation JTS2010
PPTX
Research and Development at Sound and Vision
PPTX
The archive publishing at Yle: collaborating with customers, networking with ...
PPTX
MMC Seminar May 2015 Jonas Engstrom (Mayam) Keynote pptx
PPTX
20161015 fiatifta timeline results
PDF
LYCKE Artificial intelligence, hype or hope?
PPTX
Where are you on the Timeline 2017 (UNANDER)
Selkala viljanen identifying the business case for automatic metadata in the ...
SAARIKOSKI YLE metadata machine
Green gupta 20 years of mmc
2nd MAM Survey (DECLERCQ)
FIAT/IFTA MAM Survey 2017: highlights of the results analysis
Metadata for media companies - Ebu master class, Helsinki 2.5.2016
From challenges to solutions
Thinking the archives of 2020: Opportunitiws, priorities, Issues
Metadata for audiovisual heritage: Semiotic considerations
FIAT/IFTA Where are you on the Timeline 2019 results
AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries
HILSKA KEINANEN NOLVI the archive publishing at yle collaborating with custom...
SEAPAVAA 2018 Closing panel
Audiovisual content exploitation JTS2010
Research and Development at Sound and Vision
The archive publishing at Yle: collaborating with customers, networking with ...
MMC Seminar May 2015 Jonas Engstrom (Mayam) Keynote pptx
20161015 fiatifta timeline results
LYCKE Artificial intelligence, hype or hope?
Where are you on the Timeline 2017 (UNANDER)
Ad

More from FIAT/IFTA (20)

PPTX
2021 FIAT/IFTA Timeline Survey
PPTX
20211021 FIAT/IFTA Most Wanted List
PPTX
WARBURTON FIAT/IFTA Timeline Survey results 2020
PPTX
OOMEN MEZARIS ReTV
PPTX
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
PPTX
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
PPTX
HULSENBECK Value Use and Copyright Comission initiatives
PPT
WILSON Film digitisation at BBC Scotland
PDF
GOLODNOFF We need to make our past accessible!
PPTX
LORENZ Building an integrated digital media archive and legal deposit
PPTX
BIRATUNGANYE Shock of formats
PPTX
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
PPTX
BERGER RIPPON BBC Music memories
PDF
AOIBHINN and CHOISTIN Rehash your archive
PDF
HULSENBECK BLOM A blast from the past open up
PDF
PERVIZ Automated evolvable media console systems in digital archives
PPTX
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
PPTX
VINSON Accuracy and cost assessment for archival video transcription methods
PDF
AZIZ BABBUCCI Let's play with the archive
PPTX
HILL Gold, silver or bronze
2021 FIAT/IFTA Timeline Survey
20211021 FIAT/IFTA Most Wanted List
WARBURTON FIAT/IFTA Timeline Survey results 2020
OOMEN MEZARIS ReTV
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
HULSENBECK Value Use and Copyright Comission initiatives
WILSON Film digitisation at BBC Scotland
GOLODNOFF We need to make our past accessible!
LORENZ Building an integrated digital media archive and legal deposit
BIRATUNGANYE Shock of formats
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
BERGER RIPPON BBC Music memories
AOIBHINN and CHOISTIN Rehash your archive
HULSENBECK BLOM A blast from the past open up
PERVIZ Automated evolvable media console systems in digital archives
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
VINSON Accuracy and cost assessment for archival video transcription methods
AZIZ BABBUCCI Let's play with the archive
HILL Gold, silver or bronze

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Transcultural that can help you someday.
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Introduction to the R Programming Language
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Computer network topology notes for revision
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Managing Community Partner Relationships
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Predictive modeling basics in data cleaning process
Quality review (1)_presentation of this 21
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Transcultural that can help you someday.
Clinical guidelines as a resource for EBP(1).pdf
Introduction to the R Programming Language
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Data Science and Data Analysis
.pdf is not working space design for the following data for the following dat...
Computer network topology notes for revision
oil_refinery_comprehensive_20250804084928 (1).pptx
modul_python (1).pptx for professional and student
SAP 2 completion done . PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Managing Community Partner Relationships
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Predictive modeling basics in data cleaning process

Automated metadata generation projects at yle - 2017 Selkala, Elina

  • 1. Automated metadata generation projects at Yle Elina Selkälä Manager, archive publishing and metadata Yle Archives elina.selkala@yle.fi FIAT/IFTA Media Management Seminar Lugano 8.-9.6.2017
  • 2. Agenda Automated metadata generation projects at Yle • Yle in a nutshell • Yle Archives, collections and materials • Production of metadata at Yle • What we experimented on: examples of automatic content analysis projects • What we learned • What is happening next • What is the role of the information professional in the age of AI
  • 3. This is Yle Automated metadata generation projects at Yle • Public service broadcasting company • 3 nationwide television & 6 radio channels, 24 regional radio stations • Extensive online presence: yle.fi, svenska.yle.fi, Yle Areena, Yle Elävä arkisto • In addition to Finnish and Swedish, has broadcasts in 11 languages, e.g. Sami, English and Russian • National programming hours per year: 50,000 hours of radio programming 20,000 hours of TV programming 5,000 hours of audio content online 15,000 hours of video content online
  • 4. Yle Archives Automated metadata generation projects at Yle • Archives and catalogues Yle produced and co-produced radio and TV programmes • Fosters and curates the archive collections of Yle • Offers information services and training for Yle staff • Publishes archive material online Collections • TV and radio materials, photographs, sound effects and music • Archived in Media Asset Management System ”Metro” (Avid) • Represents an important part of Finnish cultural heritage • Archive has also sheet music, books and online resources e.g. papers, magazines, databases
  • 5. Radio and TV Archive collections Automated metadata generation projects at Yle TV materials • TV programmes and raw material from 1957 onwards & film materials from 1906 onward • Collection consists of around 700,000 programmes and clips • All Yle productions / co-productions have been systematically archived since 1984 • Archiving in native digital form since 2009 • Around 10,000 hours of video content is archived / year • Relatively good metadata Radio materials • Yle produced programmes and raw material, oldest surviving clip from 1935 • The collection consists of around 2 million programmes and clips • Currently around 10% of radio transmissions are archived (e.g. News and works of art) • Archiving in native digital form from the beginning of the 2000s • Around 20,000 hours of audio content is archived / year • Metadata of varying quality
  • 6. Metadata production at Yle Archived radio and TV programmes Automated metadata generation projects at Yle • Yle’s archive materials are widely used as whole programmes (reruns) and clips • Metadata incomplete or insufficient for many reasons → hinders findability and safe re-use • Alongside tape collections digitization projects, related programme metadata is updated and improved • Huge endeavour, therefore prioritization is needed (most used, customer orders) • Descriptive metadata is done manually • Done by Archives’ information specialists (about 15 people)
  • 7. Metadata production at Yle New audio and video content Automated metadata generation projects at Yle • Metadata production decentralized Metadata added and stored throughout the production and publishing process Some metadata from production and publishing systems, descriptive metadata filled out manually Done by Yle staff; production coordinators, editors, producers, etc. • Company-wide Archiving Policy Defines the responsibilities, contents to be archived, metadata and formats • Growing amount of published content • Metadata is used for archiving and reuse purposes, as well as reporting • New needs for metadata: improve discoverability and visibility on
  • 8. Automated content analysis projects at Yle Fall 2016 • Automated content analysis (virtual) team with participants from different parts of Yle • Improve discoverability on web services (Yle Areena) • Improve discoverability from archive databases • New ways to subtitle video content • Management of raw materials and versions • Team’s goals were to: • Learn about AI, machine learning and automatic content analysis methods in theory and practice • Carry out pilot projects (PoCs) with some companies • Find solutions for automated metadata production in practic Automated metadata generation projects at Yle
  • 9. Case 1 Automatic content analysis of TV programmes (1/2) Pilot project with Valossa Labs Goal • Test and evaluate the quality and suitability of automatically produced (descriptive) metadata in Yle’s metadata production Tested methods • Text analysis of subtitles → tagging, annotation • Image recognition: object and face recognition • OCR of captions • Automatic segmentation Automated metadata generation projects at Yle
  • 10. Case 1 Automatic content analysis of TV programmes (2/2) Results • Face recognition works well, object recognition is somewhat unreliable and too detailed • Subtitles could also be used for content analysis • Automatic segmentation (scenes, inserts) works well • Test period was too short, no experiences about the learning capabilities of the system • Speech recognition alongside image recognition would probably be profitable, but the tested application did not support this feature Automated metadata generation projects at Yle
  • 11. Case 2 Automatic content analysis of audio content (1/2) Pilot project with Lingsoft Goal • Test and evaluate the quality of speech & music recognition and automatic annotation Tested methods • Speech recognition → textual data for text analysis • Automatic annotation and indexing • Music recognition (distinguish music from speech) Automated metadata generation projects at Yle
  • 12. Case 2 Automatic content analysis of audio content (2/2) Results • Quality of the audio and speaker's way to speak have a significant impact • Accuracy of the transcription is sufficient for annotation → relevant keywords, tags • Music recognition works fairly well • Speaker recognition would be useful, but the tested service did not support this feature Automated metadata generation projects at Yle
  • 13. Case 3 Automatic content analysis of Yle Areena content (1/3) Pilot project with Qvik, Valossa Labs and Aalto University Goal • Improve findability and usability of audio and video content in Yle Areena online service Three experiments • Speech recognition: Time-code based transcriptions of audio files • Image / structure recognition: fast forward opening & closing credits, inserts • Text analysis: automatic annotation Yle Areena content New functionalities for the end user Automatic content analysis Media Metadata Automated metadata generation projects at Yle
  • 14. Case 3 Automatic content analysis of Yle Areena content (2/3) Speech-to-text & text analysis • Time-coded transcription and automatic annotation of audio and video content Results • Transcriptions were added to Yle Areena web page, search engines were able to index contents → searches to verbal content was made possible • Identification of relevant concepts was successful Automated metadata generation projects at Yle
  • 15. Case 3 Automatic content analysis of Yle Areena content (3/3) Identifying the structure of the content • Automatic segmentation and identification of recurrent elements (opening & closing credits) • Object recognition Results • Recurring elements (based on images) and topics (based on subtitling) can be identified → intelligent fast forward is possible (Demo) • Object recognition is somewhat unreliable Automated metadata generation projects at Yle
  • 16. Lessons learned Define needs, requirements, and goals • What is needed and who needs • Costs and benefits Define how success is measured • Define how success is measured • Evaluation criteria Plan lead-through of projects • Time and other resources Cooperation with outside partners • Ready-made test material packages Contract and copyrights issues Share your information Automated metadata generation projects at Yle
  • 17. On-going projects Production • Robot journalism, Voitto-robotti (pilot project) • Automatic annotation of Yle’s web articles (in production) Publishing • Automatic metadata production by speech recognition and image recognition (PoC) • Speech recognition in subtitling (PoC) Consumption / use • Recommendation for Yle Areena content (in production) • Yle Uutisvahti application, recommendation engine (in production) • Automatic moderation of web discussions (PoC) • Deduction of customer demographics (in production) Automated metadata generation projects at Yle
  • 18. Information professionals changing role What is the role of information professionals in the age of AI? • Machine’s teacher • Quality assessor, quality control manager • Curator and valuer of metadata • Customer value assessor • Publisher of (archived) content New skills are needed • Comprehension of the methods to assess the opportunities available • Technical know-how Information professional and the machine need to coexist Automated metadata generation projects at Yle