SlideShare a Scribd company logo
Face Recognition
@
Tim Manders tmanders@beeldengeluid.nl
AM
AI
Artificial Intelligence
AM
AM
Automatic Metadata
HELPFUL?
VARIETY OF TECHNOLOGY
A B C Thesaurus labeling
Speaker voice recognition
Face reconition
IN PRODUCTION AT NISV
THESAURUS LABELING - MAYO
Trump, Donald????
Trump, Melania????
Trump Airlines????
Trump Towers????
TECHNOLOGY NEEDS CONTEXT
SPEAKER VOICE RECOGNITION
ACCURATE?
MISSING PIECES
GAPS
MORE GAPS
TECHNOLOGY BOOST
Adding Face
Recognition in the mix
@
PILOT: ACCURACY SCORES
PILOT: QUALITY THRESHOLD
FACE MODELS
3 OUT OF 4
3 OUT OF 4
THE KING
POLITICIANS
 472.695 automatic labeled thesaurus terms
 2.408 speaker models & 904.476 fragments labeled with speakers
 1.326 face models & 94.526 fragments labeled with persons /
faces
 45.278 programmes with automatically generated metadata
 2.072.141 programmes in our catalogue
 2% of all programs have automatically generated metadata
SOME NUMBERS
SOME FUNNY MISTAKES
OH NO, A DONALD TRUMP CLONE
KIM IONG-UN
99,8% CERTAINTY IT’S THE REAL EPPO
• . Bernie Sanders?
LOW SCORE
No worries
Kim Jong Un?
HIGH SCORE
Ouch
LESS FUNNY
REALLY, HE CAN ACT AND PLAY SOCCER?
RAPE SUSPECT
https://imagenet-
roulette.paglen.com
https://www.excavating.ai
“As the classifications of humans by AI systems
becomes more invasive and complex, their
biases and politics become apparent. Within
computer vision and AI systems, forms of
measurement easily — but surreptitiously —
turn into moral judgments.” -
http://www.fondazioneprada.org/project/training
-humans/?lang=en
AM
CHALLENGES
• Accuracy / Quantity
• New ‘celebrities’ every second
• How to upscale and apply retro
• Privacy and Ethics / A.I. Bias
Tim Manders tmanders@beeldengeluid.nl

More Related Content

ODS
Final list
PPTX
2021 FIAT/IFTA Timeline Survey
PPTX
20211021 FIAT/IFTA Most Wanted List
PPTX
WARBURTON FIAT/IFTA Timeline Survey results 2020
PPTX
OOMEN MEZARIS ReTV
PPTX
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
PPTX
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
PPTX
HULSENBECK Value Use and Copyright Comission initiatives
Final list
2021 FIAT/IFTA Timeline Survey
20211021 FIAT/IFTA Most Wanted List
WARBURTON FIAT/IFTA Timeline Survey results 2020
OOMEN MEZARIS ReTV
BUCHMAN Digitisation of quarter inch audio tapes at DR (FRAME Expert)
CULJAT (FRAME Expert) Public procurement in audiovisual digitisation at RTÉ
HULSENBECK Value Use and Copyright Comission initiatives

More from FIAT/IFTA (20)

PPT
WILSON Film digitisation at BBC Scotland
PDF
GOLODNOFF We need to make our past accessible!
PPTX
LORENZ Building an integrated digital media archive and legal deposit
PPTX
BIRATUNGANYE Shock of formats
PPTX
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
PPTX
BERGER RIPPON BBC Music memories
PDF
AOIBHINN and CHOISTIN Rehash your archive
PDF
HULSENBECK BLOM A blast from the past open up
PDF
PERVIZ Automated evolvable media console systems in digital archives
PPTX
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
PPTX
VINSON Accuracy and cost assessment for archival video transcription methods
PDF
LYCKE Artificial intelligence, hype or hope?
PDF
AZIZ BABBUCCI Let's play with the archive
PPTX
HILL Gold, silver or bronze
PDF
MULLER Becoming digital by design whilst remaining trustworthy
PDF
SEGAL GEZELIUS The long journey to a safe haven
PDF
BERGER STEVENS BBC Genome
PPTX
KOZLOWSKY New York Times Archives
PPTX
MAURI Multimedia certification of the rights at RAI.
PPTX
JAMET-FOURNIER INA Signature
WILSON Film digitisation at BBC Scotland
GOLODNOFF We need to make our past accessible!
LORENZ Building an integrated digital media archive and legal deposit
BIRATUNGANYE Shock of formats
CANTU VT is TV The History of Argentinian Video Art and Television Archives P...
BERGER RIPPON BBC Music memories
AOIBHINN and CHOISTIN Rehash your archive
HULSENBECK BLOM A blast from the past open up
PERVIZ Automated evolvable media console systems in digital archives
AICHROTH Systemaic evaluation and decentralisation for a (bit more) trusted AI
VINSON Accuracy and cost assessment for archival video transcription methods
LYCKE Artificial intelligence, hype or hope?
AZIZ BABBUCCI Let's play with the archive
HILL Gold, silver or bronze
MULLER Becoming digital by design whilst remaining trustworthy
SEGAL GEZELIUS The long journey to a safe haven
BERGER STEVENS BBC Genome
KOZLOWSKY New York Times Archives
MAURI Multimedia certification of the rights at RAI.
JAMET-FOURNIER INA Signature
Ad

Recently uploaded (20)

PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
Chapter 2 - AI chatbots and prompt engineering.pdf
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
DOCX
Center Enamel A Strategic Partner for the Modernization of Georgia's Chemical...
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PDF
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
PDF
Nante Industrial Plug Factory: Engineering Quality for Modern Power Applications
PPTX
operations management : demand supply ch
DOCX
Center Enamel Powering Innovation and Resilience in the Italian Chemical Indu...
PDF
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
PDF
income tax laws notes important pakistan
PDF
Susan Semmelmann: Enriching the Lives of others through her Talents and Bless...
PDF
Environmental Law Communication: Strategies for Advocacy (www.kiu.ac.ug)
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PDF
PMB 401-Identification-of-Potential-Biotechnological-Products.pdf
PDF
Charisse Litchman: A Maverick Making Neurological Care More Accessible
PPTX
basic introduction to research chapter 1.pptx
DOCX
FINALS-BSHhchcuvivicucucucucM-Centro.docx
PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
Satish NS: Fostering Innovation and Sustainability: Haier India’s Customer-Ce...
Slide gioi thieu VietinBank Quy 2 - 2025
Chapter 2 - AI chatbots and prompt engineering.pdf
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
Center Enamel A Strategic Partner for the Modernization of Georgia's Chemical...
NEW - FEES STRUCTURES (01-july-2024).pdf
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
Nante Industrial Plug Factory: Engineering Quality for Modern Power Applications
operations management : demand supply ch
Center Enamel Powering Innovation and Resilience in the Italian Chemical Indu...
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
income tax laws notes important pakistan
Susan Semmelmann: Enriching the Lives of others through her Talents and Bless...
Environmental Law Communication: Strategies for Advocacy (www.kiu.ac.ug)
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PMB 401-Identification-of-Potential-Biotechnological-Products.pdf
Charisse Litchman: A Maverick Making Neurological Care More Accessible
basic introduction to research chapter 1.pptx
FINALS-BSHhchcuvivicucucucucM-Centro.docx
Slide gioi thieu VietinBank Quy 2 - 2025
Satish NS: Fostering Innovation and Sustainability: Haier India’s Customer-Ce...
Ad

MANDERS Harder Better Faster Stronger Adding face recognition in the mix

Editor's Notes

  • #2: “Work it harder, Make it better, Do it faster, Makes us stronger” Sounds like I am going to present to you today a glorification of the potential of technological progress and Arificial Intelligence for the archives. In a way I am, but it’s not random that I chose a quote from the magnificent electronic music duo Daft Punk to title this presentation about AI in general and face recognition in particular at S&V. Very recognizable as their robot persona’s, yet always wearing helmets to avoid their human faces going public. I too have privacy concerns, and therefor made sure this presentation doesn’t go uncensored on slideshare (photos of my face are blurred)
  • #3: Artificial Intelligence is very much a buzz word, some might even say a hype.
  • #4: But I am a metadata guy, so I prefer the term automatic metadata, since I am more interested in the metadata as output, and not what intelligence is behind it.
  • #5: But maybe my focus on the metadata output is a bit shortsighted. So yes, I praise technology for how it can supports humans – archive users are still human after all - to search through our collections via very finegrained, segmented access points. I will show you also some humorous and painful examples of where the technology gets it
  • #6: Nowadays there are is a wide range of AI-technologies available.
  • #7: In Sound and Vision’s case we operationalized 3 technologies since 2015 in an effort to reduce time spent on manual annotation: Topical labeling via an extraction service that distills thesaurus labels out of subtitles and production metadata A speaker voice recognition service that identifies who’s speaking, and when And finally, since 2019 face recognition has been thrown into the mix.
  • #8: To start of with a humorous example: in this episode people where discussing mayonnaise for an hour, what else Dutch people have to talk about. Since ‘mayonnaise’ is commonly abbreviated as Mayo, our term extraction service was sure this episode was about Mayo, a County in Ireland.
  • #9: Another example of technology lacking knowledge and skills to interprete the context. While a user can see immediately that the word Trump refers to Donald Trump, a machine doesn’t know who is Trump and blissfully doesn’t care.
  • #10: Moving on to speaker voice recognition. What is it / how does it work? Basically we started with 500 speakers out of our thesaurus and created a speaker model database based on fragments in which they speak. And we compare those models - currently more than 2000 - to daily ingested video and radio programs.
  • #11: Since our end users will not be happy with many inaccurate results, we work with a quality threshold and we only ingest labels if the machine scores it above a certain probability score and presses the green button.
  • #12: Consequence of threshold = labels that have a lower score are left out, even though they might be correct. So we have gaps in our timecoded labeling.
  • #13: Let me illustrate how this gaps might look like
  • #14: And another gap: we (sometimes) know who’s speaking but we don’t know who is on camera, and there face recognition comes to the rescue.
  • #15: These kind of services had an enormous technology boost in the last few years thanks to deep learning. Where 3 years ago we could only speaker label roughly 2 percent of fragments with speech, today it’s closer to 50% with an accuracy of above 90%.
  • #16: And for face labeling it’s even better, so why not add it to the mix. And so we did this year. Basically it works the same as speaker labeling: our video content is compared to face models of very important persons, consisting of a bunch of images transformed into vectors, zeroes and ones.
  • #17: In the pilot phase we evaluated under which confidence score the service would produce too many inaccurate results
  • #18: And then based on this evaluation we calculated what minimal score we would take as a threshold to make sure that minimal 90% of the labels are correct.
  • #19: We created a face model database with about 1000 (national and international) faces to recognize. And we compare our daily ingested videos with the database.
  • #20: The selection of face models nowadays is done manually. Since there are new celebrities on tv every day, it’s still a tedious process to keep our database up to date.
  • #21: As a result not all faces will be recognized and we do have gaps in our automatic annotations. As you see on this slide only 3 out of 4 persons are recoignised, simply because we didn’t have a face model for the 4th one yet. Currently we are working of a face model suggestion tool that identifies faces that are often on tv but are not in the database yet.
  • #22: Let me give you an example of how well face recognition can work. In this case the king of the Netherlands is recognized even though he is sitting quite far away in his little car.
  • #23: Another example of a political debate illustrates how useful speaker and face labeling can be, not only for producers who are searching for re-usable material. But for instance also for researchers and scientists who want to for instance analyze for instance how much airtime female and male politicians get.
  • #24: Some impressive numbers on the amount of automatic labels we have today. But less impressive if you see that only 2% of our collections have 1 or more automatic label, while there are plenty of collection items that have little or no metadata. So there are some challenges we have to overcome: Not only do we have to balance between the users need for high accuracy and effects of quality thresholds on the quantity of automatically generated labels. In order to increase the quantity of labels we have to also constantly update our model databases. And we have to drastically increase our efforts if we want to upscale and apply the service retro, on a larger part of our collections. That almost brings me to final part of this presentation, but I will not end it before some examples of labeling errors.
  • #25: O no, Donald Trump again. Yes, him again even if it’s not him. In this case Dutch journalist Frits Wester was labeled as Trump.
  • #26: Another example: here Kim Jong Un is labeled with 99,9% certainty. But it’s not him, it’s a lookalike.
  • #27: So who can say: is it the real Eppo walking around this conference and bein in this image. Or is it a fraud? Can you be more certain than the machine?
  • #28: Less funny: A facial composite of a criminal suspect was labeled as Bernie Sanders. But luckily it scored under our threshold and therefor is disregarded as a label. Not even funny: this woman was labeled a Kim Jong Un, and with a score above the threshold. Ouch We solved this now by putting an extra high threshold on controversial persons. But ultimately this is not about avoiding controversy, maybe should be the opposite: being crystal clear about the fundamental bias behind such technologies
  • #29: As you see in this example quite some black soccer players are recognized as actor Werner Kolf. Not coincidentally he is also black. Clearly we have some work to do here in terms of diversifying our databases, trainingsets and algoritms. We have a bias, which is discriminatory, for both gender and race And it reflects white privilege in what is in the media and is being archived. And we have to move past that after we acknowledge it and discuss it at conferences like this
  • #30: Oops, a pic of me with a label according to Imagenet roulette It was part of an art exhibition which intentially wants to provocate. ImageNet is generally bad at recognizing people. It’s mostly an object recognition set, but it has a category for People that contains thousands of subcategories. Some of them are misogynistic or racist. ImageNet Roulette draws upon those categories to shed light on what happens when technical systems are trained on problematic training data.
  • #31: To conclude: We have to find a balance between the need for high accuracy on the one hand and for quantity on the other And we should be aware that – if we allow mistakes or gaps – that some of them are the result of AI bias. In the era of big data and social media, and conference presentations shared online (do i share my kids’ faces and my face or not) it is almost impossible to remain anonymous. And lines between cultural preservation and commercialization of data are getting thinner and thinner.  Therefor privacy and ethics must play a part in our discussions on implementing AI technology for the archives. I hope to have encouraged you to start using these kind of techniques, without looking away from it’s difficult aspects