SlideShare a Scribd company logo
TECHNICAL 
UNIVERSITY 
OF KOŠICE 
Laboratory of Speech 
Technologies in 
Telecommunications 
TUKE system for MediaEval 2014 QUESST 
Jozef VAVREK, Peter VISZLAY, Martin LOJKA, Matúš PLEVA, and Jozef JUHÁR 
Department of Electronics and Multimedia Communications 
Technical University of Košice, Slovak Republic 
{Jozef.Vavrek, Peter.Viszlay, Martin.Lojka, Matus.Pleva, Jozef.Juhar}@tuke.sk 
Zero-resource approaches 
Timit 
ParDat1 
SpeechDat CZ 
SpeechDat SK 
Tab.1 Evaluation of primary low-resource (p-low) and general 
zero resource (g-zero) systems (* indicates late submission) 
dev 
Cnxe TWV 
(act/min) (act/min) 
0.161/0.162 
0.091/0.091 
0.191/0.191 
0.106/0.107 
Acknowledgments 
eval 
(act/min) (act/min) 
0.959/0.891 
0.973/0.934 
0.947/0.853 
0.970/0.921 
0.154/0.154 
0.075/0.077 
0.168/0.169 
0.102/0.103 
0.960/0.892 
0.974/0.934 
0.948/0.854 
0.971/0.922 
Tab.2 Processing resources measures 
p-low* 
g-zero* 
Searching Algorithm (Weighted Fast Sequential - DTW): 
1) one step forward moving strategy, when each DTW search is carried out 
sequentially, block by block, with size equal to the length of query; 
2) linear time-aligned accumulated distance for speeding up sequential DTW 
without considerable loss in retrieving performance; 
3) optimization of global minimum for set of alignment paths by 
implementing weighted cumulative distance (WCD) parameter. 
500 detected 
candidates 
This publication is the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by 
Knowledge Technology, ITMS: 26220220182, supported by the Research & Development Operational Programme funded by the ERDF (100%). 
MediaEval 2014: Query by Example Search on Speech Task, 16-17 October 2014, Barcelona, Spain 
VAD 
PCA-based 
Posteriorgrams 
GMM-based 
Query 
Abstract 
Two approaches to QbE (Query-by-Example) retrieving system, proposed by the Technical University of Košice (TUKE) for the query 
by example search on speech task (QUESST), are presented in this paper. Our main interest was focused on building such QbE 
system, which is able to retrieve all given queries with and without using any external speech resources.Therefore we developed 
posteriorgram-based keyword matching system, which utilizes a novel weighted fast sequential variant of DTW (WFS-DTW) algorithm 
in order to detect occurrences of each query within the particular utterance file, using two GMM-based acoustic units modeling 
approaches. The first one, referred as low-resource approach, employs language-dependent phonetic decoders to convert queries 
and utterances into posteriorgrams. The second one, defined as zero-resource approach, implements combination of unsupervised 
segmentation and clustering techniques by using only provided utterance files. 
Results 
System Overview 
system Cnxe TWV 
p-low 
g-zero 
system ISF SSF PMUI PMUS PL 
p-low (dev) 0.61 0.0034 0.05 2.46 0.010 
g-zero (dev) 1.50 0.0042 1.40 3.92 0.225 
Conclusions and Future Work 
Phonetic 
Decoders 
Utterances 
Type 1 
- 13 MFCC 
- PCA-based feature selection 
- K-means clustering (K=75) 
Type 2 
- 39 MFCC 
- Type 1 + Viterbi seg. 
& new GMM training 
Type 3 
- 39 MFCC 
- flat start training (GMM-based) 
- phone sequences from Type 1 
Type 4 
- 39 MFCC 
- GMM-based seg. (64 GM) 
- EHHM (64 states / 256 GM) 
Low-resource approaches 
WFS-DTW 
Score normalization & Fusion 
- scaling 0-1 
- max-score merging fusion 
- z-normalization 
- Still big differences in performance between p-low and g-zero 
approaches, even if the score fusion technique was 
applied. 
- There is also considerable gap between act and min Cnxe 
despite the fact that the act and max TWV are perfectly 
calibrated. 
- An improved calibration/fusion models based on affine 
transformation and linear-regression will be investigated in 
the future. 
The indexing was done using 2xIBM x3650 (Intel E5530 @ 2.4 GHz, 8 
cores), 28 GB RAM, under Debian OS. Searching algorithm was running 
on 52xIBM dx360 M3 cluster (Intel E5645 @ 2.4GHz, 624 cores), 48 GB 
RAM per node, running on Scientific Linux 6 and Torque 2.5.13. 
time-aligned 
& labelled 
segments 
new GMM 
training

More Related Content

PDF
MIMO System Performance Evaluation for High Data Rate Wireless Networks usin...
PDF
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
PPT
MP2P 2008 (PerCom 2008) - Elisa Rondini
PDF
Available network bandwidth schema to improve performance in tcp protocols
PPTX
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
PDF
LSQR-Based-Precoding
PDF
Traffic Class Assignment for Mixed-Criticality Frames in TTEthernet
ODP
DL for setence classification project presentation
MIMO System Performance Evaluation for High Data Rate Wireless Networks usin...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
MP2P 2008 (PerCom 2008) - Elisa Rondini
Available network bandwidth schema to improve performance in tcp protocols
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
LSQR-Based-Precoding
Traffic Class Assignment for Mixed-Criticality Frames in TTEthernet
DL for setence classification project presentation

What's hot (20)

PDF
SP Study1018 Paper Reading
PDF
PR-207: YOLOv3: An Incremental Improvement
PDF
Stability Analysis in a Cognitive Radio System with Cooperative Beamforming
PPTX
Cost terra meeting. 2011
PDF
International Journal of Engineering Research and Development (IJERD)
DOCX
หน่วยที่ 3 รูปร่างเครือข่าย
PDF
deep CNN vs conventional ML
PPT
Applying Reinforcement Learning for Network Routing
PDF
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
PDF
Wanhive vs Chord Distributed Hash Table
PDF
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
PDF
Partial Feedback Scheme with an Interference-Aware Subcarrier Allocation Sche...
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
PPTX
PPT
Chap4 slides
PPTX
Introduction to CNN
PPT
All-Reduce and Prefix-Sum Operations
PDF
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PPT
` Traffic Classification based on Machine Learning
SP Study1018 Paper Reading
PR-207: YOLOv3: An Incremental Improvement
Stability Analysis in a Cognitive Radio System with Cooperative Beamforming
Cost terra meeting. 2011
International Journal of Engineering Research and Development (IJERD)
หน่วยที่ 3 รูปร่างเครือข่าย
deep CNN vs conventional ML
Applying Reinforcement Learning for Network Routing
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~
Wanhive vs Chord Distributed Hash Table
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Partial Feedback Scheme with an Interference-Aware Subcarrier Allocation Sche...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Chap4 slides
Introduction to CNN
All-Reduce and Prefix-Sum Operations
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
` Traffic Classification based on Machine Learning
Ad

Viewers also liked (11)

PDF
Handicap
PPTX
юридические тонкости интернет торговли
PPTX
English language
PDF
Revista 2013
PDF
The situation about medical tourizm in ukraine poland
PDF
Guia Fiscal 2016
DOC
Revised exercises for spring charts
PPSX
Prototipado rapido de interfaces
PPT
Minerals4EU - Delivering the European Minerals Yearbook
PPTX
C2i 1
PPTX
Kitab kitab allah
Handicap
юридические тонкости интернет торговли
English language
Revista 2013
The situation about medical tourizm in ukraine poland
Guia Fiscal 2016
Revised exercises for spring charts
Prototipado rapido de interfaces
Minerals4EU - Delivering the European Minerals Yearbook
C2i 1
Kitab kitab allah
Ad

Similar to TUKE System for MediaEval 2014 QUESST (20)

PDF
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
PDF
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
PDF
CUHK System for the Spoken Web Search task at Mediaeval 2012
PDF
Cuhk system 14oct
PDF
Cuhk system 14oct_2
PDF
GTTS-EHU Systems for QUESST at MediaEval 2014
PDF
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
PDF
MediaEval 2015 - Query by Example Search on Speech Task
PDF
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
PPTX
Information Retrieval Dynamic Time Warping - Interspeech 2013 presentation
PPTX
Mediaeval 2013 Spoken Web Search results slides
PPTX
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
PDF
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
PDF
MediaEval 2015 - GTM-UVigo Systems for the Query-by-Example Search on Speech ...
PDF
IRJET- Voice based Retrieval for Transport Enquiry System
PDF
PDF
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
PDF
A novel automatic voice recognition system based on text-independent in a noi...
PPTX
The NNI Query-by-Example System for MediaEval 2014
PPT
Automatic speech recognition
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
CUHK System for the Spoken Web Search task at Mediaeval 2012
Cuhk system 14oct
Cuhk system 14oct_2
GTTS-EHU Systems for QUESST at MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - The NNI Query-by-Example System for MediaEval 2015
Information Retrieval Dynamic Time Warping - Interspeech 2013 presentation
Mediaeval 2013 Spoken Web Search results slides
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
MediaEval 2015 - GTM-UVigo Systems for the Query-by-Example Search on Speech ...
IRJET- Voice based Retrieval for Transport Enquiry System
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
A novel automatic voice recognition system based on text-independent in a noi...
The NNI Query-by-Example System for MediaEval 2014
Automatic speech recognition

More from multimediaeval (20)

PPTX
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
PDF
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
PDF
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
PDF
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
PPTX
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
PDF
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
PDF
Fooling an Automatic Image Quality Estimator
PDF
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
PDF
Pixel Privacy: Quality Camouflage for Social Images
PDF
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
PPTX
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
PDF
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
PDF
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
PPTX
Deep Conditional Adversarial learning for polyp Segmentation
PPTX
A Temporal-Spatial Attention Model for Medical Image Detection
PPTX
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
PDF
Fine-tuning for Polyp Segmentation with Attention
PPTX
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
PPTX
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
PDF
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Fooling an Automatic Image Quality Estimator
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Pixel Privacy: Quality Camouflage for Social Images
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Deep Conditional Adversarial learning for polyp Segmentation
A Temporal-Spatial Attention Model for Medical Image Detection
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
Fine-tuning for Polyp Segmentation with Attention
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

Recently uploaded (20)

PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
assetexplorer- product-overview - presentation
PPTX
Introduction to Artificial Intelligence
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
medical staffing services at VALiNTRY
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Transform Your Business with a Software ERP System
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
assetexplorer- product-overview - presentation
Introduction to Artificial Intelligence
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
2025 Textile ERP Trends: SAP, Odoo & Oracle
Operating system designcfffgfgggggggvggggggggg
Softaken Excel to vCard Converter Software.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
CHAPTER 2 - PM Management and IT Context
Wondershare Filmora 15 Crack With Activation Key [2025
Odoo Companies in India – Driving Business Transformation.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Navsoft: AI-Powered Business Solutions & Custom Software Development
medical staffing services at VALiNTRY
How to Choose the Right IT Partner for Your Business in Malaysia
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PTS Company Brochure 2025 (1).pdf.......
Design an Analysis of Algorithms I-SECS-1021-03
Transform Your Business with a Software ERP System

TUKE System for MediaEval 2014 QUESST

  • 1. TECHNICAL UNIVERSITY OF KOŠICE Laboratory of Speech Technologies in Telecommunications TUKE system for MediaEval 2014 QUESST Jozef VAVREK, Peter VISZLAY, Martin LOJKA, Matúš PLEVA, and Jozef JUHÁR Department of Electronics and Multimedia Communications Technical University of Košice, Slovak Republic {Jozef.Vavrek, Peter.Viszlay, Martin.Lojka, Matus.Pleva, Jozef.Juhar}@tuke.sk Zero-resource approaches Timit ParDat1 SpeechDat CZ SpeechDat SK Tab.1 Evaluation of primary low-resource (p-low) and general zero resource (g-zero) systems (* indicates late submission) dev Cnxe TWV (act/min) (act/min) 0.161/0.162 0.091/0.091 0.191/0.191 0.106/0.107 Acknowledgments eval (act/min) (act/min) 0.959/0.891 0.973/0.934 0.947/0.853 0.970/0.921 0.154/0.154 0.075/0.077 0.168/0.169 0.102/0.103 0.960/0.892 0.974/0.934 0.948/0.854 0.971/0.922 Tab.2 Processing resources measures p-low* g-zero* Searching Algorithm (Weighted Fast Sequential - DTW): 1) one step forward moving strategy, when each DTW search is carried out sequentially, block by block, with size equal to the length of query; 2) linear time-aligned accumulated distance for speeding up sequential DTW without considerable loss in retrieving performance; 3) optimization of global minimum for set of alignment paths by implementing weighted cumulative distance (WCD) parameter. 500 detected candidates This publication is the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: 26220220182, supported by the Research & Development Operational Programme funded by the ERDF (100%). MediaEval 2014: Query by Example Search on Speech Task, 16-17 October 2014, Barcelona, Spain VAD PCA-based Posteriorgrams GMM-based Query Abstract Two approaches to QbE (Query-by-Example) retrieving system, proposed by the Technical University of Košice (TUKE) for the query by example search on speech task (QUESST), are presented in this paper. Our main interest was focused on building such QbE system, which is able to retrieve all given queries with and without using any external speech resources.Therefore we developed posteriorgram-based keyword matching system, which utilizes a novel weighted fast sequential variant of DTW (WFS-DTW) algorithm in order to detect occurrences of each query within the particular utterance file, using two GMM-based acoustic units modeling approaches. The first one, referred as low-resource approach, employs language-dependent phonetic decoders to convert queries and utterances into posteriorgrams. The second one, defined as zero-resource approach, implements combination of unsupervised segmentation and clustering techniques by using only provided utterance files. Results System Overview system Cnxe TWV p-low g-zero system ISF SSF PMUI PMUS PL p-low (dev) 0.61 0.0034 0.05 2.46 0.010 g-zero (dev) 1.50 0.0042 1.40 3.92 0.225 Conclusions and Future Work Phonetic Decoders Utterances Type 1 - 13 MFCC - PCA-based feature selection - K-means clustering (K=75) Type 2 - 39 MFCC - Type 1 + Viterbi seg. & new GMM training Type 3 - 39 MFCC - flat start training (GMM-based) - phone sequences from Type 1 Type 4 - 39 MFCC - GMM-based seg. (64 GM) - EHHM (64 states / 256 GM) Low-resource approaches WFS-DTW Score normalization & Fusion - scaling 0-1 - max-score merging fusion - z-normalization - Still big differences in performance between p-low and g-zero approaches, even if the score fusion technique was applied. - There is also considerable gap between act and min Cnxe despite the fact that the act and max TWV are perfectly calibrated. - An improved calibration/fusion models based on affine transformation and linear-regression will be investigated in the future. The indexing was done using 2xIBM x3650 (Intel E5530 @ 2.4 GHz, 8 cores), 28 GB RAM, under Debian OS. Searching algorithm was running on 52xIBM dx360 M3 cluster (Intel E5645 @ 2.4GHz, 624 cores), 48 GB RAM per node, running on Scientific Linux 6 and Torque 2.5.13. time-aligned & labelled segments new GMM training