SlideShare a Scribd company logo
Indexing and Retrieval of 
Audio 
Rachmat Wahid Saleh Insani, S.Kom 
Multimedia Database Management System - Chapter 5
Introduction 
• Audio is classified into three types: speech, music, 
and noise. 
• Different audio types are processed and indexed in 
different ways. 
• Query audio pieces are similarly classified, processed, 
and indexed. 
• Audio pieces are retrieved based on similarity between 
the query index and the audio index in the database. 
Multimedia Database Management System - Chapter 5
Objectives 
• Main audio properties and features. 
• Audio classification. 
• Main speech recognition techniques. 
• General approach in indexing and retrieval. 
• Temporal and content relationship between media 
types. 
Multimedia Database Management System - Chapter 5
Main Audio Properties and 
Features 
• Time domain 
• Frequency domain 
Multimedia Database Management System - Chapter 5
Features Derives in the 
Time Domain 
A signal is represented as amplitude varying with 
time. 
Multimedia Database Management System - Chapter 5
Features Derives in the 
Time Domain 
• Average energy 
• Zero crossing rate 
• Silence ratio 
E = 
NΣ 
Multimedia Database Management System - Chapter 5 
N−1 Σ 
N 
x(n)2 
n=0 
ZC = 
| sgn x(n) − sgn x(n −1) 
n=1 
2N
Features Derived from 
the Frequency Domain 
• Sound spectrum 
Multimedia Database Management System - Chapter 5
Features Derived from 
the Frequency Domain 
• Bandwidth 
• Energy Distribution 
• Harmonicity 
Multimedia Database Management System - Chapter 5 
• Pitch
Timbre 
• Quality of a sound. 
Multimedia Database Management System - Chapter 5
Audio Classification 
Why audio classification is important? 
- Different audio types require different processing and indexing 
retrieval techniques. 
- Different audio types have different significance to different 
applications. 
- Speech is important audio types which is successful speech 
recognition techniques available. 
- Audio types is very useful to some applications. 
- The search space after classification is reduced to a particular 
audio class during the retrieval process. 
Multimedia Database Management System - Chapter 5
Audio Classification 
• There are two types of sound: speech and music. 
Multimedia Database Management System - Chapter 5
Main Characteristics 
Music 
• Music has frequency range 
from 16-20,000 Hz. 
• Music has low silence ratio. 
• Music has regular beats. 
Multimedia Database Management System - Chapter 5 
Speech 
• Speech frequency 
range from 100-7,000 
Hz. 
• Speech has high 
silence ratio. 
• No regular beats.
Audio Classification 
Frameworks 
• Step by Step Classification 
• Feature Vector Based Audio Classification 
Multimedia Database Management System - Chapter 5
Step by Step 
Classification 
Multimedia Database Management System - Chapter 5
Feature Vector Based 
Audio Classification 
Audio pieces of the same class are located close to 
each other in the feature space and audio pieces of 
different classes are located far apart in the feature 
space. 
Multimedia Database Management System - Chapter 5
Speech Recognition 
and Retrieval 
Multimedia Database Management System - Chapter 5
Automatic 
Speech Recognition 
ASR system collect models or feature vectors for all 
possible speech units. Speech unit e.g., phoneme, 
word, and phrases. 
Multimedia Database Management System - Chapter 5
Automatic Speech 
Recognition Factors 
• A phoneme spoken by different speakers or the same 
speaker at different times produces different features in 
term of duration, amplitude, and frequency 
components. 
• The above differences are exacerbated by the 
background or environmental noise. 
• Normal speech is continuous and difficult to separate 
into individual phonemes. 
• Phonemes vary with their location in a word. 
Multimedia Database Management System - Chapter 5
General ASR System 
Multimedia Database Management System - Chapter 5
Speech Recognition 
Performance 
Speech recognition performance is normally measured by 
recognition error rate. The lower the error rate, the higher the 
performance. 
The performance are affected by following factors: 
- Subject matter: this may vary from a set of digits, a 
newspaper article, to general news. 
- Types of speech: read or spontaneous conversation. 
- Size of the vocabulary: it ranges from dozens to a few 
thousand words. 
Multimedia Database Management System - Chapter 5
Music Indexing and 
Retrieval 
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Structured 
Music and Sound Effects 
• Structured music are represented by a set of 
commands. 
• The most common structured music is MIDI. 
• A new standard of structured audio is MPEG-4 
Structured Audio. 
• The formats contains structure and notes 
description. 
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Structured 
Music and Sound Effects 
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of 
Sample Based Music 
• Based on extracted sound features. 
• Based on pitches of music notes. 
Multimedia Database Management System - Chapter 5
Music Retrieval Based on a 
set of Features 
Multimedia Database Management System - Chapter 5
Music Retrieval Based on 
Pitch 
Multimedia Database Management System - Chapter 5
Multimedia Information IR Using 
Relationships between Audio and Other 
Media 
Multimedia Database Management System - Chapter 5

More Related Content

PPTX
Chapter 1 and 2 gonzalez and woods
PPTX
Image proccessing and its application
PPTX
Mpeg video compression
PPTX
WEB INTERFACE DESIGN
PDF
Chapter 8 - Multimedia Storage and Retrieval
PPTX
Cse image processing ppt
PPTX
Final year ppt
PDF
Chapter 5 - Data Compression
Chapter 1 and 2 gonzalez and woods
Image proccessing and its application
Mpeg video compression
WEB INTERFACE DESIGN
Chapter 8 - Multimedia Storage and Retrieval
Cse image processing ppt
Final year ppt
Chapter 5 - Data Compression

What's hot (20)

PPTX
Image compression: Techniques and Application
PPT
Image formats
PPTX
Intro to Multimedia Systems
PPT
Hypertext, multimedia and www
PPTX
Digital image processing
PPTX
IMAGE SEGMENTATION.
PPTX
Digital image processing
PPTX
File Format - Animation Courses, Ahmedabad
PPT
Image segmentation ppt
PPTX
Edge detection
DOCX
Unit 1 a notes
PPTX
Hypermedia messageing (UNIT 5)
PPTX
Image restoration and degradation model
PPT
Chapter 7 : MAKING MULTIMEDIA
PPTX
Stress detection using Image processing
PPTX
Multimedia
PPT
Chapter 6 : VIDEO
PPTX
Homomorphic filtering
PDF
Human computer interaction-web interface design and mobile eco system
PDF
A Brief History of Object Detection / Tommi Kerola
Image compression: Techniques and Application
Image formats
Intro to Multimedia Systems
Hypertext, multimedia and www
Digital image processing
IMAGE SEGMENTATION.
Digital image processing
File Format - Animation Courses, Ahmedabad
Image segmentation ppt
Edge detection
Unit 1 a notes
Hypermedia messageing (UNIT 5)
Image restoration and degradation model
Chapter 7 : MAKING MULTIMEDIA
Stress detection using Image processing
Multimedia
Chapter 6 : VIDEO
Homomorphic filtering
Human computer interaction-web interface design and mobile eco system
A Brief History of Object Detection / Tommi Kerola
Ad

Similar to Indexing and Retrieval of Audio (20)

PPTX
Multimedia database
PPTX
Multimedia database
PPTX
MultiMedia dbms
PPTX
Current trends in DBMS
PPT
Multimedia Information Systems in Information Technology
PPT
Emerging database technology multimedia database
PPTX
Multimedia Database
PPTX
Multimedia Database
PPTX
Multimedia system, Architecture & Databases
PPT
MULTMEDIA DATABASE.ppt
PPTX
UNIT 4 multimediasystemandarchitecture-150525050602-lva1-app6891 (1).pptx
PDF
Integrated Multimedia Indexing and Retrieval
PDF
Multimedia mining research – an overview
ODP
Impress multimedia
PPTX
Introduction to Multimedia.pptx
PPTX
graphics and multimedia applications Chapter Five.pptx
PPT
Chapter 1-Introduction to Media-Past, Present and Future.ppt
DOCX
AUDIO Information and MEDIA (handsout).docx
PPTX
Cs8092 computer graphics and multimedia unit 4
PPTX
Multimedia Database
Multimedia database
Multimedia database
MultiMedia dbms
Current trends in DBMS
Multimedia Information Systems in Information Technology
Emerging database technology multimedia database
Multimedia Database
Multimedia Database
Multimedia system, Architecture & Databases
MULTMEDIA DATABASE.ppt
UNIT 4 multimediasystemandarchitecture-150525050602-lva1-app6891 (1).pptx
Integrated Multimedia Indexing and Retrieval
Multimedia mining research – an overview
Impress multimedia
Introduction to Multimedia.pptx
graphics and multimedia applications Chapter Five.pptx
Chapter 1-Introduction to Media-Past, Present and Future.ppt
AUDIO Information and MEDIA (handsout).docx
Cs8092 computer graphics and multimedia unit 4
Multimedia Database
Ad

More from Rachmat Wahid Saleh Insani (9)

PDF
01 Mengenal Struktur Data
PPTX
#2 LIST | PEMROGRAMAN PYTHON
PPTX
#1 PENGENALAN PYTHON
PDF
Video Indexing and Retrieval
PDF
Image Indexing and Retrieval
PDF
Text Indexing and Retrieval
PDF
Teori Probabilitas
PPT
Certainty Factor Theory
PPTX
DNS (Domain Name System)
01 Mengenal Struktur Data
#2 LIST | PEMROGRAMAN PYTHON
#1 PENGENALAN PYTHON
Video Indexing and Retrieval
Image Indexing and Retrieval
Text Indexing and Retrieval
Teori Probabilitas
Certainty Factor Theory
DNS (Domain Name System)

Recently uploaded (20)

PPTX
Chapter 5: Probability Theory and Statistics
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
project resource management chapter-09.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Chapter 5: Probability Theory and Statistics
OMC Textile Division Presentation 2021.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Enhancing emotion recognition model for a student engagement use case through...
Mushroom cultivation and it's methods.pdf
Zenith AI: Advanced Artificial Intelligence
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
NewMind AI Weekly Chronicles - August'25-Week II
A comparative analysis of optical character recognition models for extracting...
Unlocking AI with Model Context Protocol (MCP)
1 - Historical Antecedents, Social Consideration.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
project resource management chapter-09.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
TLE Review Electricity (Electricity).pptx
Hybrid model detection and classification of lung cancer
Encapsulation_ Review paper, used for researhc scholars
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Web App vs Mobile App What Should You Build First.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf

Indexing and Retrieval of Audio

  • 1. Indexing and Retrieval of Audio Rachmat Wahid Saleh Insani, S.Kom Multimedia Database Management System - Chapter 5
  • 2. Introduction • Audio is classified into three types: speech, music, and noise. • Different audio types are processed and indexed in different ways. • Query audio pieces are similarly classified, processed, and indexed. • Audio pieces are retrieved based on similarity between the query index and the audio index in the database. Multimedia Database Management System - Chapter 5
  • 3. Objectives • Main audio properties and features. • Audio classification. • Main speech recognition techniques. • General approach in indexing and retrieval. • Temporal and content relationship between media types. Multimedia Database Management System - Chapter 5
  • 4. Main Audio Properties and Features • Time domain • Frequency domain Multimedia Database Management System - Chapter 5
  • 5. Features Derives in the Time Domain A signal is represented as amplitude varying with time. Multimedia Database Management System - Chapter 5
  • 6. Features Derives in the Time Domain • Average energy • Zero crossing rate • Silence ratio E = NΣ Multimedia Database Management System - Chapter 5 N−1 Σ N x(n)2 n=0 ZC = | sgn x(n) − sgn x(n −1) n=1 2N
  • 7. Features Derived from the Frequency Domain • Sound spectrum Multimedia Database Management System - Chapter 5
  • 8. Features Derived from the Frequency Domain • Bandwidth • Energy Distribution • Harmonicity Multimedia Database Management System - Chapter 5 • Pitch
  • 9. Timbre • Quality of a sound. Multimedia Database Management System - Chapter 5
  • 10. Audio Classification Why audio classification is important? - Different audio types require different processing and indexing retrieval techniques. - Different audio types have different significance to different applications. - Speech is important audio types which is successful speech recognition techniques available. - Audio types is very useful to some applications. - The search space after classification is reduced to a particular audio class during the retrieval process. Multimedia Database Management System - Chapter 5
  • 11. Audio Classification • There are two types of sound: speech and music. Multimedia Database Management System - Chapter 5
  • 12. Main Characteristics Music • Music has frequency range from 16-20,000 Hz. • Music has low silence ratio. • Music has regular beats. Multimedia Database Management System - Chapter 5 Speech • Speech frequency range from 100-7,000 Hz. • Speech has high silence ratio. • No regular beats.
  • 13. Audio Classification Frameworks • Step by Step Classification • Feature Vector Based Audio Classification Multimedia Database Management System - Chapter 5
  • 14. Step by Step Classification Multimedia Database Management System - Chapter 5
  • 15. Feature Vector Based Audio Classification Audio pieces of the same class are located close to each other in the feature space and audio pieces of different classes are located far apart in the feature space. Multimedia Database Management System - Chapter 5
  • 16. Speech Recognition and Retrieval Multimedia Database Management System - Chapter 5
  • 17. Automatic Speech Recognition ASR system collect models or feature vectors for all possible speech units. Speech unit e.g., phoneme, word, and phrases. Multimedia Database Management System - Chapter 5
  • 18. Automatic Speech Recognition Factors • A phoneme spoken by different speakers or the same speaker at different times produces different features in term of duration, amplitude, and frequency components. • The above differences are exacerbated by the background or environmental noise. • Normal speech is continuous and difficult to separate into individual phonemes. • Phonemes vary with their location in a word. Multimedia Database Management System - Chapter 5
  • 19. General ASR System Multimedia Database Management System - Chapter 5
  • 20. Speech Recognition Performance Speech recognition performance is normally measured by recognition error rate. The lower the error rate, the higher the performance. The performance are affected by following factors: - Subject matter: this may vary from a set of digits, a newspaper article, to general news. - Types of speech: read or spontaneous conversation. - Size of the vocabulary: it ranges from dozens to a few thousand words. Multimedia Database Management System - Chapter 5
  • 21. Music Indexing and Retrieval Multimedia Database Management System - Chapter 5
  • 22. Indexing and Retrieval of Structured Music and Sound Effects • Structured music are represented by a set of commands. • The most common structured music is MIDI. • A new standard of structured audio is MPEG-4 Structured Audio. • The formats contains structure and notes description. Multimedia Database Management System - Chapter 5
  • 23. Indexing and Retrieval of Structured Music and Sound Effects Multimedia Database Management System - Chapter 5
  • 24. Indexing and Retrieval of Sample Based Music • Based on extracted sound features. • Based on pitches of music notes. Multimedia Database Management System - Chapter 5
  • 25. Music Retrieval Based on a set of Features Multimedia Database Management System - Chapter 5
  • 26. Music Retrieval Based on Pitch Multimedia Database Management System - Chapter 5
  • 27. Multimedia Information IR Using Relationships between Audio and Other Media Multimedia Database Management System - Chapter 5