SlideShare a Scribd company logo
Large Language Model In
Automatic Speech Recognition
Table Of Content
• Introduction
• Problem Identification
• Literature Review
• Research Gap
• Research Objective
• Proposed Methodology
• Conclusion
• References
INTRODUCTION
• Automatic Speech Recognition (ASR) is a transformative technology
that converts spoken language into text, enabling applications in areas
like virtual assistants, real-time transcription, and accessibility tools.
• Traditional ASR systems rely heavily on separate components,
including acoustic modeling, language modeling, and decoding.
LLM Model
Large Language Models (LLMs), such as OpenAI’s GPT, Google’s T5,
and Meta’s LLaMA, are deep learning models trained on massive
textual datasets. These models excel in understanding, generating, and
contextualizing text. Their ability to comprehend nuances,
disambiguate meaning, and perform reasoning tasks makes them
valuable assets in ASR.
Literature Review
Ref. Focus Area Techniques/Methods Limitations Research Gaps
[1]
Overview of LLMs
and applications
Discusses LLM
downstream tasks like
text generation, and
applications in
healthcare, education,
etc.
Secondary Data
Challenges in domain-
specific adaptation,
hallucinations, lack of
interpretability, ethical
concerns, and computational
cost.
Need for domain-specific LLMs,
strategies to reduce biases and
hallucinations, and improvement in
interpretability of predictions.
[2] Evaluation of LLMs
Explores evaluation
frameworks, tasks, and
benchmarks for LLMs
across domains like NLP,
ethics, and reasoning.
Secondary Data
Limited focus on diverse
tasks and underrepresented
languages
Development of standardized
evaluation protocols addressing
safety, reliability, and robustness
[3]
Deep Learning in
Audio-Visual
Speech
Recognition
(AVSR)
Focus on multimodal
fusion strategies, pre-
processing techniques,
and end-to-end AVSR
architectures using deep
Real-world noise, lack of
large-scale datasets in
diverse languages
Need for large-scale multilingual
AVSR datasets, robust methods for
handling noise and variability in real-
world scenarios
Literature Review
Ref. Focus Area Techniques/Methods Limitations Research Gaps
[4]
Deep Learning
Techniques for
Speech Emotion
Recognition (SER)
Deep learning (LSTM,
CNNs, GANs,
Autoencoders), use of
emotional speech
datasets, feature
extraction methods.
Lack of real-world SER datasets,
limitations in speaker-independent
settings; challenges in noisy
environments.
Need for robust SER systems in
noisy and natural settings;
exploration of multimodal data
integration to enhance emotion
recognition accuracy.
[5] Audio-Visual Speech
Recognition (AVSR)
Deep learning for
modality fusion, pre-
processing,
augmentation, and
end-to-end AVSR
systems.
Limitations in datasets for diverse
languages and real-world noise;
difficulties in managing variability
in speaker characteristics.
Development of large-scale
multilingual datasets; more
robust fusion strategies to
handle noise, accents, and
diverse conditions in real-world
AVSR systems.

More Related Content

PDF
poiuytrewqasdfghjkloiuytrescvbjkl,mnbvcxzsdfghjklkjhgfdcvbnmnbvcxcvbn
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PPTX
NLP,expert,robotics.pptx
PDF
Harnessing the Power of Speech Datasets for Machine Learning Success
PPTX
Natural Language Processing on presnattion
PPTX
Unit 5.ppt Fundamenrtal of Artificial intelligence
poiuytrewqasdfghjkloiuytrescvbjkl,mnbvcxzsdfghjklkjhgfdcvbnmnbvcxcvbn
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
NLP,expert,robotics.pptx
Harnessing the Power of Speech Datasets for Machine Learning Success
Natural Language Processing on presnattion
Unit 5.ppt Fundamenrtal of Artificial intelligence

Similar to Large Language Models in the agriculture (20)

PDF
ICS 2208 Lecture Slide Notes for Topic 6
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
PDF
Natural Language Processing, Techniques, Current Trends and Applications in I...
PPTX
Leveraging-Transformer-Models-for-Multilingual-Abusive-Language-Detection.pptx
PPTX
speech segmentation based on four articles in one.
PDF
A Review Paper on Speech Based Emotion Detection Using Deep Learning
PPT
Lecture1 Natural Language Processing for
PDF
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
PPTX
AI_in_text very important how to do it a
PDF
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
PDF
Natural Language Processing (NLP)
PDF
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
PPTX
PPTX
Networking lesson 4 chaoter 1 Module 4-1.pptx
PDF
Kc3517481754
PDF
Natural Language Processing: L01 introduction
PPTX
Project_Phase1_-_Literature_Review-1[1].pptx
PDF
Natural language processing module 1 chapter 1
PDF
Understanding the Importance of Speech Recognition Datasets in AI Development
PPTX
NLP Section - 02 .Text Processing.pptx
ICS 2208 Lecture Slide Notes for Topic 6
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
Natural Language Processing, Techniques, Current Trends and Applications in I...
Leveraging-Transformer-Models-for-Multilingual-Abusive-Language-Detection.pptx
speech segmentation based on four articles in one.
A Review Paper on Speech Based Emotion Detection Using Deep Learning
Lecture1 Natural Language Processing for
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
AI_in_text very important how to do it a
Crafting Your Customized Legal Mastery: A Guide to Building Your Private LLM
Natural Language Processing (NLP)
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
Networking lesson 4 chaoter 1 Module 4-1.pptx
Kc3517481754
Natural Language Processing: L01 introduction
Project_Phase1_-_Literature_Review-1[1].pptx
Natural language processing module 1 chapter 1
Understanding the Importance of Speech Recognition Datasets in AI Development
NLP Section - 02 .Text Processing.pptx
Ad

Recently uploaded (20)

PPTX
Safety Seminar civil to be ensured for safe working.
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPT
Total quality management ppt for engineering students
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
communication and presentation skills 01
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPT
Occupational Health and Safety Management System
Safety Seminar civil to be ensured for safe working.
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Total quality management ppt for engineering students
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Abrasive, erosive and cavitation wear.pdf
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
communication and presentation skills 01
Fundamentals of safety and accident prevention -final (1).pptx
86236642-Electric-Loco-Shed.pdf jfkduklg
UNIT - 3 Total quality Management .pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Fundamentals of Mechanical Engineering.pptx
III.4.1.2_The_Space_Environment.p pdffdf
Occupational Health and Safety Management System
Ad

Large Language Models in the agriculture

  • 1. Large Language Model In Automatic Speech Recognition
  • 2. Table Of Content • Introduction • Problem Identification • Literature Review • Research Gap • Research Objective • Proposed Methodology • Conclusion • References
  • 3. INTRODUCTION • Automatic Speech Recognition (ASR) is a transformative technology that converts spoken language into text, enabling applications in areas like virtual assistants, real-time transcription, and accessibility tools. • Traditional ASR systems rely heavily on separate components, including acoustic modeling, language modeling, and decoding.
  • 4. LLM Model Large Language Models (LLMs), such as OpenAI’s GPT, Google’s T5, and Meta’s LLaMA, are deep learning models trained on massive textual datasets. These models excel in understanding, generating, and contextualizing text. Their ability to comprehend nuances, disambiguate meaning, and perform reasoning tasks makes them valuable assets in ASR.
  • 5. Literature Review Ref. Focus Area Techniques/Methods Limitations Research Gaps [1] Overview of LLMs and applications Discusses LLM downstream tasks like text generation, and applications in healthcare, education, etc. Secondary Data Challenges in domain- specific adaptation, hallucinations, lack of interpretability, ethical concerns, and computational cost. Need for domain-specific LLMs, strategies to reduce biases and hallucinations, and improvement in interpretability of predictions. [2] Evaluation of LLMs Explores evaluation frameworks, tasks, and benchmarks for LLMs across domains like NLP, ethics, and reasoning. Secondary Data Limited focus on diverse tasks and underrepresented languages Development of standardized evaluation protocols addressing safety, reliability, and robustness [3] Deep Learning in Audio-Visual Speech Recognition (AVSR) Focus on multimodal fusion strategies, pre- processing techniques, and end-to-end AVSR architectures using deep Real-world noise, lack of large-scale datasets in diverse languages Need for large-scale multilingual AVSR datasets, robust methods for handling noise and variability in real- world scenarios
  • 6. Literature Review Ref. Focus Area Techniques/Methods Limitations Research Gaps [4] Deep Learning Techniques for Speech Emotion Recognition (SER) Deep learning (LSTM, CNNs, GANs, Autoencoders), use of emotional speech datasets, feature extraction methods. Lack of real-world SER datasets, limitations in speaker-independent settings; challenges in noisy environments. Need for robust SER systems in noisy and natural settings; exploration of multimodal data integration to enhance emotion recognition accuracy. [5] Audio-Visual Speech Recognition (AVSR) Deep learning for modality fusion, pre- processing, augmentation, and end-to-end AVSR systems. Limitations in datasets for diverse languages and real-world noise; difficulties in managing variability in speaker characteristics. Development of large-scale multilingual datasets; more robust fusion strategies to handle noise, accents, and diverse conditions in real-world AVSR systems.