SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072
© 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 76
A SURVEY ON AI POWERED PERSONAL ASSISTANT
Mrs. Ashwini Bhamre1, Arpita Moharir2, Vaishnavi Kulkarni3, Rashmi Modgi4, Ninad Kulkarni5
1Assistant Professor, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India
2 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India
3 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India
4 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India
5 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Web accessibility is vital for inclusivity among
visually impaired individuals. This survey paper delves
into designing a web application tailored for their
enhanced usability. Through voice-driven interfaces and
assistive technologies, the objective is seamless
interaction and navigation. Exploring challenges faced by
visually impaired users accessing web applications, the
paper emphasizes auditory feedback and adaptive
interfaces. It details a user-friendly login/signup process
with voice recognition, providing auditory cues. The home
page offers crucial functions like battery status, search
engines, weather updates, and YouTube accessible via
voice or buttons. Addressing privacy, error handling, and
user feedback, the paper advocates inclusive design and
assistive tech integration. Drawing from academic studies,
it consolidates best practices for developers, fostering an
inclusive digital environment for visually impaired users.
Key Words: Web accessibility, Visually impaired,
Voice-driven interfaces, Voice recognition.
1.INTRODUCTION
The recent evolution of virtual assistants (VAs) marks a
significant shift in user interaction and experience within
the realm of technology. These VAs, primarily voice-based,
have transformed how individuals engage with devices,
enabling tasks like controlling lights, playing music, and
conducting voice-based searches. They operate through
stages: text-to-speech, text interpretation, and intention-
to-action, constantly evolving with AI advancements.
Studies reveal the growing reliance on VAs; a survey in
2015 highlighted that a substantial percentage of both
teens and adults in the U.S. anticipated the widespread use
of voice search in the future. Foreseeably, enhanced
speech recognition, empowered by robust internet
connectivity and advanced data processing capabilities,
will augment VA capabilities, making them more intuitive
and efficient.
However, the integration of VAs into daily life raises
questions about user preferences and concerns. While
many anticipate increased VA usage, there's hesitancy
among some individuals or limitations in their usage.
Limited research has explored the factors influencing VA
acceptance comprehensively, especially regarding various
configurations and user requirements.
To delve deeper into user preferences, a study proposes
employing conjoint analysis—a method to gauge user
preferences—regarding three specific aspects: natural
language processing (NLP) performance, pricing, and
privacy concerns. By weighing these factors against each
other, the aim is to unveil user segments and their trade-
offs in accepting VAs.
Simultaneously, the integration of smart assistants into
mobile technology is gaining prominence due to its ease of
access and versatility. A smart assistant is essentially a
virtual assistant implemented on personal computers,
facilitating voice or keyboard inputs and internet-based
remote access. Comprising components like speech-to-text
converters, query processors, and text-to-speech
converters, these systems leverage voice's efficiency in
communication and are integral to computer devices.
Moreover, the digital landscape's transformation has seen
voice searches surpassing text searches, with predictions
indicating that half of all searches will be voice-based by
2024. Virtual assistant software, designed for desktop
usage, aids in managing tasks, providing information, and
improving user productivity. Recent research focuses on
recognizing human activities through voice recognition
systems powered by Natural Language Processing (NLP)
algorithms, aiming to understand user commands and
respond accordingly.
Overall, the evolution of VAs, their potential impact on
user interaction, and the exploration of user preferences
regarding acceptance and usage constitute significant
areas of current research and development in technology.
2. LITERATURE REVIEW/SURVEY
The referenced papers collectively illustrate the
progressive advancements in speech recognition
technology and the wide-ranging applications of voice-
controlled assistants. These innovations encompass
diverse areas, from personal voice assistants to smart
home automation and aiding visually impaired individuals.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072
© 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 77
One notable project, 'JARVIS', amalgamates Artificial
Intelligence with Google platforms, employing markup
language to convert text formats into speech. Emad S.
Othman's project focuses on a personal voice assistant
utilizing Raspberry Pi, demonstrating its architecture
configuration and efficiency in managing various tasks.
Ankit Pandey's smart voice assistant showcases
functionalities such as note-taking, email exchanges, and
calendar scheduling via speech commands, also extending
its utility for appliance monitoring. Subash. S implements
an AI-based virtual assistant adaptable for desktops and
mobiles, translating spoken content into human-readable
data. Yash Mittal et al.'s Smart Home Automation system
utilizes Arduino microcontrollers to process voice
commands for domestic appliances.
Moreover, Rahul Kumar's power-efficient smart home
with a voice assistant focuses on aiding visually impaired
individuals through Raspberry Pi-based hardware.
Jianliang Meng et al. provide an overview of Speech
Recognition Technology, emphasizing the machine's
ability to comprehend vocal input through recognition
patterns and signal processing.
Deepika Sherawat implements a voice-activated desktop
assistant, featuring in-built commands for user queries.
Prof. Emad S. Othman's Voice Controlled Personal
Assistant serves as a surveillance model, catering to blind
individuals and predicting information from spoken input.
Additionally, N. Vignesh's Comparative Study on Voice
Based Chat Bots highlights challenges in chatbot
versatility and proposes an ontology-based approach.
Ankit Pandey, Vaibhav Vashist, Prateek Tiwari, and Sunil's
project focuses on email sending, to-do lists, and web
service tasks, aiming to integrate their voice assistant with
cloud infrastructure for multi-user functionality.
3. METHODOLOGY
Voice assistants operate through programming languages,
enabling them to interpret verbal commands and provide
responses based on user queries. In this context, Python
programming language serves as the foundation for
building AI-based Voice Assistants. For instance, a
command like "Play a song" or "Open Facebook website"
triggers the assistant to respond accordingly by executing
the requested action.
The functionality of the voice assistant involves several
steps:
1. Listening for pauses after a request to understand when
the user has finished speaking.
2. Parsing the user's request by breaking it down into
distinct commands for better comprehension.
3. Comparing these commands with available requests
stored in the assistant's database.
4. Executing the appropriate action based on the
recognized command.
5. Seeking clarification from the user if the request is
ambiguous, ensuring accurate interpretation.
Regarding the Automatic Speech Recognition (ASR)
system, it functions by recording speech, creating a
wavefile, filtering background noise, normalizing volume,
segmenting the speech into elements, and applying
statistical probability analysis to convert it into text
content. The ASR process involves Acoustic Modeling,
Pronunciation Modeling, and Language Modeling, which
collectively aid in recognizing and decoding spoken
elements into comprehensible text commands.
This AI-based system processes recorded speech data
independently, using Artificial Intelligence (AI) algorithms
without human intervention. The processed speech
waveforms are transmitted to the decoder, ultimately
transforming them into text commands for further
execution.
Voice assistants operate on a sequence of sophisticated
algorithms that enable them to comprehend and respond
to user voice commands. The backbone of these systems
lies in programming languages such as Python, providing a
framework for developing AI-based voice assistants. These
assistants are capable of executing diverse tasks, ranging
from playing music or opening websites to managing daily
tasks like setting reminders, sending emails, or fetching
information.
The operation of a voice assistant involves a multi-step
process:
1. Voice Input Interpretation: Upon receiving a verbal
command, the assistant utilizes various techniques to
parse and interpret the spoken words. It leverages natural
language processing (NLP) algorithms to understand the
intent behind the user's query accurately.
2. Command Execution: After parsing the input, the system
matches it with predefined commands or tasks stored in
its database. If a match is found, the assistant executes the
corresponding action or command.
3. Contextual Understanding: Sophisticated voice
assistants are designed to understand context. They
consider previous commands, ongoing conversations, or
user preferences to deliver more personalized and
accurate responses.
4. Learning and Adaptation: AI-powered voice assistants
employ machine learning algorithms to continuously learn
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072
© 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 78
from user interactions, improving their accuracy and
efficiency over time.
Regarding Automatic Speech Recognition (ASR) systems,
their functionality is crucial in the voice assistant's ability
to interpret and process spoken commands. ASR involves
several stages:
- Audio Capture: The process begins by capturing audio
through a microphone or a similar source.
- Preprocessing: Recorded audio undergoes initial
processing steps, including noise reduction and volume
normalization, to enhance the clarity of the voice signal.
- Feature Extraction: Speech signals are converted into a
mathematical representation, typically using techniques
like Mel-frequency cepstral coefficients (MFCCs), to
extract relevant features for analysis.
- Acoustic Modeling: This phase involves modeling sound
units and their variations to determine possible words or
phrases corresponding to the extracted features.
- Language Modeling: Analyzing word sequences and
predicting the likelihood of specific words or phrases
occurring together, based on statistical language models.
- Decoding: Finally, the system decodes the processed
speech data into text, enabling the voice assistant to
understand and act upon the user's command accurately.
These systems continuously evolve with advancements in
AI, machine learning, and natural language processing,
contributing to their increased accuracy and capabilities in
understanding and responding to human speech.
4. EXISTING TECHNOLOGIES
4.1 Accessibility Standards and Guidelines
(WCAG):
A) Principles of WCAG: WCAG is structured around four
core principles: Perceivable, Operable, Understandable,
and Robust (POUR). These principles are further detailed
into specific guidelines and success criteria to ensure the
accessibility of web content for all users, including
individuals with disabilities.
B) Challenges in Implementation: While WCAG offers
comprehensive guidelines, there are challenges in fully
implementing them. Factors such as the complexity of
interactions in modern web applications, dynamic content,
and the rapid evolution of technologies pose difficulties in
adherence to existing guidelines.
4.2 Natural Language Processing (NLP):
A) Components of NLP: NLP encompasses several
elements, including tokenization, syntactic analysis,
semantic analysis, named entity recognition, and machine
learning models such as transformers (e.g., BERT, GPT).
These components collaborate to comprehend and
process human language.
B) Resolving Ambiguity: One of the primary hurdles in
NLP involves addressing ambiguity in natural language
queries. Techniques such as context-aware language
models and entity disambiguation are utilized to enhance
accuracy in interpreting user intents.
4.3 Machine Learning and Voice Recognition:
A) Algorithms for Voice Recognition: Advancements in
machine learning, notably deep learning methods like
Convolutional Neural Networks (CNNs), RNNs, and
transformer models (e.g., BERT), have significantly
bolstered voice recognition accuracy. Strategies like
transfer learning are employed to optimize models,
particularly when dealing with limited data.
B) Addressing Biases: Overcoming biases in voice
recognition models is pivotal for creating fair and inclusive
systems. Approaches such as bias detection, data
augmentation, and curated diverse datasets are explored
to mitigate biases.
Fig -1: NLP Model
4.4 Multi-Modal Interfaces:
A) Challenges in Integration: Designing interfaces that
seamlessly amalgamate multiple interaction modes (voice,
touch, gesture) necessitates thoughtful consideration of
user preferences, environmental contexts, and
technological capabilities across various devices.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072
© 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 79
B) Consistency and Usability: Ensuring uniform and user-
friendly interactions across different interaction modes
presents challenges. Incorporating user testing and
iterative design processes are essential for refining multi-
modal interfaces, ensuring enhanced usability and
accessibility.
5. PROPOSED TECHNOLOGIES
5.1 Voice User Interface (VUI) and Speech
Recognition:
VUIs enable users to interact naturally with technology
using spoken language, significantly improving
accessibility for visually impaired individuals. This hands-
free method offers an intuitive approach to accessing
information and performing tasks. Unlike traditional
interfaces requiring physical input, VUIs allow users to
control devices or applications without manual
interaction, especially beneficial for those with limited
motor abilities. Accurate speech recognition technology
facilitates faster navigation of applications, web browsing,
and task execution compared to conventional input
methods, potentially boosting productivity for visually
impaired users.
5.2 Assistive Technologies:
Screen readers and text-to-speech systems deliver instant
auditory feedback, granting visually impaired users’
immediate access to digital content as it appears on
screens, eliminating accessibility barriers. These
technologies empower visually impaired individuals to
independently access digital content, fostering autonomy
and reducing reliance on external support. Assistive
technologies often offer adjustable settings, such as speech
rate, voice options, and navigation preferences,
accommodating individual user needs and preferences.
Fig -2: Speech Recognition Model
5.3 User-Centric Design and Human-Computer
Interaction (HCI):
HCI principles in user-centered design ensure interfaces
cater specifically to various visual impairments,
considering factors like contrast, font size, and
compatibility with screen readers, enhancing accessibility.
User-centric design emphasizes usability testing and
feedback integration, resulting in interfaces that are
intuitive, easy to learn, and navigate for visually impaired
users. HCI studies advocate for inclusive design strategies,
considering a diverse range of users' abilities and needs,
aiming to create universally accessible interfaces
accommodating a wide spectrum of users.
In summary, these proposed technologies offer improved
accessibility, natural interaction methods, user
empowerment through independence, and interfaces
tailored to meet the unique requirements of visually
impaired individuals. Their integration signifies continued
efforts to bridge accessibility gaps in digital interactions,
promoting inclusivity and equitable access to information
and services.
Fig -3: Proposed Model
6. RESULTS
The final outcome of the project intends to introduce a
fully operational web application tailored specifically for
individuals with visual impairments. This application will
feature an intuitive login interface that incorporates voice-
based inputs for entering user credentials such as name
and phone number, thereby facilitating a smooth login or
signup process. Upon successful authentication, users will
gain access to a user-friendly homepage offering multiple
functionalities. These include checking battery
percentages and a versatile "Ask Me" feature, enabling
voice-activated queries for Google/Wikipedia searches,
weather updates, news, jokes, and access to YouTube
content. By integrating Voice User Interface (VUI)
technologies, assistive tools like screen readers and text-
to-speech systems, and implementing principles from
Human-Computer Interaction (HCI), the intended output
aims to significantly improve accessibility. This
enhancement targets granting visually impaired users a
more independent and seamless interaction within the
digital realm, thereby fostering a more inclusive and
accommodating web experience.
7. CONCLUSION
In summary, this project effectively addresses the pressing
necessity for enhanced accessibility and usability within
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072
© 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 80
web applications designed for visually impaired users.
Through the utilization of cutting-edge technologies such
as VUI, speech recognition, assistive tools, and the
integration of user-centric design principles, the project
strives to narrow the accessibility gap, empowering
visually impaired individuals to interact more seamlessly
with digital platforms.
While each technology introduces its unique challenges,
the project is committed to mitigating these obstacles
through meticulous implementation, iterative design
methodologies, and a keen focus on user input.
Recognizing the significance of continuous enhancement
and adaptation to accommodate evolving user
requirements and technological progressions.
Ultimately, by amalgamating these innovative
technologies and adopting a user-centric design
philosophy, the project aims to foster a more inclusive
digital landscape. This initiative endeavors to provide
visually impaired individuals with equitable opportunities
to access information, independently execute tasks, and
navigate web applications with heightened ease and
efficiency.
REFERENCES
[1] Artificial Intelligence-based Voice Assistant
https://ieeexplore.ieee.org/document/9210344
[2] “Hey, Siri”, “Ok, Google”, “Alexa”. Acceptance - Relevant
Factors of Virtual Voice-Assistant
https://ieeexplore.ieee.org/document/8804568
[3] A Literature Review on Smart Assistant
https://www.academia.edu/download/69546476/IRJET_
V8I4769.pdf
[4] DESKTOP’S VIRTUAL ASSISTANT USING PYTHON
https://www.researchgate.net/profile/Jegadeesan-
Ramalingam/publication/372657833_DESKTOP'S_VIRTU
AL_ASSISTANT_USING_PYTHON/links/64c22f0304d6c44
bc35d350e/DESKTOPS-VIRTUAL-ASSISTANT-USING-
PYTHON.pdf
[5] Intelligent personal assistants: A systematic literature
review
https://www.sciencedirect.com/science/article/abs/pii/S
0957417420300191

More Related Content

PDF
“SKYE : Voice Based AI Desktop Assistant”
PDF
A Review on Personal Digital Voice Assistant
PDF
SURVEY ON SMART VIRTUAL VOICE ASSISTANT
PDF
“Visual Based Virtual Assistant System”
PDF
Desktop Based Voice Assistant Application Using Machine Learning Approach
PDF
Virtual Personal Assistant
PPTX
VOICE-ASSISTANT-IN-PYTHON-pptx.pptx
PPTX
Voice Assistance Technology for integration with smart home ecosystem
“SKYE : Voice Based AI Desktop Assistant”
A Review on Personal Digital Voice Assistant
SURVEY ON SMART VIRTUAL VOICE ASSISTANT
“Visual Based Virtual Assistant System”
Desktop Based Voice Assistant Application Using Machine Learning Approach
Virtual Personal Assistant
VOICE-ASSISTANT-IN-PYTHON-pptx.pptx
Voice Assistance Technology for integration with smart home ecosystem

Similar to A SURVEY ON AI POWERED PERSONAL ASSISTANT (20)

PPT
NoteGPT_AI_PPT_Maker_ personal voice assistant .ppt
PDF
VOX – A Desktop Voice Assistant
PPTX
ppt project pk.pptx
PDF
Voice Assistant Using Python and AI
PPTX
SAP Marethon.pptx
PDF
REPORT ST.pdf
PDF
A Literature Survey On Voice Assistance
PDF
BUILDING PYTHON APPLICATION FOR WEBMAIL INTERFACES NAVIGATION USING VOICE REC...
PDF
Building Python Application for Webmail Interfaces Navigation using Voice Rec...
PPTX
AIproject_Voice_Assistant_Presentation.pptx
PPTX
PERSONAL VOICE ASSISTANT - Copy.pptx
PDF
A Voice Based Assistant Using Google Dialogflow And Machine Learning
PPTX
final_ppt[1].pptxCCCCCCCCCCCCCCCCCCCCCCCC
PPTX
Personal Voice Assistant using python.pptx
PDF
AI SMART BUDDY “MANAV”
PPTX
Voice Assistant.pptx
PDF
Personal Desktop AI Assistant Using Python ( J.A.R.V.I.S )
PDF
Voice Assistant (1).pdf
PDF
Tackling the Problem of Multilingualism in Voice Assistants
PPTX
Multimodal virtual assistant(2170171).pptx
NoteGPT_AI_PPT_Maker_ personal voice assistant .ppt
VOX – A Desktop Voice Assistant
ppt project pk.pptx
Voice Assistant Using Python and AI
SAP Marethon.pptx
REPORT ST.pdf
A Literature Survey On Voice Assistance
BUILDING PYTHON APPLICATION FOR WEBMAIL INTERFACES NAVIGATION USING VOICE REC...
Building Python Application for Webmail Interfaces Navigation using Voice Rec...
AIproject_Voice_Assistant_Presentation.pptx
PERSONAL VOICE ASSISTANT - Copy.pptx
A Voice Based Assistant Using Google Dialogflow And Machine Learning
final_ppt[1].pptxCCCCCCCCCCCCCCCCCCCCCCCC
Personal Voice Assistant using python.pptx
AI SMART BUDDY “MANAV”
Voice Assistant.pptx
Personal Desktop AI Assistant Using Python ( J.A.R.V.I.S )
Voice Assistant (1).pdf
Tackling the Problem of Multilingualism in Voice Assistants
Multimodal virtual assistant(2170171).pptx
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
web development for engineering and engineering
DOCX
573137875-Attendance-Management-System-original
PPTX
Geodesy 1.pptx...............................................
PPT
Project quality management in manufacturing
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
UNIT 4 Total Quality Management .pptx
Digital Logic Computer Design lecture notes
Foundation to blockchain - A guide to Blockchain Tech
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Lecture Notes Electrical Wiring System Components
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
web development for engineering and engineering
573137875-Attendance-Management-System-original
Geodesy 1.pptx...............................................
Project quality management in manufacturing
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

A SURVEY ON AI POWERED PERSONAL ASSISTANT

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072 © 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 76 A SURVEY ON AI POWERED PERSONAL ASSISTANT Mrs. Ashwini Bhamre1, Arpita Moharir2, Vaishnavi Kulkarni3, Rashmi Modgi4, Ninad Kulkarni5 1Assistant Professor, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India 2 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India 3 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India 4 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India 5 U.G. Student, Dept. of Information Technology, P.E.S. Modern College of Engineering, Pune, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Web accessibility is vital for inclusivity among visually impaired individuals. This survey paper delves into designing a web application tailored for their enhanced usability. Through voice-driven interfaces and assistive technologies, the objective is seamless interaction and navigation. Exploring challenges faced by visually impaired users accessing web applications, the paper emphasizes auditory feedback and adaptive interfaces. It details a user-friendly login/signup process with voice recognition, providing auditory cues. The home page offers crucial functions like battery status, search engines, weather updates, and YouTube accessible via voice or buttons. Addressing privacy, error handling, and user feedback, the paper advocates inclusive design and assistive tech integration. Drawing from academic studies, it consolidates best practices for developers, fostering an inclusive digital environment for visually impaired users. Key Words: Web accessibility, Visually impaired, Voice-driven interfaces, Voice recognition. 1.INTRODUCTION The recent evolution of virtual assistants (VAs) marks a significant shift in user interaction and experience within the realm of technology. These VAs, primarily voice-based, have transformed how individuals engage with devices, enabling tasks like controlling lights, playing music, and conducting voice-based searches. They operate through stages: text-to-speech, text interpretation, and intention- to-action, constantly evolving with AI advancements. Studies reveal the growing reliance on VAs; a survey in 2015 highlighted that a substantial percentage of both teens and adults in the U.S. anticipated the widespread use of voice search in the future. Foreseeably, enhanced speech recognition, empowered by robust internet connectivity and advanced data processing capabilities, will augment VA capabilities, making them more intuitive and efficient. However, the integration of VAs into daily life raises questions about user preferences and concerns. While many anticipate increased VA usage, there's hesitancy among some individuals or limitations in their usage. Limited research has explored the factors influencing VA acceptance comprehensively, especially regarding various configurations and user requirements. To delve deeper into user preferences, a study proposes employing conjoint analysis—a method to gauge user preferences—regarding three specific aspects: natural language processing (NLP) performance, pricing, and privacy concerns. By weighing these factors against each other, the aim is to unveil user segments and their trade- offs in accepting VAs. Simultaneously, the integration of smart assistants into mobile technology is gaining prominence due to its ease of access and versatility. A smart assistant is essentially a virtual assistant implemented on personal computers, facilitating voice or keyboard inputs and internet-based remote access. Comprising components like speech-to-text converters, query processors, and text-to-speech converters, these systems leverage voice's efficiency in communication and are integral to computer devices. Moreover, the digital landscape's transformation has seen voice searches surpassing text searches, with predictions indicating that half of all searches will be voice-based by 2024. Virtual assistant software, designed for desktop usage, aids in managing tasks, providing information, and improving user productivity. Recent research focuses on recognizing human activities through voice recognition systems powered by Natural Language Processing (NLP) algorithms, aiming to understand user commands and respond accordingly. Overall, the evolution of VAs, their potential impact on user interaction, and the exploration of user preferences regarding acceptance and usage constitute significant areas of current research and development in technology. 2. LITERATURE REVIEW/SURVEY The referenced papers collectively illustrate the progressive advancements in speech recognition technology and the wide-ranging applications of voice- controlled assistants. These innovations encompass diverse areas, from personal voice assistants to smart home automation and aiding visually impaired individuals.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072 © 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 77 One notable project, 'JARVIS', amalgamates Artificial Intelligence with Google platforms, employing markup language to convert text formats into speech. Emad S. Othman's project focuses on a personal voice assistant utilizing Raspberry Pi, demonstrating its architecture configuration and efficiency in managing various tasks. Ankit Pandey's smart voice assistant showcases functionalities such as note-taking, email exchanges, and calendar scheduling via speech commands, also extending its utility for appliance monitoring. Subash. S implements an AI-based virtual assistant adaptable for desktops and mobiles, translating spoken content into human-readable data. Yash Mittal et al.'s Smart Home Automation system utilizes Arduino microcontrollers to process voice commands for domestic appliances. Moreover, Rahul Kumar's power-efficient smart home with a voice assistant focuses on aiding visually impaired individuals through Raspberry Pi-based hardware. Jianliang Meng et al. provide an overview of Speech Recognition Technology, emphasizing the machine's ability to comprehend vocal input through recognition patterns and signal processing. Deepika Sherawat implements a voice-activated desktop assistant, featuring in-built commands for user queries. Prof. Emad S. Othman's Voice Controlled Personal Assistant serves as a surveillance model, catering to blind individuals and predicting information from spoken input. Additionally, N. Vignesh's Comparative Study on Voice Based Chat Bots highlights challenges in chatbot versatility and proposes an ontology-based approach. Ankit Pandey, Vaibhav Vashist, Prateek Tiwari, and Sunil's project focuses on email sending, to-do lists, and web service tasks, aiming to integrate their voice assistant with cloud infrastructure for multi-user functionality. 3. METHODOLOGY Voice assistants operate through programming languages, enabling them to interpret verbal commands and provide responses based on user queries. In this context, Python programming language serves as the foundation for building AI-based Voice Assistants. For instance, a command like "Play a song" or "Open Facebook website" triggers the assistant to respond accordingly by executing the requested action. The functionality of the voice assistant involves several steps: 1. Listening for pauses after a request to understand when the user has finished speaking. 2. Parsing the user's request by breaking it down into distinct commands for better comprehension. 3. Comparing these commands with available requests stored in the assistant's database. 4. Executing the appropriate action based on the recognized command. 5. Seeking clarification from the user if the request is ambiguous, ensuring accurate interpretation. Regarding the Automatic Speech Recognition (ASR) system, it functions by recording speech, creating a wavefile, filtering background noise, normalizing volume, segmenting the speech into elements, and applying statistical probability analysis to convert it into text content. The ASR process involves Acoustic Modeling, Pronunciation Modeling, and Language Modeling, which collectively aid in recognizing and decoding spoken elements into comprehensible text commands. This AI-based system processes recorded speech data independently, using Artificial Intelligence (AI) algorithms without human intervention. The processed speech waveforms are transmitted to the decoder, ultimately transforming them into text commands for further execution. Voice assistants operate on a sequence of sophisticated algorithms that enable them to comprehend and respond to user voice commands. The backbone of these systems lies in programming languages such as Python, providing a framework for developing AI-based voice assistants. These assistants are capable of executing diverse tasks, ranging from playing music or opening websites to managing daily tasks like setting reminders, sending emails, or fetching information. The operation of a voice assistant involves a multi-step process: 1. Voice Input Interpretation: Upon receiving a verbal command, the assistant utilizes various techniques to parse and interpret the spoken words. It leverages natural language processing (NLP) algorithms to understand the intent behind the user's query accurately. 2. Command Execution: After parsing the input, the system matches it with predefined commands or tasks stored in its database. If a match is found, the assistant executes the corresponding action or command. 3. Contextual Understanding: Sophisticated voice assistants are designed to understand context. They consider previous commands, ongoing conversations, or user preferences to deliver more personalized and accurate responses. 4. Learning and Adaptation: AI-powered voice assistants employ machine learning algorithms to continuously learn
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072 © 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 78 from user interactions, improving their accuracy and efficiency over time. Regarding Automatic Speech Recognition (ASR) systems, their functionality is crucial in the voice assistant's ability to interpret and process spoken commands. ASR involves several stages: - Audio Capture: The process begins by capturing audio through a microphone or a similar source. - Preprocessing: Recorded audio undergoes initial processing steps, including noise reduction and volume normalization, to enhance the clarity of the voice signal. - Feature Extraction: Speech signals are converted into a mathematical representation, typically using techniques like Mel-frequency cepstral coefficients (MFCCs), to extract relevant features for analysis. - Acoustic Modeling: This phase involves modeling sound units and their variations to determine possible words or phrases corresponding to the extracted features. - Language Modeling: Analyzing word sequences and predicting the likelihood of specific words or phrases occurring together, based on statistical language models. - Decoding: Finally, the system decodes the processed speech data into text, enabling the voice assistant to understand and act upon the user's command accurately. These systems continuously evolve with advancements in AI, machine learning, and natural language processing, contributing to their increased accuracy and capabilities in understanding and responding to human speech. 4. EXISTING TECHNOLOGIES 4.1 Accessibility Standards and Guidelines (WCAG): A) Principles of WCAG: WCAG is structured around four core principles: Perceivable, Operable, Understandable, and Robust (POUR). These principles are further detailed into specific guidelines and success criteria to ensure the accessibility of web content for all users, including individuals with disabilities. B) Challenges in Implementation: While WCAG offers comprehensive guidelines, there are challenges in fully implementing them. Factors such as the complexity of interactions in modern web applications, dynamic content, and the rapid evolution of technologies pose difficulties in adherence to existing guidelines. 4.2 Natural Language Processing (NLP): A) Components of NLP: NLP encompasses several elements, including tokenization, syntactic analysis, semantic analysis, named entity recognition, and machine learning models such as transformers (e.g., BERT, GPT). These components collaborate to comprehend and process human language. B) Resolving Ambiguity: One of the primary hurdles in NLP involves addressing ambiguity in natural language queries. Techniques such as context-aware language models and entity disambiguation are utilized to enhance accuracy in interpreting user intents. 4.3 Machine Learning and Voice Recognition: A) Algorithms for Voice Recognition: Advancements in machine learning, notably deep learning methods like Convolutional Neural Networks (CNNs), RNNs, and transformer models (e.g., BERT), have significantly bolstered voice recognition accuracy. Strategies like transfer learning are employed to optimize models, particularly when dealing with limited data. B) Addressing Biases: Overcoming biases in voice recognition models is pivotal for creating fair and inclusive systems. Approaches such as bias detection, data augmentation, and curated diverse datasets are explored to mitigate biases. Fig -1: NLP Model 4.4 Multi-Modal Interfaces: A) Challenges in Integration: Designing interfaces that seamlessly amalgamate multiple interaction modes (voice, touch, gesture) necessitates thoughtful consideration of user preferences, environmental contexts, and technological capabilities across various devices.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072 © 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 79 B) Consistency and Usability: Ensuring uniform and user- friendly interactions across different interaction modes presents challenges. Incorporating user testing and iterative design processes are essential for refining multi- modal interfaces, ensuring enhanced usability and accessibility. 5. PROPOSED TECHNOLOGIES 5.1 Voice User Interface (VUI) and Speech Recognition: VUIs enable users to interact naturally with technology using spoken language, significantly improving accessibility for visually impaired individuals. This hands- free method offers an intuitive approach to accessing information and performing tasks. Unlike traditional interfaces requiring physical input, VUIs allow users to control devices or applications without manual interaction, especially beneficial for those with limited motor abilities. Accurate speech recognition technology facilitates faster navigation of applications, web browsing, and task execution compared to conventional input methods, potentially boosting productivity for visually impaired users. 5.2 Assistive Technologies: Screen readers and text-to-speech systems deliver instant auditory feedback, granting visually impaired users’ immediate access to digital content as it appears on screens, eliminating accessibility barriers. These technologies empower visually impaired individuals to independently access digital content, fostering autonomy and reducing reliance on external support. Assistive technologies often offer adjustable settings, such as speech rate, voice options, and navigation preferences, accommodating individual user needs and preferences. Fig -2: Speech Recognition Model 5.3 User-Centric Design and Human-Computer Interaction (HCI): HCI principles in user-centered design ensure interfaces cater specifically to various visual impairments, considering factors like contrast, font size, and compatibility with screen readers, enhancing accessibility. User-centric design emphasizes usability testing and feedback integration, resulting in interfaces that are intuitive, easy to learn, and navigate for visually impaired users. HCI studies advocate for inclusive design strategies, considering a diverse range of users' abilities and needs, aiming to create universally accessible interfaces accommodating a wide spectrum of users. In summary, these proposed technologies offer improved accessibility, natural interaction methods, user empowerment through independence, and interfaces tailored to meet the unique requirements of visually impaired individuals. Their integration signifies continued efforts to bridge accessibility gaps in digital interactions, promoting inclusivity and equitable access to information and services. Fig -3: Proposed Model 6. RESULTS The final outcome of the project intends to introduce a fully operational web application tailored specifically for individuals with visual impairments. This application will feature an intuitive login interface that incorporates voice- based inputs for entering user credentials such as name and phone number, thereby facilitating a smooth login or signup process. Upon successful authentication, users will gain access to a user-friendly homepage offering multiple functionalities. These include checking battery percentages and a versatile "Ask Me" feature, enabling voice-activated queries for Google/Wikipedia searches, weather updates, news, jokes, and access to YouTube content. By integrating Voice User Interface (VUI) technologies, assistive tools like screen readers and text- to-speech systems, and implementing principles from Human-Computer Interaction (HCI), the intended output aims to significantly improve accessibility. This enhancement targets granting visually impaired users a more independent and seamless interaction within the digital realm, thereby fostering a more inclusive and accommodating web experience. 7. CONCLUSION In summary, this project effectively addresses the pressing necessity for enhanced accessibility and usability within
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 01 | Jan 2024 www.irjet.net p-ISSN: 2395-0072 © 2024, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 80 web applications designed for visually impaired users. Through the utilization of cutting-edge technologies such as VUI, speech recognition, assistive tools, and the integration of user-centric design principles, the project strives to narrow the accessibility gap, empowering visually impaired individuals to interact more seamlessly with digital platforms. While each technology introduces its unique challenges, the project is committed to mitigating these obstacles through meticulous implementation, iterative design methodologies, and a keen focus on user input. Recognizing the significance of continuous enhancement and adaptation to accommodate evolving user requirements and technological progressions. Ultimately, by amalgamating these innovative technologies and adopting a user-centric design philosophy, the project aims to foster a more inclusive digital landscape. This initiative endeavors to provide visually impaired individuals with equitable opportunities to access information, independently execute tasks, and navigate web applications with heightened ease and efficiency. REFERENCES [1] Artificial Intelligence-based Voice Assistant https://ieeexplore.ieee.org/document/9210344 [2] “Hey, Siri”, “Ok, Google”, “Alexa”. Acceptance - Relevant Factors of Virtual Voice-Assistant https://ieeexplore.ieee.org/document/8804568 [3] A Literature Review on Smart Assistant https://www.academia.edu/download/69546476/IRJET_ V8I4769.pdf [4] DESKTOP’S VIRTUAL ASSISTANT USING PYTHON https://www.researchgate.net/profile/Jegadeesan- Ramalingam/publication/372657833_DESKTOP'S_VIRTU AL_ASSISTANT_USING_PYTHON/links/64c22f0304d6c44 bc35d350e/DESKTOPS-VIRTUAL-ASSISTANT-USING- PYTHON.pdf [5] Intelligent personal assistants: A systematic literature review https://www.sciencedirect.com/science/article/abs/pii/S 0957417420300191