SlideShare a Scribd company logo
SRI RAMAKRISHNA ENGINEERING COLLEGE
Course Instructors:
Mr.J.Judeson Antony Kovilpillai, Asst. Prof./ECE
No. of Credits : 3
20AI202 PRINCIPLES OF ARTIFICIAL INTELLIGENCE
Department of Electronics and Communication Engineering
[Educational Service : SNR Sons Charitable Trust]
[Autonomous Institution, Accredited by NAAC with ‘A’ Grade]
[Approved by AICTE and Permanently Affiliated to Anna University, Chennai]
[ISO 9001-2015 Certified and all eligible programmes Accredited by NBA]
VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022.
2/23/2024 1
VISION OF THE COLLEGE
Vision of the College:
• To develop into a leading world class
Technological University consisting of
Schools of Excellence in various
disciplines with a co-existent Centre for
Engineering Solutions Development for
world-wide clientele.
2/23/2024 2
MISSION OF THE COLLEGE
Mission of the College:
To provide all necessary inputs to the students for
them to grow into knowledge engineers and
scientists attaining.
• Excellence in domain knowledge- practice and
theory.
• Excellence in co-curricular and Extra curricular
talents.
• Excellence in character and personality.
2/23/2024 3
VISION OF THE DEPARTMENT
Vision of the Department:
• To develop Electronics and
Communication Engineers by
keeping pace with changing
technologies, professionalism,
creativity research and
employability.
2/23/2024 4
MISSION OF THE DEPARTMENT
Mission of the Department:
• To provide quality an contemporary education through
effective teaching- learning process that equips the
students with adequate knowledge in Electronics and
Communication Engineering for a successful career.
• To inculcate the students in problem solving and
lifelong learning skills that will enable them to pursue
higher studies and career in research.
• To produce engineers with effective communication
skills, the abilities to lead a team adhering to ethical
values and inclination serve the society.
2/23/2024 5
On successful completion of the course, students
will be able to
CO1 : Understand the basics of Artificial Intelligence
and Intelligent Agents.
CO2 : Apply the problem-solving strategies for real
life scenario applications
CO3 : Make use of Machine learning techniques for
problem solving
CO4 : Analyze the various applications of Artificial
Intelligence
COURSE OUTCOMES
2/23/2024 6
• A simple definition of a Language Model is an AI model that has been
trained to predict the next word or words in a text based on the
preceding words, its part of the technology that predicts the next word
you want to type on
your mobile phone allowing you to complete the message faster.
• The task of predicting the next word/s is referred to as self-supervised
learning, it does not need labels it just needs lots of text
• There is a broad classification of Language Models that fit into two
main groups that are:
• Statistical Language Models: These models use traditional
statistical techniques like N-grams, Hidden Markov Models (HMM) and
certain linguistic rules to learn the probability distribution of words.
• Neural Language Models: These are new players in the NLP town and
have surpassed the statistical language models in their effectiveness.
They use different kinds of Neural Networks to model language.
2/23/2024 7
Language Models
• Let’s begin with the task of computing P(w|H) — probability of word
‘w’, given some history ‘H’.
• Suppose the ‘H’ is ‘its water is so transparent that’, and we want to
know the probability of next word ‘the’: P(the|its water is so transparent
that).
• One way to estimate this probability — relative frequency counts.
• Take a large corpus, count the number of time ‘its water is so transparent
that’ and also count the number of times it has been followed by ‘the’.
2/23/2024 8
N- Gram Model
• While this method of estimating probabilities directly from counts
works fine in many cases, it turns out that even the web isn’t big enough
to give us good estimates in most cases. Why? Because language is
creative, and there are new sentences added everyday, which we wont be
able to count.
• For this reason’s we need to introduce cleverer ways to calculate the
probability of word w given history H.
• To represent the probability of a particular random variable Xi taking on
the value “the”, or P(Xi = “the”), we will use the simplification P(the).
• We’ll represent a sequence of N words either as w1 . . . wn or wn (so the
expression wn−1 means the string w1,w2,…,wn−1). For the joint
probability of each word in a sequence having a particular value P(X =
w1,Y = w2,Z = w3,…,W = wn) we’ll use P(w1,w2,…,wn).
2/23/2024 9
• The intuition of the n-gram model is that instead of computing the
probability of a word given its entire history, we can approximate the
history by just the last few words.
• In the bigram (n = 2) language model the sentence “I saw the red house”
is approximated as:
• P(I, saw, the, red, house) ≈
P(I|‹s›)P(saw | I)P(the | saw)P(red | the)P(house | red)P(‹/s› | house)
• In a trigram (n = 3) language model it will approximate as:
• P(I, saw, the, red, house) ≈
P(I|‹s›, ‹s›)P(saw | ‹s›, I)P(the | I, saw)P(red | saw, the)
P(house | the, red)P(‹/s› | red, house)
Note: ‹s› is a marker denoting the beginning and end of the sentence.
2/23/2024 10
2/23/2024 11
N-Gram Model Example
2/23/2024 12
2/23/2024 13
2/23/2024 14
2/23/2024 15
2/23/2024 16
2/23/2024 17
• Neural Language Models do two things:
• Step 1: Process context → model-specific
 The main idea here is to get a vector representation for
the previous context.
 Using this representation, a model predicts a probability
distribution for the next token.
 This part could be different depending on model
architecture (e.g., RNN, CNN, whatever you want), but
the main point is the same - to encode context.
• Step 2: Generate a probability distribution for the next
token
2/23/2024 18
Neural Language Models
2/23/2024 19
2/23/2024 20
2/23/2024 21
2/23/2024 22
2/23/2024 23
2/23/2024 24
2/23/2024 25
• An information retrieval (IR) system is a set of algorithms that
facilitate the relevance of displayed documents to searched
queries.
• In simple words, it works to sort and rank documents based on
the queries of a user. There is uniformity with respect to the
query and text in the document to enable document
accessibility.
• This also allows a matching function to be used effectively to
rank a document formally using their Retrieval Status Value
(RSV).
• The document contents are represented by a collection of
descriptors, known as terms, that belong to a vocabulary V. An
IR system also extracts feedback on the usability of the
displayed results by tracking the user’s behaviour.
2/23/2024 26
Information retrieval (IR) system
An information retrieval comprises of the
following four key elements:
• D − Document Representation.
• Q − Query Representation.
• F − A framework to match and establish a
relationship between D and Q.
• R (q, di) − A ranking function that determines
the similarity between the query and the
document to display relevant information.
2/23/2024 27
There are three types of Information Retrieval (IR) models:
• Classical IR Model — It is designed upon basic mathematical concepts
and is the most widely-used of IR models. Classic Information Retrieval
models can be implemented with ease. Its examples include Vector-space,
Boolean and Probabilistic IR models. In this system, the retrieval of
information depends on documents containing the defined set of queries.
There is no ranking or grading of any kind. The different classical IR
models take Document Representation, Query representation, and
Retrieval/Matching function into account in their modelling.
• Non-Classical IR Model — They differ from classic models in that they
are built upon propositional logic. Examples of non-classical IR models
include Information Logic, Situation Theory, and Interaction models.
• Alternative IR Model — These take principles of classical IR model and
enhance upon to create more functional models like the Cluster model,
Alternative Set-Theoretic Models Fuzzy Set model, Latent Semantic
Indexing (LSI) model, Alternative Algebraic Models Generalized Vector
Space Model, etc.
2/23/2024 28
• Boolean Model — This model required information to be translated into a
Boolean expression and Boolean queries. The latter is used to determine the
information needed to be able to provide the right match when the Boolean
expression is found to be true. It uses Boolean operations AND, OR, NOT to
create a combination of multiple terms based on what the user asks.
• Vector Space Model — This model takes documents and queries denoted as
vectors and retrieves documents depending on how similar they are. This can
result in two types of vectors which are then used to rank search results either
• Probability Distribution Model — In this model, the documents are
considered as distributions of terms and queries are matched based on the
similarity of these representations. This is made possible using entropy or by
computing the probable utility of the document. They are if two types:
• Probabilistic Models — The probabilistic model is rather simple and takes the
probability ranking to display results. To put it simply, documents are ranked
based on the probability of their relevance to a searched query.
2/23/2024 29
Classical IR Models
Prerequisites for an IR model:
• An automated or manually-operated indexing system
used to index and search techniques and procedures.
• A collection of documents in any one of the following
formats: text, image or multimedia.
• A set of queries that serve as the input to a system, via a
human or machine.
• An evaluation metric to measure or evaluate a system’s
effectiveness (for instance, precision and recall). For
instance, to ensure how useful the information displayed
to the user is.
2/23/2024 30
• Acquisition The IR system sources documents and multimedia information from a
variety of web resources. This data is compiled by web crawlers and is sent to
database storage systems.
• Representation The free-text terms are indexed, and the vocabulary is sorted, both
using automated or manual procedures. For instance, a document abstract will
contain a summary, meta description, bibliography, and details of the authors or co-
authors.
• File Organization File organization is carried out in one of two methods, sequential
or inverted. Sequential file organization involves data contained in the document.
The Inverted file comprises a list of records, in a term by term manner.
• Query An IR system is initiated on entering a query. User queries can either be
formal or informal statements highlighting what information is required. In IR
systems, a query is not indicative of a single object in the database system. It could
refer to several objects whichever match the query. However, their degrees of
relevance may vary.
2/23/2024 31
Components of an Information
Retrieval Model
2/23/2024 32
2/23/2024 33
2/23/2024 34
• Information extraction is the process of extracting specific (pre-
specified) information from textual sources. One of the most trivial
examples is when your email extracts only the data from the message for
you to add in your Calendar.
• Other free-flowing textual sources from which information extraction
can distill structured information are legal acts, medical records, social
media interactions and streams, online news, government documents,
corporate reports and more.
• Gathering detailed structured data from texts, information extraction
enables:
• The automation of tasks such as smart content classification, integrated
search, management and delivery;
• Data-driven activities such as mining for patterns and trends, uncovering
hidden relationships, etc.
2/23/2024 35
Information extraction
2/23/2024 36
• To elaborate a bit on this minimalist way of describing information
extraction, the process involves transforming an unstructured text or a
collection of texts into sets of facts (i.e., formal, machine-readable
statements of the type “Bukowski is the author of Post Office“) that are
further populated (filled) in a database (like an American Literature
database).
• Typically, for structured information to be extracted from unstructured
texts, the following main subtasks are involved:
• Pre-processing of the text – this is where the text is prepared for
processing with the help of computational linguistics tools such as
tokenization, sentence splitting, morphological analysis, etc.
• Finding and classifying concepts – this is where mentions of people,
things, locations, events and other pre-specified types of concepts are
detected and classified.
2/23/2024 37
• Connecting the concepts – this is the task of
identifying relationships between the extracted
concepts.
• Unifying – this subtask is about presenting the
extracted data into a standard form.
• Getting rid of the noise – this subtask involves
eliminating duplicate data.
• Enriching your knowledge base – this is where
the extracted knowledge is ingested in your
database for further use.
2/23/2024 38
• Computers usually won't understand the language we speak or
communicate with. Hence, we break the language, basically the words
and sentences, into tokens and then load it into a program. The process
of breaking down language into tokens is called tokenization.
• For example, consider a simple sentence: "NLP information extraction is
fun''. This could be tokenized into:
• One-word (sometimes called unigram token): NLP, information,
extraction, is, fun
• Two-word phrase (bigram tokens): NLP information, information
extraction, extraction is, is fun, fun NLP
• Three-word sentence (trigram tokens): NLP information extraction,
information extraction is, extraction is fun
2/23/2024 39
Tokenization
• Tagging parts of speech is very crucial for information
extraction from text. It'll help us understand the context of the
text data. We usually refer to text from documents as
''unstructured data'' – data with no defined structure or pattern.
• Hence, with POS tagging we can use techniques that will
provide the context of words or tokens used to categorise them
in specific ways
2/23/2024 40
POS tagging
• Dependency graphs help us find relationships between
neighbouring words using directed graphs.
• This relation will provide details about the dependency type
(e.g. Subject, Object etc.).
• Following is a figure representing a dependency graph of a
short sentence.
• The arrow directed from the word faster indicates that faster
modifies moving, and the label `advmod` assigned to the arrow
describes the exact nature of the dependency.
2/23/2024 41
Dependency graphs
– Event extraction: Given an input document, output zero or more event templates.
For instance, a newspaper article might describe multiple terrorist attacks.
– Named entity recognition: recognition of known entity names (for people and
organizations), place names, temporal expressions, and certain types of numerical
expressions, by employing existing knowledge of the domain or information
extracted from other sentences.
– Typically the recognition task involves assigning a unique identifier to the extracted
entity. A simpler task is named entity detection, which aims at detecting entities
without having any existing knowledge about the entity instances.
– For example, in processing the sentence "M. Smith likes fishing", named entity
detection would denote detecting that the phrase "M. Smith" does refer to a person,
but without necessarily having (or using) any knowledge about a certain M. Smith
who is (or, "might be") the specific person whom that sentence is talking about.
– Coreference resolution: detection of coreference and anaphoric links between text
entities. In IE tasks, this is typically restricted to finding links between previously-
extracted named entities. For example, "International Business Machines" and
"IBM" refer to the same real-world entity. If we take the two sentences "M. Smith
likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is
referring to the previously detected person "M. Smith".
2/23/2024 42
Applications
– Relationship extraction: identification of relations between entities,[such as:
• PERSON works for ORGANIZATION (extracted from the sentence "Bill works for
IBM.")
• PERSON located in LOCATION (extracted from the sentence "Bill is in France.")
– Table extraction: finding and extracting tables from documents
– Table information extraction : extracting information in structured manner from
the tables. This is more complex task than table extraction, as table extraction is
only the first step, while understanding the roles of the cells, rows, columns, linking
the information inside the table and understanding the information presented in the
table are additional tasks necessary for table information extraction.
– Comments extraction : extracting comments from actual content of article in order
to restore the link between author of each sentence
– Terminology extraction: finding the relevant terms for a given corpus
– Template-based music extraction: finding relevant characteristic in an audio signal
taken from a given repertoire; for instance time indexes of occurrences of
percussive sounds can be extracted in order to represent the essential rhythmic
component of a music piece
2/23/2024 43
• NLP stands for Natural Language Processing, which is a part of
Computer Science, Human language, and Artificial Intelligence. It is the
technology that is used by machines to understand, analyse, manipulate,
and interpret human's languages.
• It helps developers to organize knowledge for performing tasks such as
translation, automatic summarization, Named Entity Recognition
(NER), speech recognition, relationship extraction, and topic
segmentation.
• NLP helps users to ask questions about any subject and get a direct
response within seconds. It also offers exact answers to the question
means it does not offer unnecessary and unwanted information.
• NLP helps computers to communicate with humans in their languages.
• Most of the companies use NLP to improve the efficiency of
documentation processes, accuracy of documentation, and identify the
information from large databases.
2/23/2024 44
NLP
2/23/2024 45
• We, as humans, perform natural language processing (NLP) considerably well, but
even then, we are not perfect. We often misunderstand one thing for another, and we
often interpret the same sentences or words differently.
2/23/2024 46
2/23/2024 47
• Speech recognition, also called speech-to-text, is the task of reliably
converting voice data into text data. Speech recognition is required for
any application that follows voice commands or answers spoken
questions. What makes speech recognition especially challenging is the
way people talk—quickly, slurring words together, with varying
emphasis and intonation, in different accents, and often using incorrect
grammar.
• Part of speech tagging, also called grammatical tagging, is the process
of determining the part of speech of a particular word or piece of text
based on its use and context. Part of speech identifies ‘make’ as a verb in
‘I can make a paper plane,’ and as a noun in ‘What make of car do you
own?’
• Word sense disambiguation is the selection of the meaning of a word
with multiple meanings through a process of semantic analysis that
determine the word that makes the most sense in the given context. For
example, word sense disambiguation helps distinguish the meaning of
the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place).
2/23/2024 48
Applications
• Co-reference resolution is the task of identifying if and when
two words refer to the same entity. The most common example
is determining the person or object to which a certain pronoun
refers (e.g., ‘she’ = ‘Mary’), but it can also involve identifying
a metaphor or an idiom in the text (e.g., an instance in which
'bear' isn't an animal but a large hairy person).
• Sentiment analysis attempts to extract subjective qualities—
attitudes, emotions, sarcasm, confusion, suspicion—from text.
• Natural language generation is sometimes described as the
opposite of speech recognition or speech-to-text; it's the task of
putting structured information into human language.
• Named entity recognition, or NEM, identifies words or
phrases as useful entities. NEM identifies ‘Kentucky’ as a
location or ‘Fred’ as a man's name.
2/23/2024 49
• Sentiment analysis is the process of analyzing emotions within a text
and classifying them as positive, negative, or neutral.
• By running sentiment analysis on social media posts, product reviews,
NPS surveys, and customer feedback, businesses can gain valuable
insights about how customers perceive their brand. Take these Zoom
customer and product reviews, for example:
2/23/2024 50
2/23/2024 51
• Spam detection: You may not think of spam detection as an NLP
solution, but the best spam detection technologies use NLP's text
classification capabilities to scan emails for language that often indicates
spam or phishing. These indicators can include overuse of financial
terms, characteristic bad grammar, threatening language, inappropriate
urgency, misspelled company names, and more. Spam detection is one
of a handful of NLP problems that experts consider 'mostly solved'
(although you may argue that this doesn’t match your email experience).
• Machine translation: Google Translate is an example of widely
available NLP technology at work. Truly useful machine translation
involves more than replacing words in one language with words of
another. Effective translation has to capture accurately the meaning and
tone of the input language and translate it to text with the same meaning
and desired impact in the output language. Machine translation tools are
making good progress in terms of accuracy. A great way to test any
machine translation tool is to translate text to one language and then
back to the original.
2/23/2024 52
Use Cases
• Virtual agents and chatbots: Virtual agents such as Apple's Siri and Amazon's
Alexa use speech recognition to recognize patterns in voice commands and natural
language generation to respond with appropriate action or helpful comments.
Chatbots perform the same magic in response to typed text entries. The best of these
also learn to recognize contextual clues about human requests and use them to
provide even better responses or options over time. The next enhancement for these
applications is question answering, the ability to respond to our questions—
anticipated or not—with relevant and helpful answers in their own words.
• Social media sentiment analysis: NLP has become an essential business tool for
uncovering hidden data insights from social media channels. Sentiment analysis can
analyze language used in social media posts, responses, reviews, and more to extract
attitudes and emotions in response to products, promotions, and events–information
companies can use in product designs, advertising campaigns, and more.
• Text summarization: Text summarization uses NLP techniques to digest huge
volumes of digital text and create summaries and synopses for indexes, research
databases, or busy readers who don't have time to read full text. The best text
summarization applications use semantic reasoning and natural language generation
(NLG) to add useful context and conclusions to summaries.
2/23/2024 53
Natural Language Processing is separated in two different approaches:
Rule-based Natural Language Processing:
• It uses common sense reasoning for processing tasks. For instance, the freezing
temperature can lead to death, or hot coffee can burn people’s skin, along with other
common sense reasoning tasks. However, this process can take much time, and it
requires manual effort.
Statistical Natural Language Processing:
• It uses large amounts of data and tries to derive conclusions from it. Statistical NLP
uses machine learning algorithms to train NLP models. After successful training on
large amounts of data, the trained model will have positive outcomes with
deduction.
2/23/2024 54
2/23/2024 55
NLP- Components
a. Lexical Analysis:
• With lexical analysis, we divide a whole chunk of text into paragraphs, sentences,
and words. It involves identifying and analyzing words’ structure.
b. Syntactic Analysis:
• Syntactic analysis involves the analysis of words in a sentence for grammar and
arranging words in a manner that shows the relationship among the words. For
instance, the sentence “The shop goes to the house” does not pass.
c. Semantic Analysis:
• Semantic analysis draws the exact meaning for the words, and it analyzes the text
meaningfulness. Sentences such as “hot ice-cream” do not pass.
d. Disclosure Integration:
• Disclosure integration takes into account the context of the text. It considers the
meaning of the sentence before it ends. For example: “He works at Google.” In this
sentence, “he” must be referenced in the sentence before it.
e. Pragmatic Analysis:
• Pragmatic analysis deals with overall communication and interpretation of language.
It deals with deriving meaningful use of language in various situations.
2/23/2024 56
• Machine translation (MT) is automated translation. It is the process by
which computer software is used to translate a text from one natural
language (such as English) to another (such as Spanish).
• To process any translation, human or automated, the meaning of a text in
the original (source) language must be fully restored in the target
language, i.e. the translation. While on the surface this seems
straightforward, it is far more complex. Translation is not a mere word-
for-word substitution. A translator must interpret and analyze all of the
elements in the text and know how each word may influence another.
This requires extensive expertise in grammar, syntax (sentence
structure), semantics (meanings), etc., in the source and target
languages, as well as familiarity with each local region.
• Human and machine translation each have their share of challenges. For
example, no two individual translators can produce identical translations
of the same text in the same language pair, and it may take several
rounds of revisions to meet customer satisfaction. But the greater
challenge lies in how machine translation can produce publishable
quality translations.
2/23/2024 57
Machine translation (MT)
• Rule-based machine translation relies on countless built-in
linguistic rules and millions of bilingual dictionaries for each
language pair.
• The software parses text and creates a transitional
representation from which the text in the target language is
generated. This process requires extensive lexicons with
morphological, syntactic, and semantic information, and large
sets of rules. The software uses these complex rule sets and
then transfers the grammatical structure of the source language
into the target language.
• Translations are built on gigantic dictionaries and sophisticated
linguistic rules. Users can improve the out-of-the-box
translation quality by adding their terminology into the
translation process. They create user-defined dictionaries which
override the system’s default settings.
2/23/2024 58
Rule-Based Machine Translation Technology
• In most cases, there are two steps: an initial investment that
significantly increases the quality at a limited cost, and an
ongoing investment to increase quality incrementally. While
rule-based MT brings companies to the quality threshold and
beyond, the quality improvement process may be long and
expensive.
• It is the earliest form of MT, rule-based MT, has several serious
disadvantages including requiring significant amounts of
human post-editing, the requirement to manually add
languages, and low quality in general. It has some uses in very
basic situations where a quick understanding of meaning is
required.
2/23/2024 59
• Statistical machine translation utilizes statistical translation
models whose parameters stem from the analysis of
monolingual and bilingual corpora.
• Building statistical translation models is a quick process, but
the technology relies heavily on existing multilingual corpora.
• A minimum of 2 million words for a specific domain and even
more for general language are required. Theoretically it is
possible to reach the quality threshold but most companies do
not have such large amounts of existing multilingual corpora to
build the necessary translation models.
• Additionally, statistical machine translation is CPU intensive
and requires an extensive hardware configuration to run
translation models for average performance levels.
2/23/2024 60
Statistical machine translation
2/23/2024 61
• Neural machine translation (NMT) is an approach to machine
translation that uses an artificial neural network to predict the
likelihood of a sequence of words, typically modeling entire
sentences in a single integrated model. It has the ability to persist
sequential data over several time steps.
2/23/2024 62
Neural machine translation
• Encoder : reads the input sequence of words from
source language and encodes that information into a real
valued vectors also known as the hidden state or thought
vector or context vector. Thought vector encodes the
“meaning” of the input sequence into a single vector. The
encoder outputs are discarded and only the hidden or
internal states are passed as initial inputs to the decoder
• Decoder: takes the thought vector from the encoder as
an input along with the start-of-string <START> token as
the initial input to produce an output sequence.
2/23/2024 63
• LSTM is Long Short Term Memory and is capable of
learning long term dependencies quickly. LSTM can
learn to bridge time intervals in excess of 1000 steps
• LSTM remember information over a long time steps.
They do this by deciding what to remember and what to
forget.
• Cell states play a key role in LSTM. LSTM can decide if
they want to add or remove information from a cell state
by determining how much information should flow
through.
2/23/2024 64
• Basic units of LSTM networks are LSTM layers that have
multiple LSTM cells. Cells do have internal cell state, often
abbreviated as "c", and cells output is what is called a
"hidden state", abbreviated as "h".
2/23/2024 65
• Speech recognition refers to a computer interpreting the words spoken
by a person and converting them to a format that is understandable by a
machine. Depending on the end-goal, it is then converted to text or voice
or another required format.
• For instance, Apple’s Siri and Google’s Alexa use AI-powered speech
recognition to provide voice or text support whereas voice-to-text
applications like Google Dictate transcribe your dictated words to text.
Voice recognition is another form of speech recognition where a source
sound is recognized and matched to a person’s voice.
• Speech recognition AI applications have seen significant growth in
numbers in recent times as businesses are increasingly adopting digital
assistants and automated support to streamline their services. Voice
assistants, smart home devices, search engines, etc are a few examples
where speech recognition has seen prominence
2/23/2024 66
Speech recognition
2/23/2024 67
• Speech recognition is fast overcoming the challenges of poor recording
equipment and noise cancellation, variations in people’s voices, accents,
dialects, semantics, contexts, etc using artificial intelligence and machine
learning.
• This also includes challenges of understanding human disposition, and the
varying human language elements like colloquialisms, acronyms, etc.
• Furthermore, it is now an acceptable format of communication given the large
companies that endorse it and regularly employ speech recognition in their
operations. It is estimated that a majority of search engines will adopt voice
technology as an integral aspect of their search mechanism.
• This has been made possible because of improved AI and machine learning
(ML) algorithms which can process significantly large datasets and provide
greater accuracy by self-learning and adapting to evolving changes.
• Machines are programmed to “listen” to accents, dialects, contexts, emotions
and process sophisticated and arbitrary data that is readily accessible for mining
and machine learning purposes.
2/23/2024 68
• From smart home devices and appliances that take instructions, and
can be switched on and off remotely, digital assistants that can set
reminders, schedule meetings, recognize a song playing in a pub, to
search engines that respond with relevant search results to user
queries, speech recognition has become an indispensable part of our
lives.
• Plenty of businesses now include speech-to-text software to enhance
their business applications and streamline the customer experience.
• Using speech recognition and natural language processing,
companies can transcribe calls, meetings, and even translate them.
• Apple, Google, Facebook, Microsoft, and Amazon are among the
tech giants who continue to leverage AI-backed speech recognition
applications to provide an exemplary user experience.
2/23/2024 69
• The best kind of systems also allow organizations to customize and
adapt the technology to their specific requirements — everything from
language and nuances of speech to brand recognition. For example:
• Language weighting: Improve precision by weighting specific words
that are spoken frequently (such as product names or industry jargon),
beyond terms already in the base vocabulary.
• Speaker labeling: Output a transcription that cites or tags each
speaker’s contributions to a multi-participant conversation.
• Acoustics training: Attend to the acoustical side of the business. Train
the system to adapt to an acoustic environment (like the ambient noise in
a call center) and speaker styles (like voice pitch, volume and pace).
• Profanity filtering: Use filters to identify certain words or phrases and
sanitize speech output.
2/23/2024 70
• Voice-based speech recognition software is now used to initiate
purchases, send emails, transcribe meetings, doctor appointments, and
court proceedings, etc.
• Virtual assistants or digital assistants and smart home devices use voice
recognition software to answer questions, provide weather news, play
music, check traffic, place an order, and so on.
• Companies like Venmo and PayPal allow customers to make
transactions using voice assistants. Several banks in North America and
Canada also provide online banking using voice-based software.
• Ecommerce is significantly powered by voice-based assistants and
allows users to make purchases quickly and seamlessly.
• Speech recognition is poised to impact transportation services and
streamline scheduling, routing, and navigating across cities.
2/23/2024 71
USE CASES
• It is also used to provide accurate subtitles to a video.
• Podcasts, meetings, and journalist interviews can be transcribed
using voice recognition.
• There has been a huge impact on security through voice
biometry where the technology analyses the varying
frequencies, tone and pitch of an individual’s voice to create a
voice profile.
• An example of this is Switzerland’s telecom company
Swisscom which has enabled voice authentication technology
in its call centres to prevent security breaches.
• Customer care services are being traced by AI-based voice
assistants, and chatbots to automate repeatable tasks.
2/23/2024 72
• Healthcare: Doctors and nurses leverage dictation applications to
capture and log patient diagnoses and treatment notes.
• Sales: Speech recognition technology has a couple of applications in
sales. It can help a call center transcribe thousands of phone calls
between customers and agents to identify common call patterns and
issues
• Cognitive bots can also talk to people via a webpage, answering
common queries and solving basic requests without needing to wait for a
contact center agent to be available. In both instances speech recognition
systems help reduce time to resolution for consumer issues.
• Security: As technology integrates into our daily lives, security
protocols are an increasing priority. Voice-based authentication adds a
viable level of security.
2/23/2024 73
• Natural language processing (NLP): While NLP isn’t
necessarily a specific algorithm used in speech recognition, it is
the area of artificial intelligence which focuses on the interaction
between humans and machines through language through speech
and text. Many mobile devices incorporate speech recognition into
their systems to conduct voice search—e.g. Siri—or provide more
accessibility around texting.
• N-grams: This is the simplest type of language model (LM),
which assigns probabilities to sentences or phrases. An N-gram is
sequence of N-words. For example, “order the pizza” is a trigram
or 3-gram and “please order the pizza” is a 4-gram. Grammar and
the probability of certain word sequences are used to improve
recognition and accuracy.
2/23/2024 74
• Neural networks: Primarily leveraged for deep learning algorithms, neural networks
process training data by mimicking the interconnectivity of the human brain through
layers of nodes. Each node is made up of inputs, weights, a bias (or threshold) and an
output.
• If that output value exceeds a given threshold, it “fires” or activates the node, passing
data to the next layer in the network. Neural networks learn this mapping function
through supervised learning, adjusting based on the loss function through the process of
gradient descent. While neural networks tend to be more accurate and can accept more
data, this comes at a performance efficiency cost as they tend to be slower to train
compared to traditional language models.
2/23/2024 75
2/23/2024 76

More Related Content

PPTX
CS8691 – Artificial Intelligence unit questions
PDF
Language Models for Information Retrieval
PPTX
Tdm information retrieval
PPTX
Language Models for Information Retrieval
PPTX
Chapter 1.pptx
PDF
Chapter 1 Introduction to Information Storage and Retrieval.pdf
PPTX
Information retrival system and PageRank algorithm
PDF
ICDIM 06 Web IR Tutorial [Compatibility Mode].pdf
CS8691 – Artificial Intelligence unit questions
Language Models for Information Retrieval
Tdm information retrieval
Language Models for Information Retrieval
Chapter 1.pptx
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Information retrival system and PageRank algorithm
ICDIM 06 Web IR Tutorial [Compatibility Mode].pdf

Similar to Unit_4- Principles of AI explaining the importants of AI (20)

PPT
Information Retrieval and Storage Systems
PPTX
Information retrieval introduction
PPTX
PPT Unit 5=software- engineering-21.pptx
PDF
Information Retrieval Fundamentals - An introduction
PPT
2-Chapter Two-N-gram Language Models.ppt
PPTX
Introduction to Information Retrieval (concepts and principles)
PPTX
Lecture 01: Machine Learning for Language Technology - Introduction
PPT
Language Modeling Putting a curve to the bag of words
PDF
Is this document relevant probably
PDF
Chapter 1 Introduction to ISR (1).pdf
DOCX
Language Modeling.docx
PPTX
Chapter 1 Intro Information Rerieval.pptx
PPT
Information retrival system it is part and parcel
PPT
information retirval system,search info insights in unsturtcured data
PDF
Natural_Language_processing_Unit_2_notes.pdf
PPTX
information retrival in natural language processing.pptx
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
PDF
CS8080_IRT__UNIT_I_NOTES.pdf
PDF
Information Retrieval and Storage Systems
Information retrieval introduction
PPT Unit 5=software- engineering-21.pptx
Information Retrieval Fundamentals - An introduction
2-Chapter Two-N-gram Language Models.ppt
Introduction to Information Retrieval (concepts and principles)
Lecture 01: Machine Learning for Language Technology - Introduction
Language Modeling Putting a curve to the bag of words
Is this document relevant probably
Chapter 1 Introduction to ISR (1).pdf
Language Modeling.docx
Chapter 1 Intro Information Rerieval.pptx
Information retrival system it is part and parcel
information retirval system,search info insights in unsturtcured data
Natural_Language_processing_Unit_2_notes.pdf
information retrival in natural language processing.pptx
Information_Retrieval_Models_Nfaoui_El_Habib
CS8080_IRT__UNIT_I_NOTES.pdf
Ad

More from VijayAECE1 (7)

PDF
Answer key for pattern recognition and machine learning
PDF
Know this information design for department Library
PPTX
How to use SIMULINK.pptx
PPTX
UJT.pptx
PPT
Tunnel Diode.ppt
PPT
SEMICONDUCTOR PHYSICS.ppt
PPTX
Clipper Circuit.pptx
Answer key for pattern recognition and machine learning
Know this information design for department Library
How to use SIMULINK.pptx
UJT.pptx
Tunnel Diode.ppt
SEMICONDUCTOR PHYSICS.ppt
Clipper Circuit.pptx
Ad

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PDF
composite construction of structures.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
PPT on Performance Review to get promotions
PPTX
Geodesy 1.pptx...............................................
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Construction Project Organization Group 2.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
OOP with Java - Java Introduction (Basics)
UNIT 4 Total Quality Management .pptx
composite construction of structures.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPT on Performance Review to get promotions
Geodesy 1.pptx...............................................
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
573137875-Attendance-Management-System-original
Construction Project Organization Group 2.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
III.4.1.2_The_Space_Environment.p pdffdf
Internet of Things (IOT) - A guide to understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
OOP with Java - Java Introduction (Basics)

Unit_4- Principles of AI explaining the importants of AI

  • 1. SRI RAMAKRISHNA ENGINEERING COLLEGE Course Instructors: Mr.J.Judeson Antony Kovilpillai, Asst. Prof./ECE No. of Credits : 3 20AI202 PRINCIPLES OF ARTIFICIAL INTELLIGENCE Department of Electronics and Communication Engineering [Educational Service : SNR Sons Charitable Trust] [Autonomous Institution, Accredited by NAAC with ‘A’ Grade] [Approved by AICTE and Permanently Affiliated to Anna University, Chennai] [ISO 9001-2015 Certified and all eligible programmes Accredited by NBA] VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022. 2/23/2024 1
  • 2. VISION OF THE COLLEGE Vision of the College: • To develop into a leading world class Technological University consisting of Schools of Excellence in various disciplines with a co-existent Centre for Engineering Solutions Development for world-wide clientele. 2/23/2024 2
  • 3. MISSION OF THE COLLEGE Mission of the College: To provide all necessary inputs to the students for them to grow into knowledge engineers and scientists attaining. • Excellence in domain knowledge- practice and theory. • Excellence in co-curricular and Extra curricular talents. • Excellence in character and personality. 2/23/2024 3
  • 4. VISION OF THE DEPARTMENT Vision of the Department: • To develop Electronics and Communication Engineers by keeping pace with changing technologies, professionalism, creativity research and employability. 2/23/2024 4
  • 5. MISSION OF THE DEPARTMENT Mission of the Department: • To provide quality an contemporary education through effective teaching- learning process that equips the students with adequate knowledge in Electronics and Communication Engineering for a successful career. • To inculcate the students in problem solving and lifelong learning skills that will enable them to pursue higher studies and career in research. • To produce engineers with effective communication skills, the abilities to lead a team adhering to ethical values and inclination serve the society. 2/23/2024 5
  • 6. On successful completion of the course, students will be able to CO1 : Understand the basics of Artificial Intelligence and Intelligent Agents. CO2 : Apply the problem-solving strategies for real life scenario applications CO3 : Make use of Machine learning techniques for problem solving CO4 : Analyze the various applications of Artificial Intelligence COURSE OUTCOMES 2/23/2024 6
  • 7. • A simple definition of a Language Model is an AI model that has been trained to predict the next word or words in a text based on the preceding words, its part of the technology that predicts the next word you want to type on your mobile phone allowing you to complete the message faster. • The task of predicting the next word/s is referred to as self-supervised learning, it does not need labels it just needs lots of text • There is a broad classification of Language Models that fit into two main groups that are: • Statistical Language Models: These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM) and certain linguistic rules to learn the probability distribution of words. • Neural Language Models: These are new players in the NLP town and have surpassed the statistical language models in their effectiveness. They use different kinds of Neural Networks to model language. 2/23/2024 7 Language Models
  • 8. • Let’s begin with the task of computing P(w|H) — probability of word ‘w’, given some history ‘H’. • Suppose the ‘H’ is ‘its water is so transparent that’, and we want to know the probability of next word ‘the’: P(the|its water is so transparent that). • One way to estimate this probability — relative frequency counts. • Take a large corpus, count the number of time ‘its water is so transparent that’ and also count the number of times it has been followed by ‘the’. 2/23/2024 8 N- Gram Model
  • 9. • While this method of estimating probabilities directly from counts works fine in many cases, it turns out that even the web isn’t big enough to give us good estimates in most cases. Why? Because language is creative, and there are new sentences added everyday, which we wont be able to count. • For this reason’s we need to introduce cleverer ways to calculate the probability of word w given history H. • To represent the probability of a particular random variable Xi taking on the value “the”, or P(Xi = “the”), we will use the simplification P(the). • We’ll represent a sequence of N words either as w1 . . . wn or wn (so the expression wn−1 means the string w1,w2,…,wn−1). For the joint probability of each word in a sequence having a particular value P(X = w1,Y = w2,Z = w3,…,W = wn) we’ll use P(w1,w2,…,wn). 2/23/2024 9
  • 10. • The intuition of the n-gram model is that instead of computing the probability of a word given its entire history, we can approximate the history by just the last few words. • In the bigram (n = 2) language model the sentence “I saw the red house” is approximated as: • P(I, saw, the, red, house) ≈ P(I|‹s›)P(saw | I)P(the | saw)P(red | the)P(house | red)P(‹/s› | house) • In a trigram (n = 3) language model it will approximate as: • P(I, saw, the, red, house) ≈ P(I|‹s›, ‹s›)P(saw | ‹s›, I)P(the | I, saw)P(red | saw, the) P(house | the, red)P(‹/s› | red, house) Note: ‹s› is a marker denoting the beginning and end of the sentence. 2/23/2024 10
  • 18. • Neural Language Models do two things: • Step 1: Process context → model-specific  The main idea here is to get a vector representation for the previous context.  Using this representation, a model predicts a probability distribution for the next token.  This part could be different depending on model architecture (e.g., RNN, CNN, whatever you want), but the main point is the same - to encode context. • Step 2: Generate a probability distribution for the next token 2/23/2024 18 Neural Language Models
  • 26. • An information retrieval (IR) system is a set of algorithms that facilitate the relevance of displayed documents to searched queries. • In simple words, it works to sort and rank documents based on the queries of a user. There is uniformity with respect to the query and text in the document to enable document accessibility. • This also allows a matching function to be used effectively to rank a document formally using their Retrieval Status Value (RSV). • The document contents are represented by a collection of descriptors, known as terms, that belong to a vocabulary V. An IR system also extracts feedback on the usability of the displayed results by tracking the user’s behaviour. 2/23/2024 26 Information retrieval (IR) system
  • 27. An information retrieval comprises of the following four key elements: • D − Document Representation. • Q − Query Representation. • F − A framework to match and establish a relationship between D and Q. • R (q, di) − A ranking function that determines the similarity between the query and the document to display relevant information. 2/23/2024 27
  • 28. There are three types of Information Retrieval (IR) models: • Classical IR Model — It is designed upon basic mathematical concepts and is the most widely-used of IR models. Classic Information Retrieval models can be implemented with ease. Its examples include Vector-space, Boolean and Probabilistic IR models. In this system, the retrieval of information depends on documents containing the defined set of queries. There is no ranking or grading of any kind. The different classical IR models take Document Representation, Query representation, and Retrieval/Matching function into account in their modelling. • Non-Classical IR Model — They differ from classic models in that they are built upon propositional logic. Examples of non-classical IR models include Information Logic, Situation Theory, and Interaction models. • Alternative IR Model — These take principles of classical IR model and enhance upon to create more functional models like the Cluster model, Alternative Set-Theoretic Models Fuzzy Set model, Latent Semantic Indexing (LSI) model, Alternative Algebraic Models Generalized Vector Space Model, etc. 2/23/2024 28
  • 29. • Boolean Model — This model required information to be translated into a Boolean expression and Boolean queries. The latter is used to determine the information needed to be able to provide the right match when the Boolean expression is found to be true. It uses Boolean operations AND, OR, NOT to create a combination of multiple terms based on what the user asks. • Vector Space Model — This model takes documents and queries denoted as vectors and retrieves documents depending on how similar they are. This can result in two types of vectors which are then used to rank search results either • Probability Distribution Model — In this model, the documents are considered as distributions of terms and queries are matched based on the similarity of these representations. This is made possible using entropy or by computing the probable utility of the document. They are if two types: • Probabilistic Models — The probabilistic model is rather simple and takes the probability ranking to display results. To put it simply, documents are ranked based on the probability of their relevance to a searched query. 2/23/2024 29 Classical IR Models
  • 30. Prerequisites for an IR model: • An automated or manually-operated indexing system used to index and search techniques and procedures. • A collection of documents in any one of the following formats: text, image or multimedia. • A set of queries that serve as the input to a system, via a human or machine. • An evaluation metric to measure or evaluate a system’s effectiveness (for instance, precision and recall). For instance, to ensure how useful the information displayed to the user is. 2/23/2024 30
  • 31. • Acquisition The IR system sources documents and multimedia information from a variety of web resources. This data is compiled by web crawlers and is sent to database storage systems. • Representation The free-text terms are indexed, and the vocabulary is sorted, both using automated or manual procedures. For instance, a document abstract will contain a summary, meta description, bibliography, and details of the authors or co- authors. • File Organization File organization is carried out in one of two methods, sequential or inverted. Sequential file organization involves data contained in the document. The Inverted file comprises a list of records, in a term by term manner. • Query An IR system is initiated on entering a query. User queries can either be formal or informal statements highlighting what information is required. In IR systems, a query is not indicative of a single object in the database system. It could refer to several objects whichever match the query. However, their degrees of relevance may vary. 2/23/2024 31 Components of an Information Retrieval Model
  • 35. • Information extraction is the process of extracting specific (pre- specified) information from textual sources. One of the most trivial examples is when your email extracts only the data from the message for you to add in your Calendar. • Other free-flowing textual sources from which information extraction can distill structured information are legal acts, medical records, social media interactions and streams, online news, government documents, corporate reports and more. • Gathering detailed structured data from texts, information extraction enables: • The automation of tasks such as smart content classification, integrated search, management and delivery; • Data-driven activities such as mining for patterns and trends, uncovering hidden relationships, etc. 2/23/2024 35 Information extraction
  • 37. • To elaborate a bit on this minimalist way of describing information extraction, the process involves transforming an unstructured text or a collection of texts into sets of facts (i.e., formal, machine-readable statements of the type “Bukowski is the author of Post Office“) that are further populated (filled) in a database (like an American Literature database). • Typically, for structured information to be extracted from unstructured texts, the following main subtasks are involved: • Pre-processing of the text – this is where the text is prepared for processing with the help of computational linguistics tools such as tokenization, sentence splitting, morphological analysis, etc. • Finding and classifying concepts – this is where mentions of people, things, locations, events and other pre-specified types of concepts are detected and classified. 2/23/2024 37
  • 38. • Connecting the concepts – this is the task of identifying relationships between the extracted concepts. • Unifying – this subtask is about presenting the extracted data into a standard form. • Getting rid of the noise – this subtask involves eliminating duplicate data. • Enriching your knowledge base – this is where the extracted knowledge is ingested in your database for further use. 2/23/2024 38
  • 39. • Computers usually won't understand the language we speak or communicate with. Hence, we break the language, basically the words and sentences, into tokens and then load it into a program. The process of breaking down language into tokens is called tokenization. • For example, consider a simple sentence: "NLP information extraction is fun''. This could be tokenized into: • One-word (sometimes called unigram token): NLP, information, extraction, is, fun • Two-word phrase (bigram tokens): NLP information, information extraction, extraction is, is fun, fun NLP • Three-word sentence (trigram tokens): NLP information extraction, information extraction is, extraction is fun 2/23/2024 39 Tokenization
  • 40. • Tagging parts of speech is very crucial for information extraction from text. It'll help us understand the context of the text data. We usually refer to text from documents as ''unstructured data'' – data with no defined structure or pattern. • Hence, with POS tagging we can use techniques that will provide the context of words or tokens used to categorise them in specific ways 2/23/2024 40 POS tagging
  • 41. • Dependency graphs help us find relationships between neighbouring words using directed graphs. • This relation will provide details about the dependency type (e.g. Subject, Object etc.). • Following is a figure representing a dependency graph of a short sentence. • The arrow directed from the word faster indicates that faster modifies moving, and the label `advmod` assigned to the arrow describes the exact nature of the dependency. 2/23/2024 41 Dependency graphs
  • 42. – Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks. – Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. – Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. – For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about. – Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously- extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". 2/23/2024 42 Applications
  • 43. – Relationship extraction: identification of relations between entities,[such as: • PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") • PERSON located in LOCATION (extracted from the sentence "Bill is in France.") – Table extraction: finding and extracting tables from documents – Table information extraction : extracting information in structured manner from the tables. This is more complex task than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction. – Comments extraction : extracting comments from actual content of article in order to restore the link between author of each sentence – Terminology extraction: finding the relevant terms for a given corpus – Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece 2/23/2024 43
  • 44. • NLP stands for Natural Language Processing, which is a part of Computer Science, Human language, and Artificial Intelligence. It is the technology that is used by machines to understand, analyse, manipulate, and interpret human's languages. • It helps developers to organize knowledge for performing tasks such as translation, automatic summarization, Named Entity Recognition (NER), speech recognition, relationship extraction, and topic segmentation. • NLP helps users to ask questions about any subject and get a direct response within seconds. It also offers exact answers to the question means it does not offer unnecessary and unwanted information. • NLP helps computers to communicate with humans in their languages. • Most of the companies use NLP to improve the efficiency of documentation processes, accuracy of documentation, and identify the information from large databases. 2/23/2024 44 NLP
  • 46. • We, as humans, perform natural language processing (NLP) considerably well, but even then, we are not perfect. We often misunderstand one thing for another, and we often interpret the same sentences or words differently. 2/23/2024 46
  • 48. • Speech recognition, also called speech-to-text, is the task of reliably converting voice data into text data. Speech recognition is required for any application that follows voice commands or answers spoken questions. What makes speech recognition especially challenging is the way people talk—quickly, slurring words together, with varying emphasis and intonation, in different accents, and often using incorrect grammar. • Part of speech tagging, also called grammatical tagging, is the process of determining the part of speech of a particular word or piece of text based on its use and context. Part of speech identifies ‘make’ as a verb in ‘I can make a paper plane,’ and as a noun in ‘What make of car do you own?’ • Word sense disambiguation is the selection of the meaning of a word with multiple meanings through a process of semantic analysis that determine the word that makes the most sense in the given context. For example, word sense disambiguation helps distinguish the meaning of the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place). 2/23/2024 48 Applications
  • 49. • Co-reference resolution is the task of identifying if and when two words refer to the same entity. The most common example is determining the person or object to which a certain pronoun refers (e.g., ‘she’ = ‘Mary’), but it can also involve identifying a metaphor or an idiom in the text (e.g., an instance in which 'bear' isn't an animal but a large hairy person). • Sentiment analysis attempts to extract subjective qualities— attitudes, emotions, sarcasm, confusion, suspicion—from text. • Natural language generation is sometimes described as the opposite of speech recognition or speech-to-text; it's the task of putting structured information into human language. • Named entity recognition, or NEM, identifies words or phrases as useful entities. NEM identifies ‘Kentucky’ as a location or ‘Fred’ as a man's name. 2/23/2024 49
  • 50. • Sentiment analysis is the process of analyzing emotions within a text and classifying them as positive, negative, or neutral. • By running sentiment analysis on social media posts, product reviews, NPS surveys, and customer feedback, businesses can gain valuable insights about how customers perceive their brand. Take these Zoom customer and product reviews, for example: 2/23/2024 50
  • 52. • Spam detection: You may not think of spam detection as an NLP solution, but the best spam detection technologies use NLP's text classification capabilities to scan emails for language that often indicates spam or phishing. These indicators can include overuse of financial terms, characteristic bad grammar, threatening language, inappropriate urgency, misspelled company names, and more. Spam detection is one of a handful of NLP problems that experts consider 'mostly solved' (although you may argue that this doesn’t match your email experience). • Machine translation: Google Translate is an example of widely available NLP technology at work. Truly useful machine translation involves more than replacing words in one language with words of another. Effective translation has to capture accurately the meaning and tone of the input language and translate it to text with the same meaning and desired impact in the output language. Machine translation tools are making good progress in terms of accuracy. A great way to test any machine translation tool is to translate text to one language and then back to the original. 2/23/2024 52 Use Cases
  • 53. • Virtual agents and chatbots: Virtual agents such as Apple's Siri and Amazon's Alexa use speech recognition to recognize patterns in voice commands and natural language generation to respond with appropriate action or helpful comments. Chatbots perform the same magic in response to typed text entries. The best of these also learn to recognize contextual clues about human requests and use them to provide even better responses or options over time. The next enhancement for these applications is question answering, the ability to respond to our questions— anticipated or not—with relevant and helpful answers in their own words. • Social media sentiment analysis: NLP has become an essential business tool for uncovering hidden data insights from social media channels. Sentiment analysis can analyze language used in social media posts, responses, reviews, and more to extract attitudes and emotions in response to products, promotions, and events–information companies can use in product designs, advertising campaigns, and more. • Text summarization: Text summarization uses NLP techniques to digest huge volumes of digital text and create summaries and synopses for indexes, research databases, or busy readers who don't have time to read full text. The best text summarization applications use semantic reasoning and natural language generation (NLG) to add useful context and conclusions to summaries. 2/23/2024 53
  • 54. Natural Language Processing is separated in two different approaches: Rule-based Natural Language Processing: • It uses common sense reasoning for processing tasks. For instance, the freezing temperature can lead to death, or hot coffee can burn people’s skin, along with other common sense reasoning tasks. However, this process can take much time, and it requires manual effort. Statistical Natural Language Processing: • It uses large amounts of data and tries to derive conclusions from it. Statistical NLP uses machine learning algorithms to train NLP models. After successful training on large amounts of data, the trained model will have positive outcomes with deduction. 2/23/2024 54
  • 56. a. Lexical Analysis: • With lexical analysis, we divide a whole chunk of text into paragraphs, sentences, and words. It involves identifying and analyzing words’ structure. b. Syntactic Analysis: • Syntactic analysis involves the analysis of words in a sentence for grammar and arranging words in a manner that shows the relationship among the words. For instance, the sentence “The shop goes to the house” does not pass. c. Semantic Analysis: • Semantic analysis draws the exact meaning for the words, and it analyzes the text meaningfulness. Sentences such as “hot ice-cream” do not pass. d. Disclosure Integration: • Disclosure integration takes into account the context of the text. It considers the meaning of the sentence before it ends. For example: “He works at Google.” In this sentence, “he” must be referenced in the sentence before it. e. Pragmatic Analysis: • Pragmatic analysis deals with overall communication and interpretation of language. It deals with deriving meaningful use of language in various situations. 2/23/2024 56
  • 57. • Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Spanish). • To process any translation, human or automated, the meaning of a text in the original (source) language must be fully restored in the target language, i.e. the translation. While on the surface this seems straightforward, it is far more complex. Translation is not a mere word- for-word substitution. A translator must interpret and analyze all of the elements in the text and know how each word may influence another. This requires extensive expertise in grammar, syntax (sentence structure), semantics (meanings), etc., in the source and target languages, as well as familiarity with each local region. • Human and machine translation each have their share of challenges. For example, no two individual translators can produce identical translations of the same text in the same language pair, and it may take several rounds of revisions to meet customer satisfaction. But the greater challenge lies in how machine translation can produce publishable quality translations. 2/23/2024 57 Machine translation (MT)
  • 58. • Rule-based machine translation relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. • The software parses text and creates a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language. • Translations are built on gigantic dictionaries and sophisticated linguistic rules. Users can improve the out-of-the-box translation quality by adding their terminology into the translation process. They create user-defined dictionaries which override the system’s default settings. 2/23/2024 58 Rule-Based Machine Translation Technology
  • 59. • In most cases, there are two steps: an initial investment that significantly increases the quality at a limited cost, and an ongoing investment to increase quality incrementally. While rule-based MT brings companies to the quality threshold and beyond, the quality improvement process may be long and expensive. • It is the earliest form of MT, rule-based MT, has several serious disadvantages including requiring significant amounts of human post-editing, the requirement to manually add languages, and low quality in general. It has some uses in very basic situations where a quick understanding of meaning is required. 2/23/2024 59
  • 60. • Statistical machine translation utilizes statistical translation models whose parameters stem from the analysis of monolingual and bilingual corpora. • Building statistical translation models is a quick process, but the technology relies heavily on existing multilingual corpora. • A minimum of 2 million words for a specific domain and even more for general language are required. Theoretically it is possible to reach the quality threshold but most companies do not have such large amounts of existing multilingual corpora to build the necessary translation models. • Additionally, statistical machine translation is CPU intensive and requires an extensive hardware configuration to run translation models for average performance levels. 2/23/2024 60 Statistical machine translation
  • 62. • Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. It has the ability to persist sequential data over several time steps. 2/23/2024 62 Neural machine translation
  • 63. • Encoder : reads the input sequence of words from source language and encodes that information into a real valued vectors also known as the hidden state or thought vector or context vector. Thought vector encodes the “meaning” of the input sequence into a single vector. The encoder outputs are discarded and only the hidden or internal states are passed as initial inputs to the decoder • Decoder: takes the thought vector from the encoder as an input along with the start-of-string <START> token as the initial input to produce an output sequence. 2/23/2024 63
  • 64. • LSTM is Long Short Term Memory and is capable of learning long term dependencies quickly. LSTM can learn to bridge time intervals in excess of 1000 steps • LSTM remember information over a long time steps. They do this by deciding what to remember and what to forget. • Cell states play a key role in LSTM. LSTM can decide if they want to add or remove information from a cell state by determining how much information should flow through. 2/23/2024 64
  • 65. • Basic units of LSTM networks are LSTM layers that have multiple LSTM cells. Cells do have internal cell state, often abbreviated as "c", and cells output is what is called a "hidden state", abbreviated as "h". 2/23/2024 65
  • 66. • Speech recognition refers to a computer interpreting the words spoken by a person and converting them to a format that is understandable by a machine. Depending on the end-goal, it is then converted to text or voice or another required format. • For instance, Apple’s Siri and Google’s Alexa use AI-powered speech recognition to provide voice or text support whereas voice-to-text applications like Google Dictate transcribe your dictated words to text. Voice recognition is another form of speech recognition where a source sound is recognized and matched to a person’s voice. • Speech recognition AI applications have seen significant growth in numbers in recent times as businesses are increasingly adopting digital assistants and automated support to streamline their services. Voice assistants, smart home devices, search engines, etc are a few examples where speech recognition has seen prominence 2/23/2024 66 Speech recognition
  • 68. • Speech recognition is fast overcoming the challenges of poor recording equipment and noise cancellation, variations in people’s voices, accents, dialects, semantics, contexts, etc using artificial intelligence and machine learning. • This also includes challenges of understanding human disposition, and the varying human language elements like colloquialisms, acronyms, etc. • Furthermore, it is now an acceptable format of communication given the large companies that endorse it and regularly employ speech recognition in their operations. It is estimated that a majority of search engines will adopt voice technology as an integral aspect of their search mechanism. • This has been made possible because of improved AI and machine learning (ML) algorithms which can process significantly large datasets and provide greater accuracy by self-learning and adapting to evolving changes. • Machines are programmed to “listen” to accents, dialects, contexts, emotions and process sophisticated and arbitrary data that is readily accessible for mining and machine learning purposes. 2/23/2024 68
  • 69. • From smart home devices and appliances that take instructions, and can be switched on and off remotely, digital assistants that can set reminders, schedule meetings, recognize a song playing in a pub, to search engines that respond with relevant search results to user queries, speech recognition has become an indispensable part of our lives. • Plenty of businesses now include speech-to-text software to enhance their business applications and streamline the customer experience. • Using speech recognition and natural language processing, companies can transcribe calls, meetings, and even translate them. • Apple, Google, Facebook, Microsoft, and Amazon are among the tech giants who continue to leverage AI-backed speech recognition applications to provide an exemplary user experience. 2/23/2024 69
  • 70. • The best kind of systems also allow organizations to customize and adapt the technology to their specific requirements — everything from language and nuances of speech to brand recognition. For example: • Language weighting: Improve precision by weighting specific words that are spoken frequently (such as product names or industry jargon), beyond terms already in the base vocabulary. • Speaker labeling: Output a transcription that cites or tags each speaker’s contributions to a multi-participant conversation. • Acoustics training: Attend to the acoustical side of the business. Train the system to adapt to an acoustic environment (like the ambient noise in a call center) and speaker styles (like voice pitch, volume and pace). • Profanity filtering: Use filters to identify certain words or phrases and sanitize speech output. 2/23/2024 70
  • 71. • Voice-based speech recognition software is now used to initiate purchases, send emails, transcribe meetings, doctor appointments, and court proceedings, etc. • Virtual assistants or digital assistants and smart home devices use voice recognition software to answer questions, provide weather news, play music, check traffic, place an order, and so on. • Companies like Venmo and PayPal allow customers to make transactions using voice assistants. Several banks in North America and Canada also provide online banking using voice-based software. • Ecommerce is significantly powered by voice-based assistants and allows users to make purchases quickly and seamlessly. • Speech recognition is poised to impact transportation services and streamline scheduling, routing, and navigating across cities. 2/23/2024 71 USE CASES
  • 72. • It is also used to provide accurate subtitles to a video. • Podcasts, meetings, and journalist interviews can be transcribed using voice recognition. • There has been a huge impact on security through voice biometry where the technology analyses the varying frequencies, tone and pitch of an individual’s voice to create a voice profile. • An example of this is Switzerland’s telecom company Swisscom which has enabled voice authentication technology in its call centres to prevent security breaches. • Customer care services are being traced by AI-based voice assistants, and chatbots to automate repeatable tasks. 2/23/2024 72
  • 73. • Healthcare: Doctors and nurses leverage dictation applications to capture and log patient diagnoses and treatment notes. • Sales: Speech recognition technology has a couple of applications in sales. It can help a call center transcribe thousands of phone calls between customers and agents to identify common call patterns and issues • Cognitive bots can also talk to people via a webpage, answering common queries and solving basic requests without needing to wait for a contact center agent to be available. In both instances speech recognition systems help reduce time to resolution for consumer issues. • Security: As technology integrates into our daily lives, security protocols are an increasing priority. Voice-based authentication adds a viable level of security. 2/23/2024 73
  • 74. • Natural language processing (NLP): While NLP isn’t necessarily a specific algorithm used in speech recognition, it is the area of artificial intelligence which focuses on the interaction between humans and machines through language through speech and text. Many mobile devices incorporate speech recognition into their systems to conduct voice search—e.g. Siri—or provide more accessibility around texting. • N-grams: This is the simplest type of language model (LM), which assigns probabilities to sentences or phrases. An N-gram is sequence of N-words. For example, “order the pizza” is a trigram or 3-gram and “please order the pizza” is a 4-gram. Grammar and the probability of certain word sequences are used to improve recognition and accuracy. 2/23/2024 74
  • 75. • Neural networks: Primarily leveraged for deep learning algorithms, neural networks process training data by mimicking the interconnectivity of the human brain through layers of nodes. Each node is made up of inputs, weights, a bias (or threshold) and an output. • If that output value exceeds a given threshold, it “fires” or activates the node, passing data to the next layer in the network. Neural networks learn this mapping function through supervised learning, adjusting based on the loss function through the process of gradient descent. While neural networks tend to be more accurate and can accept more data, this comes at a performance efficiency cost as they tend to be slower to train compared to traditional language models. 2/23/2024 75