AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt

Unit 3 – NATURAL
LANGUAGE PROCESSING

NLP OVERVIEW
 NLP is a part of Artificial Intelligence which deals with
Human Language by a program.
 Used by machines to understand, analyse, manipulate, and
interpret human's languages.
 It helps to performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech
recognition, relationship extraction, and topic segmentation.

PYTHON IN NLP
NLTK, or Natural Language Toolkit, is a Python package used
in NLP.
NLTK provides a wide range of functionalities and resources for
tasks such as tokenization, stemming, tagging, parsing,
semantic reasoning, and more.
NLTK is widely used in academia and industry for tasks such as
text classification, sentiment analysis, machine translation, and
information extraction.

NLTK INSTALLATION
pip install nltk
import nltk
nltk.download() (#download all the required packages)

1.MORPHOLOGICAL PROCESSING
Morphological processing refers to the analysis and manipulation of the internal
structure of words to understand their grammatical forms and extract meaningful
information.
Morphological processing involves tasks such as:
1. Tokenization
2. Stop Word Removal
3. Stemming
4. N –Gram Language Model
5. Name Entity Recognition(ner)
6. Chunking & Part-of-Speech (POS) Tagging

1. TOKENIZATION
Tokenization in NLP is the process of breaking a sequence of text into smaller
units, called tokens.
Types of tokenization :
Word Tokenization: This type of tokenization breaks text into individual words
based on whitespace or punctuation.
Example: "I love NLP!" ["I", "love", "NLP", "!"].
Sentence Tokenization: Sentence tokenization involves splitting a paragraph or
text into individual sentences.
Example: "I love NLP. It's fascinating." ["I love NLP.", "It's

AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt

Python Code:
from nltk.tokenize import sent_tokenize, word_tokenize
data = "All work and no play makes jack a dull boy, all work and no
play"
print(word_tokenize(data))
Output:

2. STOP WORD REMOVAL
In NLP, stop word removal is the process of eliminating commonly used words
that do not carry significant meaning of a text.
Stop word removal is performed to reduce the dimensionality of text data,
improve computational efficiency, and focus on more informative words.
By removing stop words, the remaining words in the text used to enhance the
accuracy of NLP task.
After stop word removal, the filtered tokens only contain the words that are
not considered stop words.

Python Code:
stopwords.words('english') is a function that returns a list of
commonly used stop words in the English language.
from nltk.corpus import stopwords
a = set(stopwords.words('english'))
print(a)
To remove a stop words in given sentence :
words = [word for word in data.split() if word.lower() not in a]
new_text = " ".join(words)
print(new_text)

3.STEMMING
Stemming is a Process of Reducing Words (normalization of words)
into its Base form(Root form/Stem form)
Example:
1.John ate Pizza
John—John Ate----eat Pizza---pizza
2.Stemmer, stemming, stemmed --- stem

Types of Stemming:
1. Porter Stemming : The Porter stemming applies a set of rules to
remove common English word suffixes.
2. Snowball Stemming : Snowball enables stemming for various
languages beyond English and provides a framework for creating new
stemming algorithms.
3. Lancaster Stemming : It applies a set of rules to remove English word
suffixes and aims for a more aggressive reduction of words to their
stems compared to the Porter algorithm.

4.N –GRAM LANGUAGE MODEL
 NLP N-Grams are useful to create features from text corpus for
machine learning algorithms like SVM, Naive Bayes, etc.
Due to their frequent uses, n-gram models for n=1,2,3 have specific
names as Unigram, Bigram, and Trigram models
The "N" in N-gram refers to the number of items considered in the
sequence.
N-Grams are useful for creating capabilities like autocorrect,
autocompletion of sentences, text summarization, speech recognition,
etc.

Python code:
from nltk.util import ngrams
data = "All work and no play makes jack a dull boy, all work and no play"
n = 1
unigrams = ngrams(data.split(), n)
for item in unigrams:
print(item)
Output:
n=1 n=2

5.NAME ENTITY RECOGNITION(NER)
Named Entity Recognition (NER) is a subtask in NLP that focuses on
identifying and classifying named entities in text into predefined
categories such as person names, organizations, locations, dates, and
more.
The goal of NER is to extract meaningful information from unstructured
text by recognizing and labeling named entities.
“Apple stock prices are going up”
Apple as a fruit or company ?

Python code:
import nltk
from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk
sentence = "Apple Inc. is planning to open a new store in New York on
July 15th."
for chunk in
ne_chunk(pos_tag(word_tokenize(sent_tokenize(sentence)[0]))):
if hasattr(chunk, 'label'):
print(chunk.label(), ' '.join(c[0] for c in chunk))

6. CHUNKING & PART-OF-SPEECH
(POS)
In chunking, the process typically involves two steps:
1.Part-of-speech (POS) tagging
2.Chunking

PART-OF-SPEECH (POS) TAGS
Each word or token in a sentence is assigned a part-of-speech tag,
indicating its grammatical category (noun, verb, adjective, etc.).
POS tagging helps in identifying the role and function of each word in
the sentence.
It is also called grammatical tagging.

Python Code:
import nltk
from nltk import sent_tokenize, word_tokenize, pos_tag,
ne_chunk
d = "The dog ate the cat"
tokenize_text=word_tokenize(d)
nltk.pos_tag(tokenize_text)

CHUNKING
Based on the POS tags, patterns or rules are applied to group
consecutive words into chunks.
Picking up Individual pieces of information and grouping them into
bigger pieces

2.SYNTACTIC ANALYSIS or PARSING
It is the process of analysing the natural language with the rules of
formal grammar to find out the dictionary meaning of any sentence.
Syntax analysis checks the text for meaningfulness comparing to
the rules of formal grammar.
Example:
Delhi is the capital of India.
Is Delhi the of India capital.

3.SEMANTIC ANALYSIS
The work of semantic analyser is to check the text for
meaningfulness.
The goal of semantic analysis is to enable machines to understand
and interpret human language in a way that goes beyond the mere
surface-level syntactic structure.
Example:
She drank some Milk
She drank some Books

4.DISCOURSE ANALYSIS
Discourse analysis is help us to
understand how language is used in
real-world contexts.
It focuses on analyzing the structure,
coherence, and meaning of texts or
spoken interactions within their
social and cultural contexts.
Example:
Monkeys Eat Banana, When they
Wake up.
Who is they here?
Monkey
Monkeys eat Banana, When they
ripe.
Who is they here?
Banana

5.PRAGMATIC ANALYSIS
Pragmatic analysis in NLP (Natural
Language Processing) is then defined as
the process of extracting information from any
given text.
Pragmatic analysis takes into account the
speaker's intention, the listener's
understanding, and the social context in which
Example:
Close the Door
Type: Order
Please, close the door
Type: Request, Affirmation

APPLICATIONS OF NLP
Core Tasks
Industry
Specific
General
Applications

VIRTUAL AGENTS
Software programs that simulate the tasks such as managing schedules,
handling travel needs, booking appointments, sending reminders , playing
music, or controlling smart home devices and password resets are known as
Virtual Assistants.
However, its functions are slightly more advanced than chatbots.
Example:
Virtual agents are commonly used in applications like CUSTOMER
SUPPORT, where they can handle frequently asked questions, troubleshoot
issues, or guide users through processes.

VIRTUAL AGENTS ENTERPRISE
INDUSTRY CASE STUDY
IBM SOLUTION:
https://www.ibm.com/case-studies/autodesk-inc

PROBLEM STATEMENT:
As the company switched from a desktop licensing model to a SaaS model, its
reach improved. But with that surge came an increase in customer inquiries.
 Sometimes with heavy volume and complex issues, the resolution time for
questions was 1.5 days or more.
Autodesk’s staff of about 350 customer support agents handles roughly one
million customer and partner contacts per year.
About half of these are simple activation code requests, changes of address,
contract problems, and technical issues.
Spratto, Vice President of Operations at Autodesk, said:
“A lot of what my team does is just problem recognition, trying to identify what
the person wants or is asking.”

AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt

More Related Content

Similar to AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt (20)

Recently uploaded (20)

AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt