SlideShare a Scribd company logo
Unit 3 – NATURAL
LANGUAGE PROCESSING
NLP OVERVIEW
 NLP is a part of Artificial Intelligence which deals with
Human Language by a program.
 Used by machines to understand, analyse, manipulate, and
interpret human's languages.
 It helps to performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech
recognition, relationship extraction, and topic segmentation.
PYTHON IN NLP
NLTK, or Natural Language Toolkit, is a Python package used
in NLP.
NLTK provides a wide range of functionalities and resources for
tasks such as tokenization, stemming, tagging, parsing,
semantic reasoning, and more.
NLTK is widely used in academia and industry for tasks such as
text classification, sentiment analysis, machine translation, and
information extraction.
NLTK INSTALLATION
pip install nltk
import nltk
nltk.download() (#download all the required packages)
APPLICATIONS
NLP PROCESS EXPLAINED
1.MORPHOLOGICAL PROCESSING
Morphological processing refers to the analysis and manipulation of the internal
structure of words to understand their grammatical forms and extract meaningful
information.
Morphological processing involves tasks such as:
1. Tokenization
2. Stop Word Removal
3. Stemming
4. N –Gram Language Model
5. Name Entity Recognition(ner)
6. Chunking & Part-of-Speech (POS) Tagging
1. TOKENIZATION
Tokenization in NLP is the process of breaking a sequence of text into smaller
units, called tokens.
Types of tokenization :
Word Tokenization: This type of tokenization breaks text into individual words
based on whitespace or punctuation.
Example: "I love NLP!" ["I", "love", "NLP", "!"].
Sentence Tokenization: Sentence tokenization involves splitting a paragraph or
text into individual sentences.
Example: "I love NLP. It's fascinating." ["I love NLP.", "It's
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
Python Code:
from nltk.tokenize import sent_tokenize, word_tokenize
data = "All work and no play makes jack a dull boy, all work and no
play"
print(word_tokenize(data))
Output:
2. STOP WORD REMOVAL
In NLP, stop word removal is the process of eliminating commonly used words
that do not carry significant meaning of a text.
Stop word removal is performed to reduce the dimensionality of text data,
improve computational efficiency, and focus on more informative words.
By removing stop words, the remaining words in the text used to enhance the
accuracy of NLP task.
After stop word removal, the filtered tokens only contain the words that are
not considered stop words.
Python Code:
stopwords.words('english') is a function that returns a list of
commonly used stop words in the English language.
from nltk.corpus import stopwords
a = set(stopwords.words('english'))
print(a)
To remove a stop words in given sentence :
words = [word for word in data.split() if word.lower() not in a]
new_text = " ".join(words)
print(new_text)
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
3.STEMMING
Stemming is a Process of Reducing Words (normalization of words)
into its Base form(Root form/Stem form)
Example:
1.John ate Pizza
John—John Ate----eat Pizza---pizza
2.Stemmer, stemming, stemmed --- stem
Types of Stemming:
1. Porter Stemming : The Porter stemming applies a set of rules to
remove common English word suffixes.
2. Snowball Stemming : Snowball enables stemming for various
languages beyond English and provides a framework for creating new
stemming algorithms.
3. Lancaster Stemming : It applies a set of rules to remove English word
suffixes and aims for a more aggressive reduction of words to their
stems compared to the Porter algorithm.
4.N –GRAM LANGUAGE MODEL
 NLP N-Grams are useful to create features from text corpus for
machine learning algorithms like SVM, Naive Bayes, etc.
Due to their frequent uses, n-gram models for n=1,2,3 have specific
names as Unigram, Bigram, and Trigram models
The "N" in N-gram refers to the number of items considered in the
sequence.
N-Grams are useful for creating capabilities like autocorrect,
autocompletion of sentences, text summarization, speech recognition,
etc.
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
Python code:
from nltk.util import ngrams
data = "All work and no play makes jack a dull boy, all work and no play"
n = 1
unigrams = ngrams(data.split(), n)
for item in unigrams:
print(item)
Output:
n=1 n=2
5.NAME ENTITY RECOGNITION(NER)
Named Entity Recognition (NER) is a subtask in NLP that focuses on
identifying and classifying named entities in text into predefined
categories such as person names, organizations, locations, dates, and
more.
The goal of NER is to extract meaningful information from unstructured
text by recognizing and labeling named entities.
“Apple stock prices are going up”
Apple as a fruit or company ?
NER – Example 1:
NER – Example 2:
Python code:
import nltk
from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk
sentence = "Apple Inc. is planning to open a new store in New York on
July 15th."
for chunk in
ne_chunk(pos_tag(word_tokenize(sent_tokenize(sentence)[0]))):
if hasattr(chunk, 'label'):
print(chunk.label(), ' '.join(c[0] for c in chunk))
6. CHUNKING & PART-OF-SPEECH
(POS)
In chunking, the process typically involves two steps:
1.Part-of-speech (POS) tagging
2.Chunking
PART-OF-SPEECH (POS) TAGS
Each word or token in a sentence is assigned a part-of-speech tag,
indicating its grammatical category (noun, verb, adjective, etc.).
POS tagging helps in identifying the role and function of each word in
the sentence.
It is also called grammatical tagging.
Python Code:
import nltk
from nltk import sent_tokenize, word_tokenize, pos_tag,
ne_chunk
d = "The dog ate the cat"
tokenize_text=word_tokenize(d)
nltk.pos_tag(tokenize_text)
CHUNKING
Based on the POS tags, patterns or rules are applied to group
consecutive words into chunks.
Picking up Individual pieces of information and grouping them into
bigger pieces
2.SYNTACTIC ANALYSIS or PARSING
It is the process of analysing the natural language with the rules of
formal grammar to find out the dictionary meaning of any sentence.
Syntax analysis checks the text for meaningfulness comparing to
the rules of formal grammar.
Example:
Delhi is the capital of India.
Is Delhi the of India capital.
3.SEMANTIC ANALYSIS
The work of semantic analyser is to check the text for
meaningfulness.
The goal of semantic analysis is to enable machines to understand
and interpret human language in a way that goes beyond the mere
surface-level syntactic structure.
Example:
She drank some Milk
She drank some Books
4.DISCOURSE ANALYSIS
Discourse analysis is help us to
understand how language is used in
real-world contexts.
It focuses on analyzing the structure,
coherence, and meaning of texts or
spoken interactions within their
social and cultural contexts.
Example:
Monkeys Eat Banana, When they
Wake up.
Who is they here?
Monkey
Monkeys eat Banana, When they
ripe.
Who is they here?
Banana
5.PRAGMATIC ANALYSIS
Pragmatic analysis in NLP (Natural
Language Processing) is then defined as
the process of extracting information from any
given text.
Pragmatic analysis takes into account the
speaker's intention, the listener's
understanding, and the social context in which
Example:
Close the Door
Type: Order
Please, close the door
Type: Request, Affirmation
Example of NLP
APPLICATIONS OF NLP
Core Tasks
Industry
Specific
General
Applications
VIRTUAL AGENTS
VIRTUAL AGENTS
Software programs that simulate the tasks such as managing schedules,
handling travel needs, booking appointments, sending reminders , playing
music, or controlling smart home devices and password resets are known as
Virtual Assistants.
However, its functions are slightly more advanced than chatbots.
Example:
Virtual agents are commonly used in applications like CUSTOMER
SUPPORT, where they can handle frequently asked questions, troubleshoot
issues, or guide users through processes.
VIRTUAL AGENTS ENTERPRISE
INDUSTRY CASE STUDY
IBM SOLUTION:
https://www.ibm.com/case-studies/autodesk-inc
PROBLEM STATEMENT:
As the company switched from a desktop licensing model to a SaaS model, its
reach improved. But with that surge came an increase in customer inquiries.
 Sometimes with heavy volume and complex issues, the resolution time for
questions was 1.5 days or more.
Autodesk’s staff of about 350 customer support agents handles roughly one
million customer and partner contacts per year.
About half of these are simple activation code requests, changes of address,
contract problems, and technical issues.
Spratto, Vice President of Operations at Autodesk, said:
“A lot of what my team does is just problem recognition, trying to identify what
the person wants or is asking.”

More Related Content

PPTX
Natural Language Processing
PPTX
Natural Language processing using nltk.pptx
PPTX
Natural Language Processing (NLP).pptx
PDF
Natural language processing (nlp)
PPTX
Unit 1 Natural Language Procerssing.pptx
PPTX
Natural Language Processing_in semantic web.pptx
PPTX
NLP.pptx
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
Natural Language Processing
Natural Language processing using nltk.pptx
Natural Language Processing (NLP).pptx
Natural language processing (nlp)
Unit 1 Natural Language Procerssing.pptx
Natural Language Processing_in semantic web.pptx
NLP.pptx
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx

Similar to AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt (20)

PPTX
Nltk
PPTX
Natural-Language-Processing -Stages and application area.pptx
PPT
week7.ppt
PPT
NLTK Python Basic Natural Language Processing.ppt
PDF
Introduction to natural language processing
PDF
Natural Language Processing for development
PPTX
operating system notes for II year IV semester students
PPTX
Natural Language Processing
PDF
NLP in artificial intelligence .pdf
PPTX
Week 1 Lesson Natural Processing Language.pptx
PDF
overview of natural language processing concepts
PPTX
Module 1-NLP (2).pptxiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
PPTX
Natural Language Processing 20 March.pptx
PDF
Natural Language Processing with Python
PDF
Natural Language Processing from Object Automation
PDF
Nltk:a tool for_nlp - py_con-dhaka-2014
PPTX
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
PDF
Natural Language Processing Theory, Applications and Difficulties
PDF
Natural language processing and its application in ai
PDF
AM4TM_WS22_Practice_01_NLP_Basics.pdf
Nltk
Natural-Language-Processing -Stages and application area.pptx
week7.ppt
NLTK Python Basic Natural Language Processing.ppt
Introduction to natural language processing
Natural Language Processing for development
operating system notes for II year IV semester students
Natural Language Processing
NLP in artificial intelligence .pdf
Week 1 Lesson Natural Processing Language.pptx
overview of natural language processing concepts
Module 1-NLP (2).pptxiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
Natural Language Processing 20 March.pptx
Natural Language Processing with Python
Natural Language Processing from Object Automation
Nltk:a tool for_nlp - py_con-dhaka-2014
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Natural Language Processing Theory, Applications and Difficulties
Natural language processing and its application in ai
AM4TM_WS22_Practice_01_NLP_Basics.pdf
Ad

Recently uploaded (20)

PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
PPT on Performance Review to get promotions
PDF
composite construction of structures.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
DOCX
573137875-Attendance-Management-System-original
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Welding lecture in detail for understanding
PPT
Mechanical Engineering MATERIALS Selection
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Lesson 3_Tessellation.pptx finite Mathematics
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT on Performance Review to get promotions
composite construction of structures.pdf
Internet of Things (IOT) - A guide to understanding
additive manufacturing of ss316l using mig welding
573137875-Attendance-Management-System-original
Embodied AI: Ushering in the Next Era of Intelligent Systems
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CYBER-CRIMES AND SECURITY A guide to understanding
Welding lecture in detail for understanding
Mechanical Engineering MATERIALS Selection
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Lecture Notes Electrical Wiring System Components
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
bas. eng. economics group 4 presentation 1.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Ad

AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt

  • 1. Unit 3 – NATURAL LANGUAGE PROCESSING
  • 2. NLP OVERVIEW  NLP is a part of Artificial Intelligence which deals with Human Language by a program.  Used by machines to understand, analyse, manipulate, and interpret human's languages.  It helps to performing tasks such as translation, automatic summarization, Named Entity Recognition (NER), speech recognition, relationship extraction, and topic segmentation.
  • 3. PYTHON IN NLP NLTK, or Natural Language Toolkit, is a Python package used in NLP. NLTK provides a wide range of functionalities and resources for tasks such as tokenization, stemming, tagging, parsing, semantic reasoning, and more. NLTK is widely used in academia and industry for tasks such as text classification, sentiment analysis, machine translation, and information extraction.
  • 4. NLTK INSTALLATION pip install nltk import nltk nltk.download() (#download all the required packages)
  • 7. 1.MORPHOLOGICAL PROCESSING Morphological processing refers to the analysis and manipulation of the internal structure of words to understand their grammatical forms and extract meaningful information. Morphological processing involves tasks such as: 1. Tokenization 2. Stop Word Removal 3. Stemming 4. N –Gram Language Model 5. Name Entity Recognition(ner) 6. Chunking & Part-of-Speech (POS) Tagging
  • 8. 1. TOKENIZATION Tokenization in NLP is the process of breaking a sequence of text into smaller units, called tokens. Types of tokenization : Word Tokenization: This type of tokenization breaks text into individual words based on whitespace or punctuation. Example: "I love NLP!" ["I", "love", "NLP", "!"]. Sentence Tokenization: Sentence tokenization involves splitting a paragraph or text into individual sentences. Example: "I love NLP. It's fascinating." ["I love NLP.", "It's
  • 10. Python Code: from nltk.tokenize import sent_tokenize, word_tokenize data = "All work and no play makes jack a dull boy, all work and no play" print(word_tokenize(data)) Output:
  • 11. 2. STOP WORD REMOVAL In NLP, stop word removal is the process of eliminating commonly used words that do not carry significant meaning of a text. Stop word removal is performed to reduce the dimensionality of text data, improve computational efficiency, and focus on more informative words. By removing stop words, the remaining words in the text used to enhance the accuracy of NLP task. After stop word removal, the filtered tokens only contain the words that are not considered stop words.
  • 12. Python Code: stopwords.words('english') is a function that returns a list of commonly used stop words in the English language. from nltk.corpus import stopwords a = set(stopwords.words('english')) print(a) To remove a stop words in given sentence : words = [word for word in data.split() if word.lower() not in a] new_text = " ".join(words) print(new_text)
  • 14. 3.STEMMING Stemming is a Process of Reducing Words (normalization of words) into its Base form(Root form/Stem form) Example: 1.John ate Pizza John—John Ate----eat Pizza---pizza 2.Stemmer, stemming, stemmed --- stem
  • 15. Types of Stemming: 1. Porter Stemming : The Porter stemming applies a set of rules to remove common English word suffixes. 2. Snowball Stemming : Snowball enables stemming for various languages beyond English and provides a framework for creating new stemming algorithms. 3. Lancaster Stemming : It applies a set of rules to remove English word suffixes and aims for a more aggressive reduction of words to their stems compared to the Porter algorithm.
  • 16. 4.N –GRAM LANGUAGE MODEL  NLP N-Grams are useful to create features from text corpus for machine learning algorithms like SVM, Naive Bayes, etc. Due to their frequent uses, n-gram models for n=1,2,3 have specific names as Unigram, Bigram, and Trigram models The "N" in N-gram refers to the number of items considered in the sequence. N-Grams are useful for creating capabilities like autocorrect, autocompletion of sentences, text summarization, speech recognition, etc.
  • 18. Python code: from nltk.util import ngrams data = "All work and no play makes jack a dull boy, all work and no play" n = 1 unigrams = ngrams(data.split(), n) for item in unigrams: print(item) Output: n=1 n=2
  • 19. 5.NAME ENTITY RECOGNITION(NER) Named Entity Recognition (NER) is a subtask in NLP that focuses on identifying and classifying named entities in text into predefined categories such as person names, organizations, locations, dates, and more. The goal of NER is to extract meaningful information from unstructured text by recognizing and labeling named entities. “Apple stock prices are going up” Apple as a fruit or company ?
  • 22. Python code: import nltk from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk sentence = "Apple Inc. is planning to open a new store in New York on July 15th." for chunk in ne_chunk(pos_tag(word_tokenize(sent_tokenize(sentence)[0]))): if hasattr(chunk, 'label'): print(chunk.label(), ' '.join(c[0] for c in chunk))
  • 23. 6. CHUNKING & PART-OF-SPEECH (POS) In chunking, the process typically involves two steps: 1.Part-of-speech (POS) tagging 2.Chunking
  • 24. PART-OF-SPEECH (POS) TAGS Each word or token in a sentence is assigned a part-of-speech tag, indicating its grammatical category (noun, verb, adjective, etc.). POS tagging helps in identifying the role and function of each word in the sentence. It is also called grammatical tagging.
  • 25. Python Code: import nltk from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk d = "The dog ate the cat" tokenize_text=word_tokenize(d) nltk.pos_tag(tokenize_text)
  • 26. CHUNKING Based on the POS tags, patterns or rules are applied to group consecutive words into chunks. Picking up Individual pieces of information and grouping them into bigger pieces
  • 27. 2.SYNTACTIC ANALYSIS or PARSING It is the process of analysing the natural language with the rules of formal grammar to find out the dictionary meaning of any sentence. Syntax analysis checks the text for meaningfulness comparing to the rules of formal grammar. Example: Delhi is the capital of India. Is Delhi the of India capital.
  • 28. 3.SEMANTIC ANALYSIS The work of semantic analyser is to check the text for meaningfulness. The goal of semantic analysis is to enable machines to understand and interpret human language in a way that goes beyond the mere surface-level syntactic structure. Example: She drank some Milk She drank some Books
  • 29. 4.DISCOURSE ANALYSIS Discourse analysis is help us to understand how language is used in real-world contexts. It focuses on analyzing the structure, coherence, and meaning of texts or spoken interactions within their social and cultural contexts. Example: Monkeys Eat Banana, When they Wake up. Who is they here? Monkey Monkeys eat Banana, When they ripe. Who is they here? Banana
  • 30. 5.PRAGMATIC ANALYSIS Pragmatic analysis in NLP (Natural Language Processing) is then defined as the process of extracting information from any given text. Pragmatic analysis takes into account the speaker's intention, the listener's understanding, and the social context in which Example: Close the Door Type: Order Please, close the door Type: Request, Affirmation
  • 32. APPLICATIONS OF NLP Core Tasks Industry Specific General Applications
  • 34. VIRTUAL AGENTS Software programs that simulate the tasks such as managing schedules, handling travel needs, booking appointments, sending reminders , playing music, or controlling smart home devices and password resets are known as Virtual Assistants. However, its functions are slightly more advanced than chatbots. Example: Virtual agents are commonly used in applications like CUSTOMER SUPPORT, where they can handle frequently asked questions, troubleshoot issues, or guide users through processes.
  • 35. VIRTUAL AGENTS ENTERPRISE INDUSTRY CASE STUDY IBM SOLUTION: https://www.ibm.com/case-studies/autodesk-inc
  • 36. PROBLEM STATEMENT: As the company switched from a desktop licensing model to a SaaS model, its reach improved. But with that surge came an increase in customer inquiries.  Sometimes with heavy volume and complex issues, the resolution time for questions was 1.5 days or more. Autodesk’s staff of about 350 customer support agents handles roughly one million customer and partner contacts per year. About half of these are simple activation code requests, changes of address, contract problems, and technical issues. Spratto, Vice President of Operations at Autodesk, said: “A lot of what my team does is just problem recognition, trying to identify what the person wants or is asking.”