SlideShare a Scribd company logo
Module
          13
Natural Language
      Processing
        Version 2 CSE IIT, Kharagpur
13.1 Instructional Objective
•   The students should understand the necessity of natural language processing in
    building an intelligent system
•   Students should understand the difference between natural and formal language and
    the difficulty in processing the former
•   Students should understand the ambiguities that arise in natural language processing
•   Students should understand the language information required like like
        o Phonology
        o Morphology
        o Syntax
        o Semantic
        o Discourse
        o World knowledge
•   Students should understand the steps involved in natural language understanding and
    generation
•   The student should be familiar with basic language processing operations like
        o Morphological analysis
        o Parts-of-Speech tagging
        o Lexical processing
        o Semantic processing
        o Knowledge representation

At the end of this lesson the student should be able to do the following:
    • Design the processing steps required for a NLP task
    • Implement the processing techniques.




                                                            Version 2 CSE IIT, Kharagpur
Lesson
        40
Issues in NLP
    Version 2 CSE IIT, Kharagpur
13.1 Natural Language Processing
Natural Language Processing (NLP) is the process of computer analysis of input provided
in a human language (natural language), and conversion of this input into a useful form of
representation.

The field of NLP is primarily concerned with getting computers to perform useful and
interesting tasks with human languages. The field of NLP is secondarily concerned with
helping us come to a better understanding of human language.

   •   The input/output of a NLP system can be:
           – written text
           – speech
   •   We will mostly concerned with written text (not speech).
   •   To process written text, we need:
           – lexical, syntactic, semantic knowledge about the language
           – discourse information, real world knowledge
   •   To process spoken language, we need everything required to process written text,
       plus the challenges of speech recognition and speech synthesis.

There are two components of NLP.

   •   Natural Language Understanding
          – Mapping the given input in the natural language into a useful
             representation.
          – Different level of analysis required:
             morphological analysis,
             syntactic analysis,
             semantic analysis,
             discourse analysis, …
   •   Natural Language Generation
          – Producing output in the natural language from some internal
             representation.
          – Different level of synthesis required:
             deep planning (what to say),
             syntactic generation
   •   NL Understanding is much harder than NL Generation. But, still both of them are
       hard.

The difficulty in NL understanding arises from the following facts:

   •   Natural language is extremely rich in form and structure, and very ambiguous.
          – How to represent meaning,
          – Which structures map to which meaning structures.
   •   One input can mean many different things. Ambiguity can be at different levels.

                                                           Version 2 CSE IIT, Kharagpur
–    Lexical (word level) ambiguity -- different meanings of words
          –    Syntactic ambiguity -- different ways to parse the sentence
          –    Interpreting partial information -- how to interpret pronouns
          –    Contextual information -- context of the sentence may affect the meaning
               of that sentence.
   •   Many input can mean the same thing.
   •   Interaction among components of the input is not clear.

The following language related information are useful in NLP:

   •   Phonology – concerns how words are related to the sounds that realize them.

   •   Morphology – concerns how words are constructed from more        basic meaning
       units called morphemes. A morpheme is the primitive unit of meaning in a
       language.

   •   Syntax – concerns how can be put together to form correct sentences and
       determines what structural role each word plays in the sentence and what phrases
       are subparts of other phrases.

   •   Semantics – concerns what words mean and how these meaning combine in
       sentences to form sentence meaning. The study of context-independent meaning.

   •   Pragmatics – concerns how sentences are used in different situations and how
       use affects the interpretation of the sentence.

   •   Discourse – concerns how the immediately preceding sentences affect the
       interpretation of the next sentence. For example, interpreting pronouns and
       interpreting the temporal aspects of the information.

   •   World Knowledge – includes general knowledge about the world. What each
       language user must know about the other’s beliefs and goals.


13.1.1 Ambiguity

I made her duck.

   •   How many different interpretations does this sentence have?
   •   What are the reasons for the ambiguity?
   •   The categories of knowledge of language can be thought of as ambiguity
       resolving components.
   •   How can each ambiguous piece be resolved?
   •   Does speech input make the sentence even more ambiguous?
           – Yes – deciding word boundaries
   •   Some interpretations of : I made her duck.

                                                         Version 2 CSE IIT, Kharagpur
1. I cooked duck for her.
           2. I cooked duck belonging to her.
           3. I created a toy duck which she owns.
           4. I caused her to quickly lower her head or body.
           5. I used magic and turned her into a duck.
   •   duck – morphologically and syntactically ambiguous:
               noun or verb.
   •   her – syntactically ambiguous: dative or possessive.
   •   make – semantically ambiguous: cook or create.
   •   make – syntactically ambiguous:
           – Transitive – takes a direct object. => 2
           – Di-transitive – takes two objects. => 5
           – Takes a direct object and a verb. => 4

Ambiguities are resolved using the following methods.

   •   models and algorithms are introduced to resolve ambiguities at different levels.
   •   part-of-speech tagging -- Deciding whether duck is verb or noun.
   •   word-sense disambiguation -- Deciding whether make is create or cook.
   •   lexical disambiguation -- Resolution of part-of-speech and        word-sense
       ambiguities are two important kinds of lexical disambiguation.
   •   syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be
       addressed by probabilistic parsing.

13.1.2 Models to represent Linguistic Knowledge

   •   We will use certain formalisms (models) to represent the required linguistic
       knowledge.
   •   State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs
   •   Formal Rule Systems -- Context Free Grammars, Unification Grammars,
       Probabilistic CFGs.
   •   Logic-based Formalisms -- first order predicate logic, some higher order logic.
   •   Models of Uncertainty -- Bayesian probability theory.

13.1.3 Algorithms to Manipulate Linguistic Knowledge

   •   We will use algorithms to manipulate the models of linguistic knowledge to
       produce the desired behavior.
   •   Most of the algorithms we will study are transducers and parsers.
           – These algorithms construct some structure based on their input.
   •   Since the language is ambiguous at all levels,
       these algorithms are never simple processes.
   •   Categories of most algorithms that will be used can fall into following categories.
           – state space search
           – dynamic programming


                                                           Version 2 CSE IIT, Kharagpur
13.2 Natural Language Understanding
The steps in natural language understanding are as follows:

           Words

Morphological Analysis

           Morphologically analyzed words (another step: POS tagging)

Syntactic Analysis

           Syntactic Structure

Semantic Analysis

           Context-independent meaning representation

Discourse Processing

            Final meaning representation




                                                          Version 2 CSE IIT, Kharagpur

More Related Content

PDF
Lesson 41
PPTX
Natural Language Processing
PPTX
Natural Language Processing - Unit 1
PPTX
Natural language processing
PPTX
Natural language processing
PDF
Natural language processing
Lesson 41
Natural Language Processing
Natural Language Processing - Unit 1
Natural language processing
Natural language processing
Natural language processing

What's hot (20)

PPT
Natural language processing
DOCX
Natural language processing
PPTX
Natural Language Processing
PDF
Natural language processing
PPT
Natural Language Processing
PDF
Natural language processing (nlp)
PPTX
Natural language processing
PDF
Hidden markov model based part of speech tagger for sinhala language
PPTX
Analysing interlanguage: how do we know what learners know?
PPTX
Processing Written English
PPTX
Artificial Intelligence Notes Unit 4
PPTX
NLP_KASHK: Introduction
PPTX
Spotting The Difference–Machine Versus Human Translation
PDF
ReseachPaper
PPTX
Warnikchow - SAIT - 0529
PPTX
Roger t bell
PPTX
PPT
Automatic speech recognition
PDF
Natural Language Processing seminar review
PPTX
Interlanguage7777
Natural language processing
Natural language processing
Natural Language Processing
Natural language processing
Natural Language Processing
Natural language processing (nlp)
Natural language processing
Hidden markov model based part of speech tagger for sinhala language
Analysing interlanguage: how do we know what learners know?
Processing Written English
Artificial Intelligence Notes Unit 4
NLP_KASHK: Introduction
Spotting The Difference–Machine Versus Human Translation
ReseachPaper
Warnikchow - SAIT - 0529
Roger t bell
Automatic speech recognition
Natural Language Processing seminar review
Interlanguage7777
Ad

Similar to Lesson 40 (20)

PDF
Natural language processing module 1 chapter 1
PPT
L1 nlp intro
PPT
NLP AI process of computer language analysis getting computers
PPT
Nlp--- --nlu -----nlg lec01-overview.PPT
PPT
CNN for NLP using text analysis by using deep learning
PPTX
Natural Language Processing (NLP).pptx
PDF
AI Lesson 41
PPTX
natural language processing help at myassignmenthelp.net
PDF
AI - natural language processing
PPTX
Unit 1 Natural Language Procerssing.pptx
PDF
Natural Language Processing Course in AI
PDF
artificial intelligence Chapter 6 - NLP.pdf
PPT
1 Introduction.ppt
PPTX
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
PDF
Lesson 41.pdf
PPTX
operating system notes for II year IV semester students
PPT
English for Specific Purposes
PPT
PPTX
Chapter #1 Introduction to NConfigure and administer Server LP.pptx
PDF
NLP in artificial intelligence .pdf
Natural language processing module 1 chapter 1
L1 nlp intro
NLP AI process of computer language analysis getting computers
Nlp--- --nlu -----nlg lec01-overview.PPT
CNN for NLP using text analysis by using deep learning
Natural Language Processing (NLP).pptx
AI Lesson 41
natural language processing help at myassignmenthelp.net
AI - natural language processing
Unit 1 Natural Language Procerssing.pptx
Natural Language Processing Course in AI
artificial intelligence Chapter 6 - NLP.pdf
1 Introduction.ppt
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
Lesson 41.pdf
operating system notes for II year IV semester students
English for Specific Purposes
Chapter #1 Introduction to NConfigure and administer Server LP.pptx
NLP in artificial intelligence .pdf
Ad

More from Avijit Kumar (20)

PDF
Lesson 18
PDF
Lesson 19
PDF
Lesson 20
PDF
Lesson 21
PDF
Lesson 23
PDF
Lesson 25
PDF
Lesson 24
PDF
Lesson 22
PDF
Lesson 26
PDF
Lesson 27
PDF
Lesson 28
PDF
Lesson 29
PDF
Lesson 30
PDF
Lesson 31
PDF
Lesson 32
PDF
Lesson 33
PDF
Lesson 36
PDF
Lesson 35
PDF
Lesson 37
PDF
Lesson 39
Lesson 18
Lesson 19
Lesson 20
Lesson 21
Lesson 23
Lesson 25
Lesson 24
Lesson 22
Lesson 26
Lesson 27
Lesson 28
Lesson 29
Lesson 30
Lesson 31
Lesson 32
Lesson 33
Lesson 36
Lesson 35
Lesson 37
Lesson 39

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”

Lesson 40

  • 1. Module 13 Natural Language Processing Version 2 CSE IIT, Kharagpur
  • 2. 13.1 Instructional Objective • The students should understand the necessity of natural language processing in building an intelligent system • Students should understand the difference between natural and formal language and the difficulty in processing the former • Students should understand the ambiguities that arise in natural language processing • Students should understand the language information required like like o Phonology o Morphology o Syntax o Semantic o Discourse o World knowledge • Students should understand the steps involved in natural language understanding and generation • The student should be familiar with basic language processing operations like o Morphological analysis o Parts-of-Speech tagging o Lexical processing o Semantic processing o Knowledge representation At the end of this lesson the student should be able to do the following: • Design the processing steps required for a NLP task • Implement the processing techniques. Version 2 CSE IIT, Kharagpur
  • 3. Lesson 40 Issues in NLP Version 2 CSE IIT, Kharagpur
  • 4. 13.1 Natural Language Processing Natural Language Processing (NLP) is the process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation. The field of NLP is primarily concerned with getting computers to perform useful and interesting tasks with human languages. The field of NLP is secondarily concerned with helping us come to a better understanding of human language. • The input/output of a NLP system can be: – written text – speech • We will mostly concerned with written text (not speech). • To process written text, we need: – lexical, syntactic, semantic knowledge about the language – discourse information, real world knowledge • To process spoken language, we need everything required to process written text, plus the challenges of speech recognition and speech synthesis. There are two components of NLP. • Natural Language Understanding – Mapping the given input in the natural language into a useful representation. – Different level of analysis required: morphological analysis, syntactic analysis, semantic analysis, discourse analysis, … • Natural Language Generation – Producing output in the natural language from some internal representation. – Different level of synthesis required: deep planning (what to say), syntactic generation • NL Understanding is much harder than NL Generation. But, still both of them are hard. The difficulty in NL understanding arises from the following facts: • Natural language is extremely rich in form and structure, and very ambiguous. – How to represent meaning, – Which structures map to which meaning structures. • One input can mean many different things. Ambiguity can be at different levels. Version 2 CSE IIT, Kharagpur
  • 5. Lexical (word level) ambiguity -- different meanings of words – Syntactic ambiguity -- different ways to parse the sentence – Interpreting partial information -- how to interpret pronouns – Contextual information -- context of the sentence may affect the meaning of that sentence. • Many input can mean the same thing. • Interaction among components of the input is not clear. The following language related information are useful in NLP: • Phonology – concerns how words are related to the sounds that realize them. • Morphology – concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. • Syntax – concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. • Semantics – concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning. • Pragmatics – concerns how sentences are used in different situations and how use affects the interpretation of the sentence. • Discourse – concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information. • World Knowledge – includes general knowledge about the world. What each language user must know about the other’s beliefs and goals. 13.1.1 Ambiguity I made her duck. • How many different interpretations does this sentence have? • What are the reasons for the ambiguity? • The categories of knowledge of language can be thought of as ambiguity resolving components. • How can each ambiguous piece be resolved? • Does speech input make the sentence even more ambiguous? – Yes – deciding word boundaries • Some interpretations of : I made her duck. Version 2 CSE IIT, Kharagpur
  • 6. 1. I cooked duck for her. 2. I cooked duck belonging to her. 3. I created a toy duck which she owns. 4. I caused her to quickly lower her head or body. 5. I used magic and turned her into a duck. • duck – morphologically and syntactically ambiguous: noun or verb. • her – syntactically ambiguous: dative or possessive. • make – semantically ambiguous: cook or create. • make – syntactically ambiguous: – Transitive – takes a direct object. => 2 – Di-transitive – takes two objects. => 5 – Takes a direct object and a verb. => 4 Ambiguities are resolved using the following methods. • models and algorithms are introduced to resolve ambiguities at different levels. • part-of-speech tagging -- Deciding whether duck is verb or noun. • word-sense disambiguation -- Deciding whether make is create or cook. • lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. • syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing. 13.1.2 Models to represent Linguistic Knowledge • We will use certain formalisms (models) to represent the required linguistic knowledge. • State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs • Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. • Logic-based Formalisms -- first order predicate logic, some higher order logic. • Models of Uncertainty -- Bayesian probability theory. 13.1.3 Algorithms to Manipulate Linguistic Knowledge • We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. • Most of the algorithms we will study are transducers and parsers. – These algorithms construct some structure based on their input. • Since the language is ambiguous at all levels, these algorithms are never simple processes. • Categories of most algorithms that will be used can fall into following categories. – state space search – dynamic programming Version 2 CSE IIT, Kharagpur
  • 7. 13.2 Natural Language Understanding The steps in natural language understanding are as follows: Words Morphological Analysis Morphologically analyzed words (another step: POS tagging) Syntactic Analysis Syntactic Structure Semantic Analysis Context-independent meaning representation Discourse Processing Final meaning representation Version 2 CSE IIT, Kharagpur