Natural language processing module 1 chapter 2

Language Modelling
Two Approaches for Language Modelling
•
•
One is to define a grammar that can handle the language
Other is capture the patterns in a grammar language
statistically.
By the above Two primarily
Grammar Based Mode
Statistical language model
•These includes lexical functional grammar,
government and binding ,Paninian and n-
gram based Model

Introduction
• Model is
process.
language.
a description of
some Language
modelis thus
complex
entity or a
description of
• Natural language is a complex entity in order to process
it we need to represent or build a model, this is known as
language modelling.
• Language model can be viewed as a problem of
grammar inference or Problem of probability estimation
Grammar based language model attempts to
distinguish a grammatical sentence from a non
grammatical one , Where probability model estimates
maximum likelihood estimate.
•

Grammar Based Language Models uses grammar to
create the model , attempts to represent syntactic
structure.
***Grammar Consists of hand coded rules defining the
structure and ordering the constituents and utilizes
structures and relations.
**Grammar based models are:
1. Generative Grammars (TG, Chomsky 1957)
2.Hierarchial Grammars (Chomsky 1956)
3.Government and Binding (GB) (Chomsky 1981)
4.Lexical Functional Grammar (LFG)(Kalpan
1982)
5. Paninian Framework (Joshi 1985)

Statistical Language Models(SLM)
• This approach creates a model by training it
from a corpus (it should be large for regularities ).
• SLM is the attempt to capture the regularities of
a natural language for the purpose of improving
the performance of various natural language
applications.– Rosenfield(1994)
• SLM s are fundamental task in many NLP
applications like speech recognition, Spell
correction, machine translation, QA,IR and Text
summarization.
-N-gram Models

I. Generative Grammars
According to Syntactic Structure
•
•
•
• We can generate sentences if we know a collection of words
and rules in a language this point dominated
computational linguistics and is appropriately termed
generative grammar.
If we have a complete set of rules that can generate all
possible sentences in a language those rules provide a model
of that language.
Language is a relation between the sound(or written text)
and its meaning. Thus model of a Lang means it should also
need to deal with syntax and meaning also.
Most of these grammars deals with Perfectly
grammatical but meaningless sentence.
Grammar Based Language Models

II. Hierarchical Grammar
• Chomsky 1956 described classes of a
grammar, where the top layer contained by its
subclasses.
• Type 0 (unrestricted)
• Type 1 (context sensitive)
• Type 2 (context free)
• Type 3 (regular)
R for given classes of formal Grammars, it can be
extended to describe grammars at various levels
such as in a class-sub class relationship.

• Explains how sentences are structured in human
languages using a set of principles and rules.
• GB Theory as a set of tools that help us understand
how words and phrases are arranged in a sentence.
-Government refers to how one word
influences another word.
-Binding refers to how pronouns (he, she,
himself, etc) and noun phrases relate to each
other in a sentence.
III. Government and Binding

III. Government and Binding
• In computational linguistics, structure of a language
can be understood at the level of its meaning to
resolve the structural ambiguity.
Transformational Grammars assume two levels of
existence of sentences one at the surface level other
is at the deep root level.
•
• Government and Binding theories have renamed
them as s-level and d-level and identified two more
levels of representation called Phonetic form and
Logical form.

• GB theories language can be considered for analysis
at the levels shown,
d-structure
|
s-structure
phone
tic Form (PF)
Logical Form (LF)
Fi
g 1: different
levels of
representation
in GB
• If we say language as the representation of some
sound and meaning GB considers LF and PF but GB
concerned with LF rather than PF.

PF: How a sentence is pronounced (sound representation)
-He will go to the park
“He’ll go to the park”
LF: The meaning of a sentence (semantic interpretation)
- Everyone loves someone
Everyone loves at least one person, but not necessarily the
same person
There is one specific person whom everyone loves

• Transformational Grammar have hundreds of rewriting
rules generally language specific and construct-specific
rules for assertive and interrogative sentences in
English or active or passive voice.
GB envisages that we define rules at structural levels
units at the deep level, it will generate any language
•
with few rules. Deep level structures are the
abstractions of Noun Phrase verb phrase and common
to all languages.( eg child Lang : abstract structure enters the
mind & its gives rise to actual phonetic structures)
The existence of deep level, language independent,
abstract structures, and expressions of these rules in
surface level, language specific with simple rules of GB
theories.
•

Surface Structure: represents the final form of the sentence
after transformations
Passive Movement: The object "Mukesh" moves to the subject position.
Insertion of Auxiliary Verb: "Be" is inflected to "was."
Deletion of Object Placeholder (e): The empty category (e) represents the removed
subject.

Deep Structure (D-structure) represents the basic, underlying form of the
sentence before any transformations occur.

In Phrase Structure Grammar (PSG) each constituent consists of two
components:
• the head (the core meaning) and
• the complement (the rest of the constituent that completes the core
meaning).
For example, in verb phrase “[ate icecream ravenously]”, the complement
‘icecream’ is necessary for the verb ‘ate’ while the complement
‘ravenously’ can be omitted. We have to disentangle the compulsory from
the omissible in order to examine the smallest complete meaning of a
constituent. This partition is suggested in Xʹ Theory (Chomsky, 1970).
Xʹ Theory (pronounced ‘X-bar Theory’) states that each constituent
consists of four basic components:
• head (the core meaning),
• complement (compulsory element),
adjunct (omissible element),
and specifier (an omissible phrase marker).
•
•
•

Components of GB
• Government and binding comprises a set of theories
that map structure from d-structure to s-structure.
A
general transformational rule called ‘Move α ’ is
applied to d structure and s structure.
• This can move constituents at any place if it does not
violate the constraints put by several theories and
principles.

Stage Description Example
D-Structure Basic structure before movement "John likes Mary."
Move α Elements are moved to form grammatical sentences "Mary is liked by John."
S-Structure The sentence after transformations "What do you want?"
Move α Again More transformations before final meaning processing
Logical Form (LF) Final meaning interpretation
"Everyone saw a movie." (Did they see
the same movie?)

• GB consists of ‘a series of modules that contain
constraints and principles’ applied at various levels
of its representations and transformation rules,
Move α.
• These modules includes X-bar theory, projection
principle, ø-theory , ø-criterion, command
and government, case theory, empty category
principle (ECP), and binding theory.
• GB considers three levels of representations (d-,s-, and
LF) as syntactic and LF is also related to meaning
or sematic representations .
Eg : Two countries are visited by most travelers

• Important concepts in GB is that of constraints, these
can prohibits certain combinations and movements. GB
creates constraints, cross lingual constraints ‘ a
constituent cannot be moved from position X’ (rules are
language independent).

X Theory
• is one of the central concepts in GB. Instead of
defining several phrase structures & Sentence
Structures with separate set of rules , ¯X
theory defines them both as maximal projections of
some head.
• Entities defined become language independent , Thus , noun
phrase (np), verb phrase (vp), adjective phase(AP),(PP) are
maximal projections of noun (N), verb(V), adjective(A),and
preposition(P) head where X={N,V,A,P}.
• GB envisages semi phrasal level denoted by X bar and the
second maximal projection at the phrasal level denoted by
X

• Move α (Move Alpha) is applied to "Most travellers", moving it to the
front.
• The subscript "i" shows that "most travellers" has moved from its original
position.
• The empty category (e) is left behind, representing the movement.
• This changes the interpretation to:
"For most travellers, there exist two specific countries that they visit.“
• In LF1, the focus is on the countries.
• In LF2, the focus is on the travellers.

Understanding Figure 2.7(a) – X-Bar Theory
General Structure of a Phrase
The X-Bar Theory is a model of phrase structure that explains how words form larger
syntactic units.
• X
̄ (X-bar) Theory suggests that phrases have a hierarchical structure with four
key components:
• Head (X) – The core element that determines the type of phrase (e.g., Noun
for NP, Verb for VP).
• Specifier – A word that modifies or specifies the head (e.g., articles like "the"
in "the food").
• Modifier – Additional elements (adjectives, adverbs, or prepositional
phrases) that modify the phrase.
• Argument – A required element that completes the meaning of the head.
• The Maximal Projection (X
̄ ̄ or XP) is the highest level of the phrase (e.g., NP for a
noun phrase, VP for a verb phrase).

Understanding Figure 2.7(b) – NP Structure
Example: "The food in a dhaba"
This part of the image explains the syntactic structure of the noun phrase
(NP):
Determinant (Det): "the"
Noun (N): "food"
Prepositional Phrase (PP): "in a dhaba" (modifies the noun "food")
Tree Breakdown:
The entire phrase is an NP (Noun Phrase).
"The" (Det) is the Specifier of the noun.
"Food" (N) is the Head of the NP.

VP (Verb Phrase) Structure
•VP (Verb Phrase) is the main phrase.
•Head (V): "ate"
•NP (Noun Phrase) as object: "the food"
•PP (Prepositional Phrase) as modifier:
"in a dhaba"
•VP (Verb Phrase) is the main phrase.
•Head (V): "ate"
•NP (Noun Phrase) as object: "the food"
•PP (Prepositional Phrase) as modifier: "in a dhaba“
This structure shows that "ate" is the main verb, "the food" is the object, and
"in a dhaba" is an optional prepositional phrase modifying the VP.

AP (Adjective Phrase) Structure
•AP (Adjective Phrase) is the main phrase.
•Head (A): "proud"
•Degree Modifier (Deg): "very"
•PP (Prepositional Phrase) as modifier: "of
his country"
Tree Explanation:
• The adjective "proud" is the Head.
• The degree modifier "very" strengthens the adjective.
• The PP "of his country" gives additional information about "proud".
This structure shows that "proud" is the main adjective, "very"
intensifies it, and "of his country" modifies it.
Adjective is a word that modifies
(describes or gives more information
about) a noun or pronoun. It tells us
what kind, how many, or which one. Ex:
She has a beautiful dress.

PP (Prepositional Phrase) Structure
•PP (Prepositional Phrase) is the main
phrase.
•Head (P): "in"
•NP (Noun Phrase) as complement: "a
dhaba"
Tree Explanation:
The preposition "in" is the Head.
The NP "a dhaba" is the Complement.
"a" is the determiner (Det).
"dhaba" is the noun (N).
👉 This structure shows that "in" is the
preposition, which takes "a dhaba" as
its complement.

Maximal Projection of Sentence Structure
S
̅ (S-Bar or CP -
Complementizer
Phrase)
COMP
(Complementizer)
• The word "that" is a complementizer, a word that introduces an embedded
clause (subordinate clause).
• Complementizers like "that", "whether", and "if" are used to connect a
dependent clause to a main clause.
Ex: I know that she ate the food in a dhaba. (Here, "that she ate the food in a
dhaba" is the complement clause of "I know.")

Subcategorization
• GB doesn’t consider traditional phrase
structures it considers maximal projection and
sub categorization.
• Maximal projection can be the argument head
but sub categorization is used to filter to permit
various heads to select a certain subset of
the range of maximal projections.
Eg :
• The verb eat can subcategorize for NP,
• whereas word 'sleep' cannot, so ate food is well-
formed but slept the bed is not.

Subcategorization tells us which grammatical elements (NP, PP, S',
etc.) must or can follow a verb.
It explains why "She ate food" is correct but "She slept the bed" is
wrong.
•Sleep" is intransitive → No NP allowed.
•"Eat" is transitive → NP required.
•"Slept the bed" is wrong because "sleep" doesn’t take an NP.
•"Ate food" is correct because "eat" requires an NP.
•"Slept on the bed" is correct because "on the bed" is a PP, not an NP.

Projection Principle
• This is also an basic notion in GB, places a constraint on
the three syntactic representations and their mapping
from one to the other. All syntactic levels are form lexicon.
❖Theta Theory or The Theory of Thematic relations
Sub Categorizations puts restrictions only on syntactic
categories which a head can accept, GB puts other
restrictions on lexical heads, roles to arguments, the role
assignments are related to ‘semantic relation’

• Theta role and Theta criterion
Thematic roles from which a head can select, theta roles
are mentioned in the lexicon word eat can take (Agent,Theme)
Eg : Mukesh ate food (agent role to mukesh, theme role to food )
❖ Roles are assigned based on the syntactic positions of the
arguments, it is important there should be a match between
the number of roles and number of arguments depicted by
theta criterion
❖ Theta criterion states that ‘each argument bears one and only
one theta role, and each Theta role is assigned to one and
only one argument’

C (Constituent Command)-command and
Government
• C-Constituent is a syntactic relationship that helps define structural
dependencies in a sentence, such as binding, scope, and interpretation
of pronouns and anaphors
• C- command defines scope of maximal projection:
If any word or phrase falls within the scope of and is
determined by a maximal projection, we say that it is
dominated by the maximal projection , there are two
structures α and ß related in such a way that
“ every maximal projection dominating α
dominates ß” iff we say that α C commands ß .
• The def of C command doesn’t include al maximal
projections dominating ß only those dominating α.

Definition of C-Command
A node α C-commands node β if and only if:
1.Every maximal projection that dominates α also
dominates β.
2. α does not dominate β, and β does not dominate α.
This means that α and β must be "sibling-like"
structures, with a common dominating node, but
neither should dominate the other.

S
/
NP VP
/ /
N V NP
John loves Mary
•"John" C-commands "loves" and "Mary", because the NP ("John") and
VP ("loves Mary") are both dominated by S, their closest maximal projection.
•However, "John" does NOT C-command inside the NP ("Mary"),
because "Mary" is inside another maximal projection.

Government, Movement, Empty Category and
Co indexing
• “α governs ß” iff : α C-commands ß
α is an X (head e.g, noun verb preposition adjective
and inflection) and every maximal projection
dominating ß dominates α.
• Movement
In GB move α is described as ‘move anything
anywhere’ though provides restrictions for valid
movement.

•
In GB, active to passive transformation.
wh-movement (Wh-question) and NP- movement
What did Mukesh eat ? [Mukesh INFL eat
[what ]]
• Lexical categoryies (N, V, A) must exisit in all three
levels.
• Existence of an abstract entity called empty category
(invisible elements).
In GB four types of empty categories, two being empty
NP positions called wh-trace and NP-Trace and remaining
two pronouns called small pro and big PRO
With two properties –anaphoric(+a or –a)
pronominal(+p or –p)
Co-Indexing is the indexing of the subject NP and AGR at
d-structure which are preserved by Move α.

1. Wh-trace -a -p
2. NP-trace +a -p
3. Small pro -a +p
4. Big PRO +a +p
Properties of Empty Categories
Empty categories are classified based on two properties:
1.Anaphoric (+a or -a):
1. If +a, the category depends on an antecedent for meaning
(e.g., traces).
2. If -a, it does not require an antecedent (e.g., pro).
2.Pronominal (+p or -p):
1. If +p, the category behaves like a pronoun (e.g., pro, PRO).
2. If -p, it does not behave like a pronoun (e.g., traces).

Classification of Empty Categories in GB Theory
Empty categories (ECs) are syntactic elements that exist in sentence structure but
are not pronounced. They are classified based on two properties:
1.Anaphoricity (+a or -a) → Does it need an antecedent?
•If (+a) → Needs an antecedent (its meaning comes from another element
in the sentence).
•Example: Traces (wh-trace, NP-trace)
•What did John eat tᵥ?
•The trace tᵥ refers back to "what" (the wh-word), so it needs an
antecedent → +a
•If (-a) → Does NOT need an antecedent (its meaning is understood
without a reference).
Example: pro (in pro-drop languages like Kannada
“speaks Kannada fluently”
•pro is understood as "he/she/they," but it doesn't depend on any explicit
antecedent → -a

2. Pronominality (+p or -p) → Does it behave like a pronoun?
This property determines whether an empty category behaves
like a pronoun (i.e., whether it can refer to a person or thing).
•If (+p) → Acts like a pronoun (can take the role of "he," "she," "it," etc.).
•Example: pro, PRO
ಓದುತ್ತಾನೆ
refers to a person (like "he/she"), it behaves like a pronoun → +p
•If (-p) → Does NOT act like a pronoun (doesn’t refer to a person/thing).
•Example: Traces (wh-trace, NP-trace)
The cake was eaten tₙₚ.
The trace tₙₚ is just a placeholder, NOT a pronoun → -p

Wh-trace (t𝑤ℎ) – Wh-Movement
Definition: A wh-trace (t𝑤ℎ) is created when a wh-word (such as
what, who, where) moves to the front of a sentence in Wh-
movement. The original position of the wh-word is left empty,
creating a trace.
Example: John ate what?
Whatidid John eat t ?
𝑤ℎ

NP-trace (tNP) – NP-Movement
•An NP-trace (tNP
) occurs when an NP moves due to
passivization or raising verbs.
•The original position of the NP is left empty, and a trace
is left behind.
Example: Someone ate the cake.
The cake𝑖 was eaten tNP i

Small pro (pronoun)
small pro refers to an empty subject pronoun found in
languages that allow pro-drop, like Kannada, Spanish,
Italian, and Chinese
Example:
ನಾನು ಶಾಲೆಗೆ ಹೋಗುತ್ತೇನೆ → "I go to school."
(Small pro)ಶಾಲೆಗೆ ಹೋಗುತ್ತೇನೆ → "Go to school." (subject is omitted)
The subject (ನಾನು / "I") is not spoken but understood. This missing subject is represented as
small pro in GB theory:

Big PRO (Pronoun)
•The term PRO (uppercase) refers to an empty subject in
control constructions.
•It is called "Big" because it appears in non-finite clauses
(clauses without tense), unlike small pro, which appears in
finite clauses.
•PRO is "big" because it has more syntactic restrictions
than small pro.
Example
The teacher told Vishnu [PRO to study].
The meaning is: The teacher told Vishnu that Vishnu should study
Incorrect: The teacher told Vishnu [he to study]

Binding Theory
• Binding defined as
α binds ß iff :
α C-commands ß
α and ß are Co-indexed.
Eg : Mukesh was killed
[e1 INFL kill Mukesh ]
[Mukesh was killed (by
ei)]
Mukesh was killed.
Empty clause (ei) and
mukesh (Npi) are bound.
Binding theory can be
given as follows:
(a) An anaphor (+a) is
bound in its

1. Principle A (Anaphors: +a, -p)
An anaphor (e.g., "himself", "herself") must be bound in its government
category.
• Example:
• Correct: John saw himself. → "Himself" is bound by "John" in the same clause.
• Incorrect: John said that Mary saw himself. → "Himself" has no local binder
(wrong).
2. Principle B (Pronominals: -a, +p)
A pronominal (e.g., "he", "she") must be free in its government category.
• Example:
• Correct: John said that he left. → "He" is free in the embedded clause.
• Incorrect: John saw him. → "Him" is bound in the same clause (wrong).
3. Principle C (R-expressions: -a, -p)
An R-expression (Referential) (e.g., "Mukesh", "John", "Mary") must be free (not
bound) everywhere.
• Example:
• Correct: John said that Mary left. → "Mary" is not bound.
• Incorrect: He said that John left. → If "He" refers to "John", it's wrong because
"John" must be free.

Empty category Principle (ECP):
α properly governs ß iff :
α governs ß and α is lexical ( i.e, N V A or P) or
Α locally A – binds ß ECP says ‘ A trace must be properly governed’
Example: What did John eat __?“
(Wrong) What do you think that John ate __?
Bounding Theory Case Theory and Case Filter
• In GB case theory deals with the distributions of NPs and mentions that each NP
must assigned a case.
(Case refers to the grammatical role that a noun (or pronoun) plays in a sentence.
Ex: She ate an apple)
• In English we have nominative, objective, genitive etc., cases which are assigned to
NPs at particular positions.
• Indian languages are rich in case markers, which are carried even
during movements.

Case Filter :
An NP is un grammatical if it has phonetic content or if it is an argument and
is not case marked.
Phonetic content here, refers to some physical realization, as opposed to empty
categories. Case filters restricts the NP movement.
LFG Model : Lexical Functional Grammar (LFG) Model:
Two syntactic levels :
constituent structure (c-struct)-Phrase structure representation (tree format)
functional structure (f-struct)- Grammatical function representation (subject, object, etc.)
ATN (Argument Transition Networks )- Computational model linking syntax and meaning
which used phrase structure trees to represent the surface of sentences and underlying
predicate –argument structure.
LFG aimed to C-structure and f-structure

Natural language processing module 1 chapter 2

She saw stars in the sky
•↑ (Up Arrow): Refers to the f-structure of the larger node (the phrase
containing the element).
•↓ (Down Arrow): Refers to the f-structure of the current node (the specific
element itself).
Rule 1 (S → NP VP):
•(↑ subj = ↓) → This means that the f-structure of NP (subject) goes into the
f-structure of the entire sentence (S).
•(↑ = ↓) → This means that the f-structure of VP (verb phrase) directly
assigned to the f-structure of S (since the verb defines the sentence's
action).

Functional Notation in Lexical Functional Grammar (LFG)

The order follows a grammatical hierarchy where features affecting
agreement appear first
f-structure of given sentence

Three key properties of f-structure in linguistic theory:
1.Consistency – Each attribute in an f-structure can have only one
value. If conflicting values are found (e.g., singular and plural for
a noun), the structure is rejected.
2.Completeness – An f-structure must include all the functions
required by its predicate. If a predicate requires an object but
none is provided (e.g., "He saw" without specifying what was
seen), the structure is incomplete.
3.Coherence – All governable functions in an f-structure must be
governed by a predicate. If an object appears where the verb does
not allow it, the structure is rejected.

Lexical Rules in Lexical-Functional Grammar (LFG)
Active Sentences (The subject performs the action, and the object receives it)
Example:
Tara ate the food. → ("Tara" is the subject, "ate" is the verb, and "the
food" is the object.)
Passive Sentences (The object becomes the subject, and the original subject moves to
an optional phrase.)
Example:
• Passive: The food was eaten by Tara. → ("The food" is now the
subject, "was eaten" is the verb, and "by Tara" is an optional agent
phrase.)
Active Structure: Pred = ‘eat < (↑ Subj) (↑ Obj) >’
Passive Structure: Pred = ‘eat < (↑ Obl_ag) (↑ Subj) >’
[oblique agent phrase (Obl_ag): special grammatical element used in passive
sentences to indicate who performed the action. Example: Active Voice: Tara ate
the food. Passive Voice (Sentence Rewritten): The food was eaten by Tara. “by
Tara" is the oblique agent phrase (Obl_ag) because it represents the original subject
(Tara) and it is no longer the main subject of the sentence ]

Causativization (Making Someone Do Something)
Causativization is when an action is caused by someone rather than being done
directly.
Example:
Active: तारा हंसी (Taaraa hansii) → Tara laughed
Pred = ‘Laugh < (↑ Subj) >’
Causative (when someone causes an action to happen):
मोनिका ने तारा को हँसाया (Monika ne Tara ko hansaayaa) → Monika made Tara
laugh
Pred = ‘cause < (↑ Subj) (↑ Obj) (Comp) >’
Subject (Subj): Monika (the causer)
Object (Obj): Tara (the one affected)
Complement (Comp): Tara laugh (this is the action that was caused). (complement
that tells what action is being caused)

Long-Distance Dependencies and Coordination in LFG
Long-distance dependency happens when a word or phrase is moved from its usual position
in a sentence to another position. In English, this often happens with questions (wh-
movement).
Example: Tara likes which picture most?
(Wh-Movement) Which picture does Tara like most?
In GB Theory (Government and Binding Theory), when a word or phrase moves to a
different position, it leaves an empty category (a placeholder for the missing part).
Which picture does Tara like __ most? (invisible placeholder or trace)
LFG does not create empty categories like GB Theory. Instead, it uses functional structures
to maintain connections and Coordination.
The moved phrase (which picture) is still linked to its original position functionally.
Coordination refers to how different sentence elements are linked logically.
Example: Tara likes ‘tea and coffee’

"Which picture does Tara like__ most?“
1. Focus
• Represents the wh-word phrase, which is in focus.
• In this case, "Which picture" is the focus.
2. Pred (‘picture (Obl
⟨ th) ’)
⟩
• The predicate (Pred) represents the main word of the phrase.
• Here, "picture" is the noun, and Oblth (oblique thematic) indicates that the picture is
related to something else (like an owner or subject).
3. Oblth(Oblique Thematic Role)
• It refers to an oblique phrase that provides additional information, such as who the
picture is related to.
• Contains:
• pred ‘PRO’ → PRO stands for a pronoun-like element.
• Refl + → Suggests reflexive or reference to something already mentioned.
4. Subj (Subject)
• pred ‘Tara’ → Identifies "Tara" as the subject of the sentence.
5. Obj (Object)
• The object is left empty, as the wh-word ("which picture") has moved.
6. Pred (‘like (↑Subj) (↑Obj) ’)
⟨ ⟩
• Represents the verb "like", which takes a subject and an object.

Paninian Framework
Its a linguistic model based on Paninian Grammar (PG), which was written by Panini in 500 BC in Sanskrit (originally titled
Astadhyayi). Though originally designed for Sanskrit, this framework can be applied to other Indian languages and some Asian
languages.
Key Features
1. SOV Structure: Unlike English (which follows Subject-Verb-Object [SVO] order), most Asian languages follow Subject-
Object-Verb [SOV] order.
1. Example:
1. English (SVO): Tara likes apples.
2. Hindi (SOV): तारा को सेब पसंद है। (Tara ko seb pasand hai)
2. Inflectional Richness:
1. Many Indian languages rely on inflectional changes to convey grammatical relationships (e.g., tense, case, gender),
instead of relying solely on word order.
2. Example (in Sanskrit):
1. रामः ग्रामं गच्छति (Rāmaḥgrāma gacchati
ṁ ) → Rama goes to the village.
2. रामेण ग्रामः गम्यते (Rāmeṇa grāmaḥgamyate) → The village is gone to by Rama.
3. Here, "Rama" (रामः / रामेण) changes its form based on its grammatical role.
3. Syntactic and Semantic Cues:
1. The Paninian framework focuses on meaning-based analysis rather than just word order, making it useful for analyzing
complex Indian languages.
4. Ongoing Research:
1. The framework is still being explored for its application to Indian languages, as many complexities remain to be
explained.

Some Important features of Indian languages

Layered representation of PG
• General GB considers deep structure , Surface and LF, where LF nearer to Semantics
• Paninion grammar frame work is said to be syntactico- semantic
surface layer to deep semantics by passing to intermediate layers.
•Language as a multi-layered process. You start with spoken words (surface level),
add grammatical roles (vibhakti level), determine who is doing what (karaka level),
and finally understand the real meaning (semantic level).
•Paninian Grammar follows this approach to ensure sentences preserve meaning
even if word order changes.
•This is useful in languages like Hindi and Sanskrit, where word order is flexible,
but meaning remains clear.

• Vibhakti means inflection, but here it refers to
word (noun, verb,or other)groups based
either on case endings, post positions or
compound verbs, or main and auxiliary verbs
etc,.
• Instead of talking NP,VP,AP,PP or … word
groups are formed based on various kinds of
markers. These markers are language specific
but all indian languages can be represented at
Vibhakti Leve.l

• Karaka Level means Case in GB these are theta
criterion etc.,.
• PG has its own way of defining karaka relations, these
relations based on word groups participate in the
activity denoted by the verb group(syntactic &
semantic as well).
KARAKA THEORY
• Central theme of PG framework, relations are assigned
based on the roles played by various participates in the
main activity.
• Roles are reflected in the case markers and post
position markers.
• Case relations we can find in english langauge, richness
of the case endings found in indian languages .

• Karakas such as Karta (subject), karma(object),
Karna(instrument),sampradhana(beneficary),
Apandan(seperation) and Adikhran (locus).

Issues in Panininan Grammar(PG)
•Computational implementation of PG- Translating PG
into a computer-friendly format is complex.
• Adaptation of PG to Indian , other similar
languages.
• Mapping Vibakthi to several semantics

n-gram Model
• n-gram model: is used in statistical language modeling to estimate the
likelihood (probability) of a sentence appearing in a given language.
• The n-gram model helps us calculate this probability using past words in a
sentence.
• Instead of treating the whole sentence as a single unit, the model breaks it
down into smaller parts and calculates probabilities step by step.
• This follows the chain rule of probability, which means the probability of a
sentence P(s) is the product of the probabilities of each word appearing,
given the previous words.
we have a sentence with words:
𝑠=(𝑤1,𝑤2,𝑤3,...,𝑤𝑛) Where: 𝑤1is the first word, 𝑤2 is the second word, and
so on 𝑤𝑛 is the last word

Special Words (Pseudo-Words) is introduced to mark the
start or beginning of the Sentence.
•<s> → Marks the beginning of a sentence.
•</s> → Marks the end of a sentence.
•In trigram models, we use <s1> and <s2> to mark the
start.

How Do We Estimate These Probabilities?
• To train an n-gram model, we use real-world text data
(corpus). We count how often a particular n-gram
appears and divide it by the total occurrences of its
history.
• The formula for estimating probabilities using Maximum
Likelihood Estimation (MLE) is:

In General sum of all n-grams that share first n-1 words is equal to the count
of the common prefix
Therefore P(T/M) where T is training set and M is model

Training Set
1. The Arabian Knights
2. These are the fairy tales of the east
3. The stories of the Arabian knights are translated in many languages
using bigram model.
Formula for bigram model

P(The / <s>) = 0.67
•C(The) = 2 (because "The" appears twice as the first word
in different sentences: "The Arabian Knights" and "The
stories of the Arabian knights...")
•Total number of sentences = 3

P(Arabian / The) = 0.4
•C(The) = 5 (appears 5 times in total)
•C(The, Arabian) = 2 (word pair appears twice: "The
Arabian Knights" and "The stories of the Arabian
knights...")

P(Knights / Arabian) = 1.0
•C(Arabian) = 2 (appears twice)
•C(Arabian, Knights) = 2 (word pair "Arabian Knights"
appears twice)

Add-One Smoothing (Laplace Smoothing)
In n-gram models, we estimate the probability of a word appearing after a
sequence of previous words based on how often that sequence occurs in the
training data.
However, a major problem arises when we encounter word sequences that
never appeared in the training data. If a certain n-gram was not seen in
training, its probability is calculated as zero, which means the model cannot
generate or recognize new sequences.
Ex: The Arabian knights are strong
P(strong | are) = 0.
Solution is smoothing
Smoothing is a technique used to fix the zero-probability problem by
adjusting how probabilities are assigned. It does this by giving a small amount
of probability to unseen n-grams so that nothing has zero probability.
One of the simplest smoothing techniques is Add-One Smoothing, also called
Laplace Smoothing.

What is Add-One Smoothing?
Add-One Smoothing (Laplace Smoothing) is a simple method where
we add 1 to all n-gram counts before normalizing probabilities.
Formula (For n-gram Model):

Good-Turing Smoothing
• Good-Turing Smoothing is a statistical technique proposed by Alan Turing and I.
J. Good (1953) to handle the problem of data sparsity in n-gram language
models. The main idea is to adjust the estimated probabilities of observed and
unseen n-grams by redistributing some probability mass from frequent n-
grams to infrequent and unseen ones.
• Good-Turing modifies the count (frequency) f of an n-gram and replaces it with
an adjusted count f∗
:

Caching Technique
• Caching is an optimization technique that
improves the basic n-gram model by storing
recently seen n-grams and giving them higher
probabilities.
Why is Caching Needed?
•Language is Context-Dependent
• Certain words or phrases occur more often in specific sections of text but
not uniformly across the dataset.
•Standard N-gram Models Ignore Recent History
• A basic n-gram model treats every sentence independently and does not
consider recent occurrences of words.

Natural language processing module 1 chapter 2

More Related Content

Similar to Natural language processing module 1 chapter 2 (20)

Recently uploaded (20)

Natural language processing module 1 chapter 2