Presentation

Exploring Higher Order Dependency Parsers

Pranava Swaroop Madhyastha

Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman

September 6, 2011

Introduction

◮ Dependency Grammar.
◮ Binary asymmetric relations - Head and Modiﬁer - Highly
lexical relationships.
◮ A quick example:

◮ Projective Constraint
◮ Graph Based Dependency Parsing
◮ Arc-Factored Parsing

Problem Description?

◮ Augmentation of Features
◮ Semantic features
◮ Morpho-syntactic features
◮ Higher order parsing
◮ Context availability
◮ horizontal and vertical context availability

◮ Motivation
◮ Semi-supervised dependency parsing and improvements.
◮ Using well deﬁned linguistic components.

What is Higher Order Dependency Parsing
◮ First-order model - decomposition of the tree into head and
modifier dependencies.
◮ Second-order models - inclusion of sibling relation of the
modifier tokens along with head and modifier or inclusion of
head and modifier and children of the modifier.
◮ Third-order models - one level up.

◮ An illustration

Features

◮ For a given φ - a feature vector and w - the list of related
parameters, each part is scored as

Part(x, p) = w .φ(x, p) (1)
◮ Each of these contributing feature vectors would be scored by
calculating the individual features in this fashion:
◮ dir.pos(h).pos(m)
◮ dir.form(h).pos(m)
◮ and so on ...
◮ The most basic feature patterns consider the surface form,
part-of-speech, lemma and other morphosyntactic attributes
of the head or the modiﬁer of a dependency.

Experimentation done with:

◮ English - Penn Treebank
◮ Section 2 to 10 as training set - a set of 15000 sentences.
◮ Random sets of sentences from sections 15, 17, 19, 25 of the
Penn Treebank as development data - a set of 1000 sentences.
◮ Test set was chosen from Sections 0, 1, 21, 23 of the penn
treebank - a set of 2000 sentences.
◮ Czech - Prague Dependency Treebank
◮ The sentences were chosen from pdt2-full-automorph dataset.
◮ The training set consisted of train1 - train5 splits - a set of
15,000 sentences..
◮ The development set consisted of train6 and train7 splits - a
set of 1000 sentences.
◮ The test set was made up of dtest and etest parts - a set of
2000 sentences.

Experimentation

◮ Fine and Coarse Grained Wordsenses
◮ Approximation
◮ For English:
◮ Both Fine and Coarse Grained Wordsense extraction make use
of WordNet::SenseRelate package.
◮ Fine grained wordsense basically restricts a word to a particular
sense - Word - noun and ﬁrst sense (extracted from the
wordnet)
◮ Coarse Grained wordsense is a more generic wordsense
description Word - the semantic ﬁle to which the word belongs
to.
◮ For Czech:
◮ Only Fine Grained Wordsense extraction (approximately).
◮ extracted by using the sempos which is already tagged in the
prague dependency treebank.

Results for the Wordsense augmentation experiment

◮ Sibling based parsers show a statistically signiﬁcant
improvement.
◮ For English with Fine Grained wordsense addition - Third
order grand-sibling based parser gives an improvement of
+0.81 percent (Unlabeled Accuracy Score). A closer
statistical examination showed that sibling based interactions
which are close to each other have better precision.
◮ For English with Coarse Grained wordsense addition - the
second order sibling based parser gives an improvement of
approximately +1.09 percent.
◮ Again for Czech with ﬁne grained wordsense augmentation,
the 3rd order sibling based parser gives an improvement of
approximately +1.20 percent.

Results for Morphosyntactic augmentation experiment

◮ Morphosyntactic augmentation was basically used directly by
extracting tags from the corpus.
◮ For Czech, instead of the 15 Letter tagset, we tried out a
subset (which includes - Person, Number, POSSGender,
Tense, Voice and Case)
◮ For English we integrated the ﬁne grained part-of-speech.

Results

◮ Both for English and Czech, there is a signiﬁcant
improvement in the parsing accuracy when it is parsed with
the grandchild based algorithms.
◮ For Czech, the third order grand sibling based algorithm
shows an improvement of +1.72 percent.
◮ For English, the third order grand sibling based algorithm
shows an improvement of +1.21 percent.

Conclusion

◮ Semantic features work better with sibling based parsers
(larger horizontal contexts).
◮ Morpho-syntactic features work better with grandchild based
parsers (larger vertical contexts).
◮ Features can be instrumental in several tasks, which include
accurate labeling of semantic roles and other related tasks.
◮ Linguistic information can be better handled by a higher order
parsing algorithm.

Future Work

◮ Higher order parsers with labels (we have not yet tested
labeled accuracy scores).
◮ Joint extraction of word-senses and semantic roles.
◮ Experimentation with lexical clusters.
◮ Thorough experimentation of several features.
◮ Maximum and Minimum order requirements.

Presentation

More Related Content

What's hot (20)

Similar to Presentation (20)

Recently uploaded (20)

Presentation