SlideShare a Scribd company logo
Codeco: A Grammar Notation for Controlled
Natural Language in Predictive Editors
Tobias Kuhn
Department of Informatics
University of Zurich
Second Workshop on Controlled Natural Language
15 September 2010
Marettimo (Italy)
Introduction
• Problem: Existing grammar frameworks do not work out
particularly well for CNLs.
• Reason:
• CNLs have essential differences to other languages (natural and
formal ones)
• To solve the writability problem, CNLs have to be embedded in
special tools with very specific requirements.
• Error messages and suggestions
• Predictive editors
• Language generation
• Solution: A new grammar notation that is dedicated to CNLs
and predictive editors.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 2 / 25
CNL Grammar Requirements
Concreteness: CNL grammars should be fully formalized and
interpretable by computers.
Declarativeness: CNL grammars should not depend on a concrete
algorithm or implementation.
Lookahead Features: CNL grammars should allow for the retrieval
of possible next tokens for a partial text.
Anaphoric References: CNL grammars should allow for the
definition of nonlocal structures like anaphoric
references.
Implementability: CNL grammars should be easy to implement in
different programming languages.
Expressivity: CNL grammars should be sufficiently expressive to
express CNLs.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 3 / 25
Lookahead Features
Predictive editors need to know which words can follow a partial text:
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 4 / 25
Anaphoric References
• Anaphoric references in CNLs are resolvable in a deterministic
way:
A country contains an area that is not controlled by the country.
If a person X is a relative of a person Y then the person Y is a
relative of the person X.
John protects himself and Mary helps him.
• Anaphoric references that cannot be resolved should be
disallowed:
* Every area is controlled by it.
* The person X is a relative of the person Y.
• Scopes have to be considered too:
Every man protects a house from every enemy and does not
destroy ...
... himself.
... the house.
* ... the enemy.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 5 / 25
Existing Grammar Frameworks
• Grammar Frameworks for Natural Languages
• Head-Driven Phrase Structure Grammars
• Lexical-Functional Grammars
• Tree-Adjoining Grammars
• Combinatory Categorial Grammars
• Dependency Grammars
• ... and many more
• Backus-Naur Form (BNF)
• Parser Generators
• Yacc
• GNU Bison
• ANTLR
• Definite Clause Grammars (DCG)
• Grammatical Framework (GF)
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 6 / 25
How Existing Grammar Frameworks Fulfill our
Requirements for CNL Grammars
NL BNF PG DCG GF
Concreteness + + + + +
Declarativeness +/– + – (+) +
Lookahead Features – + (+) (+) +
Anaphoric References (+) – – (+) –
Implementability – + – – ?
Expressivity + – + + +
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 7 / 25
The Codeco Notation
Codeco = “Concrete and Declarative Grammar Notation for
Controlled Natural Languages”
• Formal and Declarative
• Easy to implement in different programming languages.
• Expressive enough for common CNLs.
• Lookahead features can be implemented in a practical and
efficient way.
• Deterministic anaphoric references can be defined in an adequate
and simple way.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 8 / 25
Grammar Rules in Codeco
• Grammatical categories with flat feature structures
• Category Types:
• non-terminal (e.g. vp)
• pre-terminal (e.g. noun)
• terminal (e.g. [ does not ])
• Grammar Rule Examples:
• vp num: Num
neg: Neg
:
−→ v
num: Num
neg: Neg
type: tr
np case: acc
• v neg: +
type: Type
:
−→ [ does not ] verb type: Type
• np noun: Noun
:
−→ [ a ] noun text: Noun
• noun text: woman
gender: fem
→ [ woman ]
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 9 / 25
Forward and Backward References
The special categories “>” and “<” can be used to establish
nonlocal dependencies, e.g. for anaphoric references:
np
:
−→ det def: – noun text: Noun > type: noun
noun: Noun
ref
:
−→ det def: + noun text: Noun < type: noun
noun: Noun
s
vp
vp
np
ref
noun
area
det
the
v
tv
control
aux
does not
conj
and
vp
np
noun
area
det
an
v
tv
contains
np
>noun
country
det
A
>
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 10 / 25
Scopes
• Opening of scopes:
• Scopes in (controlled) English usually open at the position of the
scope triggering structure, or nearby.
• Scope opener category “ ” in Codeco:
quant exist: –
:
−→ [ every ]
• Closing of scopes:
• Scopes in (controlled) English usually close at the end of certain
structures like verb phrases, relative clauses, and sentences.
• Scope-closing rules “
∼
−→” in Codeco:
vp num: Num
∼
−−→ v num: Num
type: tr
np case: acc
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 11 / 25
Position Operators
• How to define reflexive pronouns like “herself ” that can only
attach to the subject?
• With the position operator “#”, position identifiers can be
assigned:
np id: Id
:
−→ # Id prop gender: G >
id: Id
gender: G
type: prop
ref subj: Subj
:
−→ [ herself ] < id: Subj
gender: fem
s
vp
np
ref
herself
v
tv
helps
np
prop
Maryp0 p1 p2 p3
#
#
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 12 / 25
Negative Backward References
• How to define that the same variable can be introduced only
once?
*A person X knows a person X.
• Negative backward references “ ” succeed only if no matching
antecedent is accessible:
newvar
:
−→ var text: V
type: var
var: V
> type: var
var: V
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 13 / 25
Complex Backward References
• How to define pronouns like “him” that cannot attach to the
subject?
*John helps him.
• Complex backward references “<+... −...” have one or more
positive feature structure “+” and zero or more negative
ones “−”.
• They succeed if there is an antecedent that matches one of the
positive feature structures but none of the negative ones:
ref subj: Subj
case: acc
:
−→ [ him ] <+ human: +
gender: masc
− id: Subj
• A more complicated (but probably less useful) example:
ref subj: Subj
:
−→ [ this ] <+ hasvar: –
human: –
type: relation − id: Subj type: prop
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 14 / 25
Strong Forward References
• How to define that propernames like “Bill” are always accessible?
*Mary does not love a man. Mary hates him.
Mary does not love Bill. Mary hates him.
• Strong forward references “ ” are always accessible:
np id: Id
:
−→ prop human: H
gender: G



id: Id
human: H
gender: G
type: prop



Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 15 / 25
Reference Resolution: Accessibility
Forward references are only accessible if they are not within a scope
that has already been closed before the position of the backward
reference:
s ∼
vp
vp ∼
np
ref
...
v
tv
destroy
aux
does not
conj
and
vp
pp
np
>n
enemy
det
every
prep
from
np
n
house
det
a
v
tv
protects
np
n
man
det
Every
∼
( ( ()
>
>
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 16 / 25
Reference Resolution: Accessibility
Strong forward references are always accessible:
s ∼
vp
vp ∼
np
ref
...
v
tv
likes
conj
and
vp
np
pp
np
prop
Bill
prep
of
>n
friend
det
every
v
tv
knows
np
prop
Mary
∼
( )
<
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 17 / 25
Reference Resolution: Proximity
If a backward reference matches more than one forward reference
then the closest one is taken:
s
s
...
...
np
ref
pn
it
conj
then
s
vp
np
n
error
det
an
v
tv
causes
np
pp
np
n
machine
det
a
prep
of
n
part
det
a
conj
If
>
>
> <
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 18 / 25
Possible Extensions
• Semantics (e.g. with λ-DRSs)
• General feature structures (instead of flat ones)
• ...
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 19 / 25
Parsers for Codeco
Two parsers with different parsing approaches exist:
• Transformation into Prolog DCG
• fast (1.5 ms per sentence)
• no lookahead features
• ideal for regression tests and parsing of large texts in batch mode
• Execution in a chart parser (Earley parser) under Java
• slower, but still reasonably fast (130 ms per sentence)
• lookahead features
• ideal for predictive editors in Java
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 20 / 25
ACE in Codeco
• Large subset of ACE in Codeco
• Includes: countable nouns, proper names, intransitive and
transitive verbs, adjectives, adverbs, prepositions, plurals,
negation, comparative and superlative adjectives and adverbs,
of-phrases, relative clauses, modality, numerical quantifiers,
coordination of sentences / verb phrases / relative clauses,
conditional sentences, questions, and anaphoric references (simple
definite noun phrases, variables, and reflexive and irreflexive
pronouns)
• Excludes: Mass nouns, measurement nouns, ditransitive verbs,
numbers and strings as noun phrases, sentences as verb phrase
complements, Saxon genitive, possessive pronouns, noun phrase
coordination, and commands
• 164 grammar rules
• Used in the ACE Editor:
http://attempto.ifi.uzh.ch/webapps/aceeditor/
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 21 / 25
Evaluation of ACE Codeco
Exhaustive Language Generation:
• Evaluation subgrammar with 97 grammar rules
• Minimal lexicon
• 2’250’869 sentences with 3–10 tokens:
sentence length number of sentences growth factor
3 6
4 87 14.50
5 385 4.43
6 1’959 5.09
7 11’803 6.03
8 64’691 5.48
9 342’863 5.30
10 1’829’075 5.33
3–10 2’250’869
• All are accepted by the ACE parser
→ ACE Codeco is a subset of ACE
• None is generated more than once
→ ACE Codeco is unambiguous
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 22 / 25
Evaluation of the Codeco notation and its
implementations
Prolog DCG representation versus Java Earley parser:
• Equivalence of the Implementations:
Generate the same set of sentences up to 8 tokens
→ The two implementations process Codeco in the same way
• Performance Tests:
task grammar implementation seconds/sentence
generation ACE Codeco eval. subset Prolog DCG 0.00286
generation ACE Codeco eval. subset Java Earley parser 0.0730
parsing ACE Codeco eval. subset Prolog DCG 0.000360
parsing ACE Codeco eval. subset Java Earley parser 0.0276
parsing full ACE Codeco Prolog DCG 0.00146
parsing full ACE Codeco Java Earley parser 0.134
parsing full ACE APE 0.0161
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 23 / 25
Conclusions
Codeco ...
• ... fulfills our requirements for CNLs in predictive editors.
• ... is suitable to describe a large subset of ACE.
• ... allows for automatic tests.
• ... stands for a principled and engineering focused approach to
CNL.
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 24 / 25
Thank you for your attention!
Questions & Discussion
Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 25 / 25

More Related Content

PPTX
Trends In Languages 2010
PDF
A Low Dimensionality Representation for Language Variety Identification (CICL...
PDF
Language Variety Identification using Distributed Representations of Words an...
PDF
AINL 2016: Malykh
PDF
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
PPTX
AINL 2016: Yagunova
PDF
Semantics and Computational Semantics
PDF
OUTDATED Text Mining 5/5: Information Extraction
Trends In Languages 2010
A Low Dimensionality Representation for Language Variety Identification (CICL...
Language Variety Identification using Distributed Representations of Words an...
AINL 2016: Malykh
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
AINL 2016: Yagunova
Semantics and Computational Semantics
OUTDATED Text Mining 5/5: Information Extraction

What's hot (12)

PDF
OUTDATED Text Mining 3/5: String Processing
PPTX
A statistical approach to machine translation
PDF
Overview of text mining and NLP (+software)
PDF
AINL 2016: Eyecioglu
PPTX
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
PDF
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
PPTX
Language models
PPTX
NLP_KASHK:Finite-State Morphological Parsing
PPT
Lecture 7: Definite Clause Grammars
PPTX
NLP_KASHK:Text Normalization
PDF
Semantic Role Labeling
PPT
Class9
OUTDATED Text Mining 3/5: String Processing
A statistical approach to machine translation
Overview of text mining and NLP (+software)
AINL 2016: Eyecioglu
Dragos Munteanu (SDL) at the Industry Leaders Forum 2015
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Language models
NLP_KASHK:Finite-State Morphological Parsing
Lecture 7: Definite Clause Grammars
NLP_KASHK:Text Normalization
Semantic Role Labeling
Class9
Ad

Viewers also liked (10)

PDF
Semantic Publishing and Nanopublications
PPTX
Shreya
PDF
Nanopubs
PDF
Image Mining from Gel Diagrams in Biomedical Publications
PPT
Task week 3 design
PPTX
PPTX
PPTX
PDF
The Controlled Natural Language of Randall Munroe’s Thing Explainer
PDF
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Semantic Publishing and Nanopublications
Shreya
Nanopubs
Image Mining from Gel Diagrams in Biomedical Publications
Task week 3 design
The Controlled Natural Language of Randall Munroe’s Thing Explainer
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Ad

Similar to Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors (20)

PDF
CCG
PDF
CS-4337_03_Chapter3- syntax and semantics.pdf
PDF
Programming collaborative-ref
PPT
3 describing syntax
PPTX
Debugging Chomsky's Hierarchy
PDF
Parsing Expression Grammars
PDF
Context free langauges
PDF
Formal Languages and Automata Theory unit 4
PDF
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
PPTX
Module 1 TOC.pptx
PPT
UNIT 1 part II.ppt
PPT
2. context free langauages
PPT
principles of programming lang unit-I.ppt
PPT
Ch2 (1).ppt
PPT
sabesta3.ppt
PPTX
Formal Grammars of English
PDF
Theory of Automata ___ Basis ...........
PPT
ssNL11SyntaxandContext-free grammars.ppt
PPT
anaphora resolution natural language processing.ppt
PPTX
Domain Specific Language Design
CCG
CS-4337_03_Chapter3- syntax and semantics.pdf
Programming collaborative-ref
3 describing syntax
Debugging Chomsky's Hierarchy
Parsing Expression Grammars
Context free langauges
Formal Languages and Automata Theory unit 4
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
Module 1 TOC.pptx
UNIT 1 part II.ppt
2. context free langauages
principles of programming lang unit-I.ppt
Ch2 (1).ppt
sabesta3.ppt
Formal Grammars of English
Theory of Automata ___ Basis ...........
ssNL11SyntaxandContext-free grammars.ppt
anaphora resolution natural language processing.ppt
Domain Specific Language Design

More from Tobias Kuhn (20)

PDF
Nanopublications and Decentralized Publishing
PDF
Linked Data Publishing with Nanopublications
PDF
Genuine semantic publishing
PDF
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
PDF
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
PDF
nanopub-java: A Java Library for Nanopublications
PDF
Scientific Data Publishing
PDF
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
PDF
Science Bots: A Model for the Future of Scientific Computation?
PDF
Data Publishing and Post-Publication Reviews
PDF
Semantic Publishing with Nanopublications
PDF
Meme Extraction from Corpora of Scientific Literature using Citation Networks
PDF
A Multilingual Semantic Wiki Based on Controlled Natural Language
PDF
Citation Graph Analysis to Identify Memes in Scientific Literature
PDF
Citation Graph Analysis to Identify Memes in Scientific Literature
PDF
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
PDF
Automatische Übersetzung in einem multilingualen, semantischen Wiki
PDF
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
PDF
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
PDF
AceRules: Executing Rules in Controlled Natural Language
Nanopublications and Decentralized Publishing
Linked Data Publishing with Nanopublications
Genuine semantic publishing
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
Publishing without Publishers: a Decentralized Approach to Dissemination, Ret...
nanopub-java: A Java Library for Nanopublications
Scientific Data Publishing
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
Science Bots: A Model for the Future of Scientific Computation?
Data Publishing and Post-Publication Reviews
Semantic Publishing with Nanopublications
Meme Extraction from Corpora of Scientific Literature using Citation Networks
A Multilingual Semantic Wiki Based on Controlled Natural Language
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Automatische Übersetzung in einem multilingualen, semantischen Wiki
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
AceRules: Executing Rules in Controlled Natural Language

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation_ Review paper, used for researhc scholars

Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors

  • 1. Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors Tobias Kuhn Department of Informatics University of Zurich Second Workshop on Controlled Natural Language 15 September 2010 Marettimo (Italy)
  • 2. Introduction • Problem: Existing grammar frameworks do not work out particularly well for CNLs. • Reason: • CNLs have essential differences to other languages (natural and formal ones) • To solve the writability problem, CNLs have to be embedded in special tools with very specific requirements. • Error messages and suggestions • Predictive editors • Language generation • Solution: A new grammar notation that is dedicated to CNLs and predictive editors. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 2 / 25
  • 3. CNL Grammar Requirements Concreteness: CNL grammars should be fully formalized and interpretable by computers. Declarativeness: CNL grammars should not depend on a concrete algorithm or implementation. Lookahead Features: CNL grammars should allow for the retrieval of possible next tokens for a partial text. Anaphoric References: CNL grammars should allow for the definition of nonlocal structures like anaphoric references. Implementability: CNL grammars should be easy to implement in different programming languages. Expressivity: CNL grammars should be sufficiently expressive to express CNLs. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 3 / 25
  • 4. Lookahead Features Predictive editors need to know which words can follow a partial text: Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 4 / 25
  • 5. Anaphoric References • Anaphoric references in CNLs are resolvable in a deterministic way: A country contains an area that is not controlled by the country. If a person X is a relative of a person Y then the person Y is a relative of the person X. John protects himself and Mary helps him. • Anaphoric references that cannot be resolved should be disallowed: * Every area is controlled by it. * The person X is a relative of the person Y. • Scopes have to be considered too: Every man protects a house from every enemy and does not destroy ... ... himself. ... the house. * ... the enemy. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 5 / 25
  • 6. Existing Grammar Frameworks • Grammar Frameworks for Natural Languages • Head-Driven Phrase Structure Grammars • Lexical-Functional Grammars • Tree-Adjoining Grammars • Combinatory Categorial Grammars • Dependency Grammars • ... and many more • Backus-Naur Form (BNF) • Parser Generators • Yacc • GNU Bison • ANTLR • Definite Clause Grammars (DCG) • Grammatical Framework (GF) Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 6 / 25
  • 7. How Existing Grammar Frameworks Fulfill our Requirements for CNL Grammars NL BNF PG DCG GF Concreteness + + + + + Declarativeness +/– + – (+) + Lookahead Features – + (+) (+) + Anaphoric References (+) – – (+) – Implementability – + – – ? Expressivity + – + + + Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 7 / 25
  • 8. The Codeco Notation Codeco = “Concrete and Declarative Grammar Notation for Controlled Natural Languages” • Formal and Declarative • Easy to implement in different programming languages. • Expressive enough for common CNLs. • Lookahead features can be implemented in a practical and efficient way. • Deterministic anaphoric references can be defined in an adequate and simple way. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 8 / 25
  • 9. Grammar Rules in Codeco • Grammatical categories with flat feature structures • Category Types: • non-terminal (e.g. vp) • pre-terminal (e.g. noun) • terminal (e.g. [ does not ]) • Grammar Rule Examples: • vp num: Num neg: Neg : −→ v num: Num neg: Neg type: tr np case: acc • v neg: + type: Type : −→ [ does not ] verb type: Type • np noun: Noun : −→ [ a ] noun text: Noun • noun text: woman gender: fem → [ woman ] Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 9 / 25
  • 10. Forward and Backward References The special categories “>” and “<” can be used to establish nonlocal dependencies, e.g. for anaphoric references: np : −→ det def: – noun text: Noun > type: noun noun: Noun ref : −→ det def: + noun text: Noun < type: noun noun: Noun s vp vp np ref noun area det the v tv control aux does not conj and vp np noun area det an v tv contains np >noun country det A > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 10 / 25
  • 11. Scopes • Opening of scopes: • Scopes in (controlled) English usually open at the position of the scope triggering structure, or nearby. • Scope opener category “ ” in Codeco: quant exist: – : −→ [ every ] • Closing of scopes: • Scopes in (controlled) English usually close at the end of certain structures like verb phrases, relative clauses, and sentences. • Scope-closing rules “ ∼ −→” in Codeco: vp num: Num ∼ −−→ v num: Num type: tr np case: acc Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 11 / 25
  • 12. Position Operators • How to define reflexive pronouns like “herself ” that can only attach to the subject? • With the position operator “#”, position identifiers can be assigned: np id: Id : −→ # Id prop gender: G > id: Id gender: G type: prop ref subj: Subj : −→ [ herself ] < id: Subj gender: fem s vp np ref herself v tv helps np prop Maryp0 p1 p2 p3 # # Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 12 / 25
  • 13. Negative Backward References • How to define that the same variable can be introduced only once? *A person X knows a person X. • Negative backward references “ ” succeed only if no matching antecedent is accessible: newvar : −→ var text: V type: var var: V > type: var var: V Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 13 / 25
  • 14. Complex Backward References • How to define pronouns like “him” that cannot attach to the subject? *John helps him. • Complex backward references “<+... −...” have one or more positive feature structure “+” and zero or more negative ones “−”. • They succeed if there is an antecedent that matches one of the positive feature structures but none of the negative ones: ref subj: Subj case: acc : −→ [ him ] <+ human: + gender: masc − id: Subj • A more complicated (but probably less useful) example: ref subj: Subj : −→ [ this ] <+ hasvar: – human: – type: relation − id: Subj type: prop Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 14 / 25
  • 15. Strong Forward References • How to define that propernames like “Bill” are always accessible? *Mary does not love a man. Mary hates him. Mary does not love Bill. Mary hates him. • Strong forward references “ ” are always accessible: np id: Id : −→ prop human: H gender: G    id: Id human: H gender: G type: prop    Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 15 / 25
  • 16. Reference Resolution: Accessibility Forward references are only accessible if they are not within a scope that has already been closed before the position of the backward reference: s ∼ vp vp ∼ np ref ... v tv destroy aux does not conj and vp pp np >n enemy det every prep from np n house det a v tv protects np n man det Every ∼ ( ( () > > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 16 / 25
  • 17. Reference Resolution: Accessibility Strong forward references are always accessible: s ∼ vp vp ∼ np ref ... v tv likes conj and vp np pp np prop Bill prep of >n friend det every v tv knows np prop Mary ∼ ( ) < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 17 / 25
  • 18. Reference Resolution: Proximity If a backward reference matches more than one forward reference then the closest one is taken: s s ... ... np ref pn it conj then s vp np n error det an v tv causes np pp np n machine det a prep of n part det a conj If > > > < Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 18 / 25
  • 19. Possible Extensions • Semantics (e.g. with λ-DRSs) • General feature structures (instead of flat ones) • ... Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 19 / 25
  • 20. Parsers for Codeco Two parsers with different parsing approaches exist: • Transformation into Prolog DCG • fast (1.5 ms per sentence) • no lookahead features • ideal for regression tests and parsing of large texts in batch mode • Execution in a chart parser (Earley parser) under Java • slower, but still reasonably fast (130 ms per sentence) • lookahead features • ideal for predictive editors in Java Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 20 / 25
  • 21. ACE in Codeco • Large subset of ACE in Codeco • Includes: countable nouns, proper names, intransitive and transitive verbs, adjectives, adverbs, prepositions, plurals, negation, comparative and superlative adjectives and adverbs, of-phrases, relative clauses, modality, numerical quantifiers, coordination of sentences / verb phrases / relative clauses, conditional sentences, questions, and anaphoric references (simple definite noun phrases, variables, and reflexive and irreflexive pronouns) • Excludes: Mass nouns, measurement nouns, ditransitive verbs, numbers and strings as noun phrases, sentences as verb phrase complements, Saxon genitive, possessive pronouns, noun phrase coordination, and commands • 164 grammar rules • Used in the ACE Editor: http://attempto.ifi.uzh.ch/webapps/aceeditor/ Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 21 / 25
  • 22. Evaluation of ACE Codeco Exhaustive Language Generation: • Evaluation subgrammar with 97 grammar rules • Minimal lexicon • 2’250’869 sentences with 3–10 tokens: sentence length number of sentences growth factor 3 6 4 87 14.50 5 385 4.43 6 1’959 5.09 7 11’803 6.03 8 64’691 5.48 9 342’863 5.30 10 1’829’075 5.33 3–10 2’250’869 • All are accepted by the ACE parser → ACE Codeco is a subset of ACE • None is generated more than once → ACE Codeco is unambiguous Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 22 / 25
  • 23. Evaluation of the Codeco notation and its implementations Prolog DCG representation versus Java Earley parser: • Equivalence of the Implementations: Generate the same set of sentences up to 8 tokens → The two implementations process Codeco in the same way • Performance Tests: task grammar implementation seconds/sentence generation ACE Codeco eval. subset Prolog DCG 0.00286 generation ACE Codeco eval. subset Java Earley parser 0.0730 parsing ACE Codeco eval. subset Prolog DCG 0.000360 parsing ACE Codeco eval. subset Java Earley parser 0.0276 parsing full ACE Codeco Prolog DCG 0.00146 parsing full ACE Codeco Java Earley parser 0.134 parsing full ACE APE 0.0161 Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 23 / 25
  • 24. Conclusions Codeco ... • ... fulfills our requirements for CNLs in predictive editors. • ... is suitable to describe a large subset of ACE. • ... allows for automatic tests. • ... stands for a principled and engineering focused approach to CNL. Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 24 / 25
  • 25. Thank you for your attention! Questions & Discussion Tobias Kuhn, University of Zurich Codeco CNL 2010, 15 September, Marettimo 25 / 25