SlideShare a Scribd company logo
NLP & Semantic Computing Group
N L P
Different Semantic Perspectives for
Hybrid Question Answering Systems
Andre Freitas
University of Passau
OKBQA, Jeju, 2016
NLP & Semantic Computing Group
http://www.slideshare.net/andrenfreitas
These slides:
NLP & Semantic Computing Group
Outline
 Multiple Perspectives of Semantic Representation
 Lightweight Semantic Representation
 Knowledge Graph Extraction from Text
 Answering Queries with
Knowledge Graphs
 Reasoning
 Take-away Message
NLP & Semantic Computing Group
Multiple Perspectives of
Semantic Representation
NLP & Semantic Computing Group
QA & Semantics
• Question Answering is about managing semantic
representation, extraction, selection
trade-offs.
• And it is about integrating multiple components
in a complex approach.
• Semantic best-effort, systems tolerant to
noisy, inconsistent, vague, data.
5
NLP & Semantic Computing Group
 “Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.”
 “If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.”
Formal World Real World
Baroni et al. 2013
Semantics for a Complex World
6
NLP & Semantic Computing Group
Why Not RDF?
• Follows a more “database-type” of
representation perspective.
• Gap towards representing text.
NLP & Semantic Computing Group
Choices of Semantic Representation
• Logical
• Frames: verbs | nouns
• Binary relations: binary | n-ary
• Named entities
• Language Models
• Syntactic structures
• Bag-of-words
Concept-level
representation
Background
knowledge
Extraction
complexity
8
NLP & Semantic Computing Group
Information Extraction
• Logical
• Frames:
 verbs | nouns
• Binary relations:
 binary | n-ary
• Named entities
• Syntactic Structures & LMs
• Bag-of-words
• Semantic parsing
• Semantic role labeling
• Relation extraction:
– closed/open
• Named entity recognition
• Syntactic/N-gram Parsing
• Indexing9
NLP & Semantic Computing Group
Information Extraction
• Logical
• Frames:
 verbs | nouns
• Binary relations:
 binary | n-ary
• Named entities
• Syntactic Structures & LMs
• Bag-of-words
• Semantic parsing
• Semantic role labeling
• Relation extraction:
– closed/open
• Named entity recognition
• Syntactic/N-gram Parsing
• Indexing
Use all of them!
NLP & Semantic Computing Group
Representation focal points
• Types of knowledge to focus at the
representation:
 Facts vs Definitions vs Opinions
 Temporality
 Spatiality
 Modality
 Polarity
 Rhetorical structures
 …
11
NLP & Semantic Computing Group
Lightweight Semantic
Representation
NLP & Semantic Computing Group
Objective
• Provide a lightweight knowledge
representation model which:
 Can represent textual discourse information.
• Maximizes the capture of textual
information.
 Is convenient to extract from text.
 Is convenient to access (query and browse).
13
NLP & Semantic Computing Group
Lightweight Semantic Representation
Representing Texts as Contextualized Entity-Centric
Linked Data Graphs, WebS 2013
NLP & Semantic Computing Group
Representation Assumptions
• Data integration:
 Named entities (instances)
 Abstract classes (unary predicates)
• Rich taxonomical structures.
• Context representation as a first class citizen.
• Open vocabulary.
• Word instead of sense/concept.
15
NLP & Semantic Computing Group
Representation Assumptions
• Data integration:
 Named entities (instances)
 Abstract classes (unary predicates)
• Rich taxonomical structures.
• Context representation as a first class citizen.
• Open vocabulary.
• Word instead of sense/concept.
16
NLP & Semantic Computing Group
Representation of Complex Relations
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
17
NLP & Semantic Computing Group
Data Integration points
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Named entities are lower
entropy integration points
Pivot
points18
NLP & Semantic Computing Group
Data Integration points
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Named entities are also
low entropy entry points
for answering queries
Pivot
points19
NLP & Semantic Computing Group
Data Integration points
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Also abstract classes …
Pivot
points20
NLP & Semantic Computing Group
Data Integration points
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
They are also a very
convenient way to
represent.
Pivot
points21
NLP & Semantic Computing Group
Representation Assumptions
• Data integration:
 Named entities (instances)
 Abstract classes (unary predicates)
• Rich taxonomical structures.
• Context representation as a first class citizen.
• Open vocabulary.
• Word instead of sense/concept.
22
NLP & Semantic Computing Group
Taxonomy Extraction
 Are predicates with more complex compositional patterns which
describe sets.
 Parsing complex nominals.
American multinational conglomerate corporation
On the Semantic Representation and Extraction of Complex
Category Descriptors, NLDB 2014
multinational conglomerate corporation
corporation
conglomerate corporation
is a
is a
is a
Pivot
points
NLP & Semantic Computing Group
Representation Assumptions
• Data integration:
 Named entities (instances)
 Abstract classes (unary predicates)
• Rich taxonomical structures.
• Context representation as a first class citizen.
• Open vocabulary.
• Word instead of sense/concept.
24
NLP & Semantic Computing Group
Context Representation
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Reification as a first class
representation element
25
NLP & Semantic Computing Group
Context Representation
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Temporality, spatiality,
modality, rhetorical relations
…
26
NLP & Semantic Computing Group
Rhetorical Structures using Reification
• cause:
 e.g. “because scraping the bottom with a metal utensil will scratch
the surface.”
• circumstance
 e.g. “After completing your operating system reinstallation,”
• concession
 e.g. “Although the hotel is situated adjacent to a beach,”
• condition
 e.g. “If you can break the $ 1000 dollar investment range,”
• contrast
 e.g. “but you can do better with 2.4ghz or 900mhz phones.”
• purpose
 e.g.“in order for the rear passengers to get in the vehicle.”
• …
27
NLP & Semantic Computing Group
Representation Assumptions
• Data integration:
 Named entities (instances)
 Abstract classes (unary predicates)
• Rich taxonomical structures.
• Context representation as a first class citizen.
• Open vocabulary.
• Word instead of sense/concept.
28
NLP & Semantic Computing Group
Open Vocabulary
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
Temporality, spatiality,
modality, rhetorical relations
…
29
NLP & Semantic Computing Group
Open Vocabulary
• Easier to extract but difficult to consume.
• We pay the price at query time.
• How to operate over a large-scale semantically
heterogeneous knowledge-graphs?
30
NLP & Semantic Computing Group
Representation Assumptions
• Data integration:
 Named entities (instances)
 Abstract classes (unary predicates)
• Rich taxonomical structures.
• Context representation as a first class citizen.
• Open vocabulary.
• Word instead of sense/concept.
31
NLP & Semantic Computing Group
Words instead of Senses
• Motivation: Disambiguation is a tough problem.
• Sense granularity can be, at many situations,
arbitrary (too context dependent).
• We treat a word as a superposition of senses,
almost in a “quantum mechanical sense”.
32
NLP & Semantic Computing Group
SenseSuperposition
Coecke et al. (2010): Category theory and
Lambek calculus.
NLP & Semantic Computing Group
Revisited RDF (for Representing Texts)
• Data Model Types: Instance, Class, Property…
• RDFS: Taxonomic representation.
• Reification for contextual relations (subordinations).
• Blank nodes for n-ary relations.
• Triple.
• Labels over URIs.
34
NLP & Semantic Computing Group
Abstract Meaning Representations – AMR,
Maximal Use of PropBank Frame Files
Alternative Representations
NLP & Semantic Computing Group
Distributional
Semantics
NLP & Semantic Computing Group
Distributional Semantic Models
 Semantic Model with low acquisition effort
(automatically built from text)
Simplification of the representation
 Enables the construction of comprehensive
commonsense/semantic KBs
 What is the cost?
Some level of noise
(semantic best-effort)
Limited semantic model
37
NLP & Semantic Computing Group
Distributional Semantics as Commonsense Knowledge
Commonsense is here
θ
car
dog
cat
bark
run
leash
Semantic Approximation is here
38
NLP & Semantic Computing Group
I find it rather odd that people are already trying to tie the
Commission's hands in relation to the proposal for a
directive, while at the same calling on it to present a Green
Paper on the current situation with regard to optional and
supplementary health insurance schemes.
I find it a little strange to now obliging the Commission to a
motion for a resolution and to ask him at the same time to
draw up a Green Paper on the current state of voluntary
insurance and supplementary sickness insurance.
=?
Beyond Single Word Vector Models:
Compositionality
NLP & Semantic Computing Group
Compositional Semantics
 Can we extend DS to account for the meaning of
phrases and sentences?
 Compositionality: The meaning of a complex
expression is a function of the meaning of its
constituent parts.
NLP & Semantic Computing Group
Compositional Semantics
Words in which the
meaning is directly
determined by their
distributional behaviour
(e.g., nouns).
Words that act as functions
transforming the
distributional profile of
other words (e.g., verbs,
adjectives, …).
41
NLP & Semantic Computing Group
Compositional-Distributional Semantics
NLP & Semantic Computing Group
Recursive Neural Networks for
Structure Prediction
43
NLP & Semantic Computing Group
New Model: Recursive Neural Tensor Network
• Goal: Function that composes two vectors.
• More expressive than any other RNN so far.
44 Socher et al.
NLP & Semantic Computing Group
Socher et al.
NLP & Semantic Computing Group
Compositional-distributional model for
Categories
46
NLP & Semantic Computing Group
Embedding
Knowledge Graphs
47
NLP & Semantic Computing Group
The vector space is
segmented48
Dimensional reduction
mechanism!
A Distributional Structured Semantic Space for
Querying RDF Graph Data, IJSC 2012
NLP & Semantic Computing Group
Compositional-distributional model for
paraphrases
A Compositional-Distributional Semantic Model for
Searching Complex Entity Categories, *SEM (2016)
NLP & Semantic Computing Group
Knowledge Graph Extraction from Text
NLP & Semantic Computing Group
Graphene
NLP & Semantic Computing Group
Graph Extraction Pipeline
Text
Transformation
N-ary
Relation
Extraction
Text
Simplification
Graph
Serialization
Taxonomy
Extraction
Storage
RST
Classification
ML-based
Rule-based
Rule-based
ML-based
52
NLP & Semantic Computing Group
Minimalistic Text Transformations
Text
Transformation
N-ary
Relation
Extraction
Text
Simplification
Graph
Serialization
Taxonomy
Extraction
Storage
RST
Classification
ML-based
Rule-based
Rule-based
ML-based
53
NLP & Semantic Computing Group
Minimalistic Text Transformations
• Co-reference Resolution
 Pronominal co-references.
• Passive
 We have been approached by the investment banker.
 The investment banker approached us.
• Genitive modifier
 Malaysia's crude palm oil output is estimated to have
risen.
 The crude palm oil output of Malasia is estimated to
have risen. 54
NLP & Semantic Computing Group
Text Simplification
Text
Transformation
N-ary
Relation
Extraction
Text
Simplification
Graph
Serialization
Taxonomy
Extraction
Storage
RST
Classification
ML-based
Rule-based
Rule-based
ML-based
55
NLP & Semantic Computing Group
Text Simplification for KG
Extraction
“Defeating Republican nominee Mitt Romney,
Obama, who was the first African American to hold
the office, was reelected president in November
2012.”
 relations are spread across clauses
 relations are presented in non-canonical form
56
NLP & Semantic Computing Group
Text Simplification for KG
Extraction
• Insertion of a text simplification step
 Obama was reelected president in November 2012.
 Obama was the first African American to hold the office.
 Obama was defeating Mitt Romney.
 Mitt Romney was Republican nominee.
57
NLP & Semantic Computing Group
Syntax-driven sentence
simplification approach
Task:
• Reduce the linguistic complexity of a text while retaining the
original information/meaning using a set of syntax-based
rewrite operations (deletion, insertion, reordering, sentence
splitting).
Idea:
• Simplify a sentence by separating out components that supply
only secondary information into simpler stand-alone context
sentences, thus yielding one or more reduced core sentences.
NLP & Semantic Computing Group
Approach
• Linguistic analysis of sentences from the English Wikipedia
to identify constructs which provide only secondary information:
• non-restrictive relative clauses
• non-restrictive and restrictive appositive phrases
• participial phrases offset by commas
• adjective and adverb phrases delimited by punctuation
• particular prepositional phrases
• lead noun phrases
• intra-sentential attributions
• parentheticals
• conjoined clauses with specific features
• particular punctuation
•Rule-based simplification rules.
Improving Relation Extraction by Syntax-based
Sentence Simplification (2016)
NLP & Semantic Computing Group
N-ary Relation Extraction
Text
Transformation
N-ary
Relation
Extraction
Text
Simplification
Graph
Serialization
Taxonomy
Extraction
Storage
RST
Classification
Rule-based
Rule-based
ML-based
60
OpenIE, University of Washington
NLP & Semantic Computing Group
Taxonomy Extraction
Text
Transformation
N-ary
Relation
Extraction
Text
Simplification
Graph
Serialization
Taxonomy
Extraction
Storage
RST
Classification
Rule-based
Rule-based
ML-based
61
Representation and Extraction of Complex
Category Descriptors, NLDB 2014
NLP & Semantic Computing Group
RST Classification
Text
Transformation
N-ary
Relation
Extraction
Text
Simplification
Graph
Serialization
Taxonomy
Extraction
Storage
RST
Classification
Rule-based
Rule-based
ML-based
62
NLP & Semantic Computing Group
Rhetorical Structure Extraction
63
TEXT-LEVEL RST-STYLE DISCOURSE PARSER (Feng and Hirst, 2012)
Structure classification Relation classification
NLP & Semantic Computing Group
Answering Queries with
Knowledge Graphs
NLP & Semantic Computing Group
Now our graph supports semantic
approximations as a first-class operation
NLP & Semantic Computing Group
Approach Overview
Query Planner
Ƭ-Space
(embedding graphs)
Wikipedia
Commonsense
knowledge
RDF
Explicit Semantic
Analysis
Core semantic approximation &
composition operations
Query AnalysisQuery Query Features
Query Plan
66
NLP & Semantic Computing Group
Core Principles
 Minimize the impact of Ambiguity, Vagueness,
Synonymy.
 Address the simplest matchings first (semantic pivoting).
 Semantic Relatedness as a primitive operation.
 Distributional semantics models as commonsense
knowledge representation.
 Lightweight syntactic constraints.
67
NLP & Semantic Computing Group
• Step 2: Query NER
 Rules-based: POS Tag + IDF
Who is the daughter of Bill Clinton married to?
(PROBABLY AN INSTANCE)
Query Pre-Processing
(Question Analysis)
68
NLP & Semantic Computing Group
• Step 3: Determine answer type
 Rules-based.
Who is the daughter of Bill Clinton married to?
(PERSON)
Query Pre-Processing
(Question Analysis)
69
NLP & Semantic Computing Group
• Transform natural language queries into a
pseudo-logical form.
“Who is the daughter of Bill Clinton married to?”
Query Pre-Processing
(Question Analysis)
70
NLP & Semantic Computing Group
Query Pre-Processing
(Question Analysis)
Bill Clinton daughter married to
(INSTANCE)
Person
ANSWER
TYPE
QUESTION FOCUS
71
• Step 5: Determine the query pattern
 Rules based.
• Remove stop words.
• Merge words into entities.
• Reorder structure from core entity position.
NLP & Semantic Computing Group
• Step 5: Determine the query pattern
 Rules based.
• Remove stop words.
• Merge words into entities.
• Reorder structure from core entity position.
Query Pre-Processing
(Question Analysis)
Bill Clinton daughter married to
(INSTANCE)
Person
(PREDICATE) (PREDICATE) Query Features
72
NLP & Semantic Computing Group
• Map query features into a query plan.
• A query plan contains a sequence of:
 Search operations.
 Navigation operations.
Query Planning
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
 (1) INSTANCE SEARCH (Bill Clinton)
 (2) DISAMBIGUATE ENTITY TYPE
 (3) GENERATE ENTITY FACETS
 (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)
 (5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)
 (6) p2 <- SEARCH RELATED PREDICATE (e1, married to)
 (7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)
 (8) POST PROCESS (Bill Clintion, e1, p1, e2, p2)
Query Plan
73
NLP & Semantic Computing Group
Core Entity Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
KB:
Entity search
74
NLP & Semantic Computing Group
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
(PIVOT ENTITY)
(ASSOCIATED
TRIPLES)
75
KB:
NLP & Semantic Computing Group
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
Which properties are semantically related to ‘daughter’?
76
KB:
NLP & Semantic Computing Group
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
:Chelsea_Clinton
:child
77
KB:
NLP & Semantic Computing Group
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
78
KB:
NLP & Semantic Computing Group
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
:Chelsea_Clinton
:child
:Mark_Mezvinsky
:spouse
79
KB:
Note the lazy
disambiguation
NLP & Semantic Computing Group
80
NLP & Semantic Computing Group
What is the highest mountain?
Second Query Example
(CLASS) (OPERATOR) Query Features
mountain - highest PODS
81
NLP & Semantic Computing Group
Entity Search
Mountain highest
:Mountain
Query:
:typeOf
(PIVOT ENTITY)
82
KB:
NLP & Semantic Computing Group
Extensional Expansion
Mountain highest
:Mountain
Query:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
83
KB:
NLP & Semantic Computing Group
Distributional Semantic Matching
Mountain highest
:Mountain
Query:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:location
...
:deathPlaceOf
84
KB:
NLP & Semantic Computing Group
Get all numerical values
Mountain highest
:Mountain
Query:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
85
KB:
NLP & Semantic Computing Group
Apply operator functional definition
Mountain highest
:Mountain
Query:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
86
KB:
NLP & Semantic Computing Group
Results
87
NLP & Semantic Computing Group
StarGraph
• Open source NoSQL platform for building and
interacting with large and sparse knowledge
graphs.
• Semantic approximation as a built-in operation.
• Scalable query execution performance.
88
NLP & Semantic Computing Group
Heuristics for the selection of the
semantic pivot is critical!
• Discussed here just superficially:
 Information-theoretical justification.
How hard is the Query? Measuring the Semantic Complexity of Schema-
Agnostic Queries, IWCS (2015).
Schema-agnositc queries over large-schema databases: a distributional
semantics approach, PhD Thesis (2015).
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary
Study, NLIWoD (2015).
89
NLP & Semantic Computing Group
Indra
 Multilingual platform for experimentation with
different word vector models.
"Indra's net" is the net of the Vedic god Indra, whose net hangs over
his palace on Mount Meru, the axis mundi of Hindu cosmology and
Hindu mythology. Indra's net has a multifaceted jewel at each vertex,
and each jewel is reflected in all of the other jewels.
In the Avatamsaka Sutra, the image of "Indra's net" is used to describe
the interconnectedness of the universe.
NLP & Semantic Computing Group
NLP & Semantic Computing Group
Indra
NLP & Semantic Computing Group
NLP & Semantic Computing Group
Bridging Structured & Unstructured Data
• NER + Text + Passage Retrieval Ranking
 Simple and powerful QA basis.
• Lazy disambiguation.
94
NLP & Semantic Computing Group
Treo Answers Jeopardy Queries (Video)
NLP & Semantic Computing Group
Ranking Candidate Answers
• But what if there are multiple candidate answers!
Q: Who was Queen Victoria’s second son?
• Answer Type: Person
• Passage:
The Marie biscuit is named after Marie Alexandrovna, the daughter of
Czar Alexander II of Russia and wife of Alfred, the second son of Queen
Victoria and Prince Albert
96Dan Jurafky’s slides
NLP & Semantic Computing Group
Ranking Candidate Answers
• But what if there are multiple candidate answers!
Q: Who was Queen Victoria’s second son?
• Answer Type: Person
• Passage:
The Marie biscuit is named after Marie Alexandrovna, the daughter of
Czar Alexander II of Russia and wife of Alfred, the second son of Queen
Victoria and Prince Albert
97Dan Jurafky’s slides
NLP & Semantic Computing Group
Feature Engineering
The Marie biscuit is named after Marie Alexandrovna, the daughter of
Czar Alexander II of Russia and wife of Alfred, the second son of Queen
Victoria and Prince Albert
followed by a ‘,’ followed by an
apposition
Who was Queen Victoria’s second son?
contains an entity in
the query
has a four-word
overlap
type =
PERSON
matches AnswerType
98
NLP & Semantic Computing Group
Propositionalisation
e0 followedBy(,) followedByAppositionContainingQueryEntities() answer …
Alfred true true true
… … …
passage
entity (e0)
entity (en)
…
The Marie biscuit is named after Marie Alexandrovna, the daughter of
Czar Alexander II of Russia and wife of Alfred, the second son of Queen
Victoria and Prince Albert
answer
99
NLP & Semantic Computing Group
Reasoning for Text Entailment
NLP & Semantic Computing Group
Beyond Word Vector Models
give birth
mother
car
θ
Distributional semantics can
give us a hint about the
concepts’ semantic proximity...
...but it still can’t tell us what
exactly the relationship
between them is
give birth
mother
???
NLP & Semantic Computing Group
Beyond Word Vector Models
give birth
mother
???
give birth
mother
???
NLP & Semantic Computing Group
Beyond Word Vector Models:
Intensional Reasoning
 Representing structured intensional-level
knowledge.
 Creation of an intensional-level reasoning
model.
104
NLP & Semantic Computing Group
Commonsense Reasoning
 Selective (focussed) reasoning
 - Selecting the relevant facts in the context of the
inference
 Reducing the search space.
Scalability
NLP & Semantic Computing Group
Extended WordNet (XWN)
NLP & Semantic Computing Group
Commonsense Data (ConceptNet)
http://conceptnet5.media.mit.edu/
107
NLP & Semantic Computing Group
Distributional semantic relatedness as a
Selectivity Heuristics
Distributional
heuristics
108
target
source
answer
NLP & Semantic Computing Group
Distributional semantic relatedness as a
Selectivity Heuristics
Distributional
heuristics
109
target
source
answer
NLP & Semantic Computing Group
Distributional semantic relatedness as a
Selectivity Heuristics
Distributional
heuristics
110
target
source
answer
NLP & Semantic Computing Group
John Smith EngineerInstance-level
occupation
Does John Smith have a degree?
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
NLP & Semantic Computing Group
A Distributional Semantics Approach for
Selective Reasoning on Commonsense Graph
Knowledge Bases, NLDB (2015).
NLP & Semantic Computing Group
Bringing it into the Real World
NLP & Semantic Computing Group
Semeval 2017
NLP & Semantic Computing Group
Take-away Message
• Choosing the sweet-spot in terms of semantic
representation is critical for the construction of robust
QA systems.
 Work at a word-based representation instead of a sense
representation.
 Text simplification/clausal disembedding critical for
relation extraction.
 Need for a standardized semantic representation for
relations extracted from texts.
NLP & Semantic Computing Group
Take-away Message
• Text entailment:
 Intensional-level reasoning.
 Natural logic.
 Distributional semantics.
• Distributional semantics:
 Robust, language-agnostic semantic matching.
 Selective reasoning over commonsense KBs.

More Related Content

PDF
Semantics at Scale: A Distributional Approach
PDF
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
PDF
Schema-agnositc queries over large-schema databases: a distributional semanti...
PPTX
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
PPTX
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
PPTX
Word Tagging with Foundational Ontology Classes
PPTX
Categorization of Semantic Roles for Dictionary Definitions
PPTX
Semantic Relation Classification: Task Formalisation and Refinement
Semantics at Scale: A Distributional Approach
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-agnositc queries over large-schema databases: a distributional semanti...
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
Word Tagging with Foundational Ontology Classes
Categorization of Semantic Roles for Dictionary Definitions
Semantic Relation Classification: Task Formalisation and Refinement

What's hot (20)

PPTX
Semantic Perspectives for Contemporary Question Answering Systems
PDF
Question Answering over Linked Data (Reasoning Web Summer School)
PPTX
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
PPTX
Ontology mapping for the semantic web
PDF
Open IE tutorial 2018
PDF
Introduction to Ontology Engineering with Fluent Editor 2014
PPT
Data Integration Ontology Mapping
PDF
Language Models for Information Retrieval
PPTX
Language Models for Information Retrieval
PDF
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
PDF
Learning ontologies
PDF
Trust Models for RDF Data: Semantics and Complexity - AAAI2015
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PDF
Ontology-based Classification and Faceted Search Interface for APIs
PPTX
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
PDF
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
PPTX
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
PDF
Tutorial - Introduction to Rule Technologies and Systems
PPT
Ontology engineering: Ontology alignment
Semantic Perspectives for Contemporary Question Answering Systems
Question Answering over Linked Data (Reasoning Web Summer School)
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
Ontology mapping for the semantic web
Open IE tutorial 2018
Introduction to Ontology Engineering with Fluent Editor 2014
Data Integration Ontology Mapping
Language Models for Information Retrieval
Language Models for Information Retrieval
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-
Learning ontologies
Trust Models for RDF Data: Semantics and Complexity - AAAI2015
NE7012- SOCIAL NETWORK ANALYSIS
Ontology-based Classification and Faceted Search Interface for APIs
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
Tutorial - Introduction to Rule Technologies and Systems
Ontology engineering: Ontology alignment
Ad

Viewers also liked (12)

PPTX
Matchine translation
PPT
Indian Writing in English
PPTX
Event-based MultiMedia Search and Retrieval for Question Answering
PPTX
WiSS Challenge - Day 2
PPTX
WISS QA Do it yourself Question answering over Linked Data
PDF
Open domain Question Answering System - Research project in NLP
PPTX
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
PDF
Lecture: Question Answering
PDF
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
PPTX
NLP pipeline in machine translation
KEY
SPARQL - Basic and Federated Queries
PPT
Translation Types
Matchine translation
Indian Writing in English
Event-based MultiMedia Search and Retrieval for Question Answering
WiSS Challenge - Day 2
WISS QA Do it yourself Question answering over Linked Data
Open domain Question Answering System - Research project in NLP
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Lecture: Question Answering
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
NLP pipeline in machine translation
SPARQL - Basic and Federated Queries
Translation Types
Ad

Similar to Different Semantic Perspectives for Question Answering Systems (20)

PDF
Effective Semantics for Engineering NLP Systems
PPT
The Role Of Ontology In Modern Expert Systems Dallas 2008
PDF
AI Beyond Deep Learning
PPTX
Building AI Applications using Knowledge Graphs
PPT
Using construction grammar in conversational systems
PPT
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
PPTX
Knowledge representation Problem in AI.pptx
PPTX
09- Syed Rehan-ai-ppt2.pptx
PPTX
How To Make Linked Data More than Data
PPTX
How To Make Linked Data More than Data
PPTX
ONTOLOGY BASED DATA ACCESS
PPTX
Jarrar: ORM in Description Logic
PPTX
Frame-Script and Predicate logic.pptx
PDF
Rasa NLU and ML Interpretability
PPT
Toward The Semantic Deep Web
PPT
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
PDF
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
PDF
Hyponymy extraction of domain ontology
PPTX
Semantic technology in nutshell 2013. Semantic! are you a linguist?
PDF
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Effective Semantics for Engineering NLP Systems
The Role Of Ontology In Modern Expert Systems Dallas 2008
AI Beyond Deep Learning
Building AI Applications using Knowledge Graphs
Using construction grammar in conversational systems
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Knowledge representation Problem in AI.pptx
09- Syed Rehan-ai-ppt2.pptx
How To Make Linked Data More than Data
How To Make Linked Data More than Data
ONTOLOGY BASED DATA ACCESS
Jarrar: ORM in Description Logic
Frame-Script and Predicate logic.pptx
Rasa NLU and ML Interpretability
Toward The Semantic Deep Web
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
Hyponymy extraction of domain ontology
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC

More from Andre Freitas (13)

PDF
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
PDF
AI Systems @ Manchester
PPTX
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
PDF
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
PDF
How Semantic Technologies can help to cure Hearing Loss?
PPTX
Towards a Distributional Semantic Web Stack
PPTX
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
PPTX
Introduction to Distributional Semantics
PPTX
On the Semantic Representation and Extraction of Complex Category Descriptors
PPTX
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
PPTX
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
PPTX
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
PPTX
A Compositional-distributional Semantic Model over Structured Data
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI Systems @ Manchester
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
How Semantic Technologies can help to cure Hearing Loss?
Towards a Distributional Semantic Web Stack
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Introduction to Distributional Semantics
On the Semantic Representation and Extraction of Complex Category Descriptors
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
A Compositional-distributional Semantic Model over Structured Data

Recently uploaded (20)

PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2. Earth - The Living Planet earth and life
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
famous lake in india and its disturibution and importance
PPTX
Microbiology with diagram medical studies .pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
AlphaEarth Foundations and the Satellite Embedding dataset
2. Earth - The Living Planet earth and life
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Viruses (History, structure and composition, classification, Bacteriophage Re...
7. General Toxicologyfor clinical phrmacy.pptx
The scientific heritage No 166 (166) (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
TOTAL hIP ARTHROPLASTY Presentation.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
HPLC-PPT.docx high performance liquid chromatography
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
famous lake in india and its disturibution and importance
Microbiology with diagram medical studies .pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Comparative Structure of Integument in Vertebrates.pptx
Placing the Near-Earth Object Impact Probability in Context

Different Semantic Perspectives for Question Answering Systems

  • 1. NLP & Semantic Computing Group N L P Different Semantic Perspectives for Hybrid Question Answering Systems Andre Freitas University of Passau OKBQA, Jeju, 2016
  • 2. NLP & Semantic Computing Group http://www.slideshare.net/andrenfreitas These slides:
  • 3. NLP & Semantic Computing Group Outline  Multiple Perspectives of Semantic Representation  Lightweight Semantic Representation  Knowledge Graph Extraction from Text  Answering Queries with Knowledge Graphs  Reasoning  Take-away Message
  • 4. NLP & Semantic Computing Group Multiple Perspectives of Semantic Representation
  • 5. NLP & Semantic Computing Group QA & Semantics • Question Answering is about managing semantic representation, extraction, selection trade-offs. • And it is about integrating multiple components in a complex approach. • Semantic best-effort, systems tolerant to noisy, inconsistent, vague, data. 5
  • 6. NLP & Semantic Computing Group  “Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.”  “If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.” Formal World Real World Baroni et al. 2013 Semantics for a Complex World 6
  • 7. NLP & Semantic Computing Group Why Not RDF? • Follows a more “database-type” of representation perspective. • Gap towards representing text.
  • 8. NLP & Semantic Computing Group Choices of Semantic Representation • Logical • Frames: verbs | nouns • Binary relations: binary | n-ary • Named entities • Language Models • Syntactic structures • Bag-of-words Concept-level representation Background knowledge Extraction complexity 8
  • 9. NLP & Semantic Computing Group Information Extraction • Logical • Frames:  verbs | nouns • Binary relations:  binary | n-ary • Named entities • Syntactic Structures & LMs • Bag-of-words • Semantic parsing • Semantic role labeling • Relation extraction: – closed/open • Named entity recognition • Syntactic/N-gram Parsing • Indexing9
  • 10. NLP & Semantic Computing Group Information Extraction • Logical • Frames:  verbs | nouns • Binary relations:  binary | n-ary • Named entities • Syntactic Structures & LMs • Bag-of-words • Semantic parsing • Semantic role labeling • Relation extraction: – closed/open • Named entity recognition • Syntactic/N-gram Parsing • Indexing Use all of them!
  • 11. NLP & Semantic Computing Group Representation focal points • Types of knowledge to focus at the representation:  Facts vs Definitions vs Opinions  Temporality  Spatiality  Modality  Polarity  Rhetorical structures  … 11
  • 12. NLP & Semantic Computing Group Lightweight Semantic Representation
  • 13. NLP & Semantic Computing Group Objective • Provide a lightweight knowledge representation model which:  Can represent textual discourse information. • Maximizes the capture of textual information.  Is convenient to extract from text.  Is convenient to access (query and browse). 13
  • 14. NLP & Semantic Computing Group Lightweight Semantic Representation Representing Texts as Contextualized Entity-Centric Linked Data Graphs, WebS 2013
  • 15. NLP & Semantic Computing Group Representation Assumptions • Data integration:  Named entities (instances)  Abstract classes (unary predicates) • Rich taxonomical structures. • Context representation as a first class citizen. • Open vocabulary. • Word instead of sense/concept. 15
  • 16. NLP & Semantic Computing Group Representation Assumptions • Data integration:  Named entities (instances)  Abstract classes (unary predicates) • Rich taxonomical structures. • Context representation as a first class citizen. • Open vocabulary. • Word instead of sense/concept. 16
  • 17. NLP & Semantic Computing Group Representation of Complex Relations General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York 17
  • 18. NLP & Semantic Computing Group Data Integration points General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York Named entities are lower entropy integration points Pivot points18
  • 19. NLP & Semantic Computing Group Data Integration points General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York Named entities are also low entropy entry points for answering queries Pivot points19
  • 20. NLP & Semantic Computing Group Data Integration points General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York Also abstract classes … Pivot points20
  • 21. NLP & Semantic Computing Group Data Integration points General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York They are also a very convenient way to represent. Pivot points21
  • 22. NLP & Semantic Computing Group Representation Assumptions • Data integration:  Named entities (instances)  Abstract classes (unary predicates) • Rich taxonomical structures. • Context representation as a first class citizen. • Open vocabulary. • Word instead of sense/concept. 22
  • 23. NLP & Semantic Computing Group Taxonomy Extraction  Are predicates with more complex compositional patterns which describe sets.  Parsing complex nominals. American multinational conglomerate corporation On the Semantic Representation and Extraction of Complex Category Descriptors, NLDB 2014 multinational conglomerate corporation corporation conglomerate corporation is a is a is a Pivot points
  • 24. NLP & Semantic Computing Group Representation Assumptions • Data integration:  Named entities (instances)  Abstract classes (unary predicates) • Rich taxonomical structures. • Context representation as a first class citizen. • Open vocabulary. • Word instead of sense/concept. 24
  • 25. NLP & Semantic Computing Group Context Representation General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York Reification as a first class representation element 25
  • 26. NLP & Semantic Computing Group Context Representation General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York Temporality, spatiality, modality, rhetorical relations … 26
  • 27. NLP & Semantic Computing Group Rhetorical Structures using Reification • cause:  e.g. “because scraping the bottom with a metal utensil will scratch the surface.” • circumstance  e.g. “After completing your operating system reinstallation,” • concession  e.g. “Although the hotel is situated adjacent to a beach,” • condition  e.g. “If you can break the $ 1000 dollar investment range,” • contrast  e.g. “but you can do better with 2.4ghz or 900mhz phones.” • purpose  e.g.“in order for the rear passengers to get in the vehicle.” • … 27
  • 28. NLP & Semantic Computing Group Representation Assumptions • Data integration:  Named entities (instances)  Abstract classes (unary predicates) • Rich taxonomical structures. • Context representation as a first class citizen. • Open vocabulary. • Word instead of sense/concept. 28
  • 29. NLP & Semantic Computing Group Open Vocabulary General Electric Company, or GE , is an American multinational conglomerate corporation incorporated in Schenectady , New York Temporality, spatiality, modality, rhetorical relations … 29
  • 30. NLP & Semantic Computing Group Open Vocabulary • Easier to extract but difficult to consume. • We pay the price at query time. • How to operate over a large-scale semantically heterogeneous knowledge-graphs? 30
  • 31. NLP & Semantic Computing Group Representation Assumptions • Data integration:  Named entities (instances)  Abstract classes (unary predicates) • Rich taxonomical structures. • Context representation as a first class citizen. • Open vocabulary. • Word instead of sense/concept. 31
  • 32. NLP & Semantic Computing Group Words instead of Senses • Motivation: Disambiguation is a tough problem. • Sense granularity can be, at many situations, arbitrary (too context dependent). • We treat a word as a superposition of senses, almost in a “quantum mechanical sense”. 32
  • 33. NLP & Semantic Computing Group SenseSuperposition Coecke et al. (2010): Category theory and Lambek calculus.
  • 34. NLP & Semantic Computing Group Revisited RDF (for Representing Texts) • Data Model Types: Instance, Class, Property… • RDFS: Taxonomic representation. • Reification for contextual relations (subordinations). • Blank nodes for n-ary relations. • Triple. • Labels over URIs. 34
  • 35. NLP & Semantic Computing Group Abstract Meaning Representations – AMR, Maximal Use of PropBank Frame Files Alternative Representations
  • 36. NLP & Semantic Computing Group Distributional Semantics
  • 37. NLP & Semantic Computing Group Distributional Semantic Models  Semantic Model with low acquisition effort (automatically built from text) Simplification of the representation  Enables the construction of comprehensive commonsense/semantic KBs  What is the cost? Some level of noise (semantic best-effort) Limited semantic model 37
  • 38. NLP & Semantic Computing Group Distributional Semantics as Commonsense Knowledge Commonsense is here θ car dog cat bark run leash Semantic Approximation is here 38
  • 39. NLP & Semantic Computing Group I find it rather odd that people are already trying to tie the Commission's hands in relation to the proposal for a directive, while at the same calling on it to present a Green Paper on the current situation with regard to optional and supplementary health insurance schemes. I find it a little strange to now obliging the Commission to a motion for a resolution and to ask him at the same time to draw up a Green Paper on the current state of voluntary insurance and supplementary sickness insurance. =? Beyond Single Word Vector Models: Compositionality
  • 40. NLP & Semantic Computing Group Compositional Semantics  Can we extend DS to account for the meaning of phrases and sentences?  Compositionality: The meaning of a complex expression is a function of the meaning of its constituent parts.
  • 41. NLP & Semantic Computing Group Compositional Semantics Words in which the meaning is directly determined by their distributional behaviour (e.g., nouns). Words that act as functions transforming the distributional profile of other words (e.g., verbs, adjectives, …). 41
  • 42. NLP & Semantic Computing Group Compositional-Distributional Semantics
  • 43. NLP & Semantic Computing Group Recursive Neural Networks for Structure Prediction 43
  • 44. NLP & Semantic Computing Group New Model: Recursive Neural Tensor Network • Goal: Function that composes two vectors. • More expressive than any other RNN so far. 44 Socher et al.
  • 45. NLP & Semantic Computing Group Socher et al.
  • 46. NLP & Semantic Computing Group Compositional-distributional model for Categories 46
  • 47. NLP & Semantic Computing Group Embedding Knowledge Graphs 47
  • 48. NLP & Semantic Computing Group The vector space is segmented48 Dimensional reduction mechanism! A Distributional Structured Semantic Space for Querying RDF Graph Data, IJSC 2012
  • 49. NLP & Semantic Computing Group Compositional-distributional model for paraphrases A Compositional-Distributional Semantic Model for Searching Complex Entity Categories, *SEM (2016)
  • 50. NLP & Semantic Computing Group Knowledge Graph Extraction from Text
  • 51. NLP & Semantic Computing Group Graphene
  • 52. NLP & Semantic Computing Group Graph Extraction Pipeline Text Transformation N-ary Relation Extraction Text Simplification Graph Serialization Taxonomy Extraction Storage RST Classification ML-based Rule-based Rule-based ML-based 52
  • 53. NLP & Semantic Computing Group Minimalistic Text Transformations Text Transformation N-ary Relation Extraction Text Simplification Graph Serialization Taxonomy Extraction Storage RST Classification ML-based Rule-based Rule-based ML-based 53
  • 54. NLP & Semantic Computing Group Minimalistic Text Transformations • Co-reference Resolution  Pronominal co-references. • Passive  We have been approached by the investment banker.  The investment banker approached us. • Genitive modifier  Malaysia's crude palm oil output is estimated to have risen.  The crude palm oil output of Malasia is estimated to have risen. 54
  • 55. NLP & Semantic Computing Group Text Simplification Text Transformation N-ary Relation Extraction Text Simplification Graph Serialization Taxonomy Extraction Storage RST Classification ML-based Rule-based Rule-based ML-based 55
  • 56. NLP & Semantic Computing Group Text Simplification for KG Extraction “Defeating Republican nominee Mitt Romney, Obama, who was the first African American to hold the office, was reelected president in November 2012.”  relations are spread across clauses  relations are presented in non-canonical form 56
  • 57. NLP & Semantic Computing Group Text Simplification for KG Extraction • Insertion of a text simplification step  Obama was reelected president in November 2012.  Obama was the first African American to hold the office.  Obama was defeating Mitt Romney.  Mitt Romney was Republican nominee. 57
  • 58. NLP & Semantic Computing Group Syntax-driven sentence simplification approach Task: • Reduce the linguistic complexity of a text while retaining the original information/meaning using a set of syntax-based rewrite operations (deletion, insertion, reordering, sentence splitting). Idea: • Simplify a sentence by separating out components that supply only secondary information into simpler stand-alone context sentences, thus yielding one or more reduced core sentences.
  • 59. NLP & Semantic Computing Group Approach • Linguistic analysis of sentences from the English Wikipedia to identify constructs which provide only secondary information: • non-restrictive relative clauses • non-restrictive and restrictive appositive phrases • participial phrases offset by commas • adjective and adverb phrases delimited by punctuation • particular prepositional phrases • lead noun phrases • intra-sentential attributions • parentheticals • conjoined clauses with specific features • particular punctuation •Rule-based simplification rules. Improving Relation Extraction by Syntax-based Sentence Simplification (2016)
  • 60. NLP & Semantic Computing Group N-ary Relation Extraction Text Transformation N-ary Relation Extraction Text Simplification Graph Serialization Taxonomy Extraction Storage RST Classification Rule-based Rule-based ML-based 60 OpenIE, University of Washington
  • 61. NLP & Semantic Computing Group Taxonomy Extraction Text Transformation N-ary Relation Extraction Text Simplification Graph Serialization Taxonomy Extraction Storage RST Classification Rule-based Rule-based ML-based 61 Representation and Extraction of Complex Category Descriptors, NLDB 2014
  • 62. NLP & Semantic Computing Group RST Classification Text Transformation N-ary Relation Extraction Text Simplification Graph Serialization Taxonomy Extraction Storage RST Classification Rule-based Rule-based ML-based 62
  • 63. NLP & Semantic Computing Group Rhetorical Structure Extraction 63 TEXT-LEVEL RST-STYLE DISCOURSE PARSER (Feng and Hirst, 2012) Structure classification Relation classification
  • 64. NLP & Semantic Computing Group Answering Queries with Knowledge Graphs
  • 65. NLP & Semantic Computing Group Now our graph supports semantic approximations as a first-class operation
  • 66. NLP & Semantic Computing Group Approach Overview Query Planner Ƭ-Space (embedding graphs) Wikipedia Commonsense knowledge RDF Explicit Semantic Analysis Core semantic approximation & composition operations Query AnalysisQuery Query Features Query Plan 66
  • 67. NLP & Semantic Computing Group Core Principles  Minimize the impact of Ambiguity, Vagueness, Synonymy.  Address the simplest matchings first (semantic pivoting).  Semantic Relatedness as a primitive operation.  Distributional semantics models as commonsense knowledge representation.  Lightweight syntactic constraints. 67
  • 68. NLP & Semantic Computing Group • Step 2: Query NER  Rules-based: POS Tag + IDF Who is the daughter of Bill Clinton married to? (PROBABLY AN INSTANCE) Query Pre-Processing (Question Analysis) 68
  • 69. NLP & Semantic Computing Group • Step 3: Determine answer type  Rules-based. Who is the daughter of Bill Clinton married to? (PERSON) Query Pre-Processing (Question Analysis) 69
  • 70. NLP & Semantic Computing Group • Transform natural language queries into a pseudo-logical form. “Who is the daughter of Bill Clinton married to?” Query Pre-Processing (Question Analysis) 70
  • 71. NLP & Semantic Computing Group Query Pre-Processing (Question Analysis) Bill Clinton daughter married to (INSTANCE) Person ANSWER TYPE QUESTION FOCUS 71 • Step 5: Determine the query pattern  Rules based. • Remove stop words. • Merge words into entities. • Reorder structure from core entity position.
  • 72. NLP & Semantic Computing Group • Step 5: Determine the query pattern  Rules based. • Remove stop words. • Merge words into entities. • Reorder structure from core entity position. Query Pre-Processing (Question Analysis) Bill Clinton daughter married to (INSTANCE) Person (PREDICATE) (PREDICATE) Query Features 72
  • 73. NLP & Semantic Computing Group • Map query features into a query plan. • A query plan contains a sequence of:  Search operations.  Navigation operations. Query Planning (INSTANCE) (PREDICATE) (PREDICATE) Query Features  (1) INSTANCE SEARCH (Bill Clinton)  (2) DISAMBIGUATE ENTITY TYPE  (3) GENERATE ENTITY FACETS  (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)  (5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)  (6) p2 <- SEARCH RELATED PREDICATE (e1, married to)  (7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)  (8) POST PROCESS (Bill Clintion, e1, p1, e2, p2) Query Plan 73
  • 74. NLP & Semantic Computing Group Core Entity Search Bill Clinton daughter married to Person :Bill_Clinton Query: KB: Entity search 74
  • 75. NLP & Semantic Computing Group Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... (PIVOT ENTITY) (ASSOCIATED TRIPLES) 75 KB:
  • 76. NLP & Semantic Computing Group Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 sem_rel(daughter,alma mater)=0.001 Which properties are semantically related to ‘daughter’? 76 KB:
  • 77. NLP & Semantic Computing Group Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: :Chelsea_Clinton :child 77 KB:
  • 78. NLP & Semantic Computing Group Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: :Chelsea_Clinton :child (PIVOT ENTITY) 78 KB:
  • 79. NLP & Semantic Computing Group Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: :Chelsea_Clinton :child :Mark_Mezvinsky :spouse 79 KB: Note the lazy disambiguation
  • 80. NLP & Semantic Computing Group 80
  • 81. NLP & Semantic Computing Group What is the highest mountain? Second Query Example (CLASS) (OPERATOR) Query Features mountain - highest PODS 81
  • 82. NLP & Semantic Computing Group Entity Search Mountain highest :Mountain Query: :typeOf (PIVOT ENTITY) 82 KB:
  • 83. NLP & Semantic Computing Group Extensional Expansion Mountain highest :Mountain Query: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... 83 KB:
  • 84. NLP & Semantic Computing Group Distributional Semantic Matching Mountain highest :Mountain Query: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... :elevation :location ... :deathPlaceOf 84 KB:
  • 85. NLP & Semantic Computing Group Get all numerical values Mountain highest :Mountain Query: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... :elevation :elevation 8848 m 8611 m 85 KB:
  • 86. NLP & Semantic Computing Group Apply operator functional definition Mountain highest :Mountain Query: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... :elevation :elevation 8848 m 8611 m SORT TOP_MOST 86 KB:
  • 87. NLP & Semantic Computing Group Results 87
  • 88. NLP & Semantic Computing Group StarGraph • Open source NoSQL platform for building and interacting with large and sparse knowledge graphs. • Semantic approximation as a built-in operation. • Scalable query execution performance. 88
  • 89. NLP & Semantic Computing Group Heuristics for the selection of the semantic pivot is critical! • Discussed here just superficially:  Information-theoretical justification. How hard is the Query? Measuring the Semantic Complexity of Schema- Agnostic Queries, IWCS (2015). Schema-agnositc queries over large-schema databases: a distributional semantics approach, PhD Thesis (2015). On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study, NLIWoD (2015). 89
  • 90. NLP & Semantic Computing Group Indra  Multilingual platform for experimentation with different word vector models. "Indra's net" is the net of the Vedic god Indra, whose net hangs over his palace on Mount Meru, the axis mundi of Hindu cosmology and Hindu mythology. Indra's net has a multifaceted jewel at each vertex, and each jewel is reflected in all of the other jewels. In the Avatamsaka Sutra, the image of "Indra's net" is used to describe the interconnectedness of the universe.
  • 91. NLP & Semantic Computing Group
  • 92. NLP & Semantic Computing Group Indra
  • 93. NLP & Semantic Computing Group
  • 94. NLP & Semantic Computing Group Bridging Structured & Unstructured Data • NER + Text + Passage Retrieval Ranking  Simple and powerful QA basis. • Lazy disambiguation. 94
  • 95. NLP & Semantic Computing Group Treo Answers Jeopardy Queries (Video)
  • 96. NLP & Semantic Computing Group Ranking Candidate Answers • But what if there are multiple candidate answers! Q: Who was Queen Victoria’s second son? • Answer Type: Person • Passage: The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert 96Dan Jurafky’s slides
  • 97. NLP & Semantic Computing Group Ranking Candidate Answers • But what if there are multiple candidate answers! Q: Who was Queen Victoria’s second son? • Answer Type: Person • Passage: The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert 97Dan Jurafky’s slides
  • 98. NLP & Semantic Computing Group Feature Engineering The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert followed by a ‘,’ followed by an apposition Who was Queen Victoria’s second son? contains an entity in the query has a four-word overlap type = PERSON matches AnswerType 98
  • 99. NLP & Semantic Computing Group Propositionalisation e0 followedBy(,) followedByAppositionContainingQueryEntities() answer … Alfred true true true … … … passage entity (e0) entity (en) … The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert answer 99
  • 100. NLP & Semantic Computing Group Reasoning for Text Entailment
  • 101. NLP & Semantic Computing Group Beyond Word Vector Models give birth mother car θ Distributional semantics can give us a hint about the concepts’ semantic proximity... ...but it still can’t tell us what exactly the relationship between them is give birth mother ???
  • 102. NLP & Semantic Computing Group Beyond Word Vector Models give birth mother ??? give birth mother ???
  • 103. NLP & Semantic Computing Group Beyond Word Vector Models: Intensional Reasoning  Representing structured intensional-level knowledge.  Creation of an intensional-level reasoning model. 104
  • 104. NLP & Semantic Computing Group Commonsense Reasoning  Selective (focussed) reasoning  - Selecting the relevant facts in the context of the inference  Reducing the search space. Scalability
  • 105. NLP & Semantic Computing Group Extended WordNet (XWN)
  • 106. NLP & Semantic Computing Group Commonsense Data (ConceptNet) http://conceptnet5.media.mit.edu/ 107
  • 107. NLP & Semantic Computing Group Distributional semantic relatedness as a Selectivity Heuristics Distributional heuristics 108 target source answer
  • 108. NLP & Semantic Computing Group Distributional semantic relatedness as a Selectivity Heuristics Distributional heuristics 109 target source answer
  • 109. NLP & Semantic Computing Group Distributional semantic relatedness as a Selectivity Heuristics Distributional heuristics 110 target source answer
  • 110. NLP & Semantic Computing Group John Smith EngineerInstance-level occupation Does John Smith have a degree?
  • 111. NLP & Semantic Computing Group
  • 112. NLP & Semantic Computing Group
  • 113. NLP & Semantic Computing Group
  • 114. NLP & Semantic Computing Group
  • 115. NLP & Semantic Computing Group
  • 116. NLP & Semantic Computing Group
  • 117. NLP & Semantic Computing Group
  • 118. NLP & Semantic Computing Group
  • 119. NLP & Semantic Computing Group
  • 120. NLP & Semantic Computing Group
  • 121. NLP & Semantic Computing Group
  • 122. NLP & Semantic Computing Group
  • 123. NLP & Semantic Computing Group
  • 124. NLP & Semantic Computing Group
  • 125. NLP & Semantic Computing Group
  • 126. NLP & Semantic Computing Group A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases, NLDB (2015).
  • 127. NLP & Semantic Computing Group Bringing it into the Real World
  • 128. NLP & Semantic Computing Group Semeval 2017
  • 129. NLP & Semantic Computing Group Take-away Message • Choosing the sweet-spot in terms of semantic representation is critical for the construction of robust QA systems.  Work at a word-based representation instead of a sense representation.  Text simplification/clausal disembedding critical for relation extraction.  Need for a standardized semantic representation for relations extracted from texts.
  • 130. NLP & Semantic Computing Group Take-away Message • Text entailment:  Intensional-level reasoning.  Natural logic.  Distributional semantics. • Distributional semantics:  Robust, language-agnostic semantic matching.  Selective reasoning over commonsense KBs.