1. ConceptNet - a pratical
commonsense reasoning
tool-kit
H Liu and P Singh
MIT Media Lab
Speaker: Yi-Ching(Janet) Huang
2. Introduction
• ConceptNet
– Freely available commonsense knowledge
base
– Natual-language-processing tool-kit
• It supports many practical textual-
reasoning tasks over real-world
documents
3. Outline
• Comparison of ConceptNet, Cyc, and
WordNet
• History, Construction and Structure
• Various contextual reasoning tasks
• Quantitative and Qualitative Analysis
• Conclusion
6. Building ConceptNet
• 3 phases
– Extraction phase
• Extract from OMCS corpus
• English sentence -> binary-relation assertion
– Normalization phase
– Relaxation phase
• Produce “inferred assertion”
• Improve the connectivity of the network
7. Structure of the ConceptNet
knowledge base
• 1.6 million
assertions (1.25
million are k-lines)
• twenty relation-types
21. Conclusion
• ConceptNet is presently the largest freely
commonsense database
• Support many practical textual-reasoning
tasks
• Goodness
– Easy to use
– Simple structure of WordNet
– Good for practical commonsense reasoning
Editor's Notes
#2:There is a lot of information on the Internet today.
There are e-mail, instant message and blogs, and so many news online. All of them are text.
If there is a tool it can help us to manage and make sense of information, that will be great.
ConceptNet is such a tool-kit for a practical commonsens reasoning.
And it is free for eveyone.
#3:It is my outline today.
First of all, I will compare with different databases about ConceptNet, Cyc, and WordNet
Secondly, I will present a brief history of ConceptNet, and describe how it was built, and how it is structured.
Next, I will introduce several different contextual reasoning tasks that ConceptNet can support.
Next, I will show the quantitative and qualitative analysis.
Final is a conclusion.
#4:ConceptNet : is generated automatically from OMCS corpus (general public) about 4 years
WordNet, Cyc: is manually handcrafted by knowledge engineers (knowledge engineers at Cycorp) 20 years
ConceptNet :
Structure like WordNet, relationally rich like Cyc
(simple-to-use representation) (rich content)
#5:Motivation: inspired by the success of distributed and collaborative projects on the Web,
they turned to volunteers from the general public to massively distribute the problem of building a commonsense knowledge base.
OMCS (Open Mind Common Sense) : 30 different activities -> each one elicits a simple assertion
CRIS/OMCSNet
CRIS (Commonsense Rubost Inferrence System)
ConceptNet
#6:ConceptNet is built by an automatic process.
In first phase, It applies some rules to extract information form OMCS corpus.
It maps English sentences to binary-relation assertions.
In next phase, it normalizes the extacted nodes.
And last is entering in relaxation phase.
This phase can improve the connectivity of the semantic network.
Consider serveral assertions, it can infer some new assertions.
And this new one is called “inferred assertion”.
#7:ConceptNet knowledge base consist of 1.6 million assertions and 20 relation-types.
K-lines is mean the different sorts of generic conceptual connections.
This picture is descrbes the ConecptNet’s relational ontology.
(an ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts)
#8:Database structure is like this picture and it look likes a mind map.
There are 1.6 million edges connecting more than 300 000 nodes.
Node are semi-structured English fragments, and edges are relation-types.
#9:ConceptNet contains with an intergrated NLP engine, is named MontyLingua.
Input a text document, MontyLingua will extract the verb-subject-object-object frames from the document.
For example,
Mary ate breakfast in this morning.
Verb: ate
Subject: Mary
Object1: breakfast
Object2: in this morning
And next I will introduce ConceptNet’s two kinds of reasoning capabilities, Node-level and Document-level reasoning.
#10:By performing spreading activation, it can radiate outside from the source node and find the contextual neighborhoods.
And you can know the relationship between neighborhoods and source nodes.
Analogy-making.
And projection is like a transitive mechanism form an origin node to another node.
It is useful for goal planning and predicting all possible outcomes ant next-states.
#11:Given a text-document, ConceptNet can do topic-gisting and know the document’s main ideas. And It can disambiguate the meaning and classify the document to appropriate categories.
Except to known concepts, it also can learn the unknown concepts, it so called “Novel-concept identification”
Other amazing thing it can do is affect sensing. It can realize the emotion form the document.
#13:If we want to know about the complexity of Concept’s nodes.
A simple statistic is the histogram of nodal word-lengths.
The shorter the nodes, the simple they are likely to be.
Look at this graph, approximately 70% of the nodes have a word-length of less than or equal to three.
That means the most assertions are simple.
#14:32% are never used.(only inferred)
58% are used only once.
And I know that 90% of assertions are used zero times or only one time.
It is surprising that there is not more overlap.
#15:It is measured by nodal edge-density.
The graph is means that k-lines can improve the connectivity of the semantic network.
#16:Authors build an experiment with 5 human judges and asked each judge to rate 100 concepts in ConceptNet1.2.
#17:There are a lot of interesting applications by using ConceptNet since 2002.
Many of them are final term project for a commonsense reasoning course in MIT media Lab during.
And Now I will simply introduce 3 applications: ARIA, Emotus Ponens, and MakeBelieve.
#18:Commonsense ARIA observes a user writing an e-mail and suggests photos relevant to the user’s story.
#19:Emotus Ponens is a textual affect-sensing system.
It can analyze the text and classify it to 6 basic emotion categories. (happy, sad, angry, fearful, disgusted, and surprised)
EmpathyBuddy is an e-mail client which gives the author automatic affective feedback via emotion face.
#20:MakeBelieve is a story-generator that allow a person to interactively invent a story with the system.
MAKEBELIEVE will attempt to continue that story by freely imagining possible sequences of events that might happen to the character the user has chosen.
The agent uses "commonsense" about causality and how the world works, mined from the Open Mind Common Sense corpus, and combines this with very simple lingustic techniques for story generation to produce pithy but interesting stories. MAKEBELIEVE also uses commonsense to evaluate and critique a story it has written to catch logically inconsistent, incoherent events and actions.
#21:ConceptNet is presently the largest freely commonsense databae.
it supports many practical textual-reasoning tasks.
Its goodness is easy to use, and has the simple structure of WordNet.
And it is good for practical commonsense reasoning.
It help computer know the semantic meaning from the textual content.
Finally, the following speaker will introduce What’s the ConceptNet’s resource, OMCS corpus.?