The Road to
the Semantic
Web
Michael Genkin
SDBI 2010@HUJI
"The Semantic Web is not a separate
Web but an extension of the current
one, in which information is given well-
defined meaning, better enabling
computers and people to work in
cooperation."
Tim Berners-Lee, James Hendler and Ora Lassila; Scientific
American, May 2001
Michael Genkin (mishagenkin@cs.huji.ac.il)
Michael Genkin (mishagenkin@cs.huji.ac.il)
Over 25 billion RDF triples
(October 2010)
More than 24 billion web pages
(June 2010)
Probably more than one
triple per page, lot more
How will we populate the
Semantic Web?
 Humans will enter structured data
 Data-store owners will share their data
 Computers will read unstructured data
Michael Genkin (mishagenkin@cs.huji.ac.il)
Read the Web
http://rtw.ml.cmu.edu/rtw/
(or google it)
Michael Genkin (mishagenkin@cs.huji.ac.il)
Roadmap
 Motivation
 Some definitions
 Natural language processing
 Machine learning
 Macro reading the web
 Coupled training
 NELL
 Demo
 Summary
Michael Genkin (mishagenkin@cs.huji.ac.il)
Some Definitions
 Natural Language Processing
 Machine Learning
Michael Genkin (mishagenkin@cs.huji.ac.il)
Natural Language Processing
 Part of Speech Tagging (e.g. noun, verb)
 Noun phrase: a phrase that normally
consists of a (modified) head noun.
 “pre-modified” (e.g. this, that, the red…)
 “post-modified” (e.g. …with long hair,
…where I live)
 Proper noun: a noun which represents an
unique entity (e.g. Jerusalem, Michael)
 Common noun: a noun which represents
a class of entities (e.g. car, university)
Michael Genkin (mishagenkin@cs.huji.ac.il)
Learning: What is it?
 Assume there is some knowledge base KB.
 Let some algorithm 𝐴 𝐾𝐵to perform a set of task
T.
 Let a performance metric Perf.
 We will say that a computer program learns if:
 KB1 > KB2 ⇒ 𝑃𝑒𝑟𝑓 𝐴 𝐾𝐵1
𝑇 > 𝑃𝑒𝑟𝑓(𝐴 𝐾𝐵2
𝑇 )
Michael Genkin (mishagenkin@cs.huji.ac.il)
Training Methods
Michael Genkin (mishagenkin@cs.huji.ac.il)
 We have a set of examples (KB) and a
domain (D)
 Examples might be positive, or negative
 e.g. for every input 𝑥, 𝑦 ∈ 𝐾𝐵 ⇒ 𝑓 𝑥 = 𝑦 for
some 𝑓.
 The learning algorithm A would try to find
such 𝑓.
 𝑓 is called a classifier or regression
Michael Genkin (mishagenkin@cs.huji.ac.il)
Supervised
 Distinguished from supervised learning by
that there are no labeled examples (KB=D).
 The unsupervised learning algorithm A will try
to find a classifier that when given some 𝑑 ∈
𝐷 as input, will return some arbitrary label.
 i.e. the algorithm A analyses the structure of D
Michael Genkin (mishagenkin@cs.huji.ac.il)
Supervised Unsupervised
 A middle way between supervised and
unsupervised.
 Use a minimal amount of labeled
examples and a large amount of
unlabeled.
 Learn the structure of D in unsupervised
manner, but use the labeled examples to
constraint the results. Repeat.
 Known as bootstrapping.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Supervised
Semi-
Supervised
Unsupervised
Bootstrapping
 Iterative semi-supervised learning
Michael Genkin (mishagenkin@cs.huji.ac.il)
Jerusalem
Tel Aviv
Haifa
mayor of arg1
life in arg1
Ness-Ziona
London
denial
anxiety
selfishness
Amsterdam
arg1 is home of
traits such as arg1
 Under constrained!
 Sematic drift
Macro Reading the Web
Populating the Semantic Web by Macro-Reading
Internet Text.
T.M. Mitchell, J. Betteridge, A. Carlson, E.R. Hruschka Jr.,
and R.C. Wang. Invited Paper, In Proceedings of the
International Semantic Web Conference (ISWC), 2009
Michael Genkin (mishagenkin@cs.huji.ac.il)
Problem Specification (1): Input
 Initial ontology that contains:
 Dozens of categories and relations
 (e.g. Company, CompanyHeadquarteredInCity)
 Relations between categories and relations
 (e.g. mutual exclusion, type constraints)
 A few seed examples of each predicate in
ontology
 The web
 Occasional access to human trainer
Michael Genkin (mishagenkin@cs.huji.ac.il)
Problem Specification (2): The Task
 Run forever (24x7)
 Each day:
 Run over ~500 million web pages.
 Extract new facts and relations from the
web to populate ontology.
 Perform better than the day before
 Populate the semantic web.
Michael Genkin (mishagenkin@cs.huji.ac.il)
A Solution?
 An automatic, learning, macro-reader.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Micro vs. Macro Reading (1)
 Micro-reading: the traditional NLP task of
annotating a single web page to extract
the full body of information contained in
the document.
 NLP is hard!
 Macro-reading: the task of “reading” a
large corpus of web pages (e.g. the web)
and returning large collection of facts
expressed in the corpus.
 But not necessarily all the facts.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Micro vs. Macro Reading (2)
 Macro-reading is easier than micro-
reading. Why?
 Macro-reading doesn’t require extracting
every bit of information available.
 In text corpora as large as the web, many
important fact are stated redundantly,
thousands of times, using different wordings.
 Benefit by ignoring complex sentences.
 Benefit by statistically combining evidence
from many fragments to determine a belief in
a hypothesis.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Why an Input Ontology?
 The problem with understanding free text
is that it can mean virtually anything.
 By formulating the problem of macro-
reading as populating an ontology we
allow the system to focus only on relevant
documents.
 The ontology can define meta properties
of its categories and relations.
 Allows to populate parts of the semantic
web for which an ontology is available.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Machine Learning Methods
 Semi-supervised (use an ontology to learn).
 Learn textual patterns for extraction.
 Employ methods such as Coupled Training
to improve accuracy.
 Expand the ontology to improve
performance.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Training
Michael Genkin (mishagenkin@cs.huji.ac.il)
Bootstrapping – Revised
 Iterative semi-supervised learning
Michael Genkin (mishagenkin@cs.huji.ac.il)
Jerusalem
Tel Aviv
Haifa
mayor of arg1
life in arg1
Ness-Ziona
London
denial
anxiety
selfishness
Amsterdam
arg1 is home of
traits such as arg1
Coupled Training
Michael Genkin (mishagenkin@cs.huji.ac.il)
 Couple the training of multiple functions to
make unlabeled data more informative
 Makes the learning task easier by adding constraints
Coupling (1):
Output Constraints
 We wish to train a function 𝑓: 𝑋 → 𝑌
 e.g. 𝑐𝑖𝑡𝑦: 𝑁𝑜𝑢𝑛𝑃ℎ𝑟𝑎𝑠𝑒 → {0,1}
 Assume we have 𝑓1: 𝑋1 → 𝑌, 𝑓2: 𝑋2 → 𝑌 two
different functions that assign the label
city, but receive different input.
 Coupling constraint: 𝑓1, 𝑓2 must agree over
unlabeled data.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupling (1):
Output Constraints
Michael Genkin (mishagenkin@cs.huji.ac.il)
arg1: Nir Barkat is the mayor of Jerusalem
X1=arg1
Y=city?
X2=arg1
Y=country?≠
X2=arg1
Y=city?
=
Coupling (2):
Compositional Constraints
 Assume we have 𝑓1: 𝑋1 → 𝑌1, 𝑓2: 𝑋1 × 𝑋2 → 𝑌2
 Assume we have a constraint on valid
𝑦1, 𝑦2 pairs given 𝑥1, 𝑥2.
 Coupling constraint: 𝑓1, 𝑓2 must satisfy the
constraint on 𝑦1, 𝑦2.
 e.g. 𝑓1 “type checks” first argument for 𝑓2
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupling (2):
Compositional Constraints
Michael Genkin (mishagenkin@cs.huji.ac.il)
Nir Barkat is the mayor of Jerusalem
MayorOf(X1,X2)
city?
location?
politician?
city?
location?
politician?
Coupling (3):
Multi-view Agreement
 We have a function 𝑓: 𝑋 → 𝑌
 If X can be partitioned into two “views” 𝑋 =
< 𝑋1, 𝑋2 >.
 Assume 𝑋1 and 𝑋2 can predict Y.
 We wish to learn 𝑓1: 𝑋1 → 𝑌, 𝑓2: 𝑋2 → 𝑌
 Coupling constraint: 𝑓1, 𝑓2 must agree.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupling (3):
Multi-view Agreement
 Let Y a set of possible web page
categories
 Let X be a set of web pages
 Assume 𝑋1 represents the words in a page
 Assume 𝑋2 represents the words in
hyperlinks pointing to the page
Michael Genkin (mishagenkin@cs.huji.ac.il)
NELL – Never-Ending
Language Learning
Coupled Semi-Supervised Learning for Information Extraction.
A. Carlson, J. Betteridge, R.C. Wang, E.R. Hruschka Jr. and T.M.
Mitchell. In Proceedings of the ACM International Conference on
Web Search and Data Mining (WSDM), 2010.
Never Ending Language Learning
Tom Mitchell's invited talk in the Univ. of Washington CSE
Distinguished Lecture Series, October 21, 2010.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Motivation
 Humans learn many things, for years, and
become better learners over time
 Why not machines?
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Constraints (1)
 Mutual Exclusion:
 Two mutually exclusive predicates can’t be
both satisfied by the same input 𝑥.
 Relation argument type checking:
 Insure the noun phrases to satisfy each relation
correspond to the categories defined for this
relation.
 e.g. CompanyIsInEconomicSector relation has
arguments of Company and EconomicSector
categories.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Constraints (2)
 Unstructured and Semi-structured text
features:
 Noun phrases appear on the web in free
text context or semi-structured context.
 Structured and Semi-structured classifiers
will make independent mistakes
 But each is sufficient for classification
Both the classifiers must
agree.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Pattern Learner (CPL):
Overview
 Learns to extract
category and pattern
instances.
 Learns high-precision
textual patterns.
 e.g. arg1 scored a
goal for arg2
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Pattern Learner (CPL):
Extracting
 Runs forever, on each iteration bootstraps a
patterns promoted on the last iteration to
extract instances.
 Select the 1000 that co-occur with most patterns.
 Similar procedure for patterns, but using recently
promoted instances.
 Uses PoS heuristics to accomplish extraction
 e.g. per category proper/common noun
specification, pattern is a sequence of verbs
followed by adjectives, prepositions, or
determiners (and optionally preceded by nouns).
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Pattern Learner (CPL):
Filtering and Ranking
 Candidates are filtered to enforce mutual
exclusion and type constraints
 A candidate is rejected unless it co-occurs with
a promoted pattern at least three times more
than it co-occurs with mutually exclusive
predicates.
 Candidates are ranked as following:
 Instances: by the number of promoted patterns
the co-occur with.
 Patterns: by precision estimation
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑝 =
𝑖∈ℐ 𝑐𝑜𝑢𝑛𝑡 𝑖, 𝑝
𝑐𝑜𝑢𝑛𝑡 𝑝
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled Pattern Learner (CPL):
Promoting Candidates
 For each predicate – promotes at most
100 instances and 5 patterns.
 Highest rated.
 Instances and patterns promoted only if
they co-occur with two promoted pattern
or instances.
 Relations instances are promoted only if
their arguments are candidates for the
specified categories.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled SEAL (1)
 SEAL is an established wrapper induction
algorithm.
 Creates page specific extractors
 Independent of language
 Category wrappers defined by prefix and
postfix, relation wrappers defined by infix.
 Wrappers for each predicate learned
independently.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Coupled SEAL (2)
 Coupled SEAL adds mutual
exclusion and type checking
constrains to SEAL.
 Bootstraps recently promoted
wrappers.
 Filters candidates that are
mutually exclusive or not of
the right type for relation.
 Uses a single page per domain
for ranking.
 Promotes the top 100 instances extracted by at
least two wrappers.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Meta-Bootstrap Learner
 Couples the training of
multiple extraction
techniques.
 Intuition: different
extractors will make
independent errors.
 Replaces the PROMOTE
step of subordinate
extractor algorithms.
 Promotes any instance recommended by all
the extractors, as long as mutual exclusion and
type checks hold.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Michael Genkin (mishagenkin@cs.huji.ac.il)
Michael Genkin (mishagenkin@cs.huji.ac.il)
Learning New Constraints
 Data mine the KB to infer new beliefs.
 Generates probabilistic, first order, horn
clauses.
 Connects previously uncoupled predicates.
 Manually filter rules.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Michael Genkin (mishagenkin@cs.huji.ac.il)
Demo Time
 http://rtw.ml.cmu.edu/rtw/kbbrowser/
Michael Genkin (mishagenkin@cs.huji.ac.il)
Summary
Populating the semantic web by using NELL for
macro reading
Michael Genkin (mishagenkin@cs.huji.ac.il)
Populating the Semantic Web
 Many ways to accomplish.
 Use initial ontology to focus, constrain the
learning task.
 Couple the learning of many, many
extractors.
 Macro Reading: instead of annotating a
single page each time, read many pages
simultaneously.
 A never ending task.
Michael Genkin (mishagenkin@cs.huji.ac.il)
Macro-Reading
 Helps to improve accuracy.
 Still doesn’t help to annotate a single
page, but…
 Many things that are true for a single
page are also true for many pages
 Helps to populate databases with
frequently mentioned knowledge
Michael Genkin (mishagenkin@cs.huji.ac.il)
Future Directions
 Coupling with external sources
 DBpedia, Freenode
 Ontology extension
 New relations through reading, Subcategories
 Use a macro-reader to train a micro-reader
 Self-reflection, Self-correction
 Distinguishing tokens from entities
 Active learning – crowd sourcing
Michael Genkin (mishagenkin@cs.huji.ac.il)
Questions?
mishagenkin@cs.huji.ac.il

More Related Content

PDF
Addressing open Machine Translation problems with Linked Data.
PPTX
Tim berners lee
PPTX
Tim berners lee
PPT
Tim Berners Lee
PPTX
Query Understanding
PPT
Copy of 10text (2)
PPT
Chapter 10 Data Mining Techniques
Addressing open Machine Translation problems with Linked Data.
Tim berners lee
Tim berners lee
Tim Berners Lee
Query Understanding
Copy of 10text (2)
Chapter 10 Data Mining Techniques

Similar to The Road To The Semantic Web (20)

PPTX
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
PPTX
Deep Neural Methods for Retrieval
PPTX
Using weak supervision and transfer learning techniques to build knowledge gr...
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
PDF
Biemann ibm cog_comp_jan2015_noanim
PDF
Crash Course in Natural Language Processing (2016)
PDF
Crash-course in Natural Language Processing
PDF
Reference Scope Identification of Citances Using Convolutional Neural Network
PPT
Machine learning for the Web:
PDF
NELL: The Never-Ending Language Learning System
PPTX
The New Content SEO - Sydney SEO Conference 2023
PPTX
Web Minnig and text mining presentation
PPTX
Haystack 2019 - Search with Vectors - Simon Hughes
PPTX
Searching with vectors
PPT
Information extraction for Free Text
PDF
Networks and Natural Language Processing
PPTX
Cork AI Meetup Number 3
PDF
Word2vec and Friends
PPTX
Building AI Applications using Knowledge Graphs
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Deep Neural Methods for Retrieval
Using weak supervision and transfer learning techniques to build knowledge gr...
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Biemann ibm cog_comp_jan2015_noanim
Crash Course in Natural Language Processing (2016)
Crash-course in Natural Language Processing
Reference Scope Identification of Citances Using Convolutional Neural Network
Machine learning for the Web:
NELL: The Never-Ending Language Learning System
The New Content SEO - Sydney SEO Conference 2023
Web Minnig and text mining presentation
Haystack 2019 - Search with Vectors - Simon Hughes
Searching with vectors
Information extraction for Free Text
Networks and Natural Language Processing
Cork AI Meetup Number 3
Word2vec and Friends
Building AI Applications using Knowledge Graphs
Ad

More from Michael Genkin (9)

PPTX
Thinking Outside The [Sand]Box
PPTX
Web Information Extraction for the Database Research Domain
PPTX
Summarizing short stories (without spoiling the fun)
PPT
Post-PC: Geolocation & Maps in the Android Ecosystem
PPT
Slideshows 101 (30 Mins)
PPT
Computeron 2006
PPT
Computeron 2005.1
PPT
Computeron 2005.2
PPT
Computeron 2004
Thinking Outside The [Sand]Box
Web Information Extraction for the Database Research Domain
Summarizing short stories (without spoiling the fun)
Post-PC: Geolocation & Maps in the Android Ecosystem
Slideshows 101 (30 Mins)
Computeron 2006
Computeron 2005.1
Computeron 2005.2
Computeron 2004
Ad

Recently uploaded (20)

PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
2018-HIPAA-Renewal-Training for executives
PPT
What is a Computer? Input Devices /output devices
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Five Habits of High-Impact Board Members
DOCX
search engine optimization ppt fir known well about this
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
A review of recent deep learning applications in wood surface defect identifi...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
sustainability-14-14877-v2.pddhzftheheeeee
Consumable AI The What, Why & How for Small Teams.pdf
2018-HIPAA-Renewal-Training for executives
What is a Computer? Input Devices /output devices
Convolutional neural network based encoder-decoder for efficient real-time ob...
Developing a website for English-speaking practice to English as a foreign la...
Microsoft Excel 365/2024 Beginner's training
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Enhancing emotion recognition model for a student engagement use case through...
Benefits of Physical activity for teenagers.pptx
Architecture types and enterprise applications.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Zenith AI: Advanced Artificial Intelligence
Five Habits of High-Impact Board Members
search engine optimization ppt fir known well about this
A proposed approach for plagiarism detection in Myanmar Unicode text
OpenACC and Open Hackathons Monthly Highlights July 2025

The Road To The Semantic Web

  • 1. The Road to the Semantic Web Michael Genkin SDBI 2010@HUJI
  • 2. "The Semantic Web is not a separate Web but an extension of the current one, in which information is given well- defined meaning, better enabling computers and people to work in cooperation." Tim Berners-Lee, James Hendler and Ora Lassila; Scientific American, May 2001 Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 3. Michael Genkin (mishagenkin@cs.huji.ac.il) Over 25 billion RDF triples (October 2010) More than 24 billion web pages (June 2010) Probably more than one triple per page, lot more
  • 4. How will we populate the Semantic Web?  Humans will enter structured data  Data-store owners will share their data  Computers will read unstructured data Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 5. Read the Web http://rtw.ml.cmu.edu/rtw/ (or google it) Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 6. Roadmap  Motivation  Some definitions  Natural language processing  Machine learning  Macro reading the web  Coupled training  NELL  Demo  Summary Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 7. Some Definitions  Natural Language Processing  Machine Learning Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 8. Natural Language Processing  Part of Speech Tagging (e.g. noun, verb)  Noun phrase: a phrase that normally consists of a (modified) head noun.  “pre-modified” (e.g. this, that, the red…)  “post-modified” (e.g. …with long hair, …where I live)  Proper noun: a noun which represents an unique entity (e.g. Jerusalem, Michael)  Common noun: a noun which represents a class of entities (e.g. car, university) Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 9. Learning: What is it?  Assume there is some knowledge base KB.  Let some algorithm 𝐴 𝐾𝐵to perform a set of task T.  Let a performance metric Perf.  We will say that a computer program learns if:  KB1 > KB2 ⇒ 𝑃𝑒𝑟𝑓 𝐴 𝐾𝐵1 𝑇 > 𝑃𝑒𝑟𝑓(𝐴 𝐾𝐵2 𝑇 ) Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 10. Training Methods Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 11.  We have a set of examples (KB) and a domain (D)  Examples might be positive, or negative  e.g. for every input 𝑥, 𝑦 ∈ 𝐾𝐵 ⇒ 𝑓 𝑥 = 𝑦 for some 𝑓.  The learning algorithm A would try to find such 𝑓.  𝑓 is called a classifier or regression Michael Genkin (mishagenkin@cs.huji.ac.il) Supervised
  • 12.  Distinguished from supervised learning by that there are no labeled examples (KB=D).  The unsupervised learning algorithm A will try to find a classifier that when given some 𝑑 ∈ 𝐷 as input, will return some arbitrary label.  i.e. the algorithm A analyses the structure of D Michael Genkin (mishagenkin@cs.huji.ac.il) Supervised Unsupervised
  • 13.  A middle way between supervised and unsupervised.  Use a minimal amount of labeled examples and a large amount of unlabeled.  Learn the structure of D in unsupervised manner, but use the labeled examples to constraint the results. Repeat.  Known as bootstrapping. Michael Genkin (mishagenkin@cs.huji.ac.il) Supervised Semi- Supervised Unsupervised
  • 14. Bootstrapping  Iterative semi-supervised learning Michael Genkin (mishagenkin@cs.huji.ac.il) Jerusalem Tel Aviv Haifa mayor of arg1 life in arg1 Ness-Ziona London denial anxiety selfishness Amsterdam arg1 is home of traits such as arg1  Under constrained!  Sematic drift
  • 15. Macro Reading the Web Populating the Semantic Web by Macro-Reading Internet Text. T.M. Mitchell, J. Betteridge, A. Carlson, E.R. Hruschka Jr., and R.C. Wang. Invited Paper, In Proceedings of the International Semantic Web Conference (ISWC), 2009 Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 16. Problem Specification (1): Input  Initial ontology that contains:  Dozens of categories and relations  (e.g. Company, CompanyHeadquarteredInCity)  Relations between categories and relations  (e.g. mutual exclusion, type constraints)  A few seed examples of each predicate in ontology  The web  Occasional access to human trainer Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 17. Problem Specification (2): The Task  Run forever (24x7)  Each day:  Run over ~500 million web pages.  Extract new facts and relations from the web to populate ontology.  Perform better than the day before  Populate the semantic web. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 18. A Solution?  An automatic, learning, macro-reader. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 19. Micro vs. Macro Reading (1)  Micro-reading: the traditional NLP task of annotating a single web page to extract the full body of information contained in the document.  NLP is hard!  Macro-reading: the task of “reading” a large corpus of web pages (e.g. the web) and returning large collection of facts expressed in the corpus.  But not necessarily all the facts. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 20. Micro vs. Macro Reading (2)  Macro-reading is easier than micro- reading. Why?  Macro-reading doesn’t require extracting every bit of information available.  In text corpora as large as the web, many important fact are stated redundantly, thousands of times, using different wordings.  Benefit by ignoring complex sentences.  Benefit by statistically combining evidence from many fragments to determine a belief in a hypothesis. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 21. Why an Input Ontology?  The problem with understanding free text is that it can mean virtually anything.  By formulating the problem of macro- reading as populating an ontology we allow the system to focus only on relevant documents.  The ontology can define meta properties of its categories and relations.  Allows to populate parts of the semantic web for which an ontology is available. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 22. Machine Learning Methods  Semi-supervised (use an ontology to learn).  Learn textual patterns for extraction.  Employ methods such as Coupled Training to improve accuracy.  Expand the ontology to improve performance. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 23. Coupled Training Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 24. Bootstrapping – Revised  Iterative semi-supervised learning Michael Genkin (mishagenkin@cs.huji.ac.il) Jerusalem Tel Aviv Haifa mayor of arg1 life in arg1 Ness-Ziona London denial anxiety selfishness Amsterdam arg1 is home of traits such as arg1
  • 25. Coupled Training Michael Genkin (mishagenkin@cs.huji.ac.il)  Couple the training of multiple functions to make unlabeled data more informative  Makes the learning task easier by adding constraints
  • 26. Coupling (1): Output Constraints  We wish to train a function 𝑓: 𝑋 → 𝑌  e.g. 𝑐𝑖𝑡𝑦: 𝑁𝑜𝑢𝑛𝑃ℎ𝑟𝑎𝑠𝑒 → {0,1}  Assume we have 𝑓1: 𝑋1 → 𝑌, 𝑓2: 𝑋2 → 𝑌 two different functions that assign the label city, but receive different input.  Coupling constraint: 𝑓1, 𝑓2 must agree over unlabeled data. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 27. Coupling (1): Output Constraints Michael Genkin (mishagenkin@cs.huji.ac.il) arg1: Nir Barkat is the mayor of Jerusalem X1=arg1 Y=city? X2=arg1 Y=country?≠ X2=arg1 Y=city? =
  • 28. Coupling (2): Compositional Constraints  Assume we have 𝑓1: 𝑋1 → 𝑌1, 𝑓2: 𝑋1 × 𝑋2 → 𝑌2  Assume we have a constraint on valid 𝑦1, 𝑦2 pairs given 𝑥1, 𝑥2.  Coupling constraint: 𝑓1, 𝑓2 must satisfy the constraint on 𝑦1, 𝑦2.  e.g. 𝑓1 “type checks” first argument for 𝑓2 Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 29. Coupling (2): Compositional Constraints Michael Genkin (mishagenkin@cs.huji.ac.il) Nir Barkat is the mayor of Jerusalem MayorOf(X1,X2) city? location? politician? city? location? politician?
  • 30. Coupling (3): Multi-view Agreement  We have a function 𝑓: 𝑋 → 𝑌  If X can be partitioned into two “views” 𝑋 = < 𝑋1, 𝑋2 >.  Assume 𝑋1 and 𝑋2 can predict Y.  We wish to learn 𝑓1: 𝑋1 → 𝑌, 𝑓2: 𝑋2 → 𝑌  Coupling constraint: 𝑓1, 𝑓2 must agree. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 31. Coupling (3): Multi-view Agreement  Let Y a set of possible web page categories  Let X be a set of web pages  Assume 𝑋1 represents the words in a page  Assume 𝑋2 represents the words in hyperlinks pointing to the page Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 32. NELL – Never-Ending Language Learning Coupled Semi-Supervised Learning for Information Extraction. A. Carlson, J. Betteridge, R.C. Wang, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2010. Never Ending Language Learning Tom Mitchell's invited talk in the Univ. of Washington CSE Distinguished Lecture Series, October 21, 2010. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 33. Motivation  Humans learn many things, for years, and become better learners over time  Why not machines? Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 34. Coupled Constraints (1)  Mutual Exclusion:  Two mutually exclusive predicates can’t be both satisfied by the same input 𝑥.  Relation argument type checking:  Insure the noun phrases to satisfy each relation correspond to the categories defined for this relation.  e.g. CompanyIsInEconomicSector relation has arguments of Company and EconomicSector categories. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 35. Coupled Constraints (2)  Unstructured and Semi-structured text features:  Noun phrases appear on the web in free text context or semi-structured context.  Structured and Semi-structured classifiers will make independent mistakes  But each is sufficient for classification Both the classifiers must agree. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 36. Coupled Pattern Learner (CPL): Overview  Learns to extract category and pattern instances.  Learns high-precision textual patterns.  e.g. arg1 scored a goal for arg2 Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 37. Coupled Pattern Learner (CPL): Extracting  Runs forever, on each iteration bootstraps a patterns promoted on the last iteration to extract instances.  Select the 1000 that co-occur with most patterns.  Similar procedure for patterns, but using recently promoted instances.  Uses PoS heuristics to accomplish extraction  e.g. per category proper/common noun specification, pattern is a sequence of verbs followed by adjectives, prepositions, or determiners (and optionally preceded by nouns). Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 38. Coupled Pattern Learner (CPL): Filtering and Ranking  Candidates are filtered to enforce mutual exclusion and type constraints  A candidate is rejected unless it co-occurs with a promoted pattern at least three times more than it co-occurs with mutually exclusive predicates.  Candidates are ranked as following:  Instances: by the number of promoted patterns the co-occur with.  Patterns: by precision estimation 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑝 = 𝑖∈ℐ 𝑐𝑜𝑢𝑛𝑡 𝑖, 𝑝 𝑐𝑜𝑢𝑛𝑡 𝑝 Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 39. Coupled Pattern Learner (CPL): Promoting Candidates  For each predicate – promotes at most 100 instances and 5 patterns.  Highest rated.  Instances and patterns promoted only if they co-occur with two promoted pattern or instances.  Relations instances are promoted only if their arguments are candidates for the specified categories. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 40. Coupled SEAL (1)  SEAL is an established wrapper induction algorithm.  Creates page specific extractors  Independent of language  Category wrappers defined by prefix and postfix, relation wrappers defined by infix.  Wrappers for each predicate learned independently. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 41. Coupled SEAL (2)  Coupled SEAL adds mutual exclusion and type checking constrains to SEAL.  Bootstraps recently promoted wrappers.  Filters candidates that are mutually exclusive or not of the right type for relation.  Uses a single page per domain for ranking.  Promotes the top 100 instances extracted by at least two wrappers. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 42. Meta-Bootstrap Learner  Couples the training of multiple extraction techniques.  Intuition: different extractors will make independent errors.  Replaces the PROMOTE step of subordinate extractor algorithms.  Promotes any instance recommended by all the extractors, as long as mutual exclusion and type checks hold. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 45. Learning New Constraints  Data mine the KB to infer new beliefs.  Generates probabilistic, first order, horn clauses.  Connects previously uncoupled predicates.  Manually filter rules. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 48. Summary Populating the semantic web by using NELL for macro reading Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 49. Populating the Semantic Web  Many ways to accomplish.  Use initial ontology to focus, constrain the learning task.  Couple the learning of many, many extractors.  Macro Reading: instead of annotating a single page each time, read many pages simultaneously.  A never ending task. Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 50. Macro-Reading  Helps to improve accuracy.  Still doesn’t help to annotate a single page, but…  Many things that are true for a single page are also true for many pages  Helps to populate databases with frequently mentioned knowledge Michael Genkin (mishagenkin@cs.huji.ac.il)
  • 51. Future Directions  Coupling with external sources  DBpedia, Freenode  Ontology extension  New relations through reading, Subcategories  Use a macro-reader to train a micro-reader  Self-reflection, Self-correction  Distinguishing tokens from entities  Active learning – crowd sourcing Michael Genkin (mishagenkin@cs.huji.ac.il)

Editor's Notes

  • #2: אני שמח לפתוח את עונת ההרצאות שמקבלות ציון. אחרי ששמענו סקירה קצרה על התחום, הטכנולוגיות וכיווני המחקר משרה, וראינו הדגמה כיצד ניתן ליצור מאגרי מידע מלאכותיים לבדיקת מערכות סמנטיות – אני רוצה לקחת צעד אחורה ולדון על כיצד מגיעים מהרשת המוכרת לנו כיום למימוש חזון הרשת הסמנטית.
  • #3: אני אוהב לפתוח בציטוט. במקרה הזה זה תוכן הציטוט פחות משנה, מה שמשנה הוא הגיל שלו. רעיון הווב הסמנטי אינו מאוד חדש (הוא בן כמעט עשור).
  • #4: יש עוד עבודה רבה לפנינו... נחזור לשאלה שהעלה שוקי מספר פעמים והיא – איך?
  • #15: רוצים למצוא ערים: מתחילים עם רשימת ערים, מוצאים הקשרים לקסיקליים – וחוזר חלילה
  • #39: ℐ - promoted instances, count(i,p) – the number of times I and p co-occur in the corpus, count(p) – the number of times p occurs in the corpus
  • #45: New instances per iteration – heat map