SlideShare a Scribd company logo
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using ReasonigTobias WunnerUnit for Natural Language Processing (UNLP)firstname.lastname@deri.orgWednesday,22nd June, 2011DERI, Reading Group1
Based On:“SOFIE: A Self-Organizing Framework for Information Extraction”Authors: Fabian Suchanek, Mauro Sozio,                      Gerhard WeikumPublished: World Wide Web Conference (WWW)                     Madrid, 20092
OverviewIntroductionSOFIE Model + RulesExcursion: SatisfiabilitySOFIE ApproachEvaluation experimentsConclusion3
MotivationClassical IE on text  pattern-based  80pcSemistructural approach  Wikipedia infoboxes 95%Idea of Paper: combine use text (hypotheses)  +  ontology (trusted facts)4
Example5Document1YAGO ontologyfamilyName(AlbertEinstein, Einstein)bornIn(AlbertEinstein, Germany)attendedSchoolIn( AlbertEinstein, Germany)Einstein attended secondary school in Germany.New Knowledge
General IdeaExpress extraction patterns as factRules to understand usage of termsAdd restrictions6patternOcc(“X went to school in Y”,Einstein, Switzerland)patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)
ContributionUnified approach toPattern matchingWord Sense DisambiguationReasoningLarge ScaleOn Unstructured Data7
Pattern extraction with WICsExtract patterns based on ‘interesting’ entities8DocumentsEinstein was born at Ulm in Württemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass. When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there…Knowledge BasepatternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1]patternOcc(“Ulm is in Württemberg, Germany”,Ulm@D1, Germany@D1) [1]patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1]WICs (Word in Context)
GroundingTest RulesHow?find an instance which satisfies the formulae9bornIn(Einstein,Ulm) ⇒ ¬bornIn(Einstein,Timbuktu)studiedIn(Einstein,Ulm)bornIn(X,Ulm) ⇒ ¬bornIn(X,Timbuktu)studiedIn(X,Ulm)
Rules (Hypotheses)DisambiguationdisambiguatesAs(Albert@D,AlberEinstein)[?]Expresses a new factexpresses(P, livedIn(Einstein,Switzerland) )[?]New factsCityIn(Ulm,Germany)[?]10
New fact rule...with disambiguation11“Pattern P expresses Relation R when    analysis of WICs     are disambiguated”patternOcc( P, WX, WY ) anddisambiguatesAs(WX, X) anddisambiguatesAs(WY, Y) andR(X,Y)⇒  express( P, R )
RestrictionsDisambiguation disambiguation prior should influence choice of disambiguation12N - any disamb. functiondisambPrior( W, X, N )⇒  disambiguatedAs( W, X )| words(D1) ∩ rel(AlbertEinstein)|| words(D1) |
RestrictionsFunctional restrictions13R(X,Y) and type(R, function) anddifferent(Y,Z)⇒ ¬R(X,Z)“Albert@D1 born in?”Albert@D1 ≠ Albert@D2
SOFIE RulesFramework to test the hypothesesQuestion  “How to satisfy all them?” rules      +         trusted facts14dismbPrior(Albert@D1, AlbertEinstein, 10)⇒  disambiguatesAs(Albert@D1, AlbertEinstein)patternOcc( P, X, Y ) andR(X,Y)⇒  express( P, R )dismbPrior(Albert@D1, HermannEinstein, 3)⇒  disambiguatesAs(Albert@D1, HermannEinstein)   Country(Germany)livedIn(AlbertEinstein,Ulm)   …
SAT / MAX SATSAT (Satisfiability)proove formula can be TRUEComplexity ClassesP  Good    example:   NkNP  Bad                     cNe.g. naive algorithm for 100 variables 2100 x 10-10 ms per row = 4 x 1012 yNot always.. 3SAT in (4/3)NSAT Solver15F = (X or Y or Z) and (¬X or Y or Z)       and (¬X or ¬Y or ¬Z)G = (X or Y) and (¬X or ¬Y) and (X)truth table has 23 rowsDetails Schöning 2010
SAT / MAX SATSAT (Satisfiability)proove formula can be TRUEComplexity ClassesP  Good    example:   NkNP  Bad                     cNe.g. naive algorithm for 100 variables 2100 x 10-10 ms per row = 4 x 1012 yNot always.. 3SAT in (4/3)NSAT SolverMAX SAT16F = (X or Y or Z) and (¬X or Y or Z)       and (¬X or ¬Y or ¬Z)G = (X or Y) and (¬X or ¬Y) and (X)truth table has 23 rowsDetails Schöning 2010
Weighted MAX SAT in SOFIE...back to SOFIEthis is MAX SAT but with weights17rules      +     trusted facts   Country(Germany)livedIn(AlbertEinstein,Ulm)   …dismbPrior(Albert@D1, AlbertEinstein, 10)⇒  disambiguatesAs(Albert@D1, AlbertEinstein)patternOcc( P, X, Y ) andR(X,Y)⇒  express( P, R )dismbPrior(Albert@D1, HermannEinstein, 3)⇒  disambiguatesAs(Albert@D1, HermannEinstein)
Weighted MAX SAT in SOFIEWeighted MAX SAT is NP hardonly approximation algorithms impractical to find optimal solutionSAT SolverJohnson’s algorithm:    2/3  (apprx guarantee)
Weighted MAX SAT in SOFIEFunctional MAX SATSpecialized reasoning (support for functional properties)Approximation guarantee 1/2Propagates dominating unit clausesConsiders only unit clausesA  v  B    [w1]A  v  B    [w2]B  v  C    [w3]C                 [w4]A  v  B     [10]A             [10]A                [30]A = true30 > 10+10
Controlled experimentCorpus from Wikipedia infoboxes100 articlesSemantic is known!20
Controlled experimentLarge-scale: Corpus from Wikipedia articles2000 articles13 frequent relations from YAGOParsing 	 = 87min         Reaoning = 77min21
Unstructured text sources150 news paper articlesrelation under test headquarterOfYAGO (modified with relation seeds)Parsing 87min     WeightedMaxSat 77mindisambiguated entries (provenance) could be manually assessed22functionalrelation
Unstructured text sourcesLarge-scale:10 biographies for each of 400 US senators5 relationshipsDisambiguation was not ideal for YAGO (13 James Watson)Parsing 7h    W-MAX-SAT  9hResults4 good1 bad (misleading patterns)23
MAX SAT can’t do OWL per se (Open World Assumption)Reformulate OWL in propositional logicOWL  FOL  Skolem Normal Form  Propositional LogicMight find OWL-inconsistent ontologies due to OW Assumption24define a student as a subclass “attends some course”⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y)⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k⇒ ¬attends(xi, ki) or ¬Course(xi) or Student(xi); k=x1 .. xnInferred Ontology{ Student(alex), Student(bob),  Student subClassOf attends some Course,                                attends(alex, SemanticWeb) }Details JMC 2010
ConclusionsOntology-based IE (OBIE) reformulated as weighted MAX SAT problemApproximation algorithm with 1/2Works and scales (large corpus + YAGO)25
LimitationsSpecialized approximation algorithmAccounts for SOFIE rules NOT OWLMAX SAT Restrictions∈ Prepositional Logic∉ First-Order LogicOntology population approach (can’t infer new relations)26
References27F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, linkJohn McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 linkUwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, linkF Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. Saarbrücken, Germany, 2009, link

More Related Content

PDF
Dialectica and Kolmogorov Problems
PDF
Dialectica and Kolmogorov Problems
PDF
Matrix calculus
PDF
Lecture2 xing
PDF
Langrange Interpolation Polynomials
PDF
A discussion on sampling graphs to approximate network classification functions
PDF
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
PPT
Xc ngwenya 201007180
Dialectica and Kolmogorov Problems
Dialectica and Kolmogorov Problems
Matrix calculus
Lecture2 xing
Langrange Interpolation Polynomials
A discussion on sampling graphs to approximate network classification functions
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Xc ngwenya 201007180

What's hot (20)

PDF
Gamma sag semi ti spaces in topological spaces
PDF
11. gamma sag semi ti spaces in topological spaces
PDF
Intro to Approximate Bayesian Computation (ABC)
PDF
Absolute and Relative Clustering
PDF
Generative models : VAE and GAN
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
PDF
Slides econometrics-2018-graduate-4
PDF
Statistics (1): estimation, Chapter 1: Models
PDF
Note on closed sets in topological spaces
PDF
Lecture 2 predicates quantifiers and rules of inference
PDF
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
PDF
Accelerated approximate Bayesian computation with applications to protein fol...
PDF
An Overview of Separation Axioms by Nearly Open Sets in Topology.
PDF
MarkDrachMeinelThesisFinal
PDF
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
PPTX
Predicates and Quantifiers
PDF
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
PDF
My data are incomplete and noisy: Information-reduction statistical methods f...
PDF
CONTINUITY ON N-ARY SPACES
PDF
Gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces
Intro to Approximate Bayesian Computation (ABC)
Absolute and Relative Clustering
Generative models : VAE and GAN
Inference for stochastic differential equations via approximate Bayesian comp...
Slides econometrics-2018-graduate-4
Statistics (1): estimation, Chapter 1: Models
Note on closed sets in topological spaces
Lecture 2 predicates quantifiers and rules of inference
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
Accelerated approximate Bayesian computation with applications to protein fol...
An Overview of Separation Axioms by Nearly Open Sets in Topology.
MarkDrachMeinelThesisFinal
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
Predicates and Quantifiers
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
My data are incomplete and noisy: Information-reduction statistical methods f...
CONTINUITY ON N-ARY SPACES
Ad

Similar to SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig (20)

PPTX
Knowledge Extraction
PPTX
BT02.pptx
PPT
PredicateLogic or FOL for Computer Science
PPT
Lec8-PredicateLogic knowledge representation.ppt
PPTX
First order logic
PPT
Jarrar.lecture notes.aai.2011s.ch8.fol.introduction
PPTX
Theorem proving 2018 2019
PPT
KNOWLEDGE Representation unit 3 for data mining
PPTX
Propositional logic(part 2)
PPT
Jarrar.lecture notes.aai.2011s.descriptionlogic
PDF
Theorem proving 2018 2019
PDF
Nominal Schema DL 2011
PPT
Frstorder 9 sldes read
PPTX
Knowledge Representation and Reasoning.pptx
PPT
KnowledgeRepresentation in artificial intelligence.ppt
PPT
A Distributed Tableau Algorithm for Package-based Description Logics
PDF
leanCoR: lean Connection-based DL Reasoner
PPTX
First order logic in artificial Intelligence.pptx
PPT
Knowledge Extraction
BT02.pptx
PredicateLogic or FOL for Computer Science
Lec8-PredicateLogic knowledge representation.ppt
First order logic
Jarrar.lecture notes.aai.2011s.ch8.fol.introduction
Theorem proving 2018 2019
KNOWLEDGE Representation unit 3 for data mining
Propositional logic(part 2)
Jarrar.lecture notes.aai.2011s.descriptionlogic
Theorem proving 2018 2019
Nominal Schema DL 2011
Frstorder 9 sldes read
Knowledge Representation and Reasoning.pptx
KnowledgeRepresentation in artificial intelligence.ppt
A Distributed Tableau Algorithm for Package-based Description Logics
leanCoR: lean Connection-based DL Reasoner
First order logic in artificial Intelligence.pptx
Ad

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
01-Introduction-to-Information-Management.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Pre independence Education in Inndia.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Cell Types and Its function , kingdom of life
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Microbial disease of the cardiovascular and lymphatic systems
O5-L3 Freight Transport Ops (International) V1.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Anesthesia in Laparoscopic Surgery in India
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
Module 4: Burden of Disease Tutorial Slides S2 2025
01-Introduction-to-Information-Management.pdf
TR - Agricultural Crops Production NC III.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
102 student loan defaulters named and shamed – Is someone you know on the list?
Pre independence Education in Inndia.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
RMMM.pdf make it easy to upload and study
Cell Types and Its function , kingdom of life
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial disease of the cardiovascular and lymphatic systems

SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

  • 1. SOFIE - A Unified Approach To Ontology-Based Information Extraction Using ReasonigTobias WunnerUnit for Natural Language Processing (UNLP)firstname.lastname@deri.orgWednesday,22nd June, 2011DERI, Reading Group1
  • 2. Based On:“SOFIE: A Self-Organizing Framework for Information Extraction”Authors: Fabian Suchanek, Mauro Sozio, Gerhard WeikumPublished: World Wide Web Conference (WWW) Madrid, 20092
  • 3. OverviewIntroductionSOFIE Model + RulesExcursion: SatisfiabilitySOFIE ApproachEvaluation experimentsConclusion3
  • 4. MotivationClassical IE on text pattern-based  80pcSemistructural approach Wikipedia infoboxes 95%Idea of Paper: combine use text (hypotheses) + ontology (trusted facts)4
  • 5. Example5Document1YAGO ontologyfamilyName(AlbertEinstein, Einstein)bornIn(AlbertEinstein, Germany)attendedSchoolIn( AlbertEinstein, Germany)Einstein attended secondary school in Germany.New Knowledge
  • 6. General IdeaExpress extraction patterns as factRules to understand usage of termsAdd restrictions6patternOcc(“X went to school in Y”,Einstein, Switzerland)patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)
  • 7. ContributionUnified approach toPattern matchingWord Sense DisambiguationReasoningLarge ScaleOn Unstructured Data7
  • 8. Pattern extraction with WICsExtract patterns based on ‘interesting’ entities8DocumentsEinstein was born at Ulm in Württemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass. When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there…Knowledge BasepatternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1]patternOcc(“Ulm is in Württemberg, Germany”,Ulm@D1, Germany@D1) [1]patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1]WICs (Word in Context)
  • 9. GroundingTest RulesHow?find an instance which satisfies the formulae9bornIn(Einstein,Ulm) ⇒ ¬bornIn(Einstein,Timbuktu)studiedIn(Einstein,Ulm)bornIn(X,Ulm) ⇒ ¬bornIn(X,Timbuktu)studiedIn(X,Ulm)
  • 10. Rules (Hypotheses)DisambiguationdisambiguatesAs(Albert@D,AlberEinstein)[?]Expresses a new factexpresses(P, livedIn(Einstein,Switzerland) )[?]New factsCityIn(Ulm,Germany)[?]10
  • 11. New fact rule...with disambiguation11“Pattern P expresses Relation R when analysis of WICs are disambiguated”patternOcc( P, WX, WY ) anddisambiguatesAs(WX, X) anddisambiguatesAs(WY, Y) andR(X,Y)⇒ express( P, R )
  • 12. RestrictionsDisambiguation disambiguation prior should influence choice of disambiguation12N - any disamb. functiondisambPrior( W, X, N )⇒ disambiguatedAs( W, X )| words(D1) ∩ rel(AlbertEinstein)|| words(D1) |
  • 13. RestrictionsFunctional restrictions13R(X,Y) and type(R, function) anddifferent(Y,Z)⇒ ¬R(X,Z)“Albert@D1 born in?”Albert@D1 ≠ Albert@D2
  • 14. SOFIE RulesFramework to test the hypothesesQuestion “How to satisfy all them?” rules + trusted facts14dismbPrior(Albert@D1, AlbertEinstein, 10)⇒ disambiguatesAs(Albert@D1, AlbertEinstein)patternOcc( P, X, Y ) andR(X,Y)⇒ express( P, R )dismbPrior(Albert@D1, HermannEinstein, 3)⇒ disambiguatesAs(Albert@D1, HermannEinstein) Country(Germany)livedIn(AlbertEinstein,Ulm) …
  • 15. SAT / MAX SATSAT (Satisfiability)proove formula can be TRUEComplexity ClassesP  Good example: NkNP  Bad cNe.g. naive algorithm for 100 variables 2100 x 10-10 ms per row = 4 x 1012 yNot always.. 3SAT in (4/3)NSAT Solver15F = (X or Y or Z) and (¬X or Y or Z) and (¬X or ¬Y or ¬Z)G = (X or Y) and (¬X or ¬Y) and (X)truth table has 23 rowsDetails Schöning 2010
  • 16. SAT / MAX SATSAT (Satisfiability)proove formula can be TRUEComplexity ClassesP  Good example: NkNP  Bad cNe.g. naive algorithm for 100 variables 2100 x 10-10 ms per row = 4 x 1012 yNot always.. 3SAT in (4/3)NSAT SolverMAX SAT16F = (X or Y or Z) and (¬X or Y or Z) and (¬X or ¬Y or ¬Z)G = (X or Y) and (¬X or ¬Y) and (X)truth table has 23 rowsDetails Schöning 2010
  • 17. Weighted MAX SAT in SOFIE...back to SOFIEthis is MAX SAT but with weights17rules + trusted facts Country(Germany)livedIn(AlbertEinstein,Ulm) …dismbPrior(Albert@D1, AlbertEinstein, 10)⇒ disambiguatesAs(Albert@D1, AlbertEinstein)patternOcc( P, X, Y ) andR(X,Y)⇒ express( P, R )dismbPrior(Albert@D1, HermannEinstein, 3)⇒ disambiguatesAs(Albert@D1, HermannEinstein)
  • 18. Weighted MAX SAT in SOFIEWeighted MAX SAT is NP hardonly approximation algorithms impractical to find optimal solutionSAT SolverJohnson’s algorithm:  2/3 (apprx guarantee)
  • 19. Weighted MAX SAT in SOFIEFunctional MAX SATSpecialized reasoning (support for functional properties)Approximation guarantee 1/2Propagates dominating unit clausesConsiders only unit clausesA v B [w1]A v B [w2]B v C [w3]C [w4]A v B [10]A [10]A [30]A = true30 > 10+10
  • 20. Controlled experimentCorpus from Wikipedia infoboxes100 articlesSemantic is known!20
  • 21. Controlled experimentLarge-scale: Corpus from Wikipedia articles2000 articles13 frequent relations from YAGOParsing = 87min Reaoning = 77min21
  • 22. Unstructured text sources150 news paper articlesrelation under test headquarterOfYAGO (modified with relation seeds)Parsing 87min WeightedMaxSat 77mindisambiguated entries (provenance) could be manually assessed22functionalrelation
  • 23. Unstructured text sourcesLarge-scale:10 biographies for each of 400 US senators5 relationshipsDisambiguation was not ideal for YAGO (13 James Watson)Parsing 7h W-MAX-SAT 9hResults4 good1 bad (misleading patterns)23
  • 24. MAX SAT can’t do OWL per se (Open World Assumption)Reformulate OWL in propositional logicOWL  FOL  Skolem Normal Form  Propositional LogicMight find OWL-inconsistent ontologies due to OW Assumption24define a student as a subclass “attends some course”⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y)⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k⇒ ¬attends(xi, ki) or ¬Course(xi) or Student(xi); k=x1 .. xnInferred Ontology{ Student(alex), Student(bob), Student subClassOf attends some Course, attends(alex, SemanticWeb) }Details JMC 2010
  • 25. ConclusionsOntology-based IE (OBIE) reformulated as weighted MAX SAT problemApproximation algorithm with 1/2Works and scales (large corpus + YAGO)25
  • 26. LimitationsSpecialized approximation algorithmAccounts for SOFIE rules NOT OWLMAX SAT Restrictions∈ Prepositional Logic∉ First-Order LogicOntology population approach (can’t infer new relations)26
  • 27. References27F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, linkJohn McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 linkUwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, linkF Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. Saarbrücken, Germany, 2009, link