SlideShare a Scribd company logo
Towards Efficient and Effective Semantic Table Interpretation 
Ziqi Zhang 
Department of Computer Science, University of Sheffield
Outline 
•Define semantic table interpretation 
•State-of-the-art and motivation 
•The method – TableMiner 
•Evaluation 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Semantic Table Interpretation 
•Input 
• Ontology 
• Relational table 
•Goals/Tasks 
• Label columns by concepts 
• Link cells to named entities 
• Connect columns by relations 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Thing 
Work 
Artist 
Location 
… … 
Ent:USA 
Ent:UK 
… 
Film 
Actor/ Actress 
Country 
Name 
Film 
Country 
1 
Tom Hanks 
Philadelphia 
USA 
2 
Jamie Foxx 
Ray 
USA 
3 
Kate Winslet 
The Reader 
UK 
99 
Charlize Theron 
Monster 
South Africa 
Table of Best Actor/Actress 
< … … > 
… … 
Rel:performIn 
Rel:performIn
Semantic Table Interpretation 
•Input 
• Ontology 
• Relational table 
•Goals/Tasks 
• Label columns by concepts 
• Link cells to named entities 
• Connect columns by relations 
Column classification/ header disambiguation 
Relation interpretation 
Cell disambiguation
Motivation and State-of-the-art 
•154 mil. relational tables on the Web and growing [Cafarella2008] 
•Classic Information Extraction methods do not work [Limaye2010, Lu2013] 
• They cannot model the complex interdependence among table components 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Motivation and State-of-the-art 
•SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Limitation 1 
Inference is ‘exhaustive’, but unnecessary 
Name 
Film 
Country 
1 
Tom Hanks 
Philadelphia 
USA 
2 
Jamie Foxx 
Ray 
USA 
3 
Kate Winslet 
The Reader 
UK 
99 
Charlize Theron 
Monster 
South Africa 
Table of Best Actor/Actress 
< … … > 
Goal: Assign a concept to this column 
Hint: Content in the column gives useful clues 
How much do we need for inference (99 rows in this example)? 
- Human: SOME (learn by example) 
- SoA: ALL
Motivation and State-of-the-art 
•SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Limitation 2 
Contextual features for inference 
Table of Best Actor/Actress 
SoA: features only from within the table 
Context outside the table also makes hint for interpretation. E.g., the words in the paragraph are often found in descriptions of actors
TableMiner
TableMiner 
•Two tasks: 
• Column classification 
• Cell disambiguation 
•Non-exhaustive inference in a bootstrapping pattern 
• phase 1 – inference with partial content 
• phase 2 – propagation and update 
•Contextual features both inside and outside tables 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Phase 1 I-Inf 
•Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Phase 1 I-Inf 
•Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Itr.1 
…. 
(until stop) 
Ei,j= 
{<e1,s1>, <e2,s2>, …}
TableMiner – Phase 1 I-Inf 
•Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Itr.1 
…. (until stop) 
Ei,j= {<e1,s1>, <e2,s2>, …} 
concepts = {<c1,s1>, <c2,s2>, …} 
Cj= 
{<c1,s1’>, <c2,s2‘>}
TableMiner – Phase 1 I-Inf 
•Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Itr.1 
…. 
(until stop) 
Ei,j= 
{<e1,s1>, <e2,s2>, …} 
concepts = {<c1,s1>, <c2,s2>, …} 
Cj= {<c1,s1’>, <c2,s2‘>} 
|H(Cj) – H(prevCj)|<t? 
Yes – stop No – next itr.
TableMiner – Phase 1 I-Inf 
•Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
…. (until stop) 
concepts = {<c1,s1>, <c3,s3>, …} 
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>} 
Ei,j= {<e1,s1>, <e2,s2>, …} 
Itr.2 
|H(Cj) – H(prevCj)|<t? 
Yes – stop No – next itr.
TableMiner – Phase 1 I-Inf 
•Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
…. 
(until stop) 
Itr.3 
Ei,j= {<e1,s1>, <e2,s2>, …} 
concepts = {<c11,s11>} 
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} 
|H(Cj) – H(prevCj)|<t? 
Yes – stop No – next itr.
TableMiner – Phase 1 I-Inf 
•To compute scores of candidate named entities (e.g. <e1,s1>) and concepts (e.g., <c1,s1’>) 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
•Candidate NE 
•Build a feature vector of a candidate using the ontology 
•Build a feature vector of the cell/column header using its context 
•Compute vector similarity 
•Candidate concept: same principle, but also depends on score of contributing NEs
TableMiner – Phase 2 Propagate, Update 
•When I-Inf stops 
• Select the highest scoring candidate concept c+ to label the column 
• Propagate: use c+ as constrain to disambiguate remaining cells – candidate NEs not belonging to c+ are discarded 
• Update: 
•Re-compute c+ after all cells are disambiguated 
•If the new c+ is different, revise disambiguation across the entire column with it as new constraint 
•Repeat until no change 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} c+ 
Rank and select 
Use as constraint to disamb- iguate cells
Evaluation 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation 
•Data 
• Freebase as reference ontology/background knowledge 
• Limaye112 – 112 Web tables from Limaye2010 originally annotated with Wikipedia 
•Cells are automatically mapped to Freebase – some are unmapped 
•Columns are manually annotated 
• IMDB – 7,354 “cast” tables of films mapped to Freebase 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation 
•Baselines (both uses exhaustive inference) 
• Bfirst - cell disambiguation: choose the top ranked NE candidate in the Freebase search result - column classification: each disambiguated cell casts a vote to the set of concepts the NEs belong to, and the majority wins 
• Bsim - cell disambiguation: string similarity + feature vector similarity (in-table context only) - column classification: the majority vote method as above + string similarity 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation Results 
•Cell disambiguation Manual validation of 932 cell annotations in Limaye112 not covered by the above results (i.e., unmapped cells) 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
If only consider those cells where at least one system predicts correctly
TableMiner – Evaluation Results 
•Column classification best only – a column is labelled correctly only if the concept is suitable for the data in the column and is specific enough best or ok – a column is labelled correctly if the concept is suitable for the data in the column, though not very specific (E.g., ‘Film Actors’ may be the best, while ‘Artist’ or ‘Person’ is OK, but ‘Engineer’ is incorrect) 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Evaluation Results 
•Efficiency – TableMiner is efficient because 
•Column classification: processes partial content from a column (avg. 57% Limaye112, 43% IMDB) 
•Cell disambiguation: constrained by column classification, resulting in smaller NE candidate space (avg. 32% reduction Limaye32, 24% IMDB) 
• Fewer candidates => less time spent on retrieval and feature space creation (typically >90% of CPU in the pipeline, Limaye2010) 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
TableMiner – Conclusion 
•TableMiner take-home messages 
•How can it be more effective? 
• Use both context within and outside tables as features for inference 
•How can it be more efficient? 
• Perform inference with partial data and follow the boot- strapping pattern of learning 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation 
Message 1 
Message 2
References 
•[Cafarella2008] Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y. 2008: Webtables: exploring the power of tables on the web. Proceedings of VLDB Endowment 1(1), 538–549 
•[Limaye2010] Limaye, G., Sarawagi, S., Chakrabarti, S. 2010: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1-2), 1338–134 
•[Lu2013] Lu, C., Bing, L., Lam, W., Chan, K., Gu, Y. 2013: Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications 
•[Mulwad2013] Mulwad, V., Finin, T., Joshi, A. 2013: Semantic message passing for generating linked data from tables. In: International Semantic Web Conference (1). pp. 363–378. Lecture Notes in Computer Science, Springer 
•[Venetis2011] Venetis, P., Halevy, A., Madhavan, J., Pas¸ca, M., Shen,W.,Wu, F., Miao, G.,Wu, C. 2011: Recovering semantics of tables on the web. Proceedings of VLDB Endowment 4(9), 528–538 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
Thank you 
Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation

More Related Content

PPTX
Types and roles
PDF
Tech Jam 01 - Database Querying
PDF
Archives & the Semantic Web
PDF
L5: Advanced modelling (english)
PPTX
What AI is and examples of how it is used in legal
PDF
Semantic segmentation
PPTX
Semantic blockchain
PDF
How to Make Awesome SlideShares: Tips & Tricks
Types and roles
Tech Jam 01 - Database Querying
Archives & the Semantic Web
L5: Advanced modelling (english)
What AI is and examples of how it is used in legal
Semantic segmentation
Semantic blockchain
How to Make Awesome SlideShares: Tips & Tricks

Similar to Towards Efficient and Effective Semantic Table Interpretation (20)

PDF
Ekaw2014 ziqi zhang
PDF
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
PDF
From Unstructured to Structured Tabular Data Using a Rule Engine
PDF
Table Retrieval and Generation
PDF
Semantics at Scale: A Distributional Approach
PDF
G04124041046
PDF
Context Driven Technique for Document Classification
PDF
Toward Description Generation for Tables in Scientific Articles
PPTX
NLP & DBpedia
PDF
Usage of word sense disambiguation in concept identification in ontology cons...
PDF
ICDM2019 table tutorial
PDF
Ontology learning
PDF
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
PDF
Sql Saturday 111 Atlanta applied enterprise semantic mining
PDF
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
PDF
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
PDF
SQL Server 2012 - Semantic Search
PDF
An Improved Approach for Word Ambiguity Removal
PDF
TOP 10 Cited Computer Science & Information Technology Research Articles From...
PPT
Copy of 10text (2)
Ekaw2014 ziqi zhang
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
From Unstructured to Structured Tabular Data Using a Rule Engine
Table Retrieval and Generation
Semantics at Scale: A Distributional Approach
G04124041046
Context Driven Technique for Document Classification
Toward Description Generation for Tables in Scientific Articles
NLP & DBpedia
Usage of word sense disambiguation in concept identification in ontology cons...
ICDM2019 table tutorial
Ontology learning
IRJET- On-AIR Based Information Retrieval System for Semi-Structure Data
Sql Saturday 111 Atlanta applied enterprise semantic mining
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SQL Server 2012 - Semantic Search
An Improved Approach for Word Ambiguity Removal
TOP 10 Cited Computer Science & Information Technology Research Articles From...
Copy of 10text (2)
Ad

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
An interstellar mission to test astrophysical black holes
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PPTX
Pharmacology of Autonomic nervous system
PPT
protein biochemistry.ppt for university classes
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
The scientific heritage No 166 (166) (2025)
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Introduction to Cardiovascular system_structure and functions-1
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
An interstellar mission to test astrophysical black holes
Biophysics 2.pdffffffffffffffffffffffffff
C1 cut-Methane and it's Derivatives.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Science Quipper for lesson in grade 8 Matatag Curriculum
Pharmacology of Autonomic nervous system
protein biochemistry.ppt for university classes
lecture 2026 of Sjogren's syndrome l .pdf
The scientific heritage No 166 (166) (2025)
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
TOTAL hIP ARTHROPLASTY Presentation.pptx
Ad

Towards Efficient and Effective Semantic Table Interpretation

  • 1. Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang Department of Computer Science, University of Sheffield
  • 2. Outline •Define semantic table interpretation •State-of-the-art and motivation •The method – TableMiner •Evaluation Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 3. Semantic Table Interpretation •Input • Ontology • Relational table •Goals/Tasks • Label columns by concepts • Link cells to named entities • Connect columns by relations Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Thing Work Artist Location … … Ent:USA Ent:UK … Film Actor/ Actress Country Name Film Country 1 Tom Hanks Philadelphia USA 2 Jamie Foxx Ray USA 3 Kate Winslet The Reader UK 99 Charlize Theron Monster South Africa Table of Best Actor/Actress < … … > … … Rel:performIn Rel:performIn
  • 4. Semantic Table Interpretation •Input • Ontology • Relational table •Goals/Tasks • Label columns by concepts • Link cells to named entities • Connect columns by relations Column classification/ header disambiguation Relation interpretation Cell disambiguation
  • 5. Motivation and State-of-the-art •154 mil. relational tables on the Web and growing [Cafarella2008] •Classic Information Extraction methods do not work [Limaye2010, Lu2013] • They cannot model the complex interdependence among table components Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 6. Motivation and State-of-the-art •SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Limitation 1 Inference is ‘exhaustive’, but unnecessary Name Film Country 1 Tom Hanks Philadelphia USA 2 Jamie Foxx Ray USA 3 Kate Winslet The Reader UK 99 Charlize Theron Monster South Africa Table of Best Actor/Actress < … … > Goal: Assign a concept to this column Hint: Content in the column gives useful clues How much do we need for inference (99 rows in this example)? - Human: SOME (learn by example) - SoA: ALL
  • 7. Motivation and State-of-the-art •SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Limitation 2 Contextual features for inference Table of Best Actor/Actress SoA: features only from within the table Context outside the table also makes hint for interpretation. E.g., the words in the paragraph are often found in descriptions of actors
  • 9. TableMiner •Two tasks: • Column classification • Cell disambiguation •Non-exhaustive inference in a bootstrapping pattern • phase 1 – inference with partial content • phase 2 – propagation and update •Contextual features both inside and outside tables Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 10. TableMiner – Phase 1 I-Inf •Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 11. TableMiner – Phase 1 I-Inf •Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Itr.1 …. (until stop) Ei,j= {<e1,s1>, <e2,s2>, …}
  • 12. TableMiner – Phase 1 I-Inf •Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Itr.1 …. (until stop) Ei,j= {<e1,s1>, <e2,s2>, …} concepts = {<c1,s1>, <c2,s2>, …} Cj= {<c1,s1’>, <c2,s2‘>}
  • 13. TableMiner – Phase 1 I-Inf •Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Itr.1 …. (until stop) Ei,j= {<e1,s1>, <e2,s2>, …} concepts = {<c1,s1>, <c2,s2>, …} Cj= {<c1,s1’>, <c2,s2‘>} |H(Cj) – H(prevCj)|<t? Yes – stop No – next itr.
  • 14. TableMiner – Phase 1 I-Inf •Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation …. (until stop) concepts = {<c1,s1>, <c3,s3>, …} Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>} Ei,j= {<e1,s1>, <e2,s2>, …} Itr.2 |H(Cj) – H(prevCj)|<t? Yes – stop No – next itr.
  • 15. TableMiner – Phase 1 I-Inf •Incremental inference with stopping (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation …. (until stop) Itr.3 Ei,j= {<e1,s1>, <e2,s2>, …} concepts = {<c11,s11>} Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} |H(Cj) – H(prevCj)|<t? Yes – stop No – next itr.
  • 16. TableMiner – Phase 1 I-Inf •To compute scores of candidate named entities (e.g. <e1,s1>) and concepts (e.g., <c1,s1’>) Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation •Candidate NE •Build a feature vector of a candidate using the ontology •Build a feature vector of the cell/column header using its context •Compute vector similarity •Candidate concept: same principle, but also depends on score of contributing NEs
  • 17. TableMiner – Phase 2 Propagate, Update •When I-Inf stops • Select the highest scoring candidate concept c+ to label the column • Propagate: use c+ as constrain to disambiguate remaining cells – candidate NEs not belonging to c+ are discarded • Update: •Re-compute c+ after all cells are disambiguated •If the new c+ is different, revise disambiguation across the entire column with it as new constraint •Repeat until no change Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} c+ Rank and select Use as constraint to disamb- iguate cells
  • 18. Evaluation Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 19. TableMiner – Evaluation •Data • Freebase as reference ontology/background knowledge • Limaye112 – 112 Web tables from Limaye2010 originally annotated with Wikipedia •Cells are automatically mapped to Freebase – some are unmapped •Columns are manually annotated • IMDB – 7,354 “cast” tables of films mapped to Freebase Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 20. TableMiner – Evaluation •Baselines (both uses exhaustive inference) • Bfirst - cell disambiguation: choose the top ranked NE candidate in the Freebase search result - column classification: each disambiguated cell casts a vote to the set of concepts the NEs belong to, and the majority wins • Bsim - cell disambiguation: string similarity + feature vector similarity (in-table context only) - column classification: the majority vote method as above + string similarity Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 21. TableMiner – Evaluation Results •Cell disambiguation Manual validation of 932 cell annotations in Limaye112 not covered by the above results (i.e., unmapped cells) Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation If only consider those cells where at least one system predicts correctly
  • 22. TableMiner – Evaluation Results •Column classification best only – a column is labelled correctly only if the concept is suitable for the data in the column and is specific enough best or ok – a column is labelled correctly if the concept is suitable for the data in the column, though not very specific (E.g., ‘Film Actors’ may be the best, while ‘Artist’ or ‘Person’ is OK, but ‘Engineer’ is incorrect) Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 23. TableMiner – Evaluation Results •Efficiency – TableMiner is efficient because •Column classification: processes partial content from a column (avg. 57% Limaye112, 43% IMDB) •Cell disambiguation: constrained by column classification, resulting in smaller NE candidate space (avg. 32% reduction Limaye32, 24% IMDB) • Fewer candidates => less time spent on retrieval and feature space creation (typically >90% of CPU in the pipeline, Limaye2010) Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 24. TableMiner – Conclusion •TableMiner take-home messages •How can it be more effective? • Use both context within and outside tables as features for inference •How can it be more efficient? • Perform inference with partial data and follow the boot- strapping pattern of learning Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation Message 1 Message 2
  • 25. References •[Cafarella2008] Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y. 2008: Webtables: exploring the power of tables on the web. Proceedings of VLDB Endowment 1(1), 538–549 •[Limaye2010] Limaye, G., Sarawagi, S., Chakrabarti, S. 2010: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1-2), 1338–134 •[Lu2013] Lu, C., Bing, L., Lam, W., Chan, K., Gu, Y. 2013: Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications •[Mulwad2013] Mulwad, V., Finin, T., Joshi, A. 2013: Semantic message passing for generating linked data from tables. In: International Semantic Web Conference (1). pp. 363–378. Lecture Notes in Computer Science, Springer •[Venetis2011] Venetis, P., Halevy, A., Madhavan, J., Pas¸ca, M., Shen,W.,Wu, F., Miao, G.,Wu, C. 2011: Recovering semantics of tables on the web. Proceedings of VLDB Endowment 4(9), 528–538 Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation
  • 26. Thank you Z. Zhang / Towards Efficient and Effective Semantic Table Interpretation