SlideShare a Scribd company logo
Learning with Partial Data for Semantic Table Interpretation 
Ziqi Zhang 
Department of Computer Science, University of Sheffield
Semantic Table Interpretation 
•Input 
• Ontology 
• Relational table 
•Goals/Tasks 
• Column – classes/concepts 
• Cell – named entities 
• Column, Column – relation 
Thing 
Company 
Work 
Time Period 
… … 
Ent:2kGames 
Ent:THQ 
… 
VidoeGameCompany 
Video Game 
Year 
Name 
Publisher 
Year 
1 
Gears of War 
Microsoft 
2006 
2 
Civilization IV 
2k Games 
2006 
3 
Titan Quest 
THQ 
2006 
99 
Civilization V 
2k Games 
2010 
Table of video games (PC) 
< … … > 
… … 
Rel:publishedBy 
Rel:publishedBy 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Motivation 
•SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] 
Limitation 
Algorithm is ‘exhaustive’, but unnecessary 
Goal: Assign a concept to this column 
Hint: Content in the column gives useful clues 
How much do we need for inference (99 rows in this example)? 
- Human: SOME (learn by example) 
- SoA: ALL 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
Name 
Publisher 
Year 
1 
Gears of War 
Microsoft 
2006 
2 
Civilization IV 
2k Games 
2006 
3 
Titan Quest 
THQ 
2006 
99 
Civilization V 
2k Games 
2010 
< … … >
Research Questions 
•Can machines ‘learn by example’ 
• inference using only partial data (sample) 
• achieving good accuracy 
•How to choose a sample 
• does it matter (e.g., in terms of accuracy) 
• how to optimize 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
Zhang, Z. (2014). Towards efficient and effective semantic table interpretation. In Proceedings of the 13th International Semantic Web Conference, 487-502 
TableMiner 
(contribution of this work) 
Sample Selection
Method 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
TableMiner (modified) 
•Incremental inference (I-Inf) to address two tasks 
• Column classification 
•Using some data in the column 
• Cell disambiguation 
•Using column label to constrain disambiguation 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
•Incremental inference (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
TableMiner (modified)
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
TableMiner (modified) 
1 
2 
3 
… … 
Until Cj changes little (convergence)
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
TableMiner (modified) 
Cj= 
{<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} 
Column label (class) used as constraint in selecting candidate entities for disambiguation
Sample Selection – the Principle 
•‘Order matters’ 
• TableMiner processes data in order until convergence 
• Changing the order means 
•(Possibly) Different convergence speed 
•Different data are processed 
•Change the order of cells in a column (and corresponding row) such that 
• cells that are ‘easier’ to disambiguate come to the top 
•because the class for a column depends on cells in the column 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Sample Selection- ‘name length’ hypothesis 
•Longer names are easier to disambiguate than shorter names 
• e.g., “Manchester” v.s. “Manchester United F.C.” 
•Method name length (nl): 
•nl(Ti,j) = # of tokens in cell Ti,j 
•Re-order table rows by sorting on column Tj using nl(Ti,j) 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
•Names that have a richer feature representation are easier to disambiguate 
• B.O.W. representation using row context 
• ‘one-sense-per-discourse’ (in non-subject columns) 
• 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
Sample Selection- ‘feature density’ hypothesis
•Method ‘duplicate content cell’ (dup) 
• re-arrange the target column and table following ospd 
• dup(Ti,j) = # of times text of Ti,j is duplicated in column Tj 
• Re-order table rows by sorting on column Tj using dup(Ti,j) 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
Sample Selection- ‘feature density’ hypothesis
•Method ‘feature representation size’ (rep) 
• re-arrange the target column and table following ospd 
• rep(Ti,j) = # of tokens in the B.O.W. representation of Ti,j 
• Re-order table rows by sorting on column Tj using rep(Ti,j) 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
Sample Selection- ‘feature density’ hypothesis
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Evaluation 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Data 
•Data 
• Freebase as reference ontology/background knowledge 
• Limaye200 – 200 Web tables from Limaye2010 originally annotated with Wikipedia 
•Column classes are manually annotated 
• LimayeAll – 6310 Web tables from Limaye2010 
•Names in content cells are automatically mapped to Freebase 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Settings 
•Baseline 
• 푇푀푏푠 – modified TableMiner to use all cells in a column for column classification (everything else unchanged) 
•Comparison* 
• 푇푀푚표푑 푛푙 - TableMiner using name length sample selection method 
• 푇푀푚표푑 푑푢푝 - TableMiner using duplicate content cell sample selection method 
• 푇푀푚표푑 푟푒푝 - TableMiner using feature representation size sample selection method * The original TableMiner is modified. For details and other settings see paper. 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Results 
•Results in F1 
•Convergence speed in column classification 
•Reduced candidate named entities for disambiguation 
푇푀푏푠 
푇푀푚표푑 푛푙 
푇푀푚표푑 푑푢푝 
푇푀푚표푑 푟푒푝 
Classification (Limaye200) 
72.1 
72.3 
72.0 
72.1 
Disambiguation (LimayeAll) 
80.9 
81.3 
81.22 
81.24 
푇푀푏푠 
푇푀푚표푑 푛푙 
푇푀푚표푑 푑푢푝 
푇푀푚표푑 푟푒푝 
Limaye200 
100% 
36.3% 
36.1% 
35.3% 
푇푀푏푠 
푇푀푚표푑 푛푙 
푇푀푚표푑 푑푢푝 
푇푀푚표푑 푟푒푝 
Limaye200 
0 
32.4% 
48.1% 
46.8% 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Results 
•Results in F1 
•Convergence speed in column classification 
•Reduced candidate named entities disambiguation 
푇푀푏푠 
푇푀푚표푑 푛푙 
푇푀푚표푑 푑푢푝 
푇푀푚표푑 푟푒푝 
Classification (Limaye200) 
72.1 
72.3 
72.0 
72.1 
Disambiguation (LimayeAll) 
80.9 
81.3 
81.22 
81.24 
푇푀푏푠 
푇푀푚표푑 푛푙 
푇푀푚표푑 푑푢푝 
푇푀푚표푑 푟푒푝 
Limaye200 
100% 
36.3% 
36.1% 
35.3% 
푇푀푏푠 
푇푀푚표푑 푛푙 
푇푀푚표푑 푑푢푝 
푇푀푚표푑 푟푒푝 
Limaye200 
0 
32.4% 
48.1% 
46.8% 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
Comparable or better accuracy 
But uses only partial data for column classification 
… and process much less data for disambiguation
Conclusion 
•Learning with partial data for semantic table interpretation can be both effective and efficient 
•The choice of sample selection methods makes limited difference in terms of accuracy and efficiency 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
Thank you 
Z. Zhang / Learning with Partial Data for Semantic Table Interpretation 
@ziqizhang_zz http://staffwww.dcs.shef.ac.uk/people/Z.Zhang

More Related Content

PPTX
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
PPT
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
PDF
Introduction to Data Analytics with R
PPTX
Unit 2 linked list
PPTX
Search tree & graph
PPTX
Bsc cs ii dfs u-3 tree and graph
PDF
XPath XSLT Workshop - Concept Listing
PPT
Elastic Hierarchies: Combining Treemaps and Node-Link Diagrams
Unit I- Data structures Introduction, Evaluation of Algorithms, Arrays, Spars...
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
Introduction to Data Analytics with R
Unit 2 linked list
Search tree & graph
Bsc cs ii dfs u-3 tree and graph
XPath XSLT Workshop - Concept Listing
Elastic Hierarchies: Combining Treemaps and Node-Link Diagrams

What's hot (7)

PPTX
1.introduction to data_structures
PPT
358 33 powerpoint-slides_5-arrays_chapter-5
PDF
Mcq question bank
PDF
Data structure
PPTX
Array
PDF
Ii pu cs practical viva voce questions
DOC
Advanced c c++
1.introduction to data_structures
358 33 powerpoint-slides_5-arrays_chapter-5
Mcq question bank
Data structure
Array
Ii pu cs practical viva voce questions
Advanced c c++
Ad

Viewers also liked (7)

PDF
Intro to Semantic Web
PDF
Situations as attractors for semantic interpretation
PPTX
The Boundary between Syntax and Semantics - Prof. Fredreck J. Newmeyer
PPTX
Semantic barriers in communication
PDF
Syntax analysis
PDF
NISM MUTUAL FUND MODEL TEST
PPT
Syntax analysis
Intro to Semantic Web
Situations as attractors for semantic interpretation
The Boundary between Syntax and Semantics - Prof. Fredreck J. Newmeyer
Semantic barriers in communication
Syntax analysis
NISM MUTUAL FUND MODEL TEST
Syntax analysis
Ad

Similar to Ekaw2014 ziqi zhang (20)

PDF
Towards Efficient and Effective Semantic Table Interpretation
PDF
Tech Jam 01 - Database Querying
PPTX
DrawingML Subject: Tables
PDF
Table Retrieval and Generation
PPTX
T5: Unified Text to Text Transfer Transformer
PDF
Toward Description Generation for Tables in Scientific Articles
PDF
PDF
Oracle sql tutorial
PPTX
DOC-20240624-WA00ggdfhjfgbbhhgfuujb00.pptx
PDF
DP080_Lecture_1 SQL lecture document .pdf
PPT
6.1\9 SSIS 2008R2_Training - DataFlow Transformations
PPTX
Symbol Table
PPTX
Dremel interactive analysis of web scale datasets
PPTX
EnviroInsite training workshop - Database fundamentals
PPTX
DeepTrans: AI That Learns to Think Before It Translates
PPTX
Introduction To Programming In R for data analyst
PPT
PDF
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
PPTX
OracleSQLraining.pptx
Towards Efficient and Effective Semantic Table Interpretation
Tech Jam 01 - Database Querying
DrawingML Subject: Tables
Table Retrieval and Generation
T5: Unified Text to Text Transfer Transformer
Toward Description Generation for Tables in Scientific Articles
Oracle sql tutorial
DOC-20240624-WA00ggdfhjfgbbhhgfuujb00.pptx
DP080_Lecture_1 SQL lecture document .pdf
6.1\9 SSIS 2008R2_Training - DataFlow Transformations
Symbol Table
Dremel interactive analysis of web scale datasets
EnviroInsite training workshop - Database fundamentals
DeepTrans: AI That Learns to Think Before It Translates
Introduction To Programming In R for data analyst
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
OracleSQLraining.pptx

Ekaw2014 ziqi zhang

  • 1. Learning with Partial Data for Semantic Table Interpretation Ziqi Zhang Department of Computer Science, University of Sheffield
  • 2. Semantic Table Interpretation •Input • Ontology • Relational table •Goals/Tasks • Column – classes/concepts • Cell – named entities • Column, Column – relation Thing Company Work Time Period … … Ent:2kGames Ent:THQ … VidoeGameCompany Video Game Year Name Publisher Year 1 Gears of War Microsoft 2006 2 Civilization IV 2k Games 2006 3 Titan Quest THQ 2006 99 Civilization V 2k Games 2010 Table of video games (PC) < … … > … … Rel:publishedBy Rel:publishedBy Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 3. Motivation •SoA semantic table interpretation methods, e.g. [Limaye2010, Venetis2011, Mulwad2013] Limitation Algorithm is ‘exhaustive’, but unnecessary Goal: Assign a concept to this column Hint: Content in the column gives useful clues How much do we need for inference (99 rows in this example)? - Human: SOME (learn by example) - SoA: ALL Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Name Publisher Year 1 Gears of War Microsoft 2006 2 Civilization IV 2k Games 2006 3 Titan Quest THQ 2006 99 Civilization V 2k Games 2010 < … … >
  • 4. Research Questions •Can machines ‘learn by example’ • inference using only partial data (sample) • achieving good accuracy •How to choose a sample • does it matter (e.g., in terms of accuracy) • how to optimize Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Zhang, Z. (2014). Towards efficient and effective semantic table interpretation. In Proceedings of the 13th International Semantic Web Conference, 487-502 TableMiner (contribution of this work) Sample Selection
  • 5. Method Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 6. TableMiner (modified) •Incremental inference (I-Inf) to address two tasks • Column classification •Using some data in the column • Cell disambiguation •Using column label to constrain disambiguation Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 7. •Incremental inference (I-Inf) Tj – a column; Cj – candidate concepts for the column; Ei,j candidate entities for a cell Z. Zhang / Learning with Partial Data for Semantic Table Interpretation TableMiner (modified)
  • 8. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation TableMiner (modified) 1 2 3 … … Until Cj changes little (convergence)
  • 9. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation TableMiner (modified) Cj= {<c1,s1’>, <c2,s2‘>, <c3,s3‘>, …. <c11,s11‘>} Column label (class) used as constraint in selecting candidate entities for disambiguation
  • 10. Sample Selection – the Principle •‘Order matters’ • TableMiner processes data in order until convergence • Changing the order means •(Possibly) Different convergence speed •Different data are processed •Change the order of cells in a column (and corresponding row) such that • cells that are ‘easier’ to disambiguate come to the top •because the class for a column depends on cells in the column Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 11. Sample Selection- ‘name length’ hypothesis •Longer names are easier to disambiguate than shorter names • e.g., “Manchester” v.s. “Manchester United F.C.” •Method name length (nl): •nl(Ti,j) = # of tokens in cell Ti,j •Re-order table rows by sorting on column Tj using nl(Ti,j) Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 12. •Names that have a richer feature representation are easier to disambiguate • B.O.W. representation using row context • ‘one-sense-per-discourse’ (in non-subject columns) • Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Sample Selection- ‘feature density’ hypothesis
  • 13. •Method ‘duplicate content cell’ (dup) • re-arrange the target column and table following ospd • dup(Ti,j) = # of times text of Ti,j is duplicated in column Tj • Re-order table rows by sorting on column Tj using dup(Ti,j) Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Sample Selection- ‘feature density’ hypothesis
  • 14. •Method ‘feature representation size’ (rep) • re-arrange the target column and table following ospd • rep(Ti,j) = # of tokens in the B.O.W. representation of Ti,j • Re-order table rows by sorting on column Tj using rep(Ti,j) Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Sample Selection- ‘feature density’ hypothesis
  • 15. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 16. Evaluation Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 17. Data •Data • Freebase as reference ontology/background knowledge • Limaye200 – 200 Web tables from Limaye2010 originally annotated with Wikipedia •Column classes are manually annotated • LimayeAll – 6310 Web tables from Limaye2010 •Names in content cells are automatically mapped to Freebase Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 18. Settings •Baseline • 푇푀푏푠 – modified TableMiner to use all cells in a column for column classification (everything else unchanged) •Comparison* • 푇푀푚표푑 푛푙 - TableMiner using name length sample selection method • 푇푀푚표푑 푑푢푝 - TableMiner using duplicate content cell sample selection method • 푇푀푚표푑 푟푒푝 - TableMiner using feature representation size sample selection method * The original TableMiner is modified. For details and other settings see paper. Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 19. Results •Results in F1 •Convergence speed in column classification •Reduced candidate named entities for disambiguation 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Classification (Limaye200) 72.1 72.3 72.0 72.1 Disambiguation (LimayeAll) 80.9 81.3 81.22 81.24 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 100% 36.3% 36.1% 35.3% 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 0 32.4% 48.1% 46.8% Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 20. Results •Results in F1 •Convergence speed in column classification •Reduced candidate named entities disambiguation 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Classification (Limaye200) 72.1 72.3 72.0 72.1 Disambiguation (LimayeAll) 80.9 81.3 81.22 81.24 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 100% 36.3% 36.1% 35.3% 푇푀푏푠 푇푀푚표푑 푛푙 푇푀푚표푑 푑푢푝 푇푀푚표푑 푟푒푝 Limaye200 0 32.4% 48.1% 46.8% Z. Zhang / Learning with Partial Data for Semantic Table Interpretation Comparable or better accuracy But uses only partial data for column classification … and process much less data for disambiguation
  • 21. Conclusion •Learning with partial data for semantic table interpretation can be both effective and efficient •The choice of sample selection methods makes limited difference in terms of accuracy and efficiency Z. Zhang / Learning with Partial Data for Semantic Table Interpretation
  • 22. Thank you Z. Zhang / Learning with Partial Data for Semantic Table Interpretation @ziqizhang_zz http://staffwww.dcs.shef.ac.uk/people/Z.Zhang