SlideShare a Scribd company logo
Handling Dynamicity and Temporality of
Web Data
Hady Elsahar
hadyelsahar@gmail.com
Jean Monnet University
Saint-Étienne, France
First try with Question Answering
Weet it : Natural language interface for Linked Data (ElSahar et al. ‘11 )
● Most of the current knowledge bases focus on static facts and ignore
the temporal dimension of facts.
● Aspects of temporality and Dynamicity of Datasets :
○ Aspect 1 : Many facts are valid only during a particular time period.
○ Aspect 2 : New extracted facts can contradict with, verify or modify new ones
○ Aspect 3 : Some Facts are collectively induced from a series of Events
Handling Dynamicity of Data
Challenges and Motivations (1) :
Stephen Hawking
Many facts are valid only during a particular time
period.
Use Case : Questions about Temporal facts
● Who is first Wife of Stephen Hawiking ?
● Who is the 10th President of France ?
● Who is the past CEO of google ?
Extraction and Represenation of Temporal data
Extraction and representation of Temporal Facts and Events
❏ Representation :
❏ Keeping the last updated fact is not enough (DBpedia)
❏ Higher order fact (Erdal and Weikum ‘11)
❏ f1:Bill_Clinton isPresidentOf USA.
❏ f2:f1 startedOnDate 20-01-1993
❏ Wikidata Qualifiers (Vrandečić ‘12)
❏ Temporal fact and event extraction:
❏ Free Text and structured data from wikipedia (patterns and pattern induction)
(Erdal and Weikum ‘11)
Annotation of temporal facts in documents for Question answering
SemEval-2015 Task 5: QA TempEval
SemEval-2015 Task 5: QA TempEval
Question Examples in the Evaluation Dataset :
Yes / No:
● “Did the the Indonesian stock market rise again after it’s last fall ?
List:
● “What happened after the crash?”
● “What happened between the crash and yesterday?”
When (Factoid):
● “When did the Oscar ceremony end yesterday ?”
Applications ?
Challenges and Motivations (2) :
Stephen Hawking
In Highly dynamic datasets, new extracted facts
can contradict with, verify or modify new ones.
Existing facts New Extracted Fact
Matt Smith
is dbo:starring of
■ dbr:Womb_(film)
■ dbr:Lost_River_(film)
■ dbr:Bert_and_Dickie
■ dbr:The_Science_of_Doctor_Who
“Matt Smith is the doctor”
(Matt Smith, occupation, Medicine)
confidence : 0.1
(Frank Sinatra, profession, Singer) confidence : 0.9
(Jared leto, influenced_by, Frank Sinatra) confidence : 0.8
● People influenced by Writers are probably writers as well
● people are probably born at the same place of their siblings
Challenges and Motivations (2) :
Stephen Hawking
In Highly dynamic datasets, new extracted facts
can contradict with, verify or modify new ones.
Evaluation of new facts using Link prediction
Link Prediction
● Add new facts without extra knowledge
● Assess the validity of an unknown fact
Embedding Models for knowledge bases
TransE : Modeling Relations as Translations (Bordes et al. ’13):
● Modeling Facts as translations between vectors of entities
VSubject
+ VRelation
≅ VObject
● distance is used to Quantify confidence in facts
● Training objective: Find the representations that Minimizes distances across all true facts and
maximize across “corrupted” facts ( s’ , o’ ):
Other Embedding Models:
● Structured Embeddings (SE) (Bordes et al ‘11 )
● Collective Matrix Factorization (RESCAL) (Nickel et al., ’11)
● Neural Tensor Networks (socher et al. ‘13)
● TATEC (Garcia-Duran et al., ’14)
Embedding Models for Text + Knowledge bases:
● Joint Learning of Words and Meaning Representations (Bordes et al. ‘12)
● Knowledge Graph and Text Jointly Embedding (Wang et al ‘14)
Link prediction using Embedding Models
Applications ?
● Verification of new Extracted Facts
● Completeness of new added datasets
● Modeing literals dataypes (length, date ..etc ) not only relations and
entities.
Embedding Models other benefits ? (collaboration potential)
● Entity Disambiguation for Fact Extraction and QA (Bordes et al. ‘12)
● Paraphrase Detection for Questions, (PARALEX) (Fader et al. ‘13)
Challenges and Motivations (3) :
Reasoning with more than one supporting
facts ● Reasoning about positions (ex: Geo Data)
● Reasoning about Counts
● Reasoning about sizesFact 1 : 55 passengers crammed into the smuggler’s boat.
Fact 2 : The boat made it to the Greek island.
Question : Where are the passengers ?
Stephen Hawking
Facts induced from a series of Events
● Towards AI-Complete QA: A Set of Prerequisite Toy Tasks (Wetson et al ‘15)
● Memory Networks (Wetson et al ‘14)

More Related Content

PDF
Getting to Know Your Data with R
PPT
Visual thinking colin_ware_lectures_2013_10_research methods
PPTX
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
PDF
Word Embeddings, why the hype ?
PDF
Guidedesurviedecisionsabsurdes
PPTX
A Simple Introduction to Word Embeddings
PDF
Deep Learning for NLP: An Introduction to Neural Word Embeddings
PDF
Deep Learning for Natural Language Processing: Word Embeddings
Getting to Know Your Data with R
Visual thinking colin_ware_lectures_2013_10_research methods
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Word Embeddings, why the hype ?
Guidedesurviedecisionsabsurdes
A Simple Introduction to Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings

Viewers also liked (20)

PPTX
Smart Comm Marketing Report
PDF
Starting a portfolio
PPTX
AFMS April 2014 - Louisiana Resources & Issues
PDF
Clipping El Observador Solitario 14/11/11 @ IED Barcelona
DOC
Frase maleïda
PDF
Clipping Hola.com 14/11/11 @ IED Barcelona
PPTX
120313 wb mpresentation_rotterdam2012
PDF
Understanding project management qualifications
PDF
Data center dynamics ver. 1.0
PDF
STC PMC Newsletter 2011-04
PPTX
SES - Plush Search
DOC
Ds 011 201100000002
DOCX
Trabajo investigación udh 2015
PPT
All about
PPTX
Synthetic division
PDF
. Net Training Institute in Noida/NCR
PPTX
LOGA State of the Industry: Houston, TX
PPT
Perfil Profesional J.Carlos Nesta
PDF
Currency book
DOC
INGLES V
Smart Comm Marketing Report
Starting a portfolio
AFMS April 2014 - Louisiana Resources & Issues
Clipping El Observador Solitario 14/11/11 @ IED Barcelona
Frase maleïda
Clipping Hola.com 14/11/11 @ IED Barcelona
120313 wb mpresentation_rotterdam2012
Understanding project management qualifications
Data center dynamics ver. 1.0
STC PMC Newsletter 2011-04
SES - Plush Search
Ds 011 201100000002
Trabajo investigación udh 2015
All about
Synthetic division
. Net Training Institute in Noida/NCR
LOGA State of the Industry: Houston, TX
Perfil Profesional J.Carlos Nesta
Currency book
INGLES V
Ad

Similar to WDAqua introduction presentation (20)

PDF
Question Answering with Subgraph Embeddings
PDF
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
PDF
Learning to read for automated fact checking
PDF
Effective Semantics for Engineering NLP Systems
PPTX
Natural language processing and transformer models
PPTX
Reasoning Over Knowledge Base
PPTX
Reasoning Over Knowledge Base
PPTX
AI2 day.pptx
PDF
CORE: Context-Aware Open Relation Extraction with Factorization Machines
PPTX
Temporal reasoning task
PDF
SelQA: A New Benchmark for Selection-based Question Answering
PDF
NLP and Machine Learning for non-experts
PPTX
aritficial intellegence
PPTX
Knowledge acquisition using automated techniques
PDF
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
PDF
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
PDF
Improving neural question generation using answer separation
PDF
Latent Relational Model for Relation Extraction
PDF
Distant Supervision with Imitation Learning
PPTX
Ivan Markov – Improving Fake News Detection via Different ML Approaches
Question Answering with Subgraph Embeddings
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Learning to read for automated fact checking
Effective Semantics for Engineering NLP Systems
Natural language processing and transformer models
Reasoning Over Knowledge Base
Reasoning Over Knowledge Base
AI2 day.pptx
CORE: Context-Aware Open Relation Extraction with Factorization Machines
Temporal reasoning task
SelQA: A New Benchmark for Selection-based Question Answering
NLP and Machine Learning for non-experts
aritficial intellegence
Knowledge acquisition using automated techniques
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Improving neural question generation using answer separation
Latent Relational Model for Relation Extraction
Distant Supervision with Imitation Learning
Ivan Markov – Improving Fake News Detection via Different ML Approaches
Ad

Recently uploaded (20)

PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
RMMM.pdf make it easy to upload and study
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Pharma ospi slides which help in ospi learning
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Classroom Observation Tools for Teachers
PPTX
Institutional Correction lecture only . . .
PPTX
Lesson notes of climatology university.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Presentation on HIE in infants and its manifestations
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
01-Introduction-to-Information-Management.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
RMMM.pdf make it easy to upload and study
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pharma ospi slides which help in ospi learning
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Classroom Observation Tools for Teachers
Institutional Correction lecture only . . .
Lesson notes of climatology university.
Final Presentation General Medicine 03-08-2024.pptx
Presentation on HIE in infants and its manifestations
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GDM (1) (1).pptx small presentation for students
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Final Presentation General Medicine 03-08-2024.pptx
O7-L3 Supply Chain Operations - ICLT Program
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
01-Introduction-to-Information-Management.pdf

WDAqua introduction presentation

  • 1. Handling Dynamicity and Temporality of Web Data Hady Elsahar hadyelsahar@gmail.com Jean Monnet University Saint-Étienne, France
  • 2. First try with Question Answering Weet it : Natural language interface for Linked Data (ElSahar et al. ‘11 )
  • 3. ● Most of the current knowledge bases focus on static facts and ignore the temporal dimension of facts. ● Aspects of temporality and Dynamicity of Datasets : ○ Aspect 1 : Many facts are valid only during a particular time period. ○ Aspect 2 : New extracted facts can contradict with, verify or modify new ones ○ Aspect 3 : Some Facts are collectively induced from a series of Events Handling Dynamicity of Data
  • 4. Challenges and Motivations (1) : Stephen Hawking Many facts are valid only during a particular time period. Use Case : Questions about Temporal facts ● Who is first Wife of Stephen Hawiking ? ● Who is the 10th President of France ? ● Who is the past CEO of google ?
  • 5. Extraction and Represenation of Temporal data Extraction and representation of Temporal Facts and Events ❏ Representation : ❏ Keeping the last updated fact is not enough (DBpedia) ❏ Higher order fact (Erdal and Weikum ‘11) ❏ f1:Bill_Clinton isPresidentOf USA. ❏ f2:f1 startedOnDate 20-01-1993 ❏ Wikidata Qualifiers (Vrandečić ‘12) ❏ Temporal fact and event extraction: ❏ Free Text and structured data from wikipedia (patterns and pattern induction) (Erdal and Weikum ‘11)
  • 6. Annotation of temporal facts in documents for Question answering SemEval-2015 Task 5: QA TempEval
  • 7. SemEval-2015 Task 5: QA TempEval Question Examples in the Evaluation Dataset : Yes / No: ● “Did the the Indonesian stock market rise again after it’s last fall ? List: ● “What happened after the crash?” ● “What happened between the crash and yesterday?” When (Factoid): ● “When did the Oscar ceremony end yesterday ?” Applications ?
  • 8. Challenges and Motivations (2) : Stephen Hawking In Highly dynamic datasets, new extracted facts can contradict with, verify or modify new ones. Existing facts New Extracted Fact Matt Smith is dbo:starring of ■ dbr:Womb_(film) ■ dbr:Lost_River_(film) ■ dbr:Bert_and_Dickie ■ dbr:The_Science_of_Doctor_Who “Matt Smith is the doctor” (Matt Smith, occupation, Medicine) confidence : 0.1
  • 9. (Frank Sinatra, profession, Singer) confidence : 0.9 (Jared leto, influenced_by, Frank Sinatra) confidence : 0.8 ● People influenced by Writers are probably writers as well ● people are probably born at the same place of their siblings Challenges and Motivations (2) : Stephen Hawking In Highly dynamic datasets, new extracted facts can contradict with, verify or modify new ones.
  • 10. Evaluation of new facts using Link prediction Link Prediction ● Add new facts without extra knowledge ● Assess the validity of an unknown fact
  • 11. Embedding Models for knowledge bases TransE : Modeling Relations as Translations (Bordes et al. ’13): ● Modeling Facts as translations between vectors of entities VSubject + VRelation ≅ VObject ● distance is used to Quantify confidence in facts ● Training objective: Find the representations that Minimizes distances across all true facts and maximize across “corrupted” facts ( s’ , o’ ):
  • 12. Other Embedding Models: ● Structured Embeddings (SE) (Bordes et al ‘11 ) ● Collective Matrix Factorization (RESCAL) (Nickel et al., ’11) ● Neural Tensor Networks (socher et al. ‘13) ● TATEC (Garcia-Duran et al., ’14) Embedding Models for Text + Knowledge bases: ● Joint Learning of Words and Meaning Representations (Bordes et al. ‘12) ● Knowledge Graph and Text Jointly Embedding (Wang et al ‘14) Link prediction using Embedding Models
  • 13. Applications ? ● Verification of new Extracted Facts ● Completeness of new added datasets ● Modeing literals dataypes (length, date ..etc ) not only relations and entities. Embedding Models other benefits ? (collaboration potential) ● Entity Disambiguation for Fact Extraction and QA (Bordes et al. ‘12) ● Paraphrase Detection for Questions, (PARALEX) (Fader et al. ‘13)
  • 14. Challenges and Motivations (3) : Reasoning with more than one supporting facts ● Reasoning about positions (ex: Geo Data) ● Reasoning about Counts ● Reasoning about sizesFact 1 : 55 passengers crammed into the smuggler’s boat. Fact 2 : The boat made it to the Greek island. Question : Where are the passengers ? Stephen Hawking Facts induced from a series of Events ● Towards AI-Complete QA: A Set of Prerequisite Toy Tasks (Wetson et al ‘15) ● Memory Networks (Wetson et al ‘14)