SlideShare a Scribd company logo
Harvesting Knowledge from Social
Networks:
Extracting Typed Relationships among
Entities
Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel
marco.brambilla@polimi.it
marcobrambi
SoWeMine Workshop @ ICWE 2017, Rome, Italy
Agenda
(1)Context
(2)Objectives
(3)Method
(4)Experiments and Validation
(5)Visualization and Exploration
(6)Conclusions
(1) Context
Ontology is the philosophical study of
the nature of being, becoming,
existence or reality
and the basic categories of being and their
relations.
Formalizing new knowledge is hard
Only high frequency emerges
The long tail challenge
Sourcing the Long Tail
Famous Emerging
…
(2) Objective
Objective
Extraction of relationships
among entities
Reconstruct a typed graph of entities & relationships
Represent the knowledge contained in social data
No need for a-priori domain knowledge
Knowledge Enrichment Setting
HF Entity1 HF Entity5
HF Entity2 HF Entity4
HF Entity3
LF Entity1
??
LF Entity2 LF Entity4
LF Entity3
??
High Frequency
Entities
Low Frequency
Entities
??
?? ????
??
Type1
Type11
Type2
Type111
Instances
Types
<<instanceof>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
??
??
??
??
??
Seed Entity
Seed Type
Type of
interest
Legend
Expert inputs
Enrichment problems
Property2
Relations HF - LF entities
Relations LF - LF entities
Typing of LF entities
Extraction of new LF entities
Property1
?? ?? ??
Finding attribute values
A Practical Example
A Practical Example
Challenge and Innovation
Highly unstructured social data
(tweets and Facebook posts)
No reliable grammar structures
(3) Method
Analysis Pipeline
(0) Preprocessing
(1) Entity Extraction
(2) Relationship Extraction
(3) Relationship Aggregation
(4) Relationship Typing
(1) Evolution of work presented in:
M. Brambilla, S. Ceri, E. Della Valle, R. Volonterio, and F. Acero Salazar.
“Extracting Emerging Knowledge from Social Media”, WWW 2017.
Pipeline Summary
(0) Preprocessing
Text cleaning and enrichment
+ Traditional text preprocessing (stemming, …)
(1) Entity Extraction
Entity identification and semantic typing
Exploiting:
Stanford
CoreNLP
NER
Dandelion
API
(2) Relationship Extraction
Baseline with Stanford OpenIE for triple extraction:
Several issues:
- Meaningless relations
- Wrong relations
- Multiple relations
(3) Relationship Aggregation
Sails fans. Season 2 airs on May 24th on History on D Stv Jag Comms
Too many answers
for the same question!
Empirical rules
{"entity1":"Season 2",
"relationship":"air on",
"entity2":"May 24th"}
(4) Relationship Typing (A): Synonyms
Exploiting synsets based on WordNet 3.1
(4) Relationship Typing (B): Matching
Types
(4) Relationship Typing (C): Linguistics
Based on VerbNet
Groupings of verbs based on syntactic and semantic properties
Pipeline Implementation
(4) Validation
Experiments
TV Series: Black Salis, Teen Wolf, Vikings
Milan Fashion Week
Rugby games
Domains and quality of results -
summary
Relationships and Verb Classes
Example: Teen Wolf
0
100
200
300
400
500
600
700
800
Occurrences
Teen Wolf Synonyms Classes
Example: Teen Wolf
0
100
200
300
400
500
600
700
800
Occurrences
Teen Wolf Synonyms Classes
OCCURRENCES
TEEN WOLF VERBNET CLASSES
Overall Quality Indexes of
Entity and Relationships Extraction
(5) Visualization
Motivation
Resulting semantic
models extremely
large and hard to
interpret
Example:
Black Sails collection,
containing 1243 entities
and 2025 relations.
Exploration
Visualization
Filtering
Navigation
Exploration
Visualization
Filtering
Navigation
Exploration
Visualization
RELATIONSHIP Filtering
Navigation
Examples
Milano
Fashion
Week
Generate
d graph
Examples
Milano
Fashion
Week
Generated
graph
Examples
Milano
Fashion
Week
Generated
graph
Examples
Milano
Fashion
Week
Generated
graph
(6) Conclusions
Conclusions
Extraction of relevant emerging relationships
feasible even in case of extremely unstructured
and informal content (social media)
Still a long way to perfect extraction:
•N-ary relations
•Time-dependency
•Poor typing of entities in ontologies
THANKS!
QUESTIONS?
Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel
Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities
Marco Brambilla @marcobrambi marco.brambilla@polimi.it
http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi

More Related Content

PPTX
Filtering out improper user accounts from twitter user accounts for discoveri...
PPTX
Information Extraction from Text, presented @ Deloitte
PDF
A semantic based approach for information retrieval from html documents using...
PDF
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
PDF
EXTRACTING ARABIC RELATIONS FROM THE WEB
PDF
Open IE tutorial 2018
PDF
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Filtering out improper user accounts from twitter user accounts for discoveri...
Information Extraction from Text, presented @ Deloitte
A semantic based approach for information retrieval from html documents using...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
EXTRACTING ARABIC RELATIONS FROM THE WEB
Open IE tutorial 2018
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...

Similar to Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities (20)

PPT
Introduction of Semantic Web using NLP techniques.
PPTX
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
PPTX
2015 07-tuto2-clus type
PPTX
The Unreasonable Effectiveness of Metadata
PPTX
Semantic Web, Ontology, and Ontology Learning: Introduction
PDF
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
PPTX
Knowledge acquisition using automated techniques
PDF
From Linked Data to Semantic Applications
PDF
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
PDF
Learning with the Web. Structuring data to ease machine understanding
PPT
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
PDF
Ontology Engineering Synthesis Lectures On Data Semantics And Knowledge 1st E...
PPT
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
PDF
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
PDF
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
PDF
Identifying the semantic relations on
PDF
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
PDF
Extraction and Retrieval of Web based Content in Web Engineering
PDF
Semantic relations: new (terminological) challenges in a world of Linked Data
PPTX
Information Extraction
Introduction of Semantic Web using NLP techniques.
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
2015 07-tuto2-clus type
The Unreasonable Effectiveness of Metadata
Semantic Web, Ontology, and Ontology Learning: Introduction
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge acquisition using automated techniques
From Linked Data to Semantic Applications
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
Learning with the Web. Structuring data to ease machine understanding
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
Ontology Engineering Synthesis Lectures On Data Semantics And Knowledge 1st E...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
Identifying the semantic relations on
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
Extraction and Retrieval of Web based Content in Web Engineering
Semantic relations: new (terminological) challenges in a world of Linked Data
Information Extraction
Ad

More from Marco Brambilla (20)

PDF
A GraphRAG approach for Energy Efficiency Q&A
PDF
Essential concepts of data architectures
PDF
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
PDF
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
PPTX
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
PDF
Exploring the Bi-verse. A trip across the digital and physical ecospheres
PPTX
Conversation graphs in Online Social Media
PPTX
Trigger.eu: Cocteau game for policy making - introduction and demo
PPTX
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
PPTX
Analyzing rich club behavior in open source projects
PDF
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
PPTX
Community analysis using graph representation learning on social networks
PDF
Available Data Science M.Sc. Thesis Proposals
PPTX
Data Cleaning for social media knowledge extraction
PPTX
Iterative knowledge extraction from social networks. The Web Conference 2018
PDF
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
PDF
Myths and challenges in knowledge extraction and analysis from human-generate...
PPTX
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
PPTX
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
PDF
Big Data and Stream Data Analysis at Politecnico di Milano
A GraphRAG approach for Energy Efficiency Q&A
Essential concepts of data architectures
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Conversation graphs in Online Social Media
Trigger.eu: Cocteau game for policy making - introduction and demo
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Analyzing rich club behavior in open source projects
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Community analysis using graph representation learning on social networks
Available Data Science M.Sc. Thesis Proposals
Data Cleaning for social media knowledge extraction
Iterative knowledge extraction from social networks. The Web Conference 2018
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Myths and challenges in knowledge extraction and analysis from human-generate...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
Big Data and Stream Data Analysis at Politecnico di Milano
Ad

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to machine learning and Linear Models
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Business Analytics and business intelligence.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
1_Introduction to advance data techniques.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Data_Analytics_and_PowerBI_Presentation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Supervised vs unsupervised machine learning algorithms
ISS -ESG Data flows What is ESG and HowHow
Reliability_Chapter_ presentation 1221.5784
Introduction to machine learning and Linear Models
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Mega Projects Data Mega Projects Data
Business Analytics and business intelligence.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
.pdf is not working space design for the following data for the following dat...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Quality review (1)_presentation of this 21
Introduction-to-Cloud-ComputingFinal.pptx

Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities