SlideShare a Scribd company logo
On the Quest for Changing Knowledge
Marco Brambilla, Stefano Ceri, Florian Daniel, Emanuele Della Valle
@marcobrambi
Data-driven innovation
and
Innovation-driven data
Innovation requires
Precise
To the point
Up-to-date
Domain-specific
information
There are more things
In heaven and earth, Horatio,
Than are dreamt of in your philosophy.
Shakespeare (Hamlet Act 1, scene 5)
From Data to Wisdom
Formalizing new knowledge is hard
Only high frequency emerges
The long tail challenge
Knowledge Extraction
Text mining
Semantic Web
Search and recommendation systems
No specific care for emerging knowledge
Heaven and Heart
How to peer through an effective window
on real world?
Social media, our blessing and curse
Domain experts matter
Can we use social networks to
discover emerging knowledge?
Beware the streetlamp effect
The bias of the source
The bias of the observer
Famous Emerging
Evolving Knowledge
consolidated
knowledge
social
content
factoid
a
c
¬c
bpotentially
emerging potentially
decaying
Overview
Knowledge Enrichment Setting
HF Entity1 HF Entity5
HF Entity2 HF Entity4
HF Entity3
LF Entity1
??
LF Entity2 LF Entity4
LF Entity3
??
High Frequency
Entities
Low Frequency
Entities
??
?? ????
??
Type1
Type11
Type2
Type111
Instances Types
<<instanceof>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
??
??
??
??
??
Seed Entity
Seed
Type
Type of
interest
Legend
Expert inputs
Enrichment problems
Property2
Relations HF - LF entities
Relations LF - LF entities
Typing of LF entities
Extraction of new LF entities
Property1
?? ?? ??
Finding attribute values
Emerging Knowledge Harvesting
Domain Types
Types selected by the experts
Relevant for the domain
Seed characterization
Selected by the expert
Belonging to an expert type
Thoroughly Described
# @ a w
Social Media Sourcing
Content coming from the seeds’ accounts
Candidate Selection
Potentially any entity extracted from
the social streams
Resulting in huge sets of candidates
# @ a w ♥
Candidate Typing
Candidate Pruning
Initial pruning of candidates based on
TF-DF:= df * tf / (N – df +1)
(*) variant of TF-IDF that does not discount document frequency because we
are actually happy about frequent appearance
(we don’t look for information entropy!)
Candidate Ranking
Candidate Vector Space
Purely syntactic
Semantic:
Based on entity extraction / DBpedia
Based on deep learning on images / ClarifAI
On the Quest for Changing Knowledge. Capturing emerging entities from social media. WebScience 2016 DDI
Example Analysis
Experiments
Fashion brands
Writers
Painters
Exhibitions
4,400 strategies evaluated
44 alternative feature vectors
(12 basic features and 32 aggregations)
9 different weighting values for aggregations
5 levels of recall for entity extraction
3 different distances
Pruning Phase
From 4,400 down to 10 strategies
Eliminating the less relevant parameters
Italian Fashion Brands
Precision @5 = 0.2
Increasing # seeds reduces precision
Australian Writers – 22 seeds
Precision @5 = 0.8
Innovative Painters – 21 seeds
Precision @5 = 0.6
Twitter vs. Instagram
P@5 = 1.0
P@5 = 0.8
vs.
Fashion: Twitter + Instagram
&
&
Writers: Twitter + Instagram
Prec. = 1
Conclusion
It’s about time
to build innovation based on data
and build knowledge based on innovation
Harvesting can be iterative
On the Quest for Changing Knowledge
contact us
Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it
http://datascience.deib.polimi.it

More Related Content

PDF
Myths and challenges in knowledge extraction and analysis from human-generate...
PDF
Available Data Science M.Sc. Thesis Proposals
PPTX
Text analysis-semantic-search
PDF
Adding value to NLP: a little semantics goes a long way
PDF
Critically Assembling Data, Processes & Things: Toward and Open Smart City
PDF
Making Decisions in a World Awash in Data: We’re going to need a different bo...
PPTX
Ethos and Pragmatics of Data Sharing
Myths and challenges in knowledge extraction and analysis from human-generate...
Available Data Science M.Sc. Thesis Proposals
Text analysis-semantic-search
Adding value to NLP: a little semantics goes a long way
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Ethos and Pragmatics of Data Sharing

What's hot (12)

PPTX
The language of social media
PDF
Filth and lies: analysing social media
PDF
Online text data for machine learning, data science, and research - Who can p...
PDF
Using language to save the world: interactions between society, behaviour and...
PPTX
BIG DATA ANALYTICS
PPTX
A Paradox but a Possibility: Modern Data Technologies in the Humanitarian World
PPTX
Predicting News Popularity by Mining Online Discussions
PPTX
Maltego Radium Mapping Network Ties and Identities across the Internet
PDF
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
PPT
Geographic knowledge discovery (PhD Theme) by Roberto Zagal
PDF
30 Tools and Tips to Speed Up Your Digital Workflow
The language of social media
Filth and lies: analysing social media
Online text data for machine learning, data science, and research - Who can p...
Using language to save the world: interactions between society, behaviour and...
BIG DATA ANALYTICS
A Paradox but a Possibility: Modern Data Technologies in the Humanitarian World
Predicting News Popularity by Mining Online Discussions
Maltego Radium Mapping Network Ties and Identities across the Internet
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Geographic knowledge discovery (PhD Theme) by Roberto Zagal
30 Tools and Tips to Speed Up Your Digital Workflow
Ad

Viewers also liked (8)

PPTX
Model Driven Development of Social Media Environmental Monitoring Applications
PPTX
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
PPTX
Model-driven Development of Social Network-enabled Applications
PPTX
Automatic code generation for cross platform, multi-device mobile apps. An in...
PPTX
Interaction Flow Modeling Language: updates on the Beta2 version - by the OMG...
PPTX
IFML - The interaction flow modeling language, the OMG standard for UI modeli...
PPTX
IFML - Interaction Flow Modeling Language - tutorial on UI and UX modeling &...
PDF
Digital Transformation and the Customer Experience
Model Driven Development of Social Media Environmental Monitoring Applications
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Model-driven Development of Social Network-enabled Applications
Automatic code generation for cross platform, multi-device mobile apps. An in...
Interaction Flow Modeling Language: updates on the Beta2 version - by the OMG...
IFML - The interaction flow modeling language, the OMG standard for UI modeli...
IFML - Interaction Flow Modeling Language - tutorial on UI and UX modeling &...
Digital Transformation and the Customer Experience
Ad

Similar to On the Quest for Changing Knowledge. Capturing emerging entities from social media. WebScience 2016 DDI (20)

PPTX
Making the Web Searchable - Keynote ICWE 2015
PPTX
(Keynote) Peter Mika - “Making the Web Searchable”
PDF
Domain Modeling for Personalized Learning
PDF
Recsys 2016
PPT
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
PPT
Artificial Intelligence and the Internet
PDF
Haystacks slides
ODP
The need for sophistication in modern search engine implementations
PDF
Data and Knowledge as Commodities
PPTX
Iterative knowledge extraction from social networks. The Web Conference 2018
PPTX
The Unreasonable Effectiveness of Metadata
PDF
DMTM Lecture 02 Data mining
PDF
Semantic Web: an introduction
PPTX
From Queries to Answers in the Web
PDF
Smart Data Webinar: Knowledge as a Service
PDF
Mining and Understanding Activities and Resources on the Web
PDF
Hybrid use of machine learning and ontology
PPTX
Smart datamining semtechbiz 2013 report
PDF
Making Intelligent Virtual Assistants a Reality
PPT
Open Innovation and Semantic Web
Making the Web Searchable - Keynote ICWE 2015
(Keynote) Peter Mika - “Making the Web Searchable”
Domain Modeling for Personalized Learning
Recsys 2016
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
Artificial Intelligence and the Internet
Haystacks slides
The need for sophistication in modern search engine implementations
Data and Knowledge as Commodities
Iterative knowledge extraction from social networks. The Web Conference 2018
The Unreasonable Effectiveness of Metadata
DMTM Lecture 02 Data mining
Semantic Web: an introduction
From Queries to Answers in the Web
Smart Data Webinar: Knowledge as a Service
Mining and Understanding Activities and Resources on the Web
Hybrid use of machine learning and ontology
Smart datamining semtechbiz 2013 report
Making Intelligent Virtual Assistants a Reality
Open Innovation and Semantic Web

More from Marco Brambilla (20)

PDF
A GraphRAG approach for Energy Efficiency Q&A
PDF
Essential concepts of data architectures
PDF
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
PDF
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
PPTX
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
PDF
Exploring the Bi-verse. A trip across the digital and physical ecospheres
PPTX
Conversation graphs in Online Social Media
PPTX
Trigger.eu: Cocteau game for policy making - introduction and demo
PPTX
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
PPTX
Analyzing rich club behavior in open source projects
PDF
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
PPTX
Community analysis using graph representation learning on social networks
PPTX
Data Cleaning for social media knowledge extraction
PDF
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
PPTX
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
PPTX
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
PPTX
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
PDF
Big Data and Stream Data Analysis at Politecnico di Milano
PPTX
Web Science. An introduction
PDF
Model driven software engineering in practice book - Chapter 9 - Model to tex...
A GraphRAG approach for Energy Efficiency Q&A
Essential concepts of data architectures
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Conversation graphs in Online Social Media
Trigger.eu: Cocteau game for policy making - introduction and demo
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Analyzing rich club behavior in open source projects
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Community analysis using graph representation learning on social networks
Data Cleaning for social media knowledge extraction
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
Big Data and Stream Data Analysis at Politecnico di Milano
Web Science. An introduction
Model driven software engineering in practice book - Chapter 9 - Model to tex...

Recently uploaded (20)

PDF
Customer Churn Prediction in Digital Banking: A Comparative Study of Xai Tech...
PDF
11111111111111111111111111111111111111111111111
PDF
TikTok Live shadow viewers_ Who watches without being counted
PDF
Why Digital Marketing Matters in Today’s World Ask ChatGPT
PDF
Your Best Post Vanished. Blame the Attention Economy
PDF
StarNetCafeSB2012D3POYNagaworld2-Hotel-Casino-Phnom Entertainment
PDF
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
PDF
Subscribe This Channel Subscribe Back You
PPTX
Developing lesson plan gejegkavbw gagsgf
PDF
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
PPTX
Result-Driven Social Media Marketing Services | Boost ROI
PDF
Transform Your Social Media, Grow Your Brand
PDF
The Edge You’ve Been Missing Get the Sociocosmos Edge
PPTX
Strategies for Social Media App Enhancement
PDF
Instagram Reels Growth Guide 2025.......
DOCX
Buy Goethe A1 ,B2 ,C1 certificate online without writing
PDF
The Fastest Way to Look Popular Buy Reactions Today
PPTX
Preposition and Asking and Responding Suggestion.pptx
PDF
Mastering Social Media Marketing in 2025.pdf
PDF
Real Presence. Real Power. Boost with Authenticity
Customer Churn Prediction in Digital Banking: A Comparative Study of Xai Tech...
11111111111111111111111111111111111111111111111
TikTok Live shadow viewers_ Who watches without being counted
Why Digital Marketing Matters in Today’s World Ask ChatGPT
Your Best Post Vanished. Blame the Attention Economy
StarNetCafeSB2012D3POYNagaworld2-Hotel-Casino-Phnom Entertainment
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
Subscribe This Channel Subscribe Back You
Developing lesson plan gejegkavbw gagsgf
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
Result-Driven Social Media Marketing Services | Boost ROI
Transform Your Social Media, Grow Your Brand
The Edge You’ve Been Missing Get the Sociocosmos Edge
Strategies for Social Media App Enhancement
Instagram Reels Growth Guide 2025.......
Buy Goethe A1 ,B2 ,C1 certificate online without writing
The Fastest Way to Look Popular Buy Reactions Today
Preposition and Asking and Responding Suggestion.pptx
Mastering Social Media Marketing in 2025.pdf
Real Presence. Real Power. Boost with Authenticity

On the Quest for Changing Knowledge. Capturing emerging entities from social media. WebScience 2016 DDI