SlideShare a Scribd company logo
KIT – The Research University in the Helmholtz Association
INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB)
www.kit.edu
Linked Data Entity Summarization
Dipl.-Inf. Univ. Andreas Thalhammer 08.12.2016
Institute of Applied Informatics and Formal
Description Methods (AIFB)
2
Outline
1. Motivation
2. Research Questions
3. Contributions
a) LinkSUM (Contribution 1)
b) SUMMA API (Contribution 3)
4. Related Work
5. Summary and Outlook
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
3 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
1. MOTIVATION
Institute of Applied Informatics and Formal
Description Methods (AIFB)
4
Information need versus availability
Information need (in the US*)
More than 40% of all search queries are focused on one specific entity.
579 million searches per day come from home and work devices in the
US every day.
~ 232 million searches for entities (every day; in the US; desktop)
Information availability (Wikidata**)
Wikidata covers 24.5 million entities (growth of 55% in last year).
3.2 million entities have > 10 statements (growth of 78% in last year).
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
* https://www.comscore.com/Insights/Rankings/comScore-Releases-February-2016-US-Desktop-Search-Engine-Rankings
** https://www.wikidata.org/wiki/Wikidata:Statistics
Institute of Applied Informatics and Formal
Description Methods (AIFB)
5
Wikidata entry
for Pulp Fiction
~ 614 facts
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Growing amount of structured data on the Web
Institute of Applied Informatics and Formal
Description Methods (AIFB)
6
Naïve solution: Entity presentation based on
class summaries
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Source: yahoo.com)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
7
Problems of class summaries
1. The patterns are very static and do not reflect the individual
particularities of entities.
2. A pattern needs to be created for each type and class hierarchies
need to be considered.
3. Some entities are of multiple (distinct) types with unclear main type.
4. Some of the properties can have many values for which no ranking or
cut-off is defined.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Person Athlete
Body
builder
Arnold
Schwarzenegger
Angkor Wat
Institute of Applied Informatics and Formal
Description Methods (AIFB)
8
Entity Summarization
Propositions:
Every entity is individual.
For different entities, different properties are of importance.
Entities of the same type do not always have the same attributes.
For each entity, a single property-value pair can be of different
relevance.
Solution:
Focus on individual particularities of each entity:
Entity Summarization
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
9 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
2. RESEARCH QUESTIONS
Institute of Applied Informatics and Formal
Description Methods (AIFB)
10
Challenge #1
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
RQ1: How can we effectively summarize entities with limited
background information?
RQ1.1: How can we use link analysis effectively in order to derive
summaries of entities?
RQ1.2: How can we use usage data analysis effectively in order to derive
summaries of entities?
RDF data typically does not reflect importance levels in its relations.
Proprietary entity summarization systems have access to a lot of data
(e.g., search queries) and infrastructure (e.g., a full Web index).
Other knowledge panel providers (such as publishers) are lacking that
information and infrastructure.
(Source: google.com)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
11
Challenge #2
RQ2: Is there a minimum set of re-occurring/common features of entity
summarization systems that allow us to provide a generic API?
Andreas Thalhammer – Linked Data Entity Summarization03.10.201803.10.2018
Providers of knowledge panels are hiding the original graph structure in
strongly abstracted interfaces.
Standardized programmatic access is desirable (but not available).
(Source: google.com)
(Source: developers.google.com/knowledge-graph)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
12
Challenge #3
RQ3: How can we align duplicate/similar facts about Linked Data
entities on the Web?
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Different Web sources provide structured information about a single entity.
The different sources often cover similar information but do not provide
according links or vocabulary mappings.
Alignments are particularly difficult as the sources typically provide data at
different levels of modeling granularity.
(Source: imdb.com)
(Source: wikidata.org)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
13 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
3. CONTRIBUTIONS
Institute of Applied Informatics and Formal
Description Methods (AIFB)
14
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Overview: Research Questions and Contributions
RQ1: How can we effectively summarize entities with limited
background information?
RQ1.1: How can we use link analysis effectively in order to
derive summaries of entities? (Contribution 1)
RQ1.2: How can we use usage data analysis effectively in
order to derive summaries of entities? (Contribution 2)
RQ2: Is there a minimum set of re-occurring/common features of
entity summarization systems that allow us to provide a generic
API (Contribution 3)
RQ3: How can we align duplicate/similar facts about Linked Data
entities on the Web? (Contribution 4)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
15
Linked Data Entity Summarization
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Contribution 1
Institute of Applied Informatics and Formal
Description Methods (AIFB)
16
LinkSUM
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Step 1: Select top-k important related resources.
Step 2: Select the most relevant connecting predicate.
Idea: Use link analysis for selecting facts.
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
17 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: Resource Selection
Quentin
Tarantino
Pulp Fiction
director
Compute PageRank [5] scores of entities with (un-typed)
links that occur in textual descriptions of entities (pr).
Use “Backlinks” [7] (also called “mutual links”) for finding strong
connections (bl):
Combine scores:
(Link Structure)
LinkSUM
dbpedia:Category:English-language_films 220.961
dbpedia:Quentin_Tarantino 13.7403
dbpedia:John_Travolta 10.5771
dbpedia:Miramax_Films 9.9398
... ...
Institute of Applied Informatics and Formal
Description Methods (AIFB)
18 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: Relation Selection
Problem: multiple relations
Approaches:
Frequency (FRQ)
#times the predicate is used
Exclusivity (EXC)
1 / (N + M)
Description (DSC):
#domain + #range + #label
Quentin
Tarantino
Pulp Fiction
director
writer of
and combinations
of those, e.g. (FREQ * EXCL)
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
19 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Used reference dataset:
Introduced in Gunaratna et al. [3].
Contains human-created summaries of 50 entities (DBpedia 3.9,
outgoing relations).
Includes seven top-5 and seven top-10 summaries for each entity.
The dataset was created by 15 experts from the Semantic Web
field.
Used similarity measure:
Reference system:
FACES (introduced in [3]).
Quantitative Evaluation: Dataset and Measures
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
20 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Quantitative Evaluation: Results
(Link Structure)
LinkSUM
SO: Subject-Object pairs (predicates not considered).
SPO: Full triple.
config-1:
config-2:
Significance with respect to both LinkSUM configurations (p < 0.05).
Significance with respect to the best LinkSUM configuration (p < 0.05).
Standard deviation.SD
9.0
8.0
Institute of Applied Informatics and Formal
Description Methods (AIFB)
21 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Qualitative Evaluation: Setup
(Link Structure)
LinkSUM
Scenario: Search Engine Result Page (SERP).
20 users, 10 entities (from the FACES dataset).
Institute of Applied Informatics and Formal
Description Methods (AIFB)
22 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Qualitative Evaluation: Results
(Link Structure)
LinkSUM
In some cases the task is
subjective.
Reasons for:
Selection
- the presented related
resources are relevant for
the entity.
Rejection
- redundancy.
- related resources do not
characterize the entity.
Institute of Applied Informatics and Formal
Description Methods (AIFB)
23
Focus: PageRank (1)
PageRank is not perfect, for example:
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
PREFIX v:http://purl.org/voc/vrank#
SELECT ?e ?r FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/
#DBpedia_PageRank>
WHERE {
?e rdf:type dbo:Scientist;
v:hasRank/v:rankValue ?r.
} ORDER BY DESC(?r) LIMIT 5
dbpedia:Carl_Linnaeus 551.791
dbpedia:Charles_Darwin 215.028
dbpedia:Albert_Einstein 186.549
dbpedia:Isaac_Newton 167.811
dbpedia:Sigmund_Freud 140.245
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
24
Focus: PageRank (2)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Important parameters (for resources r):
l(r) – returns all pages that link to r.
c(r) – the number of outgoing links of r.
d – the damping factor
Traditional PageRank [5]:
Variant: Weighted Links Rank (WLRank) [6]:
Link weights (lw): relative position of a link in the article
[8]
Institute of Applied Informatics and Formal
Description Methods (AIFB)
25
Focus: PageRank (3)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Newly constructed rankings:
ALL – all links from the article text and from the templates.
ATL – article text links.
TEL – template links.
ATL-RP – article text links with WLRank and relative position.
Size of input dataset:
Reference rankings (page-view-based):
TOWR-PV – “The Open Wikipedia Ranking”
SUB – SubjectiveEye3D by Paul Houle
ALL ATL TEL ATL-RP
# links 159.398.815 142.305.605 26.460.273 143.056.545
Institute of Applied Informatics and Formal
Description Methods (AIFB)
26
Focus: PageRank (4)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Measure: Spearman rank correlation (range: [-1, 1])
Results:
Conclusions:
Bad correlation of TEL with TOWR-PV/SUB is the result of a small input
data set.
Weighting by relative position improves correlation to SUB. These findings
are supported by [4].
Institute of Applied Informatics and Formal
Description Methods (AIFB)
27
Conclusions and Impact
Conclusions:
LinkSUM significantly outperforms the state of the art.
Entity summarization:
Focus should be on selecting relevant resources.
Redundancies at the object level should be avoided.
LinkSUM is lightweight and can be applied in other scenarios, e.g.
Web sites with semantic annotations.
Semantic MediaWikis.
Impact:
Published and presented as full research paper at ICWE 2016.
The PageRank scores are published online and found many adopters
(e.g., the official DBpedia SPARQL endpoint includes the scores)
In use at the WDAqua project (http://wdaqua.eu/).
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
28
Linked Data Entity Summarization
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Contribution 3
Institute of Applied Informatics and Formal
Description Methods (AIFB)
29
SUMMA API
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Quantitative evaluation.
Qualitative evaluation.
A/B testing.
Combination of summary services.
Idea: A common API for entity summaries
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
30 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: SUMMA API
Parameters:
URI (of the entity e) – the entity needs to be identified
k (number) – an upper limit of facts related to e
Multi-language support
Statement groups (e.g., biographical data)
Restriction to specific properties
Multi-hop search space
SUMMA Vocabulary:
Output
UI
SUMMA
API
summa:Summary
xsd:positiveInteger
summa:topK
summa:entity
rdfs:Resource
xsd:String
summa:language
summa:fixedProperty
rdf:Property
summa:statement
rdf:Statement
xsd:positiveInteger
summa:maxHops
summa:SummaryGroup
summa:group
summa:path
PF
JT
VV
actor
role
_:
starring
Institute of Applied Informatics and Formal
Description Methods (AIFB)
31 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: SUMMA API
SUMMA RESTful Interaction:
Client Server
POST [ a :Summary;
:entity dbpedia:Barack_Obama; :topK 10 ] .
201 CREATED
Location: http://example.com/
summary?entity=dbpedia:Barack_Obama&topK=10
@ prefix summa: <http://purl.org/voc/summa/> .
...
GET http://example.com/
summary?entity=dbpedia:Barack_Obama&topK=10
200 OK
@ prefix summa: <http://purl.org/voc/summa/> .
...
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
32 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Analysis: Setup
Search Engines:
Google Knowledge Graph
Microsoft Bing Satori/Snapshots
Yahoo Knowledge
News Portals (Alexa Top 25 News sites):
Forbes
BBC News
Can the user interfaces be generated with data from the
SUMMA API without changing their layout?
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
33 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Analysis: Criteria
Features:
1. Property Restriction
2. Statement Groups
3. Multi-hop Search Space
4. Languages
Five entities:
Spain (country)
Dirk Nowitzki (person/athlete)
Ramones (band)
SAP (company/organization)
Inglourious Basterds (movie) (Source: http://google.com)
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
34 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Analysis: Results
Which features were required by the respective system?
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
35
Conclusions and Impact
Conclusions:
Decouple user interface from actual entity summarization
system by defining a common API.
Light-weight and extensible vocabulary and interaction mechanism.
Reference implementations and their source code are publicly
available.
Empirical analysis demonstrate applicability in real-world scenarios.
Impact:
Published and presented as full research paper at ICWE 2015.
Best Paper Candidate at ICWE 2015.
Best Demo Award at ICWE 2016.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
36 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
4. RELATED WORK
Institute of Applied Informatics and Formal
Description Methods (AIFB)
37
Related Work
Who else is working on this?
Google [1], Microsoft, Yahoo, etc.
Other researchers in the field of the
Semantic Web e.g.
Cheng et al. [2]
Gunaratna et al. [3]
What distinguishes the presented work from theirs?
LinkSUM is a lightweight and effective approach.
UBES is the first approach that uses usage data for entity summarization.
SUMMA API: first and currently only API definition that enables the
exchange of entity summaries.
Entity Data Fusion: First approach that focuses on general alignment of
structured entity data on the Web.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
RDF + lots of
background data
(Only)
RDF data
Institute of Applied Informatics and Formal
Description Methods (AIFB)
38 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
5. SUMMARY AND OUTLOOK
Institute of Applied Informatics and Formal
Description Methods (AIFB)
39
We provided contributions for Linked Data Entity Summarization.
Impact was created on the levels of research and dataset/system
adoption.
Combination with entity linking is possible.
The addressed problem is highly relevant for search and question
answering engines.
Summary
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
40
Outlook
Full integration of the entity data fusion approach.
Addressing literal values.
Personalized/contextualized summaries of entities.
Abstract entity summarization.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
41 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Questions?
Institute of Applied Informatics and Formal
Description Methods (AIFB)
42
Publications
Contribution 1
Andreas Thalhammer, Nelia Lasierra, Achim Rettinger: LinkSUM: Using Link Analysis to Summarize Entity Data, In Web Engineering: 16th
International Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 244–261. Springer, 2016
Andreas Thalhammer and Achim Rettinger: Browsing DBpedia Entities with Summaries. The Semantic Web: ESWC 2014 Satellite Events,
Lecture Notes in Computer Science 2014, pages 511-515, Springer 2014
Andreas Thalhammer and Achim Rettinger: PageRank on Wikipedia: Towards General Importance Scores for Entities. In The Semantic
Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers, pages 227–240. Springer,
2016.
Contribution 2
Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel: Leveraging Usage Data for Linked Data Movie Entity
Summarization. In Proceedings of the 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD’12), 2012.
Andreas Thalhammer, Magnus Knuth, Harald Sack: Evaluating Entity Summarization Using a Game-Based Ground Truth. In International
Semantic Web Conference (2), vol. 7650, pages 350–361. Springer, 2012.
Contribution 3
Antonio Roa-Valverde, Andreas Thalhammer, Ioan Toma, and Miguel-Angel Sicilia: Towards a formal model for sharing and reusing
ranking computations. In Proceedings of the 6th International Workshop on Ranking in Databases In conjunction with VLDB 2012.
Andreas Thalhammer and Steffen Stadtmüller. SUMMA: A Common API for Linked Data Entity Summaries. In P. Cimiano, F. Frasincar,
G.-J. Houben, and D. Schwabe, editors, Engineering the Web in the Big Data Era, vol. 9114, pages 430-446. Springer, 2015.
Andreas Thalhammer, Achim Rettinger: ELES: Combining Entity Linking and Entity Summarization. In Web Engineering: 16th International
Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 547–550. Springer, 2016
Contribution 4
Andreas Thalhammer, Steffen Thoma, Andreas Harth: Entity-Centric Claim Reconciliation in Web Data, Submitted to WWW 2017.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Conference
Workshop
Demo
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Institute of Applied Informatics and Formal
Description Methods (AIFB)
43
References
[1] A. Singhal. Introducing the knowledge graph: things, not strings.
http://goo.gl/kH1NKq, 2012.
[2] G. Cheng, T. Tran, and Y. Qu. RELIN: relatedness and informativeness-based centrality
for entity summarization. In Proc. of the 10th int. conf. on The Semantic Web - Vol. Part I,
ISWC’11. Springer, 2011.
[3] K. Gunaratna, K. Thirunarayan, and A. P. Sheth. FACES: diversity-aware entity
summarization using incremental hierarchical conceptual clustering. In Proc. of the 29th
AAAI Conf. Artificial Intelligence, 2015, Austin, Texas, USA., 2015.
[4] D. Dimitrov, P. Singer, F. Lemmerich, M. Strohmaier. What Makes a Link Successful on
Wikipedia? https://arxiv.org/abs/1611.02508
[5] S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. In
Proceedings of the Seventh International Conference on World Wide Web 7, WWW7,
pages 107–117. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The
Netherlands, 1998.
[6] R. Baeza-Yates and E. Davis. Web Page Ranking Using Link Attributes. In Proceedings
of the 13th International World Wide Web Conference on Alternate Track Papers &Amp;
Posters, WWW Alt. ’04, pages 328–329, New York, NY, USA, 2004. ACM.
[7] J. Waitelonis and H. Sack. Towards exploratory video search using linked data.
Multimedia Tools and Applications, 59:645–672, 2012. 10.1007/s11042-011-0733-1.
[8] An art draw drawn by Felipe Micaroni Lalli (micaroni@gmail.com).
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018

More Related Content

PDF
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
PDF
Skip List: Implementation, Optimization and Web Search
PDF
Algorithm for calculating relevance of documents in information retrieval sys...
PDF
An empirical performance evaluation of relational keyword search systems
PPT
Qualitative Content Analysis
PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
PDF
Perception Determined Constructing Algorithm for Document Clustering
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
Skip List: Implementation, Optimization and Web Search
Algorithm for calculating relevance of documents in information retrieval sys...
An empirical performance evaluation of relational keyword search systems
Qualitative Content Analysis
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Perception Determined Constructing Algorithm for Document Clustering

What's hot (20)

PDF
Phd thesis final presentation
PDF
Introduction to Data Science
PDF
Modern association rule mining methods
PDF
Fairification experience clarifying the semantics of data matrices
PPTX
Kid171 chap02 english version
PPTX
Business Intelligence and Big Data in Cloud
PDF
Linear Regression With R
PPTX
Unstructured data processing webinar 06272016
PDF
Business Analytics with R
PPTX
Data model
PDF
Logistic Regression In Data Science
PPT
Graph based forcasting for social network
PPT
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
PDF
Semantically Enriched Knowledge Extraction With Data Mining
PPT
Analytical Tools Primer
PDF
PPT
Approaches to automated metadata extraction : FixRep Project
PDF
P11 goonetilleke
PPTX
Introduction to Big Data/Machine Learning
PDF
A survey of 2013 data science salary survey”
Phd thesis final presentation
Introduction to Data Science
Modern association rule mining methods
Fairification experience clarifying the semantics of data matrices
Kid171 chap02 english version
Business Intelligence and Big Data in Cloud
Linear Regression With R
Unstructured data processing webinar 06272016
Business Analytics with R
Data model
Logistic Regression In Data Science
Graph based forcasting for social network
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Semantically Enriched Knowledge Extraction With Data Mining
Analytical Tools Primer
Approaches to automated metadata extraction : FixRep Project
P11 goonetilleke
Introduction to Big Data/Machine Learning
A survey of 2013 data science salary survey”
Ad

Similar to Linked Data Entity Summarization (PhD defense) (20)

PDF
LinkSUM: Using Link Analysis to Summarize Entity Data
PDF
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
PPTX
SUMMA: A Common API for Linked Data Entity Summaries
PDF
Enriching SMW based Virtual Research Environments with external data, Jan Nov...
PDF
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
PPTX
Diane Hillmann: RDA Vocabularies in the Semantic Web
PPTX
The RDA Vocabularies: What They Are, How They Work
PPTX
Linked data for Libraries
PPTX
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
PDF
BIBFRAME, Linked data, RDA
PDF
Perspectives on mining knowledge graphs from text
PDF
Entity Search on Virtual Documents Created with Graph Embeddings
PDF
ESBM: An Entity Summarization Benchmark (ESWC 2020)
PPTX
NCompass Live: FRBR: Cataloging's New Frontier
PDF
Entity Linking, Link Prediction, and Knowledge Graph Completion
PDF
Publishing and Using Linked Data
PDF
Linked data and the future of libraries
PPTX
UNC visit
LinkSUM: Using Link Analysis to Summarize Entity Data
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
SUMMA: A Common API for Linked Data Entity Summaries
Enriching SMW based Virtual Research Environments with external data, Jan Nov...
Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
Diane Hillmann: RDA Vocabularies in the Semantic Web
The RDA Vocabularies: What They Are, How They Work
Linked data for Libraries
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
BIBFRAME, Linked data, RDA
Perspectives on mining knowledge graphs from text
Entity Search on Virtual Documents Created with Graph Embeddings
ESBM: An Entity Summarization Benchmark (ESWC 2020)
NCompass Live: FRBR: Cataloging's New Frontier
Entity Linking, Link Prediction, and Knowledge Graph Completion
Publishing and Using Linked Data
Linked data and the future of libraries
UNC visit
Ad

Recently uploaded (20)

PDF
The scientific heritage No 166 (166) (2025)
PPT
protein biochemistry.ppt for university classes
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet earth and life
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
2Systematics of Living Organisms t-.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
An interstellar mission to test astrophysical black holes
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Sciences of Europe No 170 (2025)
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2. Earth - The Living Planet Module 2ELS
The scientific heritage No 166 (166) (2025)
protein biochemistry.ppt for university classes
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet earth and life
. Radiology Case Scenariosssssssssssssss
Placing the Near-Earth Object Impact Probability in Context
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
2Systematics of Living Organisms t-.pptx
HPLC-PPT.docx high performance liquid chromatography
Phytochemical Investigation of Miliusa longipes.pdf
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
An interstellar mission to test astrophysical black holes
Taita Taveta Laboratory Technician Workshop Presentation.pptx
ECG_Course_Presentation د.محمد صقران ppt
Sciences of Europe No 170 (2025)
POSITIONING IN OPERATION THEATRE ROOM.ppt
neck nodes and dissection types and lymph nodes levels
lecture 2026 of Sjogren's syndrome l .pdf
AlphaEarth Foundations and the Satellite Embedding dataset
2. Earth - The Living Planet Module 2ELS

Linked Data Entity Summarization (PhD defense)

  • 1. KIT – The Research University in the Helmholtz Association INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB) www.kit.edu Linked Data Entity Summarization Dipl.-Inf. Univ. Andreas Thalhammer 08.12.2016
  • 2. Institute of Applied Informatics and Formal Description Methods (AIFB) 2 Outline 1. Motivation 2. Research Questions 3. Contributions a) LinkSUM (Contribution 1) b) SUMMA API (Contribution 3) 4. Related Work 5. Summary and Outlook Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 3. Institute of Applied Informatics and Formal Description Methods (AIFB) 3 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 1. MOTIVATION
  • 4. Institute of Applied Informatics and Formal Description Methods (AIFB) 4 Information need versus availability Information need (in the US*) More than 40% of all search queries are focused on one specific entity. 579 million searches per day come from home and work devices in the US every day. ~ 232 million searches for entities (every day; in the US; desktop) Information availability (Wikidata**) Wikidata covers 24.5 million entities (growth of 55% in last year). 3.2 million entities have > 10 statements (growth of 78% in last year). Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 * https://www.comscore.com/Insights/Rankings/comScore-Releases-February-2016-US-Desktop-Search-Engine-Rankings ** https://www.wikidata.org/wiki/Wikidata:Statistics
  • 5. Institute of Applied Informatics and Formal Description Methods (AIFB) 5 Wikidata entry for Pulp Fiction ~ 614 facts Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Growing amount of structured data on the Web
  • 6. Institute of Applied Informatics and Formal Description Methods (AIFB) 6 Naïve solution: Entity presentation based on class summaries Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Source: yahoo.com)
  • 7. Institute of Applied Informatics and Formal Description Methods (AIFB) 7 Problems of class summaries 1. The patterns are very static and do not reflect the individual particularities of entities. 2. A pattern needs to be created for each type and class hierarchies need to be considered. 3. Some entities are of multiple (distinct) types with unclear main type. 4. Some of the properties can have many values for which no ranking or cut-off is defined. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Person Athlete Body builder Arnold Schwarzenegger Angkor Wat
  • 8. Institute of Applied Informatics and Formal Description Methods (AIFB) 8 Entity Summarization Propositions: Every entity is individual. For different entities, different properties are of importance. Entities of the same type do not always have the same attributes. For each entity, a single property-value pair can be of different relevance. Solution: Focus on individual particularities of each entity: Entity Summarization Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 9. Institute of Applied Informatics and Formal Description Methods (AIFB) 9 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 2. RESEARCH QUESTIONS
  • 10. Institute of Applied Informatics and Formal Description Methods (AIFB) 10 Challenge #1 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 RQ1: How can we effectively summarize entities with limited background information? RQ1.1: How can we use link analysis effectively in order to derive summaries of entities? RQ1.2: How can we use usage data analysis effectively in order to derive summaries of entities? RDF data typically does not reflect importance levels in its relations. Proprietary entity summarization systems have access to a lot of data (e.g., search queries) and infrastructure (e.g., a full Web index). Other knowledge panel providers (such as publishers) are lacking that information and infrastructure. (Source: google.com)
  • 11. Institute of Applied Informatics and Formal Description Methods (AIFB) 11 Challenge #2 RQ2: Is there a minimum set of re-occurring/common features of entity summarization systems that allow us to provide a generic API? Andreas Thalhammer – Linked Data Entity Summarization03.10.201803.10.2018 Providers of knowledge panels are hiding the original graph structure in strongly abstracted interfaces. Standardized programmatic access is desirable (but not available). (Source: google.com) (Source: developers.google.com/knowledge-graph)
  • 12. Institute of Applied Informatics and Formal Description Methods (AIFB) 12 Challenge #3 RQ3: How can we align duplicate/similar facts about Linked Data entities on the Web? Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Different Web sources provide structured information about a single entity. The different sources often cover similar information but do not provide according links or vocabulary mappings. Alignments are particularly difficult as the sources typically provide data at different levels of modeling granularity. (Source: imdb.com) (Source: wikidata.org)
  • 13. Institute of Applied Informatics and Formal Description Methods (AIFB) 13 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 3. CONTRIBUTIONS
  • 14. Institute of Applied Informatics and Formal Description Methods (AIFB) 14 Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4 Overview: Research Questions and Contributions RQ1: How can we effectively summarize entities with limited background information? RQ1.1: How can we use link analysis effectively in order to derive summaries of entities? (Contribution 1) RQ1.2: How can we use usage data analysis effectively in order to derive summaries of entities? (Contribution 2) RQ2: Is there a minimum set of re-occurring/common features of entity summarization systems that allow us to provide a generic API (Contribution 3) RQ3: How can we align duplicate/similar facts about Linked Data entities on the Web? (Contribution 4) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 15. Institute of Applied Informatics and Formal Description Methods (AIFB) 15 Linked Data Entity Summarization Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4 Contribution 1
  • 16. Institute of Applied Informatics and Formal Description Methods (AIFB) 16 LinkSUM Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Step 1: Select top-k important related resources. Step 2: Select the most relevant connecting predicate. Idea: Use link analysis for selecting facts. (Link Structure) LinkSUM
  • 17. Institute of Applied Informatics and Formal Description Methods (AIFB) 17 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: Resource Selection Quentin Tarantino Pulp Fiction director Compute PageRank [5] scores of entities with (un-typed) links that occur in textual descriptions of entities (pr). Use “Backlinks” [7] (also called “mutual links”) for finding strong connections (bl): Combine scores: (Link Structure) LinkSUM dbpedia:Category:English-language_films 220.961 dbpedia:Quentin_Tarantino 13.7403 dbpedia:John_Travolta 10.5771 dbpedia:Miramax_Films 9.9398 ... ...
  • 18. Institute of Applied Informatics and Formal Description Methods (AIFB) 18 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: Relation Selection Problem: multiple relations Approaches: Frequency (FRQ) #times the predicate is used Exclusivity (EXC) 1 / (N + M) Description (DSC): #domain + #range + #label Quentin Tarantino Pulp Fiction director writer of and combinations of those, e.g. (FREQ * EXCL) (Link Structure) LinkSUM
  • 19. Institute of Applied Informatics and Formal Description Methods (AIFB) 19 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Used reference dataset: Introduced in Gunaratna et al. [3]. Contains human-created summaries of 50 entities (DBpedia 3.9, outgoing relations). Includes seven top-5 and seven top-10 summaries for each entity. The dataset was created by 15 experts from the Semantic Web field. Used similarity measure: Reference system: FACES (introduced in [3]). Quantitative Evaluation: Dataset and Measures (Link Structure) LinkSUM
  • 20. Institute of Applied Informatics and Formal Description Methods (AIFB) 20 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Quantitative Evaluation: Results (Link Structure) LinkSUM SO: Subject-Object pairs (predicates not considered). SPO: Full triple. config-1: config-2: Significance with respect to both LinkSUM configurations (p < 0.05). Significance with respect to the best LinkSUM configuration (p < 0.05). Standard deviation.SD 9.0 8.0
  • 21. Institute of Applied Informatics and Formal Description Methods (AIFB) 21 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Qualitative Evaluation: Setup (Link Structure) LinkSUM Scenario: Search Engine Result Page (SERP). 20 users, 10 entities (from the FACES dataset).
  • 22. Institute of Applied Informatics and Formal Description Methods (AIFB) 22 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Qualitative Evaluation: Results (Link Structure) LinkSUM In some cases the task is subjective. Reasons for: Selection - the presented related resources are relevant for the entity. Rejection - redundancy. - related resources do not characterize the entity.
  • 23. Institute of Applied Informatics and Formal Description Methods (AIFB) 23 Focus: PageRank (1) PageRank is not perfect, for example: Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 PREFIX v:http://purl.org/voc/vrank# SELECT ?e ?r FROM <http://dbpedia.org> FROM <http://people.aifb.kit.edu/ath/ #DBpedia_PageRank> WHERE { ?e rdf:type dbo:Scientist; v:hasRank/v:rankValue ?r. } ORDER BY DESC(?r) LIMIT 5 dbpedia:Carl_Linnaeus 551.791 dbpedia:Charles_Darwin 215.028 dbpedia:Albert_Einstein 186.549 dbpedia:Isaac_Newton 167.811 dbpedia:Sigmund_Freud 140.245 (Link Structure) LinkSUM
  • 24. Institute of Applied Informatics and Formal Description Methods (AIFB) 24 Focus: PageRank (2) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM Important parameters (for resources r): l(r) – returns all pages that link to r. c(r) – the number of outgoing links of r. d – the damping factor Traditional PageRank [5]: Variant: Weighted Links Rank (WLRank) [6]: Link weights (lw): relative position of a link in the article [8]
  • 25. Institute of Applied Informatics and Formal Description Methods (AIFB) 25 Focus: PageRank (3) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM Newly constructed rankings: ALL – all links from the article text and from the templates. ATL – article text links. TEL – template links. ATL-RP – article text links with WLRank and relative position. Size of input dataset: Reference rankings (page-view-based): TOWR-PV – “The Open Wikipedia Ranking” SUB – SubjectiveEye3D by Paul Houle ALL ATL TEL ATL-RP # links 159.398.815 142.305.605 26.460.273 143.056.545
  • 26. Institute of Applied Informatics and Formal Description Methods (AIFB) 26 Focus: PageRank (4) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM Measure: Spearman rank correlation (range: [-1, 1]) Results: Conclusions: Bad correlation of TEL with TOWR-PV/SUB is the result of a small input data set. Weighting by relative position improves correlation to SUB. These findings are supported by [4].
  • 27. Institute of Applied Informatics and Formal Description Methods (AIFB) 27 Conclusions and Impact Conclusions: LinkSUM significantly outperforms the state of the art. Entity summarization: Focus should be on selecting relevant resources. Redundancies at the object level should be avoided. LinkSUM is lightweight and can be applied in other scenarios, e.g. Web sites with semantic annotations. Semantic MediaWikis. Impact: Published and presented as full research paper at ICWE 2016. The PageRank scores are published online and found many adopters (e.g., the official DBpedia SPARQL endpoint includes the scores) In use at the WDAqua project (http://wdaqua.eu/). Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM
  • 28. Institute of Applied Informatics and Formal Description Methods (AIFB) 28 Linked Data Entity Summarization Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4 Contribution 3
  • 29. Institute of Applied Informatics and Formal Description Methods (AIFB) 29 SUMMA API Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Quantitative evaluation. Qualitative evaluation. A/B testing. Combination of summary services. Idea: A common API for entity summaries Output UI SUMMA API
  • 30. Institute of Applied Informatics and Formal Description Methods (AIFB) 30 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: SUMMA API Parameters: URI (of the entity e) – the entity needs to be identified k (number) – an upper limit of facts related to e Multi-language support Statement groups (e.g., biographical data) Restriction to specific properties Multi-hop search space SUMMA Vocabulary: Output UI SUMMA API summa:Summary xsd:positiveInteger summa:topK summa:entity rdfs:Resource xsd:String summa:language summa:fixedProperty rdf:Property summa:statement rdf:Statement xsd:positiveInteger summa:maxHops summa:SummaryGroup summa:group summa:path PF JT VV actor role _: starring
  • 31. Institute of Applied Informatics and Formal Description Methods (AIFB) 31 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: SUMMA API SUMMA RESTful Interaction: Client Server POST [ a :Summary; :entity dbpedia:Barack_Obama; :topK 10 ] . 201 CREATED Location: http://example.com/ summary?entity=dbpedia:Barack_Obama&topK=10 @ prefix summa: <http://purl.org/voc/summa/> . ... GET http://example.com/ summary?entity=dbpedia:Barack_Obama&topK=10 200 OK @ prefix summa: <http://purl.org/voc/summa/> . ... Output UI SUMMA API
  • 32. Institute of Applied Informatics and Formal Description Methods (AIFB) 32 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Analysis: Setup Search Engines: Google Knowledge Graph Microsoft Bing Satori/Snapshots Yahoo Knowledge News Portals (Alexa Top 25 News sites): Forbes BBC News Can the user interfaces be generated with data from the SUMMA API without changing their layout? Output UI SUMMA API
  • 33. Institute of Applied Informatics and Formal Description Methods (AIFB) 33 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Analysis: Criteria Features: 1. Property Restriction 2. Statement Groups 3. Multi-hop Search Space 4. Languages Five entities: Spain (country) Dirk Nowitzki (person/athlete) Ramones (band) SAP (company/organization) Inglourious Basterds (movie) (Source: http://google.com) Output UI SUMMA API
  • 34. Institute of Applied Informatics and Formal Description Methods (AIFB) 34 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Analysis: Results Which features were required by the respective system? Output UI SUMMA API
  • 35. Institute of Applied Informatics and Formal Description Methods (AIFB) 35 Conclusions and Impact Conclusions: Decouple user interface from actual entity summarization system by defining a common API. Light-weight and extensible vocabulary and interaction mechanism. Reference implementations and their source code are publicly available. Empirical analysis demonstrate applicability in real-world scenarios. Impact: Published and presented as full research paper at ICWE 2015. Best Paper Candidate at ICWE 2015. Best Demo Award at ICWE 2016. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Output UI SUMMA API
  • 36. Institute of Applied Informatics and Formal Description Methods (AIFB) 36 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 4. RELATED WORK
  • 37. Institute of Applied Informatics and Formal Description Methods (AIFB) 37 Related Work Who else is working on this? Google [1], Microsoft, Yahoo, etc. Other researchers in the field of the Semantic Web e.g. Cheng et al. [2] Gunaratna et al. [3] What distinguishes the presented work from theirs? LinkSUM is a lightweight and effective approach. UBES is the first approach that uses usage data for entity summarization. SUMMA API: first and currently only API definition that enables the exchange of entity summaries. Entity Data Fusion: First approach that focuses on general alignment of structured entity data on the Web. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 RDF + lots of background data (Only) RDF data
  • 38. Institute of Applied Informatics and Formal Description Methods (AIFB) 38 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 5. SUMMARY AND OUTLOOK
  • 39. Institute of Applied Informatics and Formal Description Methods (AIFB) 39 We provided contributions for Linked Data Entity Summarization. Impact was created on the levels of research and dataset/system adoption. Combination with entity linking is possible. The addressed problem is highly relevant for search and question answering engines. Summary Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 40. Institute of Applied Informatics and Formal Description Methods (AIFB) 40 Outlook Full integration of the entity data fusion approach. Addressing literal values. Personalized/contextualized summaries of entities. Abstract entity summarization. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 41. Institute of Applied Informatics and Formal Description Methods (AIFB) 41 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Questions?
  • 42. Institute of Applied Informatics and Formal Description Methods (AIFB) 42 Publications Contribution 1 Andreas Thalhammer, Nelia Lasierra, Achim Rettinger: LinkSUM: Using Link Analysis to Summarize Entity Data, In Web Engineering: 16th International Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 244–261. Springer, 2016 Andreas Thalhammer and Achim Rettinger: Browsing DBpedia Entities with Summaries. The Semantic Web: ESWC 2014 Satellite Events, Lecture Notes in Computer Science 2014, pages 511-515, Springer 2014 Andreas Thalhammer and Achim Rettinger: PageRank on Wikipedia: Towards General Importance Scores for Entities. In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers, pages 227–240. Springer, 2016. Contribution 2 Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel: Leveraging Usage Data for Linked Data Movie Entity Summarization. In Proceedings of the 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD’12), 2012. Andreas Thalhammer, Magnus Knuth, Harald Sack: Evaluating Entity Summarization Using a Game-Based Ground Truth. In International Semantic Web Conference (2), vol. 7650, pages 350–361. Springer, 2012. Contribution 3 Antonio Roa-Valverde, Andreas Thalhammer, Ioan Toma, and Miguel-Angel Sicilia: Towards a formal model for sharing and reusing ranking computations. In Proceedings of the 6th International Workshop on Ranking in Databases In conjunction with VLDB 2012. Andreas Thalhammer and Steffen Stadtmüller. SUMMA: A Common API for Linked Data Entity Summaries. In P. Cimiano, F. Frasincar, G.-J. Houben, and D. Schwabe, editors, Engineering the Web in the Big Data Era, vol. 9114, pages 430-446. Springer, 2015. Andreas Thalhammer, Achim Rettinger: ELES: Combining Entity Linking and Entity Summarization. In Web Engineering: 16th International Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 547–550. Springer, 2016 Contribution 4 Andreas Thalhammer, Steffen Thoma, Andreas Harth: Entity-Centric Claim Reconciliation in Web Data, Submitted to WWW 2017. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Conference Workshop Demo Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4
  • 43. Institute of Applied Informatics and Formal Description Methods (AIFB) 43 References [1] A. Singhal. Introducing the knowledge graph: things, not strings. http://goo.gl/kH1NKq, 2012. [2] G. Cheng, T. Tran, and Y. Qu. RELIN: relatedness and informativeness-based centrality for entity summarization. In Proc. of the 10th int. conf. on The Semantic Web - Vol. Part I, ISWC’11. Springer, 2011. [3] K. Gunaratna, K. Thirunarayan, and A. P. Sheth. FACES: diversity-aware entity summarization using incremental hierarchical conceptual clustering. In Proc. of the 29th AAAI Conf. Artificial Intelligence, 2015, Austin, Texas, USA., 2015. [4] D. Dimitrov, P. Singer, F. Lemmerich, M. Strohmaier. What Makes a Link Successful on Wikipedia? https://arxiv.org/abs/1611.02508 [5] S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107–117. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 1998. [6] R. Baeza-Yates and E. Davis. Web Page Ranking Using Link Attributes. In Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers &Amp; Posters, WWW Alt. ’04, pages 328–329, New York, NY, USA, 2004. ACM. [7] J. Waitelonis and H. Sack. Towards exploratory video search using linked data. Multimedia Tools and Applications, 59:645–672, 2012. 10.1007/s11042-011-0733-1. [8] An art draw drawn by Felipe Micaroni Lalli (micaroni@gmail.com). Andreas Thalhammer – Linked Data Entity Summarization03.10.2018

Editor's Notes

  • #2: Good afternoon, I would like to welcome the committee and the audience to my PhD defense, my name is Andreas Thalhammer and the title of my PhD thesis is “Linked Data Entity Summarization”.
  • #5: Wikidata is a Wikipedia project ...
  • #6: roughly 600 facts now you could say: that’s too much, just show me the top part
  • #7: Show facts in a common order: release date, rating, ... this seems reasonable But: the second one has an important part missing: “it was the first animated feature film by walt disnesy, it is based on a fairy tale”
  • #8: Arnold Schwarzenegger – body builder, actor, politician Angkor Wat – tourist attraction, human-built structure, Hindu and Buddhist temple
  • #9: x example -> for snow white the production company is of particular importance – for pulp fiction not so much x ocean, Sri Lanka (Indian Ocean) – Austria doesn’t x If two movies have john travolta as an actor, it might be more important for the one and not so important for the other
  • #12: So why is it desirable: exchange, combine and remix summaries. Evaluate summaries in different ways.
  • #25: Baeza-Yates
  • #38: Filling the gap between approaches that have large amounts of background data and those who only use RDF