SlideShare a Scribd company logo
Keyword-Based Navigation and Search
over the Linked Data Web
Luca Matteis1, Aidan Hogan2, Roberto Navigli1
1 Sapienza University
of Rome
2 University of
Chile
General idea
• Browse the live linked data web using keywords
• Predicate resolution along the navigation to
increase matches
• Results are streamed back to users as quickly as
possible
• We measure how fast relevant triples are found at
each step of the navigation
Keyword-Based Navigation and Search over the Linked Data Web
Navigation
• Navigation starts from a list of starting URIs
• Users/agents provide keywords to search against
and guide the navigation
• Navigation is structured using a streaming
pipeline
Search
• Search occurs at each element of the pipeline
• Several RDF keyword search algorithms can be used
• Predicate resolution is used to increase number of
matches
Keyword-Based Navigation and Search over the Linked Data Web
Keyword-Based Navigation and Search over the Linked Data Web
SWGET comparison
• SWGET is an implementation of the NautiLOD
navigational language
• It allows to filter (through SPARQL) triples at each
step at the navigation
• We show that our pipeline streaming approach
results in faster response times
SWGET comparison
Results
• Total response time is under 10 seconds (varies
based on the number of keywords)
• Navigation hop time averages ~5 seconds
Discussion
• Results point to the fact that keyword-navigation
is achievable, although a bit sluggish.
• Experiments were on the live linked data web!
Servers optimized for concurrency and high-
throughput (triple pattern fragments) might yield
faster response times.
Final remarks
• Our approach incentives publishers to enrich their
structured data (using predicates with meaningful
descriptions)
• Concurrent resolution of many URIs at runtime to
find answers to queries is becoming more and
more viable; increase in bandwidth is going to
make this even more usable
• Upfront querying may not be the only way we
query the Web of Linked Data
Use case
Use case
Use case
dir suggestions
codirector (8)
redirection (4)
director (1)
nadir (1)
…
Use case
director 1 triple found (view)
Use case
director 1 triple found (view)
know suggestions
known for (17)
knows (6)
knowledge of (5)
…
Use case
director 1 triple found (view)
known for 17 triples found (view)
Use case
director 1 triple found (view)
known for 17 triples found (view)
Use case
director 1 triple found (view)
known for 17 triples found (view)
act suggestions
actor (56)
abstract (48)
…
Use case
director 1 triple found (view)
known for 17 triples found (view)
actor 56 triples found (view)
Users don't have to input URIs
(as they do when writing SPARQL)
Nor they have to know the exact
structure of the underlying dataset
(they simply type keywords)
SELECT * {
<http://viaf.org/viaf/177603646>
onto:mov100 ?movement .
?movement my:lab ?label .
}
http://viaf.org/viaf/177603646 /
movement /
name
Query federation is built-in
(we're simply following links)
http://viaf.org/viaf/177603646 /
movement /
same as /
movement of /
born < 1960 /
same as freebase /
name
} VIAF
} DBpedia
} Freebase
Future work
• Develop a functioning app (browser extension or
add-on to Tabulator)
• Use third-party services to assist the navigation by
matching synonyms or translations (BabelNet,
WordNet)
• Use other third-party services to assist in the
disambiguation of words using the context of the
data acquired along the navigation (Babelfy)
• Better methods for effectively crawling Linked
Datasets at runtime (that don't strain servers and
provide quick response times)
Thanks!
@lmatteis
http://lucaa.org

More Related Content

PPTX
Protégé4US: Harvesting Ontology Authoring Data with Protégé
PPTX
NAMED ENTITY RECOGNITION
PDF
Kudos - A Peer-to-Peer Discussion System Based on Social Voting
PPTX
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
PPTX
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
PDF
Saner17 sharma
PDF
"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...
PPTX
Discovery Hub: on-the-fly linked data exploratory search
Protégé4US: Harvesting Ontology Authoring Data with Protégé
NAMED ENTITY RECOGNITION
Kudos - A Peer-to-Peer Discussion System Based on Social Voting
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
Saner17 sharma
"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...
Discovery Hub: on-the-fly linked data exploratory search

Similar to Keyword-Based Navigation and Search over the Linked Data Web (20)

PPTX
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
PDF
CS6007 information retrieval - 5 units notes
PPTX
RDF Stream Processing: Let's React
PDF
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
PDF
Disrupting Data Discovery
PDF
Ontology Based Approach for Semantic Information Retrieval System
PDF
Rui Meng - 2017 - Deep Keyphrase Generation
PPTX
Clickstream data with spark
PPTX
DC presentation 1
PPTX
The data streaming processing paradigm and its use in modern fog architectures
PPTX
Strata sf - Amundsen presentation
PDF
Measuring the end user
PDF
Pdd crawler a focused web
PPTX
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
PPT
RFCs for HDF5 and HDF-EOS5 Status Update
PDF
Opentracing jaeger
PDF
Distributed Tracing with Jaeger
PDF
SEMLIB Final Conference | DERI presentation
PPTX
Data council sf amundsen presentation
PPTX
Eureka, I found it! - Special Libraries Association 2021 Presentation
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
CS6007 information retrieval - 5 units notes
RDF Stream Processing: Let's React
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Disrupting Data Discovery
Ontology Based Approach for Semantic Information Retrieval System
Rui Meng - 2017 - Deep Keyphrase Generation
Clickstream data with spark
DC presentation 1
The data streaming processing paradigm and its use in modern fog architectures
Strata sf - Amundsen presentation
Measuring the end user
Pdd crawler a focused web
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
RFCs for HDF5 and HDF-EOS5 Status Update
Opentracing jaeger
Distributed Tracing with Jaeger
SEMLIB Final Conference | DERI presentation
Data council sf amundsen presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Ad

Recently uploaded (20)

PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
annual-report-2024-2025 original latest.
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Lecture1 pattern recognition............
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to machine learning and Linear Models
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
annual-report-2024-2025 original latest.
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Lecture1 pattern recognition............
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction-to-Cloud-ComputingFinal.pptx
Business Acumen Training GuidePresentation.pptx
ISS -ESG Data flows What is ESG and HowHow
.pdf is not working space design for the following data for the following dat...
Introduction to machine learning and Linear Models
Galatica Smart Energy Infrastructure Startup Pitch Deck
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Ad

Keyword-Based Navigation and Search over the Linked Data Web

  • 1. Keyword-Based Navigation and Search over the Linked Data Web Luca Matteis1, Aidan Hogan2, Roberto Navigli1 1 Sapienza University of Rome 2 University of Chile
  • 2. General idea • Browse the live linked data web using keywords • Predicate resolution along the navigation to increase matches • Results are streamed back to users as quickly as possible • We measure how fast relevant triples are found at each step of the navigation
  • 4. Navigation • Navigation starts from a list of starting URIs • Users/agents provide keywords to search against and guide the navigation • Navigation is structured using a streaming pipeline
  • 5. Search • Search occurs at each element of the pipeline • Several RDF keyword search algorithms can be used • Predicate resolution is used to increase number of matches
  • 8. SWGET comparison • SWGET is an implementation of the NautiLOD navigational language • It allows to filter (through SPARQL) triples at each step at the navigation • We show that our pipeline streaming approach results in faster response times
  • 10. Results • Total response time is under 10 seconds (varies based on the number of keywords) • Navigation hop time averages ~5 seconds Discussion • Results point to the fact that keyword-navigation is achievable, although a bit sluggish. • Experiments were on the live linked data web! Servers optimized for concurrency and high- throughput (triple pattern fragments) might yield faster response times.
  • 11. Final remarks • Our approach incentives publishers to enrich their structured data (using predicates with meaningful descriptions) • Concurrent resolution of many URIs at runtime to find answers to queries is becoming more and more viable; increase in bandwidth is going to make this even more usable • Upfront querying may not be the only way we query the Web of Linked Data
  • 14. Use case dir suggestions codirector (8) redirection (4) director (1) nadir (1) …
  • 15. Use case director 1 triple found (view)
  • 16. Use case director 1 triple found (view) know suggestions known for (17) knows (6) knowledge of (5) …
  • 17. Use case director 1 triple found (view) known for 17 triples found (view)
  • 18. Use case director 1 triple found (view) known for 17 triples found (view)
  • 19. Use case director 1 triple found (view) known for 17 triples found (view) act suggestions actor (56) abstract (48) …
  • 20. Use case director 1 triple found (view) known for 17 triples found (view) actor 56 triples found (view)
  • 21. Users don't have to input URIs (as they do when writing SPARQL) Nor they have to know the exact structure of the underlying dataset (they simply type keywords) SELECT * { <http://viaf.org/viaf/177603646> onto:mov100 ?movement . ?movement my:lab ?label . } http://viaf.org/viaf/177603646 / movement / name
  • 22. Query federation is built-in (we're simply following links) http://viaf.org/viaf/177603646 / movement / same as / movement of / born < 1960 / same as freebase / name } VIAF } DBpedia } Freebase
  • 23. Future work • Develop a functioning app (browser extension or add-on to Tabulator) • Use third-party services to assist the navigation by matching synonyms or translations (BabelNet, WordNet) • Use other third-party services to assist in the disambiguation of words using the context of the data acquired along the navigation (Babelfy) • Better methods for effectively crawling Linked Datasets at runtime (that don't strain servers and provide quick response times)