SlideShare a Scribd company logo
Linking Library Data
with Fusepool
Johannes Hercher (Free University Berlin)
June 25, 2014
@jhercher
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Context
I care for metadata Ugh! 

Your OPAC sucks
We
cooperate…
How to link Library Data
with the „Oceans“ of WWW ?
German
National Library
published authority
data
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Example
a search in subject index (with GND Identifiers)
a search in full text http://primo.fu-berlin.de
• GND = Thesaurus for
subject indexing in Germany
• Search with GND limited to

local resources
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
• search beyond the local
holdings => easier, more reliable
• suggest content using
semantic relations 

( GND is a Thesaurus ! )
You* should use
identifiers
*publishers, authors, aggregators
Assigning IDs 

is time consuming
- Reality -
Assigning IDs 

is fun
- Vision -
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Questions & Tasks
• Could machines do the subject indexing?

-> Use SMA to enrich DBpedia pages with GND IDs
• Can we support Librarians in subject indexing? 

-> Build Annotator Prototype 



https://github.com/jhercher/LEE/
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Demonstrator
AnnotatorApp: 

filters stoppwords and
displays Library entities
for your text
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Review concepts and
start a search using concept id’s
https://github.com/jhercher/LEE
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
How to Fusepool
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Workflow
1. Select a subset of GND Subject Headings using SPARQL
2. Import Subject Headings
3. Configure SMA dictionary component
4. Import documents (Graph)
5. Batch matching of documents with dictionaries using
Fusepools DLC
6. Review results and build services on top
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
http://zbw.eu/beta/sparql/gnd
http://d-nb.info/standards/elementset/gnd


NomenclatureInBiologyOrChemistry

SubjectHeadingSensoStricto

ProductNameOrBrandName

HistoricSingleEventOrEra

EthnographicName

GroupOfPersons

SubjectHeading
Language

Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
http://localhost:8080/admin/graphs/
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Results
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
<http://de.dbpedia.org/resource/Wilder_Streik_bei_Ford_(1973)>

<http://purl.org/dc/elements/1.1/subject>

<http://d-nb.info/gnd/7708211-4> , # Drug-eluting Stent(syn: DES)
<http://d-nb.info/gnd/4302110-4> , # Ford

<http://d-nb.info/gnd/4578282-9> , # sich [„self“@en] 

<http://d-nb.info/gnd/4248646-4> , # Spitzel [„spy“@en] (syn: IM)
<http://d-nb.info/gnd/4389837-3> , # August (month)

<http://d-nb.info/gnd/4291333-0> , # Niederlage [„defeat“@en]

<http://d-nb.info/gnd/4002623-1> . # Arbeitnehmer [„employee“@en]
• GND Dictionary includes: articles, prepositions, adjectives…
• Acronyms („IM, DES“) -> activate „Case Sensitivity“
• Not every match is useful in the context („August, Defeat“)
http://localhost:8080/graph?name=urn:x-localinstance:/dlc/
{yourDataset}/enhance.graph
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
human (found in GND) = 1
SMA GND suggestions = 7
SMA correct = 3
precision = 33%
recall = 100%
SMA false = 1
Prototype: GND Annotator
Persons LocationsTopics Time
manual Evaluation only for Topics
ok
ok
not relevant
false
not relevant
ok
not relevant
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Results (1)
Recall: 78%"
Precision: 73%
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Results (2)
Recall: 90%"
Precision: 72%
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
http://primo.kobv.de/docId=TN_thieme_articles10.1055/s-0029-1237743
Fusepool in the wild (1)
no exact
string match
chemical term geographic
financial
education
too broad
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Fusepool in the wild (2)
Abstract
Reviews
TOC
ISBN: 9783642371103
Drawback: 

Quality of annotations
depend on text input
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Feedback
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Why Fusepool?
1. Ready for the Semantic Web"
• can handle graphs (clerezza, TDB,…)
• Data i/o using REST
2. String Matching SMA"
• Import & configuration of dictionaries (e.g. a Thesaurus)
• batch matching & annotation using Data Life Center (DLC)
3. Easy to install Builds at http://jenkins.fusepool.info
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Conclusion
!
• Fusepool: Infrastructure to build new services
• … better linking beyond the aquarium(s)
• TODO:
• build tailored interfaces for annotation, search, recommender
• improve the dictionaries
Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Thank You!
twitter: @jhercher
github: https://github.com/jhercher/
mail: hercher@ub.fu-berlin.de

More Related Content

PDF
Research in Germany: a clinician-scientist´s view.
PPTX
sciforge at EGU 2014 OSGeo Townhall Meeting
PDF
Introducing the UGC Networking Resource Centre for Materials (NRC-M)
PDF
Quo vadis nutzergenerierte Metadaten?
PPT
Kulturmanagement-Tage 2015
PDF
Semantische Suche im audiovisuellen Kulturerbe - Das Projekt mediaglobe
PDF
Social Tagging / Social Computing
PDF
20120608_Thematische Vernetzung heterogener Informationsbestände
Research in Germany: a clinician-scientist´s view.
sciforge at EGU 2014 OSGeo Townhall Meeting
Introducing the UGC Networking Resource Centre for Materials (NRC-M)
Quo vadis nutzergenerierte Metadaten?
Kulturmanagement-Tage 2015
Semantische Suche im audiovisuellen Kulturerbe - Das Projekt mediaglobe
Social Tagging / Social Computing
20120608_Thematische Vernetzung heterogener Informationsbestände

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Machine Learning_overview_presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Electronic commerce courselecture one. Pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Machine Learning_overview_presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Electronic commerce courselecture one. Pdf
sap open course for s4hana steps from ECC to s4
Encapsulation_ Review paper, used for researhc scholars
MIND Revenue Release Quarter 2 2025 Press Release
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Ad
Ad

Linking Library Data using Fusepool

  • 1. Linking Library Data with Fusepool Johannes Hercher (Free University Berlin) June 25, 2014 @jhercher
  • 2. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Context I care for metadata Ugh! 
 Your OPAC sucks We cooperate… How to link Library Data with the „Oceans“ of WWW ? German National Library published authority data
  • 3. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Example a search in subject index (with GND Identifiers) a search in full text http://primo.fu-berlin.de • GND = Thesaurus for subject indexing in Germany • Search with GND limited to
 local resources
  • 4. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library • search beyond the local holdings => easier, more reliable • suggest content using semantic relations 
 ( GND is a Thesaurus ! ) You* should use identifiers *publishers, authors, aggregators Assigning IDs 
 is time consuming - Reality - Assigning IDs 
 is fun - Vision -
  • 5. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Questions & Tasks • Could machines do the subject indexing?
 -> Use SMA to enrich DBpedia pages with GND IDs • Can we support Librarians in subject indexing? 
 -> Build Annotator Prototype 
 
 https://github.com/jhercher/LEE/
  • 6. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Demonstrator AnnotatorApp: 
 filters stoppwords and displays Library entities for your text
  • 7. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Review concepts and start a search using concept id’s https://github.com/jhercher/LEE
  • 8. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library How to Fusepool
  • 9. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Workflow 1. Select a subset of GND Subject Headings using SPARQL 2. Import Subject Headings 3. Configure SMA dictionary component 4. Import documents (Graph) 5. Batch matching of documents with dictionaries using Fusepools DLC 6. Review results and build services on top
  • 10. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library http://zbw.eu/beta/sparql/gnd http://d-nb.info/standards/elementset/gnd 
 NomenclatureInBiologyOrChemistry
 SubjectHeadingSensoStricto
 ProductNameOrBrandName
 HistoricSingleEventOrEra
 EthnographicName
 GroupOfPersons
 SubjectHeading Language

  • 11. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library http://localhost:8080/admin/graphs/
  • 12. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
  • 13. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
  • 14. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Results
  • 15. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library <http://de.dbpedia.org/resource/Wilder_Streik_bei_Ford_(1973)>
 <http://purl.org/dc/elements/1.1/subject>
 <http://d-nb.info/gnd/7708211-4> , # Drug-eluting Stent(syn: DES) <http://d-nb.info/gnd/4302110-4> , # Ford
 <http://d-nb.info/gnd/4578282-9> , # sich [„self“@en] 
 <http://d-nb.info/gnd/4248646-4> , # Spitzel [„spy“@en] (syn: IM) <http://d-nb.info/gnd/4389837-3> , # August (month)
 <http://d-nb.info/gnd/4291333-0> , # Niederlage [„defeat“@en]
 <http://d-nb.info/gnd/4002623-1> . # Arbeitnehmer [„employee“@en] • GND Dictionary includes: articles, prepositions, adjectives… • Acronyms („IM, DES“) -> activate „Case Sensitivity“ • Not every match is useful in the context („August, Defeat“) http://localhost:8080/graph?name=urn:x-localinstance:/dlc/ {yourDataset}/enhance.graph
  • 16. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library human (found in GND) = 1 SMA GND suggestions = 7 SMA correct = 3 precision = 33% recall = 100% SMA false = 1 Prototype: GND Annotator Persons LocationsTopics Time manual Evaluation only for Topics ok ok not relevant false not relevant ok not relevant
  • 17. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Results (1) Recall: 78%" Precision: 73%
  • 18. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Results (2) Recall: 90%" Precision: 72%
  • 19. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library http://primo.kobv.de/docId=TN_thieme_articles10.1055/s-0029-1237743 Fusepool in the wild (1) no exact string match chemical term geographic financial education too broad
  • 20. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Fusepool in the wild (2) Abstract Reviews TOC ISBN: 9783642371103 Drawback: 
 Quality of annotations depend on text input
  • 21. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Feedback
  • 22. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Why Fusepool? 1. Ready for the Semantic Web" • can handle graphs (clerezza, TDB,…) • Data i/o using REST 2. String Matching SMA" • Import & configuration of dictionaries (e.g. a Thesaurus) • batch matching & annotation using Data Life Center (DLC) 3. Easy to install Builds at http://jenkins.fusepool.info
  • 23. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Conclusion ! • Fusepool: Infrastructure to build new services • … better linking beyond the aquarium(s) • TODO: • build tailored interfaces for annotation, search, recommender • improve the dictionaries
  • 24. Fusepool final public workshop! Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library Thank You! twitter: @jhercher github: https://github.com/jhercher/ mail: hercher@ub.fu-berlin.de