07/09/2016 - TPDL, Hannover
Joffrey Decourselle, Fabien Duchateau, Trond Aalberg,
Naimdjon Takhirov, Nicolas Lumineau
BIB-R: a Benchmark
for the Interpretation of Bibliographic Records
From MARC to… FRBR
2
020 $c 13,5€
041 $a eng
100 $a Robert Louis Stevenson
245 $a Strange Case of Dr. Jekyll and Mr. Hyde
300 $b Colorful illustrations
MARC Record
Tennant, R. (2002). MARC must die. Library Journal - New York
From MARC to… FRBR
3
020 $c 13,5€
041 $a eng
100 $a Robert Louis Stevenson
245 $a Strange Case of Dr. Jekyll and Mr. Hyde
300 $b Colorful illustrations
Expression
[English]
Person
[Robert Louis
Stevenson]
Manifestation
[Illustrations] Item
[13,5€]
Work
[Strange Case of
Dr. Jekyll
and Mr. Hyde]
Realization
Embodiment
Exemplification
Creation
Tillett, B. (2005). FRBR and Cataloging for the Future. Cataloging & classification quarterly
MARC Record
FRBR
FRBRisation process
4
Catalog
Rule Engine
Mapping
Rules
W1
E1
M1
A1
W2
E2
M2
A2
W1
A1
E1
E2 M2
M1
Deduplication
Pre-FRBRization
 Tuning
 Preparation
FRBRization
 Entity/property extraction
 Deduplication
Post-FRBRization
 Validation
 Enrichment
State of the art of FRBRization techniques
5
Decourselle, J., Duchateau, F., Lumineau, N. (2015). A Survey of FRBRization Techniques. TPDL
Related Work for evaluating FRBRisation
6
Process and evaluation metrics for FRBRisation
Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M. (2012). FRBR-ML: A FRBR-based
framework for semantic interoperability. Semantic Web.
Requirements for Bibliographic records
Manguinhas, H. M. Á., Freire, N. M. A., Borbinha, J. L. B. (2010). FRBRization of MARC
records in multiple catalogs. In JCDL
Challenges of FRBRisation through use-cases
Aalberg, T., & Žumer, M. (2013). The value of MARC data, or, challenges of
frbrisation. Journal of documentation
Motivation
Comparison of existing solutions
Need for metrics according to the bibliographic patterns
No qualitative comparison between tools
Datasets for FRBRisation
Too small or simple
Not representative of specific FRBRisation cases
7
Contributions
8
Definition of dedicated metrics
Pre-FRBRisation (issues, cataloguing practices, …)
FRBRisation (rules usage, performance, …)
Post-FRBRisation (completeness, consistency, …)
Open datasets with FRBR ground truth
T42 (multiple records collections focused on migration cases)
BIBR-CAT (larger collection representative of real work catalog)
Experiments on three recent FRBRisation tools
http://bib-r.github.io/
Metrics – Datasets – Experiments
BIB-R: a Benchmark for the Interpretation of Bibliographic Records
9
Hidden bibliographic patterns in MARC
10
A1
W1
E1
M1
E2 A2
A1
W1
E1.1
M1.1
E1.2
W1.1 W1.2
M1.2
A
W
E
M
Riva, P. (2004). Mapping MARC 21 linking entry fields to FRBR and Tillett's taxonomy of bibliographic
relationships. Library resources & technical services
Core Derivation Aggregation
Inconsistencies and cataloguing practices
11
101 $a no $c en
200 $a Ringenes herre = The Lord Of The Ring
$f J.R.R. Tolkien
$g trans. by Eilev Groven
210 $a Oslo ; Paris $c Tiden Norsk Forlag $d 2006
500 $a
997 $k 1543218621
Manguinhas, H. M. Á., Freire, N. M. A., Borbinha, J. L. B. (2010). FRBRization of MARC records in multiple catalogs. JCDL
Pre-FRBRisation Metrics
Pattern analysis
COR, AUG, AGG, …
Inconsistencies & Cataloguing practices
MID, MPD, MUT, MOT, …
Rules (usage, conflicts)
MR, CR, …
12
Metrics to compare the specificities of a catalog with
the rules of a FRBRisation tool.
Pre-FRBRisation Metrics (examples)
13
041 $a no $c en
100 $a J.R.R. Tolkien
240 $a The Lord Of The Ring
245 $a Ringenes herre $f
700 $a Roche, Daniel $4 trl
DER: Percentage of records that describe a Derivation pattern
MUT: Percentage of records where the Uniform Title is missing
FRBRisation Metrics
Rules application
NRT: Number of rules applied
Performance
ETC: Execution time of the entity/relationship creation
ETD: Execution time for deduplication
14
Metrics to evaluate the efficiency of a FRBRisation tool.
Post-FRBRisation Metrics
Completeness
MD, IAD, WSD
Pattern detection
MEND, MRND, ESE, …
15
Metrics to compare the FRBRisation result with the
FRBR expert.
Post-FRBRisation Metrics (examples)
16
Example of related metrics
MEND: Main entity of a specific pattern is not detected
MRND: Main relationship of a specific pattern is not detected
ESE: Secondary element (entity or relationship) is not detected
MD-E/MD-R: Missing entity / relationship
W1
E1
M1
E2 A2
translation translator
A1
Missing Relationship
Missing Entity
Main Relationship
Secondary Relationship
Secondary Entity
Metrics – Datasets – Experiments
BIB-R: a Benchmark for the Interpretation of Bibliographic Records
17
Datasets
T42
42 tests, 5 categories of bibliographic patterns
1.x for Core pattern, 2.x for Augmentation, …
Each test combines one bibliographic pattern and one
inconsistency/cataloguing practice
e.g., 3.5 for Derivation with Missing Uniform Title
BIBR-CAT
One collection closer to real-world catalogs
Mix of bibliographic patterns and issues
18
Datasets
19
Files provided in XML formats
MARC21, UNIMARC & FRBR/RDA
Hosted on GitHub: http://bib-r.github.io/
Metrics – Datasets – Experiments
BIB-R: a Benchmark for the Interpretation of Bibliographic Records
20
FRBRisation Tools
Variations VFRBR (Indiana University)
Hardcoded rules
Washington, M., Notess, M., & Dunn, J. W. (2011). Taking Music Metadata from MARC to FRBR to RDF.
International Conference on Dublin Core and Metadata Applications
Extensible Catalog (Organization / Consortium)
Hardcoded rules (harvesting limited to OAI-PMH)
Bowen, J. B. (2010). Moving library metadata toward linked data: Opportunities provided by the eXtensible
catalog. International Conference on Dublin Core and Metadata Applications
FRBR-ML (NTNU)
Declarative rules
Takhirov, N., Aalberg, T., Duchateau, F., & Žumer, M. (2012). FRBR-ML: A FRBR-based framework for semantic
interoperability. Semantic Web
21
Experiments
Assessing strengths and weaknesses
Three tools applied to the 42 tests of T42
Metrics from Post-FRBRization
Comparing tools in real-world context
Three tools applied to BIBR-CAT
Metrics from FRBRization & Post-FRBRization
Facilitating the tuning
Only for FRBR-ML (declarative rules) applied to BIBR-CAT
Tuning based on Pre-FRBRization metrics
22
Experiment 1 (T42)
23
Evaluating completeness with FRBR-ML
MD: Missing Data
E: entity
R: relationship
P: property
Percentage
ofMD
Number of the test in T42
Experiment 1 (T42)
24
Evaluating completeness with VFRBR
MD: Missing Data
E: entity
R: relationship
P: property
Percentage
ofMD
Number of the test in T42
Experiment 1 (T42)
25
Incorrectly Added Data with Extensible Catalog
Percentage
ofIAD
Number of the test in T42
Experiment 1 (T42)
26
(Pattern) Main Entity Not Detected with FRBR-ML
Number of the test in T42
Percentage
ofMEND
Experiment 2 (BIBR-CAT)
27
XCVFRBR
Evaluation of the quality (multiple metrics)
Percentage
ofthemetric
Metric Metric
Experiment 2 (BIBR-CAT)
28
Summary of evaluation results for the three tools
Experiment 3 (BIBR-CAT with tuned FRBR-ML)
29
Based on analysis feedback from pre-FRBRisation metrics
Tuning performed by one expert for 4 hours
Discussion
Experiments results: http://bib-r.github.io/experiments.pdf
Analysis of evaluation results
Limited bibliographic pattern detection
Difficulty to implement some metrics (e.g., IAD, WSD)
Keys for further improvements
Enhanced tuning with pre-FRBRisation metrics
Detection of bibliographic patterns
Visualization and interactions on migration rules
30
Conclusion
BIB-R benchmark
Definition of new metrics (Pre-FRBRization, FRBRization
& Post-FRBRization)
Two open Datasets (T42 & BIBR-CAT)
Experimental results with VFRBR, XC & FRBR-ML
Ongoing works
Creation of new datasets with ground truth
Design of a novel FRBRisation solution
31
32
Thank you !
To get more details about our projects:
http://liris.cnrs.fr/diricks/ http://www.progilone.fr/en/syrtis
http://bib-r.github.io/

More Related Content

PPTX
A language-independent method for the extraction of RDF verbalization templat...
PPTX
Entity Linking in Queries: Tasks and Evaluation
DOCX
Data types in c
PDF
On the Reproducibility of the TAGME entity linking system
PDF
A Multi-theory Logic Language for the World Wide Web
PDF
Inductive Entity Typing Alignment
PDF
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
DOCX
original_cv[1]
A language-independent method for the extraction of RDF verbalization templat...
Entity Linking in Queries: Tasks and Evaluation
Data types in c
On the Reproducibility of the TAGME entity linking system
A Multi-theory Logic Language for the World Wide Web
Inductive Entity Typing Alignment
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
original_cv[1]

Viewers also liked (6)

PPTX
Genres of music
PDF
Tyry CV 2016 normal
PDF
10 геом билянина_билянин_академ_20101_рус
PDF
التقرير الأدبي
PDF
4 Steps for Improved Stakeholder Engagement
PDF
Online Marketing for Business-Kubik
Genres of music
Tyry CV 2016 normal
10 геом билянина_билянин_академ_20101_рус
التقرير الأدبي
4 Steps for Improved Stakeholder Engagement
Online Marketing for Business-Kubik
Ad

Similar to Benchmark BIB-R @ TPDL 2016 (20)

PPT
RDA (Resource Description & Access)
PDF
PPTX
Democratizing Big Semantic Data management
PPT
The paper trail:steps towards a reference model for the metadata ecology
PDF
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
PPTX
Topical_Facets
PDF
SPARQL and Linked Data
PPTX
Make your data great again - Ver 2
PDF
ParlBench: a SPARQL-benchmark for electronic publishing applications.
PDF
Interactive Information Retrieval inspired by Quantum Theory
PPTX
Nanoinformatics 2010 SMIRP-ONS Talk
PPTX
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
PDF
Redundancy analysis on linked data #cold2014 #ISWC2014
PPT
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
PDF
Calculating Projections via Type Checking
PDF
Entity Retrieval (SIGIR 2013 tutorial)
PDF
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
PPT
Practical dimensions
PPTX
Presentation at MTSR 2012
PPTX
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
RDA (Resource Description & Access)
Democratizing Big Semantic Data management
The paper trail:steps towards a reference model for the metadata ecology
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
Topical_Facets
SPARQL and Linked Data
Make your data great again - Ver 2
ParlBench: a SPARQL-benchmark for electronic publishing applications.
Interactive Information Retrieval inspired by Quantum Theory
Nanoinformatics 2010 SMIRP-ONS Talk
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Redundancy analysis on linked data #cold2014 #ISWC2014
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
Calculating Projections via Type Checking
Entity Retrieval (SIGIR 2013 tutorial)
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Practical dimensions
Presentation at MTSR 2012
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
Ad

Recently uploaded (20)

PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPT
What is a Computer? Input Devices /output devices
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Modernising the Digital Integration Hub
PDF
CloudStack 4.21: First Look Webinar slides
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Microsoft Excel 365/2024 Beginner's training
sustainability-14-14877-v2.pddhzftheheeeee
What is a Computer? Input Devices /output devices
The influence of sentiment analysis in enhancing early warning system model f...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Architecture types and enterprise applications.pdf
Taming the Chaos: How to Turn Unstructured Data into Decisions
Module 1.ppt Iot fundamentals and Architecture
Convolutional neural network based encoder-decoder for efficient real-time ob...
Comparative analysis of machine learning models for fake news detection in so...
OpenACC and Open Hackathons Monthly Highlights July 2025
A contest of sentiment analysis: k-nearest neighbor versus neural network
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Modernising the Digital Integration Hub
CloudStack 4.21: First Look Webinar slides
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Build Your First AI Agent with UiPath.pptx
Getting started with AI Agents and Multi-Agent Systems
Benefits of Physical activity for teenagers.pptx
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...

Benchmark BIB-R @ TPDL 2016

  • 1. 07/09/2016 - TPDL, Hannover Joffrey Decourselle, Fabien Duchateau, Trond Aalberg, Naimdjon Takhirov, Nicolas Lumineau BIB-R: a Benchmark for the Interpretation of Bibliographic Records
  • 2. From MARC to… FRBR 2 020 $c 13,5€ 041 $a eng 100 $a Robert Louis Stevenson 245 $a Strange Case of Dr. Jekyll and Mr. Hyde 300 $b Colorful illustrations MARC Record Tennant, R. (2002). MARC must die. Library Journal - New York
  • 3. From MARC to… FRBR 3 020 $c 13,5€ 041 $a eng 100 $a Robert Louis Stevenson 245 $a Strange Case of Dr. Jekyll and Mr. Hyde 300 $b Colorful illustrations Expression [English] Person [Robert Louis Stevenson] Manifestation [Illustrations] Item [13,5€] Work [Strange Case of Dr. Jekyll and Mr. Hyde] Realization Embodiment Exemplification Creation Tillett, B. (2005). FRBR and Cataloging for the Future. Cataloging & classification quarterly MARC Record FRBR
  • 4. FRBRisation process 4 Catalog Rule Engine Mapping Rules W1 E1 M1 A1 W2 E2 M2 A2 W1 A1 E1 E2 M2 M1 Deduplication Pre-FRBRization  Tuning  Preparation FRBRization  Entity/property extraction  Deduplication Post-FRBRization  Validation  Enrichment
  • 5. State of the art of FRBRization techniques 5 Decourselle, J., Duchateau, F., Lumineau, N. (2015). A Survey of FRBRization Techniques. TPDL
  • 6. Related Work for evaluating FRBRisation 6 Process and evaluation metrics for FRBRisation Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M. (2012). FRBR-ML: A FRBR-based framework for semantic interoperability. Semantic Web. Requirements for Bibliographic records Manguinhas, H. M. Á., Freire, N. M. A., Borbinha, J. L. B. (2010). FRBRization of MARC records in multiple catalogs. In JCDL Challenges of FRBRisation through use-cases Aalberg, T., & Žumer, M. (2013). The value of MARC data, or, challenges of frbrisation. Journal of documentation
  • 7. Motivation Comparison of existing solutions Need for metrics according to the bibliographic patterns No qualitative comparison between tools Datasets for FRBRisation Too small or simple Not representative of specific FRBRisation cases 7
  • 8. Contributions 8 Definition of dedicated metrics Pre-FRBRisation (issues, cataloguing practices, …) FRBRisation (rules usage, performance, …) Post-FRBRisation (completeness, consistency, …) Open datasets with FRBR ground truth T42 (multiple records collections focused on migration cases) BIBR-CAT (larger collection representative of real work catalog) Experiments on three recent FRBRisation tools http://bib-r.github.io/
  • 9. Metrics – Datasets – Experiments BIB-R: a Benchmark for the Interpretation of Bibliographic Records 9
  • 10. Hidden bibliographic patterns in MARC 10 A1 W1 E1 M1 E2 A2 A1 W1 E1.1 M1.1 E1.2 W1.1 W1.2 M1.2 A W E M Riva, P. (2004). Mapping MARC 21 linking entry fields to FRBR and Tillett's taxonomy of bibliographic relationships. Library resources & technical services Core Derivation Aggregation
  • 11. Inconsistencies and cataloguing practices 11 101 $a no $c en 200 $a Ringenes herre = The Lord Of The Ring $f J.R.R. Tolkien $g trans. by Eilev Groven 210 $a Oslo ; Paris $c Tiden Norsk Forlag $d 2006 500 $a 997 $k 1543218621 Manguinhas, H. M. Á., Freire, N. M. A., Borbinha, J. L. B. (2010). FRBRization of MARC records in multiple catalogs. JCDL
  • 12. Pre-FRBRisation Metrics Pattern analysis COR, AUG, AGG, … Inconsistencies & Cataloguing practices MID, MPD, MUT, MOT, … Rules (usage, conflicts) MR, CR, … 12 Metrics to compare the specificities of a catalog with the rules of a FRBRisation tool.
  • 13. Pre-FRBRisation Metrics (examples) 13 041 $a no $c en 100 $a J.R.R. Tolkien 240 $a The Lord Of The Ring 245 $a Ringenes herre $f 700 $a Roche, Daniel $4 trl DER: Percentage of records that describe a Derivation pattern MUT: Percentage of records where the Uniform Title is missing
  • 14. FRBRisation Metrics Rules application NRT: Number of rules applied Performance ETC: Execution time of the entity/relationship creation ETD: Execution time for deduplication 14 Metrics to evaluate the efficiency of a FRBRisation tool.
  • 15. Post-FRBRisation Metrics Completeness MD, IAD, WSD Pattern detection MEND, MRND, ESE, … 15 Metrics to compare the FRBRisation result with the FRBR expert.
  • 16. Post-FRBRisation Metrics (examples) 16 Example of related metrics MEND: Main entity of a specific pattern is not detected MRND: Main relationship of a specific pattern is not detected ESE: Secondary element (entity or relationship) is not detected MD-E/MD-R: Missing entity / relationship W1 E1 M1 E2 A2 translation translator A1 Missing Relationship Missing Entity Main Relationship Secondary Relationship Secondary Entity
  • 17. Metrics – Datasets – Experiments BIB-R: a Benchmark for the Interpretation of Bibliographic Records 17
  • 18. Datasets T42 42 tests, 5 categories of bibliographic patterns 1.x for Core pattern, 2.x for Augmentation, … Each test combines one bibliographic pattern and one inconsistency/cataloguing practice e.g., 3.5 for Derivation with Missing Uniform Title BIBR-CAT One collection closer to real-world catalogs Mix of bibliographic patterns and issues 18
  • 19. Datasets 19 Files provided in XML formats MARC21, UNIMARC & FRBR/RDA Hosted on GitHub: http://bib-r.github.io/
  • 20. Metrics – Datasets – Experiments BIB-R: a Benchmark for the Interpretation of Bibliographic Records 20
  • 21. FRBRisation Tools Variations VFRBR (Indiana University) Hardcoded rules Washington, M., Notess, M., & Dunn, J. W. (2011). Taking Music Metadata from MARC to FRBR to RDF. International Conference on Dublin Core and Metadata Applications Extensible Catalog (Organization / Consortium) Hardcoded rules (harvesting limited to OAI-PMH) Bowen, J. B. (2010). Moving library metadata toward linked data: Opportunities provided by the eXtensible catalog. International Conference on Dublin Core and Metadata Applications FRBR-ML (NTNU) Declarative rules Takhirov, N., Aalberg, T., Duchateau, F., & Žumer, M. (2012). FRBR-ML: A FRBR-based framework for semantic interoperability. Semantic Web 21
  • 22. Experiments Assessing strengths and weaknesses Three tools applied to the 42 tests of T42 Metrics from Post-FRBRization Comparing tools in real-world context Three tools applied to BIBR-CAT Metrics from FRBRization & Post-FRBRization Facilitating the tuning Only for FRBR-ML (declarative rules) applied to BIBR-CAT Tuning based on Pre-FRBRization metrics 22
  • 23. Experiment 1 (T42) 23 Evaluating completeness with FRBR-ML MD: Missing Data E: entity R: relationship P: property Percentage ofMD Number of the test in T42
  • 24. Experiment 1 (T42) 24 Evaluating completeness with VFRBR MD: Missing Data E: entity R: relationship P: property Percentage ofMD Number of the test in T42
  • 25. Experiment 1 (T42) 25 Incorrectly Added Data with Extensible Catalog Percentage ofIAD Number of the test in T42
  • 26. Experiment 1 (T42) 26 (Pattern) Main Entity Not Detected with FRBR-ML Number of the test in T42 Percentage ofMEND
  • 27. Experiment 2 (BIBR-CAT) 27 XCVFRBR Evaluation of the quality (multiple metrics) Percentage ofthemetric Metric Metric
  • 28. Experiment 2 (BIBR-CAT) 28 Summary of evaluation results for the three tools
  • 29. Experiment 3 (BIBR-CAT with tuned FRBR-ML) 29 Based on analysis feedback from pre-FRBRisation metrics Tuning performed by one expert for 4 hours
  • 30. Discussion Experiments results: http://bib-r.github.io/experiments.pdf Analysis of evaluation results Limited bibliographic pattern detection Difficulty to implement some metrics (e.g., IAD, WSD) Keys for further improvements Enhanced tuning with pre-FRBRisation metrics Detection of bibliographic patterns Visualization and interactions on migration rules 30
  • 31. Conclusion BIB-R benchmark Definition of new metrics (Pre-FRBRization, FRBRization & Post-FRBRization) Two open Datasets (T42 & BIBR-CAT) Experimental results with VFRBR, XC & FRBR-ML Ongoing works Creation of new datasets with ground truth Design of a novel FRBRisation solution 31
  • 32. 32 Thank you ! To get more details about our projects: http://liris.cnrs.fr/diricks/ http://www.progilone.fr/en/syrtis http://bib-r.github.io/