Benchmark BIB-R @ TPDL 2016

07/09/2016 - TPDL, Hannover
Joffrey Decourselle, Fabien Duchateau, Trond Aalberg,
Naimdjon Takhirov, Nicolas Lumineau
BIB-R: a Benchmark
for the Interpretation of Bibliographic Records

From MARC to… FRBR
2
020 $c 13,5€
041 $a eng
100 $a Robert Louis Stevenson
245 $a Strange Case of Dr. Jekyll and Mr. Hyde
300 $b Colorful illustrations
MARC Record
Tennant, R. (2002). MARC must die. Library Journal - New York

From MARC to… FRBR
3
020 $c 13,5€
041 $a eng
100 $a Robert Louis Stevenson
245 $a Strange Case of Dr. Jekyll and Mr. Hyde
300 $b Colorful illustrations
Expression
[English]
Person
[Robert Louis
Stevenson]
Manifestation
[Illustrations] Item
[13,5€]
Work
[Strange Case of
Dr. Jekyll
and Mr. Hyde]
Realization
Embodiment
Exemplification
Creation
Tillett, B. (2005). FRBR and Cataloging for the Future. Cataloging & classification quarterly
MARC Record
FRBR

FRBRisation process
4
Catalog
Rule Engine
Mapping
Rules
W1
E1
M1
A1
W2
E2
M2
A2
W1
A1
E1
E2 M2
M1
Deduplication
Pre-FRBRization
 Tuning
 Preparation
FRBRization
 Entity/property extraction
 Deduplication
Post-FRBRization
 Validation
 Enrichment

State of the art of FRBRization techniques
5
Decourselle, J., Duchateau, F., Lumineau, N. (2015). A Survey of FRBRization Techniques. TPDL

Related Work for evaluating FRBRisation
6
Process and evaluation metrics for FRBRisation
Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M. (2012). FRBR-ML: A FRBR-based
framework for semantic interoperability. Semantic Web.
Requirements for Bibliographic records
Manguinhas, H. M. Á., Freire, N. M. A., Borbinha, J. L. B. (2010). FRBRization of MARC
records in multiple catalogs. In JCDL
Challenges of FRBRisation through use-cases
Aalberg, T., & Žumer, M. (2013). The value of MARC data, or, challenges of
frbrisation. Journal of documentation

Motivation
Comparison of existing solutions
Need for metrics according to the bibliographic patterns
No qualitative comparison between tools
Datasets for FRBRisation
Too small or simple
Not representative of specific FRBRisation cases
7

Contributions
8
Definition of dedicated metrics
Pre-FRBRisation (issues, cataloguing practices, …)
FRBRisation (rules usage, performance, …)
Post-FRBRisation (completeness, consistency, …)
Open datasets with FRBR ground truth
T42 (multiple records collections focused on migration cases)
BIBR-CAT (larger collection representative of real work catalog)
Experiments on three recent FRBRisation tools
http://bib-r.github.io/

Metrics – Datasets – Experiments
BIB-R: a Benchmark for the Interpretation of Bibliographic Records
9

Hidden bibliographic patterns in MARC
10
A1
W1
E1
M1
E2 A2
A1
W1
E1.1
M1.1
E1.2
W1.1 W1.2
M1.2
A
W
E
M
Riva, P. (2004). Mapping MARC 21 linking entry fields to FRBR and Tillett's taxonomy of bibliographic
relationships. Library resources & technical services
Core Derivation Aggregation

Inconsistencies and cataloguing practices
11
101 $a no $c en
200 $a Ringenes herre = The Lord Of The Ring
$f J.R.R. Tolkien
$g trans. by Eilev Groven
210 $a Oslo ; Paris $c Tiden Norsk Forlag $d 2006
500 $a
997 $k 1543218621
Manguinhas, H. M. Á., Freire, N. M. A., Borbinha, J. L. B. (2010). FRBRization of MARC records in multiple catalogs. JCDL

Pre-FRBRisation Metrics
Pattern analysis
COR, AUG, AGG, …
Inconsistencies & Cataloguing practices
MID, MPD, MUT, MOT, …
Rules (usage, conflicts)
MR, CR, …
12
Metrics to compare the specificities of a catalog with
the rules of a FRBRisation tool.

Pre-FRBRisation Metrics (examples)
13
041 $a no $c en
100 $a J.R.R. Tolkien
240 $a The Lord Of The Ring
245 $a Ringenes herre $f
700 $a Roche, Daniel $4 trl
DER: Percentage of records that describe a Derivation pattern
MUT: Percentage of records where the Uniform Title is missing

FRBRisation Metrics
Rules application
NRT: Number of rules applied
Performance
ETC: Execution time of the entity/relationship creation
ETD: Execution time for deduplication
14
Metrics to evaluate the efficiency of a FRBRisation tool.

Post-FRBRisation Metrics
Completeness
MD, IAD, WSD
Pattern detection
MEND, MRND, ESE, …
15
Metrics to compare the FRBRisation result with the
FRBR expert.

Post-FRBRisation Metrics (examples)
16
Example of related metrics
MEND: Main entity of a specific pattern is not detected
MRND: Main relationship of a specific pattern is not detected
ESE: Secondary element (entity or relationship) is not detected
MD-E/MD-R: Missing entity / relationship
W1
E1
M1
E2 A2
translation translator
A1
Missing Relationship
Missing Entity
Main Relationship
Secondary Relationship
Secondary Entity

17

Datasets
T42
42 tests, 5 categories of bibliographic patterns
1.x for Core pattern, 2.x for Augmentation, …
Each test combines one bibliographic pattern and one
inconsistency/cataloguing practice
e.g., 3.5 for Derivation with Missing Uniform Title
BIBR-CAT
One collection closer to real-world catalogs
Mix of bibliographic patterns and issues
18

Datasets
19
Files provided in XML formats
MARC21, UNIMARC & FRBR/RDA
Hosted on GitHub: http://bib-r.github.io/

20

FRBRisation Tools
Variations VFRBR (Indiana University)
Hardcoded rules
Washington, M., Notess, M., & Dunn, J. W. (2011). Taking Music Metadata from MARC to FRBR to RDF.
International Conference on Dublin Core and Metadata Applications
Extensible Catalog (Organization / Consortium)
Hardcoded rules (harvesting limited to OAI-PMH)
Bowen, J. B. (2010). Moving library metadata toward linked data: Opportunities provided by the eXtensible
catalog. International Conference on Dublin Core and Metadata Applications
FRBR-ML (NTNU)
Declarative rules
Takhirov, N., Aalberg, T., Duchateau, F., & Žumer, M. (2012). FRBR-ML: A FRBR-based framework for semantic
interoperability. Semantic Web
21

Experiments
Assessing strengths and weaknesses
Three tools applied to the 42 tests of T42
Metrics from Post-FRBRization
Comparing tools in real-world context
Three tools applied to BIBR-CAT
Metrics from FRBRization & Post-FRBRization
Facilitating the tuning
Only for FRBR-ML (declarative rules) applied to BIBR-CAT
Tuning based on Pre-FRBRization metrics
22

Experiment 1 (T42)
23
Evaluating completeness with FRBR-ML
MD: Missing Data
E: entity
R: relationship
P: property
Percentage
ofMD
Number of the test in T42

Experiment 1 (T42)
24
Evaluating completeness with VFRBR
MD: Missing Data
E: entity
R: relationship
P: property
Percentage
ofMD

Experiment 1 (T42)
25
Incorrectly Added Data with Extensible Catalog
Percentage
ofIAD

Experiment 1 (T42)
26
(Pattern) Main Entity Not Detected with FRBR-ML
Percentage
ofMEND

Experiment 2 (BIBR-CAT)
27
XCVFRBR
Evaluation of the quality (multiple metrics)
Percentage
ofthemetric
Metric Metric

Experiment 2 (BIBR-CAT)
28
Summary of evaluation results for the three tools

Experiment 3 (BIBR-CAT with tuned FRBR-ML)
29
Based on analysis feedback from pre-FRBRisation metrics
Tuning performed by one expert for 4 hours

Discussion
Experiments results: http://bib-r.github.io/experiments.pdf
Analysis of evaluation results
Limited bibliographic pattern detection
Difficulty to implement some metrics (e.g., IAD, WSD)
Keys for further improvements
Enhanced tuning with pre-FRBRisation metrics
Detection of bibliographic patterns
Visualization and interactions on migration rules
30

Conclusion
BIB-R benchmark
Definition of new metrics (Pre-FRBRization, FRBRization
& Post-FRBRization)
Two open Datasets (T42 & BIBR-CAT)
Experimental results with VFRBR, XC & FRBR-ML
Ongoing works
Creation of new datasets with ground truth
Design of a novel FRBRisation solution
31

32
Thank you !
To get more details about our projects:
http://liris.cnrs.fr/diricks/ http://www.progilone.fr/en/syrtis
http://bib-r.github.io/

Benchmark BIB-R @ TPDL 2016

More Related Content

Viewers also liked (6)

Similar to Benchmark BIB-R @ TPDL 2016 (20)

Recently uploaded (20)

Benchmark BIB-R @ TPDL 2016