SlideShare a Scribd company logo
Tag-Based Browsing of Digital
Collections with Inverted Indexes and
Browsing Cache
Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán,
José Luis Sierra
Fac. Informática
Universidad Complutense de Madrid
2
Contents
Introduction
The Tag-Based Browsing Model
Tag-Based Browsing with Inverted
Indexes
Adding a Browsing Cache
Conclusions and Future Work
3
Introduction
Clavy: an experimental platform for learning
object repositories with reconfiguable
structures
Clavy makes it possible to rearrange the
hierarchical organization of elements in
metadata schemata.
These reconfigurations affect functionalities like
learning object presentation, and browsing.
In particular, although from a user’s point of
view Clavy supports a guided browsing
paradigm…
… internally it supports more free and flexible
browsing mechanisms…
… able to take account of all the posible ways of
browsing the repositories
4
Introduction
Clavy browsing is internally supported by a tag-
based browsing system
element – value pairs are abstracted as tags
The browsing system maintains:
– A set of active tags
– The set of filtered objects
– The set of additionally selectable tags, able to
further shrink, but not to vanish, the filtered
objects
Updating the browsing snapshot when the set of
active tags changes can be computationally-
intensive
To mitigate the cost we proposed a strategy
based on inverted indexes and a browsing
cache
5
The Tag-Based Browsing Model
Digital Collections
Resources Tagging Resources Tagging
r1 Cave-Painting
Cantabrian
Prehistoric
r4 Tartesian
Plateau
Protohistoric
r2 Cave-Painting
Levant
Prehistoric
r5 Phoenician
Penibaetic
Protohistoric
r3 Megalithic
Cantabrian
Prehistoric
r6 Punic
Levant
Protohistoric
Resources  Content of Learning objects
Tags  Element-value pairs
6
The Tag-Based Browsing Model
Browsing
Browsing state:
– F  Set of selected tags.
– RF  Set of filtered resources.
– SF  Set of selectable tags.
Browsing actions:
– +t  Select the tag t.
– xt  Remove the tag t
7
Browsing with Inverted Indexes
Inverted Indexes
For each tag t the inverted index  returns
the set of all the resources (t) tagged with t
(Cave-Painting)={r1,r2}
(Megalithic)={r3}
(Tartesian)={r4}
(Phoenician)={r5}
(Punic)={r6}
(Cantabrian)={r1,r3}
(Levant)={r2,r6}
(Plateau)={r4}
(Penibaetic)={r5}
(Prehistoric)={r1,r2,r3}
(Protohistoric)={r4,r5,r6}
Resources Tagging Resources Tagging
r1 Cave-Painting
Cantabrian
Prehistoric
r4 Tartesian
Plateau
Protohistoric
r2 Cave-Painting
Levant
Prehistoric
r5 Phoenician
Penibaetic
Protohistoric
r3 Megalithic
Cantabrian
Prehistoric
r6 Punic
Levant
Protohistoric
Inverted index
8
Browsing with Inverted Indexes
The Browsing Strategy
+t browsing action:
– F  F  {t}
– RF  RF(t)
– SF{t’SF-{t} |
0 < |RF(t’)| <|RF|}
xt browsing action:
– F  F - {t}
– RF  t’F (t’) (or all the
resources if F=)
– SF{t’- F |
0 < |RF(t’)| <|RF|}
F= is managed as a
particular case:
– RF  
– SF  {t | |(t)| < ||}
9
: filtered resource
store
F ⟶ RF
: selectable tag
store
F ⟶ SF
: representative
store
RF ⟶ F
Adding a Browsing Cache
CACHE#5 CACHE#4
CACHE#1
CACHE#2
()=
()=
CACHE#3
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
()=
(t10)=R1
F
(t10,t1)=R2
F
(t1)=R5
F
(R1
F
)={t10}
(R2
F
)={t10,t1}
()=
(t10)={t1,t2,t6,t7}
(t10,t1)={t6,t7}
(t1)={t6,t7}
CACHE#6
()=
(t10)=R1
F
(R1
F
)={t10}
()=
(t10)={t1,t2,t6,t7}
+Prehistoric
CACHE#1
+Cave-Painting
CACHE#2
xCave-Painting
CACHE#3
xPrehistoric
CACHE#4+Cave-Painting
CACHE#5
{Cave-Painting}
{Cantabrian,
Levant}
 
 {Prehistoric}
{Cave-Painting,
Megalithic,
Cantabrian,
Levant}
{Prehistoric}
{Cave-Painting,
Megalithic,
Cantabrian,
Levant}
 

R1
F
=R0
F
  (t10) R2
F
=R1
F
  (t1)
R5
F
=R4
F
  (t1)
|R1
F
  (t1)|=2
|R1
F
  (t2)|=1
|R1
F
  (t3)|=0
|R1
F
  (t4)|=0
|R1
F
  (t5)|=0
|R1
F
  (t6)|=2
|R1
F
  (t7)|=1
|R1
F
  (t8)|=0
|R1
F
  (t9)|=0
|R1
F
  (t11)|=0
0<|R1
F
(t)|<|R1
F
|
|R2
F
  (t2)|=0
|R2
F
  (t6)|=1
|R2
F
  (t7)|=1
| (t1)|=2
| (t2)|=1
| (t3)|=1
| (t4)|=1
| (t5)|=1
| (t6)|=2
| (t7)|=2
| (t8)|=1
| (t9)|=1
| (t10)|=3
| (t11)|=3
|(t)|< ||
{Prehistoric,
Cave-Painting}
{Cantabrian,
Levant}
0<|R2
F
(t)|<|R2
F
|
345
{r1,r2,r3} {r1,r2}
{r1,r2,r3}{r1,r2}
0 1 2
CACHE#6
10
Conclusions
A browsing strategy based on a suitable combination of
inverted indexes and multilevel caches has been proposed
to speed up the browsing process in Clavy
Currently we are working on the empirical evaluation of our
approach in Chasqui, a real-world repository in the Pre-
Columbian American archeology field.
Preliminary experiments suggest that the browsing cache
can substantially speed up navigation with respect to a more
basic, un-cached strategy (solely based on inverted indexes).
The price to pay is the overhead generated by cache
management, as well as the higher memory footprint caused
by the technique.
However, the experiments also make apparent how: (i) the
cache management overhead is compensated by eliminating
the explicit computation of the information associated to many
browsing states, and (ii) the cache size is maintained within
reasonable ranges, even when it is not upper-bounded.
11
Future Work
To improve the cache strategy by combining it with our
previous work on navigation automata.
To generalize the browsing strategy to support navigation
through links among resources.
To combine browsing and search, letting users browse
search results according to the browsing model described.
Tag-Based Browsing of Digital
Collections with Inverted Indexes and
Browsing Cache
Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán,
José Luis Sierra
Fac. Informática
Universidad Complutense de Madrid

More Related Content

PPT
An Overview of HDF-EOS (Part II)
PPTX
CLIWOC Attributes
PPT
Reading HDF family of formats via NetCDF-Java / CDM
PPTX
Query Rewriting in RDF Stream Processing
PPT
Web Search Engine
PDF
PDF
Web Archiving – Lessons and Potential
PDF
Measuring System Performance in Cultural Heritage Systems
An Overview of HDF-EOS (Part II)
CLIWOC Attributes
Reading HDF family of formats via NetCDF-Java / CDM
Query Rewriting in RDF Stream Processing
Web Search Engine
Web Archiving – Lessons and Potential
Measuring System Performance in Cultural Heritage Systems

Similar to Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache (20)

PDF
Models and interaction mechanisms for exploratory interfaces
PDF
Pdd crawler a focused web
PPTX
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
PDF
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
PDF
International conference On Computer Science And technology
PDF
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
KEY
Online Collections Crawlability for Libraries, Archives, and Museums
PDF
Design and Implementation of a High- Performance Distributed Web Crawler
PDF
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
PDF
Smart Crawler for Efficient Deep-Web Harvesting
PDF
Semantic Search on Heterogeneous Wiki Systems - poster
PDF
PPT
Web Mining
PPT
Web Mining
PDF
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
PDF
Smart Crawler Automation with RMI
PPT
Websrc~1
PDF
IRJET - Re-Ranking of Google Search Results
PDF
Farthest first clustering in links reorganization
PDF
Mining web-logs-to-improve-website-organization1
Models and interaction mechanisms for exploratory interfaces
Pdd crawler a focused web
Chaos&Order: Using visualization as a means to
 explore large heritage collec...
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
International conference On Computer Science And technology
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Online Collections Crawlability for Libraries, Archives, and Museums
Design and Implementation of a High- Performance Distributed Web Crawler
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Smart Crawler for Efficient Deep-Web Harvesting
Semantic Search on Heterogeneous Wiki Systems - poster
Web Mining
Web Mining
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Smart Crawler Automation with RMI
Websrc~1
IRJET - Re-Ranking of Google Search Results
Farthest first clustering in links reorganization
Mining web-logs-to-improve-website-organization1
Ad

More from Technological Ecosystems for Enhancing Multiculturality (20)

PDF
A Preliminary Study of Proof of Concept Practices and their connection with I...
PDF
Social networks as a promotional space for Spanish radio content. The case st...
PDF
Towards the study of sentiment in the public opinion of science in Spanish
PDF
A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...
PPTX
Specifics of multimedia texts in the context of social networks media aesthetics
PDF
Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...
PDF
Direct online political communication effects on civil participation in spain...
PPTX
University Media in Ecuador: Types, Functions and Self-determination
PDF
Like it or die: using social networks to improve collaborative learning in hi...
PPTX
Framing theory in studies of environmental information in press
PDF
Domain engineering for generating dashboards to analyze employment and employ...
PDF
Mapping the systematic literature studies about software ecosystems
PPTX
A Multivocal Literature Review on the use of DevOps for e-learning systems
PPTX
Document Annotation Tools: Annotation Classification Mechanisms
PDF
Toward supporting decision-making under uncertainty in digital humanities wit...
PDF
Managing Uncertainty in the Humanities: Digital and Analogue Approaches
PDF
Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...
PDF
Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...
PDF
Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Cit...
PDF
Appliying topic modeling techniques to degraded texts. Spanish historical pre...
A Preliminary Study of Proof of Concept Practices and their connection with I...
Social networks as a promotional space for Spanish radio content. The case st...
Towards the study of sentiment in the public opinion of science in Spanish
A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...
Specifics of multimedia texts in the context of social networks media aesthetics
Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...
Direct online political communication effects on civil participation in spain...
University Media in Ecuador: Types, Functions and Self-determination
Like it or die: using social networks to improve collaborative learning in hi...
Framing theory in studies of environmental information in press
Domain engineering for generating dashboards to analyze employment and employ...
Mapping the systematic literature studies about software ecosystems
A Multivocal Literature Review on the use of DevOps for e-learning systems
Document Annotation Tools: Annotation Classification Mechanisms
Toward supporting decision-making under uncertainty in digital humanities wit...
Managing Uncertainty in the Humanities: Digital and Analogue Approaches
Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...
Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...
Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Cit...
Appliying topic modeling techniques to degraded texts. Spanish historical pre...
Ad

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Classroom Observation Tools for Teachers
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Pre independence Education in Inndia.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Sports Quiz easy sports quiz sports quiz
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
GDM (1) (1).pptx small presentation for students
VCE English Exam - Section C Student Revision Booklet
PPH.pptx obstetrics and gynecology in nursing
Classroom Observation Tools for Teachers
Anesthesia in Laparoscopic Surgery in India
Pre independence Education in Inndia.pdf
Microbial disease of the cardiovascular and lymphatic systems
Cell Types and Its function , kingdom of life
Pharma ospi slides which help in ospi learning
Microbial diseases, their pathogenesis and prophylaxis
Abdominal Access Techniques with Prof. Dr. R K Mishra
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
TR - Agricultural Crops Production NC III.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Final Presentation General Medicine 03-08-2024.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Sports Quiz easy sports quiz sports quiz
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache

  • 1. Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán, José Luis Sierra Fac. Informática Universidad Complutense de Madrid
  • 2. 2 Contents Introduction The Tag-Based Browsing Model Tag-Based Browsing with Inverted Indexes Adding a Browsing Cache Conclusions and Future Work
  • 3. 3 Introduction Clavy: an experimental platform for learning object repositories with reconfiguable structures Clavy makes it possible to rearrange the hierarchical organization of elements in metadata schemata. These reconfigurations affect functionalities like learning object presentation, and browsing. In particular, although from a user’s point of view Clavy supports a guided browsing paradigm… … internally it supports more free and flexible browsing mechanisms… … able to take account of all the posible ways of browsing the repositories
  • 4. 4 Introduction Clavy browsing is internally supported by a tag- based browsing system element – value pairs are abstracted as tags The browsing system maintains: – A set of active tags – The set of filtered objects – The set of additionally selectable tags, able to further shrink, but not to vanish, the filtered objects Updating the browsing snapshot when the set of active tags changes can be computationally- intensive To mitigate the cost we proposed a strategy based on inverted indexes and a browsing cache
  • 5. 5 The Tag-Based Browsing Model Digital Collections Resources Tagging Resources Tagging r1 Cave-Painting Cantabrian Prehistoric r4 Tartesian Plateau Protohistoric r2 Cave-Painting Levant Prehistoric r5 Phoenician Penibaetic Protohistoric r3 Megalithic Cantabrian Prehistoric r6 Punic Levant Protohistoric Resources  Content of Learning objects Tags  Element-value pairs
  • 6. 6 The Tag-Based Browsing Model Browsing Browsing state: – F  Set of selected tags. – RF  Set of filtered resources. – SF  Set of selectable tags. Browsing actions: – +t  Select the tag t. – xt  Remove the tag t
  • 7. 7 Browsing with Inverted Indexes Inverted Indexes For each tag t the inverted index  returns the set of all the resources (t) tagged with t (Cave-Painting)={r1,r2} (Megalithic)={r3} (Tartesian)={r4} (Phoenician)={r5} (Punic)={r6} (Cantabrian)={r1,r3} (Levant)={r2,r6} (Plateau)={r4} (Penibaetic)={r5} (Prehistoric)={r1,r2,r3} (Protohistoric)={r4,r5,r6} Resources Tagging Resources Tagging r1 Cave-Painting Cantabrian Prehistoric r4 Tartesian Plateau Protohistoric r2 Cave-Painting Levant Prehistoric r5 Phoenician Penibaetic Protohistoric r3 Megalithic Cantabrian Prehistoric r6 Punic Levant Protohistoric Inverted index
  • 8. 8 Browsing with Inverted Indexes The Browsing Strategy +t browsing action: – F  F  {t} – RF  RF(t) – SF{t’SF-{t} | 0 < |RF(t’)| <|RF|} xt browsing action: – F  F - {t} – RF  t’F (t’) (or all the resources if F=) – SF{t’- F | 0 < |RF(t’)| <|RF|} F= is managed as a particular case: – RF   – SF  {t | |(t)| < ||}
  • 9. 9 : filtered resource store F ⟶ RF : selectable tag store F ⟶ SF : representative store RF ⟶ F Adding a Browsing Cache CACHE#5 CACHE#4 CACHE#1 CACHE#2 ()= ()= CACHE#3 ()= (t10)=R1 F (t10,t1)=R2 F (R1 F )={t10} (R2 F )={t10,t1} ()= (t10)={t1,t2,t6,t7} (t10,t1)={t6,t7} ()= (t10)=R1 F (t10,t1)=R2 F (R1 F )={t10} (R2 F )={t10,t1} ()= (t10)={t1,t2,t6,t7} (t10,t1)={t6,t7} ()= (t10)=R1 F (t10,t1)=R2 F (R1 F )={t10} (R2 F )={t10,t1} ()= (t10)={t1,t2,t6,t7} (t10,t1)={t6,t7} ()= (t10)=R1 F (t10,t1)=R2 F (t1)=R5 F (R1 F )={t10} (R2 F )={t10,t1} ()= (t10)={t1,t2,t6,t7} (t10,t1)={t6,t7} (t1)={t6,t7} CACHE#6 ()= (t10)=R1 F (R1 F )={t10} ()= (t10)={t1,t2,t6,t7} +Prehistoric CACHE#1 +Cave-Painting CACHE#2 xCave-Painting CACHE#3 xPrehistoric CACHE#4+Cave-Painting CACHE#5 {Cave-Painting} {Cantabrian, Levant}    {Prehistoric} {Cave-Painting, Megalithic, Cantabrian, Levant} {Prehistoric} {Cave-Painting, Megalithic, Cantabrian, Levant}    R1 F =R0 F   (t10) R2 F =R1 F   (t1) R5 F =R4 F   (t1) |R1 F   (t1)|=2 |R1 F   (t2)|=1 |R1 F   (t3)|=0 |R1 F   (t4)|=0 |R1 F   (t5)|=0 |R1 F   (t6)|=2 |R1 F   (t7)|=1 |R1 F   (t8)|=0 |R1 F   (t9)|=0 |R1 F   (t11)|=0 0<|R1 F (t)|<|R1 F | |R2 F   (t2)|=0 |R2 F   (t6)|=1 |R2 F   (t7)|=1 | (t1)|=2 | (t2)|=1 | (t3)|=1 | (t4)|=1 | (t5)|=1 | (t6)|=2 | (t7)|=2 | (t8)|=1 | (t9)|=1 | (t10)|=3 | (t11)|=3 |(t)|< || {Prehistoric, Cave-Painting} {Cantabrian, Levant} 0<|R2 F (t)|<|R2 F | 345 {r1,r2,r3} {r1,r2} {r1,r2,r3}{r1,r2} 0 1 2 CACHE#6
  • 10. 10 Conclusions A browsing strategy based on a suitable combination of inverted indexes and multilevel caches has been proposed to speed up the browsing process in Clavy Currently we are working on the empirical evaluation of our approach in Chasqui, a real-world repository in the Pre- Columbian American archeology field. Preliminary experiments suggest that the browsing cache can substantially speed up navigation with respect to a more basic, un-cached strategy (solely based on inverted indexes). The price to pay is the overhead generated by cache management, as well as the higher memory footprint caused by the technique. However, the experiments also make apparent how: (i) the cache management overhead is compensated by eliminating the explicit computation of the information associated to many browsing states, and (ii) the cache size is maintained within reasonable ranges, even when it is not upper-bounded.
  • 11. 11 Future Work To improve the cache strategy by combining it with our previous work on navigation automata. To generalize the browsing strategy to support navigation through links among resources. To combine browsing and search, letting users browse search results according to the browsing model described.
  • 12. Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing Cache Joaquín Gayoso-Cabada, Mercedes Gómez-Albarrán, José Luis Sierra Fac. Informática Universidad Complutense de Madrid