SlideShare a Scribd company logo
Improved Discoverability of Digital
Objects in Institutional Repositories
Using Controlled Vocabularies
University of Zambia
Lusaka, ZAMBIA
Bertha Chipangila · Eric Liswaniso · Andrew Mawila
Philomena Mwanza · Daisy Nawila · Robert M’sendo
Mayumbo Nyirenda · Lighton Phiri
ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
September 27–30, 2021
2/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
About Us (1/2)
3/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
About Us (2/2)
● The DataLab research group at
The University of Zambia is
composed of faculty staff and
students—undergraduate and
postgraduate—working in three
main areas
○ Data Mining
○ Digital Libraries
○ Technology-Enhanced Learning
http://datalab.unza.zm
4/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Outline
● Motivation
● Problem Statement
● Methodology
● Results and Discussion
● Conclusion and Future Work
5/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
There is an Ever Increasing Amount of
Scholarly Research Generated
https://scholar.google.com
6/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
There is an Ever Increasing Amount of
Scholarly Research Generated
https://academic.microsoft.com
7/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Discoverability Services Facilitate Findability
of Scholarly Research in IRs
http://open.uct.ac.za
http://dspace.unza.zm
8/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Problem Statement
● There are numerous
inconsistencies with
digital object metadata
elements used to
describe subjects
○ Lack of use of controlled
vocabularies and subject
headings compromises
effective searching and
browsing of scholarly
research
9/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Problem Statement
● There are numerous
inconsistencies with
digital object metadata
elements used to
describe subjects
○ Lack of use of controlled
vocabularies and subject
headings compromises
effective searching and
browsing of scholarly
research
10/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Problem Statement
● There are numerous
inconsistencies with
digital object metadata
elements used to
describe subjects
○ Lack of use of controlled
vocabularies and subject
headings compromises
effective searching and
browsing of scholarly
research
11/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Methodology
● Situational analysis to determine the implications of non-use
of controlled vocabularies
● Identification of subject-specific controlled vocabularies for
various disciplines
● Usability study of IRs integrated with controlled vocabularies
when compared with IRs without controlled vocabularies
● Implementation of multi-label subject classification model
for classifying ACM CCS concepts and arXiv subjects
12/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Methodology: Situational Analysis
● Dublin Core encoded
metadata harvested
from three repositories
○ NDLTD Union Catalog
○ University of Cape
Town Computer
Science Document
Archive
○ University of Zambia
Institutional
Repository
13/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Methodology: Identification of Subjects and
Usability Study
● 7 faculty interviewed to
determine appropriate
controlled vocabularies
● DSpace-powered IRs
set-up to conduct
controlled comparative
study
○ IR #1: LCSH subjects
○ IR #2: No subjects
○ System Usability Scale
used to assess usability
14/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Methodology: Identification of Subjects and
Usability Study
● 7 faculty interviewed to
determine appropriate
controlled vocabularies
● DSpace-powered IRs
set-up to conduct
controlled comparative
study
○ IR #1: LCSH subjects
○ IR #2: No subjects
○ System Usability Scale
used to assess usability
● Within subject SUS study
conducted with 50
participants
15/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Methodology: Multi-label Subject Classifier
Title + Abstract
F1 Score Hamming
Loss
Jaccard
Score
SGDClassifier TF-IDF 0.540 0.005 0.431
[...] [...] [...] [...] [...]
● Multi-label subject classification model implemented using
arXiv CoRR dataset and validated using the UCT@ CS
Document archive
○ Titles and Abstracts used as input features
16/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Situational Analysis
(2/2)
● Analysis 1. Metadata preparation
and ingestion workflow based on
internal policy
● Analysis 2. Subject heading usage
is sparing. 92.1% of tags only
associated with one publication
● Analysis 3. Domain-specific
subject headings are not used.
Internally devised LCSH used
17/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Situational Analysis
(1/2)
18/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Situational Analysis
(1/2)
● Incidentally, the
problem manifests
itself in other
repositories and
downstream services
19/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Comparative
Analysis (1/2)
20/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Comparative
Analysis (1/2)
● SUS average scores
○ [66.2] Baseline
○ [68.9] Intervention
21/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Comparative
Analysis (2/2)
22/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Comparative
Analysis (2/2)
23/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Multi-label Subject
Classification Model—Implementation
Title + Abstract
F1 Score Hamming
Loss
Jaccard
Score
SGDClassifier TF-IDF 0.540 0.005 0.431
[...] [...] [...] [...] [...]
● Approaches used: Binary Relevance, Classifier Chains and
One-Versus-Rest
● Estimators: MultinomialNB vs SGDClassifier vs RandomForest
● Features: TF vs TF-IDF; Title vs Abstract vs Title + Abstract
24/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Multi-label Subject
Classification Model—Validation
● Model evaluated using CS
subject repository with
self-archiving
implemented
25/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Results and Discussion: Multi-label Subject
Classification Model—Demonstration
C.2.4 · D.2.11 · F.1.1 · H.3.4 · H.3.5 ·
H.5.2
Computer Science - Artificial
Intelligence · Computer Science -
Computation and Language ·
Computer Science - General
Literature · Computer Science -
Human-Computer Interaction
● Six (6) arXiv and four (4)
ACM CCS subject predicted
by model
26/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Conclusions and Future Work
● Integrating IRs with subject controlled vocabularies can
potentially complement self-archiving and, additionally, has
the benefit ensuring that IRs are usable and effective
● Potential future work and/or direction
○ Metadata cleaning, enhancement and augmentation of existing
descriptive metadata
○ Implementation of subject classification models for other
subject-specific controlled vocabularies
○ Automatic generation of subject classes for large-scale
repositories such as the NDLTD Union Catalog
27/30
September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
Q & A Session
● Comments, concerns and complaints?
[1] Phiri, L. (2018). Research Visibility in the Global South: Towards
Increased Online Visibility of Scholarly Research Output in
Zambia. IEEE International Conference in Information and
Communication Technologies.
[2] Phiri, L. (2020). A Multi-Faceted Multi-Stakeholder Approach for
Increased Visibility of ETDs in Zambia. Cadernos BAD, (1).
DOI: 10.1017/S0269888910000032
[3] Phiri, L. (2020). Automatic classification of digital objects for
improved metadata quality of electronic theses and dissertations
in institutional repositories. International Journal of Metadata,
Semantics and Ontologies, 14(3), 234-248.
DOI: 10.1504/IJMSO.2020.112804
Bibliography
lighton.phiri@unza.zm
http://datalab.unza.zm
http://lis.unza.zm/~lightonphiri
Improved Discoverability of Digital
Objects in Institutional Repositories
Using Controlled Vocabularies
University of Zambia
Lusaka, ZAMBIA
Bertha Chipangila · Eric Liswaniso · Andrew Mawila
Philomena Mwanza · Daisy Nawila · Robert M’sendo
Mayumbo Nyirenda · Lighton Phiri
ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021)
September 27–30, 2021

More Related Content

PDF
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
PDF
Using Machine Learning Techniques for Solving Locally Relevant Problems
PDF
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
PDF
PPT
Human Networking: a University, High School & Industry Partnership
PPTX
D. Dluznevskij. YOLOv5 efektyvumo tyrimas „iPhone“ palaikomose sistemose
PDF
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
PDF
Determining the Credibility of Science Communication
Effective Ingestion of Digital Objects in Institutional Repositories Using Su...
Using Machine Learning Techniques for Solving Locally Relevant Problems
Discovering Insight from Scholarly Research Output in Higher Educational Inst...
Human Networking: a University, High School & Industry Partnership
D. Dluznevskij. YOLOv5 efektyvumo tyrimas „iPhone“ palaikomose sistemose
Empirical Evaluation of ETD-ms Compliance for ETDs Harvested by the NDLTD Uni...
Determining the Credibility of Science Communication

Similar to Improved Discoverability of Digital Objects in Institutional Repositories Using Controlled Vocabularies (20)

PPTX
Ppt tale kn_intro_final
PPTX
Smithies bodleian 2017_v.2.0
PPTX
Growing the Knowledge Tree: Core concepts, methods, outcomes, and tools
PDF
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
PPT
Scottish UPA Meeting 20/04/10
PDF
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
PPTX
EADTU 2018 conference MECA project
PPTX
drc 3-2.pptx
PPTX
20200408_210832.pptx
PDF
PDF
An enhanced domain ontology model of database course in computing curricula
PDF
Managing and Testing Ensembles of IoT, Network functions, and Clouds
PPTX
Shamane-PhD-Defence-Final.pptx
PPTX
ENVRIPLUS Data for Science Theme
PDF
Session 3: Vocabulary enrichment, Gerda Koch
PDF
[MM2023] Ducho: A Unified Framework for the Extraction of Multimodal Features...
PDF
Accelerator Programming Using Directives 8th International Workshop Waccpd 20...
PDF
Towards a Knowledge Graph for a Research Group with Focus on Qualitative Anal...
PDF
A history of clu
Ppt tale kn_intro_final
Smithies bodleian 2017_v.2.0
Growing the Knowledge Tree: Core concepts, methods, outcomes, and tools
Factors Influencing Co-Creation of Open Education Resources Using Learning Ob...
Scottish UPA Meeting 20/04/10
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
EADTU 2018 conference MECA project
drc 3-2.pptx
20200408_210832.pptx
An enhanced domain ontology model of database course in computing curricula
Managing and Testing Ensembles of IoT, Network functions, and Clouds
Shamane-PhD-Defence-Final.pptx
ENVRIPLUS Data for Science Theme
Session 3: Vocabulary enrichment, Gerda Koch
[MM2023] Ducho: A Unified Framework for the Extraction of Multimodal Features...
Accelerator Programming Using Directives 8th International Workshop Waccpd 20...
Towards a Knowledge Graph for a Research Group with Focus on Qualitative Anal...
A history of clu
Ad

More from Lighton Phiri (20)

PDF
Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
PDF
User Centred Design and Implementation of Useful Picture Archiving and Commun...
PDF
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
PDF
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
PDF
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
PDF
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
PDF
OJS Training: Users and User Roles
PDF
OJS Training: Journal Settings and Configuration
PDF
OJS Training: Managing The Submission Process
PDF
OJS Training: Creating and Managing Journal Issues
PDF
Institutional Repository Single Sources of Truth
PDF
Improved Scholarly Communication Using Machine Learning
PDF
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
PDF
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
PDF
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
PDF
Post PhD Transition Experience: Successes and Challenges
PDF
Technology-Enhanced Learning for Improved Quality of Teaching and Learning
PDF
Research Visibility in the Global South: Towards Increased Online Visibility...
PDF
Ph.D Research Proposal: Software Tools for Orchestration
PDF
Research Visibility in the Global South: Towards Increased Online Visibility ...
Enterprise Medical Imaging for Streamlined Radiological Diagnosis in Zambian...
User Centred Design and Implementation of Useful Picture Archiving and Commun...
Enterprise Medical Imaging for Improved Radiological Workflows in Zambian Pub...
Enterprise Medical Imaging in Public Health Facilities in Zambia: Towards a U...
Enterprise Medical Imaging in the Global South: Challenges and Opportunities
DRGS OJS Training: Electronic Publishing Using Open Journal Systems
OJS Training: Users and User Roles
OJS Training: Journal Settings and Configuration
OJS Training: Managing The Submission Process
OJS Training: Creating and Managing Journal Issues
Institutional Repository Single Sources of Truth
Improved Scholarly Communication Using Machine Learning
Open Access Electronic Publishing for Increased Online Visibility: Tooling Ch...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs i...
Post PhD Transition Experience: Successes and Challenges
Technology-Enhanced Learning for Improved Quality of Teaching and Learning
Research Visibility in the Global South: Towards Increased Online Visibility...
Ph.D Research Proposal: Software Tools for Orchestration
Research Visibility in the Global South: Towards Increased Online Visibility ...
Ad

Recently uploaded (20)

PDF
Sports Quiz easy sports quiz sports quiz
PDF
Computing-Curriculum for Schools in Ghana
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Lesson notes of climatology university.
PDF
01-Introduction-to-Information-Management.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Institutional Correction lecture only . . .
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
GDM (1) (1).pptx small presentation for students
Sports Quiz easy sports quiz sports quiz
Computing-Curriculum for Schools in Ghana
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Renaissance Architecture: A Journey from Faith to Humanism
O7-L3 Supply Chain Operations - ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
2.FourierTransform-ShortQuestionswithAnswers.pdf
Complications of Minimal Access Surgery at WLH
Lesson notes of climatology university.
01-Introduction-to-Information-Management.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Institutional Correction lecture only . . .
Anesthesia in Laparoscopic Surgery in India
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
102 student loan defaulters named and shamed – Is someone you know on the list?
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Supply Chain Operations Speaking Notes -ICLT Program
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
GDM (1) (1).pptx small presentation for students

Improved Discoverability of Digital Objects in Institutional Repositories Using Controlled Vocabularies

  • 1. Improved Discoverability of Digital Objects in Institutional Repositories Using Controlled Vocabularies University of Zambia Lusaka, ZAMBIA Bertha Chipangila · Eric Liswaniso · Andrew Mawila Philomena Mwanza · Daisy Nawila · Robert M’sendo Mayumbo Nyirenda · Lighton Phiri ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) September 27–30, 2021
  • 2. 2/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) About Us (1/2)
  • 3. 3/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) About Us (2/2) ● The DataLab research group at The University of Zambia is composed of faculty staff and students—undergraduate and postgraduate—working in three main areas ○ Data Mining ○ Digital Libraries ○ Technology-Enhanced Learning http://datalab.unza.zm
  • 4. 4/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Outline ● Motivation ● Problem Statement ● Methodology ● Results and Discussion ● Conclusion and Future Work
  • 5. 5/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) There is an Ever Increasing Amount of Scholarly Research Generated https://scholar.google.com
  • 6. 6/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) There is an Ever Increasing Amount of Scholarly Research Generated https://academic.microsoft.com
  • 7. 7/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Discoverability Services Facilitate Findability of Scholarly Research in IRs http://open.uct.ac.za http://dspace.unza.zm
  • 8. 8/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Problem Statement ● There are numerous inconsistencies with digital object metadata elements used to describe subjects ○ Lack of use of controlled vocabularies and subject headings compromises effective searching and browsing of scholarly research
  • 9. 9/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Problem Statement ● There are numerous inconsistencies with digital object metadata elements used to describe subjects ○ Lack of use of controlled vocabularies and subject headings compromises effective searching and browsing of scholarly research
  • 10. 10/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Problem Statement ● There are numerous inconsistencies with digital object metadata elements used to describe subjects ○ Lack of use of controlled vocabularies and subject headings compromises effective searching and browsing of scholarly research
  • 11. 11/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Methodology ● Situational analysis to determine the implications of non-use of controlled vocabularies ● Identification of subject-specific controlled vocabularies for various disciplines ● Usability study of IRs integrated with controlled vocabularies when compared with IRs without controlled vocabularies ● Implementation of multi-label subject classification model for classifying ACM CCS concepts and arXiv subjects
  • 12. 12/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Methodology: Situational Analysis ● Dublin Core encoded metadata harvested from three repositories ○ NDLTD Union Catalog ○ University of Cape Town Computer Science Document Archive ○ University of Zambia Institutional Repository
  • 13. 13/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Methodology: Identification of Subjects and Usability Study ● 7 faculty interviewed to determine appropriate controlled vocabularies ● DSpace-powered IRs set-up to conduct controlled comparative study ○ IR #1: LCSH subjects ○ IR #2: No subjects ○ System Usability Scale used to assess usability
  • 14. 14/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Methodology: Identification of Subjects and Usability Study ● 7 faculty interviewed to determine appropriate controlled vocabularies ● DSpace-powered IRs set-up to conduct controlled comparative study ○ IR #1: LCSH subjects ○ IR #2: No subjects ○ System Usability Scale used to assess usability ● Within subject SUS study conducted with 50 participants
  • 15. 15/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Methodology: Multi-label Subject Classifier Title + Abstract F1 Score Hamming Loss Jaccard Score SGDClassifier TF-IDF 0.540 0.005 0.431 [...] [...] [...] [...] [...] ● Multi-label subject classification model implemented using arXiv CoRR dataset and validated using the UCT@ CS Document archive ○ Titles and Abstracts used as input features
  • 16. 16/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Situational Analysis (2/2) ● Analysis 1. Metadata preparation and ingestion workflow based on internal policy ● Analysis 2. Subject heading usage is sparing. 92.1% of tags only associated with one publication ● Analysis 3. Domain-specific subject headings are not used. Internally devised LCSH used
  • 17. 17/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Situational Analysis (1/2)
  • 18. 18/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Situational Analysis (1/2) ● Incidentally, the problem manifests itself in other repositories and downstream services
  • 19. 19/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Comparative Analysis (1/2)
  • 20. 20/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Comparative Analysis (1/2) ● SUS average scores ○ [66.2] Baseline ○ [68.9] Intervention
  • 21. 21/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Comparative Analysis (2/2)
  • 22. 22/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Comparative Analysis (2/2)
  • 23. 23/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Multi-label Subject Classification Model—Implementation Title + Abstract F1 Score Hamming Loss Jaccard Score SGDClassifier TF-IDF 0.540 0.005 0.431 [...] [...] [...] [...] [...] ● Approaches used: Binary Relevance, Classifier Chains and One-Versus-Rest ● Estimators: MultinomialNB vs SGDClassifier vs RandomForest ● Features: TF vs TF-IDF; Title vs Abstract vs Title + Abstract
  • 24. 24/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Multi-label Subject Classification Model—Validation ● Model evaluated using CS subject repository with self-archiving implemented
  • 25. 25/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Results and Discussion: Multi-label Subject Classification Model—Demonstration C.2.4 · D.2.11 · F.1.1 · H.3.4 · H.3.5 · H.5.2 Computer Science - Artificial Intelligence · Computer Science - Computation and Language · Computer Science - General Literature · Computer Science - Human-Computer Interaction ● Six (6) arXiv and four (4) ACM CCS subject predicted by model
  • 26. 26/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Conclusions and Future Work ● Integrating IRs with subject controlled vocabularies can potentially complement self-archiving and, additionally, has the benefit ensuring that IRs are usable and effective ● Potential future work and/or direction ○ Metadata cleaning, enhancement and augmentation of existing descriptive metadata ○ Implementation of subject classification models for other subject-specific controlled vocabularies ○ Automatic generation of subject classes for large-scale repositories such as the NDLTD Union Catalog
  • 27. 27/30 September 27–30 , 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) Q & A Session ● Comments, concerns and complaints?
  • 28. [1] Phiri, L. (2018). Research Visibility in the Global South: Towards Increased Online Visibility of Scholarly Research Output in Zambia. IEEE International Conference in Information and Communication Technologies. [2] Phiri, L. (2020). A Multi-Faceted Multi-Stakeholder Approach for Increased Visibility of ETDs in Zambia. Cadernos BAD, (1). DOI: 10.1017/S0269888910000032 [3] Phiri, L. (2020). Automatic classification of digital objects for improved metadata quality of electronic theses and dissertations in institutional repositories. International Journal of Metadata, Semantics and Ontologies, 14(3), 234-248. DOI: 10.1504/IJMSO.2020.112804 Bibliography
  • 30. Improved Discoverability of Digital Objects in Institutional Repositories Using Controlled Vocabularies University of Zambia Lusaka, ZAMBIA Bertha Chipangila · Eric Liswaniso · Andrew Mawila Philomena Mwanza · Daisy Nawila · Robert M’sendo Mayumbo Nyirenda · Lighton Phiri ACM/IEEE Joint Conference on Digital Libraries (JCDL 2021) September 27–30, 2021