SlideShare a Scribd company logo
Union Catalog and Knowledge
Engineering for TELDAP
Keh-Jiann Chen
Principal Investigator
Core Platforms for Digital Contents Project, TELDAP
Research Fellow
Research Center for Information Technology Innovation &
Institute of Information Science, Academia Sinica
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
 Introduction
The integration and management of digital
contents has become an important issue as
the amount of digital contents produced from
different projects and institutions increases
rapidly.
The goal of our project is to achieve
optimized preservation, retrieval, and
presentation of digital collections.
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
What is the union catalog ?
• It is a catalog and portal for all digital collections of
TELDAP.
• It is an integrated platform for browsing and searching
entire digital contents of TELDAP.
• Metadata provides core descriptions and licensing
information of each digital collection.
Browsing by topics
Search by keywords
Home Page of Union Catalog
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
 Metadata models for different
types of objects
Archived digital items
• Union catalog metadata model- Dublin core+
Web sites
• DCCAP (Dublin Core Collections Application Profile)
• Fields for internal used only
― Unique Identifier, Format, Evaluation, Cataloging History
Documents
• Document metadata-Dublin core
9
Metadata for
digital items :
Over 3 million
digital items and
still increasing
Element Definition
Title A name given to the resource
Creator An entity primarily responsible for making the
content of the resource
Subject and Keywords The topic of the content of the resource
Description An account of the content of the resource
Publisher An entity responsible for making the resource
available
Contributor An entity responsible for making contributions to the
content of the resource
Date A date associated with an event in the life cycle of
the resource
Resource Type The nature or genre of the content of the resource
Format The physical or digital manifestation of the resource
Resource Identifier An unambiguous reference to the resource within a
given context
Source A Reference to a resource from which the present
resource is derived
Language A language of the intellectual content of the
resource
Relation A reference to a related resource
Coverage The extent or scope of the content of the resource
Rights Management Information about rights held in and over the
resource
10
Metadata for websites
Over 500 websites and still increasing
Metadata
• DCCAP (Dublin Core Collections Application
Profile)
• Total of 19 data fields
The Website Homepage Picture
URL, Project Information
Type, Name, Author, Subject,
Description, Language,
Item Type, Target
Archived Information:
URL, time, authorization
Copyright, Purpose, Other Information
Figure: http://digitalarchives.tw
Metadata for
websites
Dynamic categorization
• User-oriented categorization
– General, elementary school students, high school
students, researchers, …etc.
• Topical-based categorization
– Archaeology, painting, animal, plant, document, …
etc.
• Functional-based categorization
– Research, education, business, technology,…
• Categorization based on institutions
– Academia Sinica, Taiwan U., Palace museum,…
Purpose: Education
Target: Elementary school student,
Junior high school student,
Teacher…
Purpose: Creative applications
Purpose: Academic research
Subject: Animal, Archaeology,
Anthropology…
Figure: http://digitalarchives.tw
Digitalarchives.tw
Metadata for project documents
Over 14,000 documents and still increasing
Metadata- Dublin core
Construct Teldapwiki- A Wikipedia for
TELDAP http://wiki.teldap.tw/
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
Plans of making knowledge
structures for TELDAP
• Construct metadata models for different objects.
• Establish hyperlinks between contexts and
objects.
– Develop keyword extraction tools.
– Design automatic hyperlink tagging tools.
• Construct TELDAP ontology and thesaurus.
– Art & Architecture Thesaurus by Getty
– Chinese WordNet
(1) Metadata models for different objects
• Digital collections
– Union catalog metadata model- Dublin core+
• Web sites
– DCCAP (Dublin Core Collections Application Profile)
– Public fields
– Private fields
 Unique Identifier, Format, Evaluation, Cataloging History
• Documents
– Document metadata-Dublin core
(2) Establish hyperlinks between contents
and objects
• Identify keywords in contents.
• Tag keywords with related object hyperlinks.
Develop hyperlink tagging tools
• Word segmentation tools
– Resolve word segmentation ambiguities and identify
keywords.
– CKIP word segmentation system:
http://ckipsvr.iis.sinica.edu.tw/
Develop hyperlink tagging tools
• TELDAP keyword dictionary
– Extract keywords from metadata and establish
object-keyword relations.
 Extract text from XML data for each object.
 The text are classified by topics, titles,
descriptions, authors, locations, eras etc.
 From each class of text file extract keywords by
automatic word segmentation, keyword
extraction, and manual post editing.
– Current dictionary contains more than 50,000
Keywords.
Prototype system for hyperlink tagger
• Identify and select keywords from the input text
Prototype system for hyperlink tagger
• Produce text with keywords and hyperlinks
Prototype system for hyperlink tagger
• Hyperlinks point to the related digital collections
(3) Construct TELDAP ontology and
thesaurus
Establish association links between
Chinese keywords and Getty AAT.
Merge TELDAP keywords with Chinese
AAT.
 Outline
 Introduction
 Union catalog
 Databases and metadata for
digital contents and websites
 Knowledge engineering
 Future perspective
 Future Perspective
• Technology development
– Construct multi-lingua thesauri – extend Getty AAT.
– Maintain the TELDAP keyword-and-object relation
database.
– Construct name authority files, gazetteers, and
universal calendars.
– Design hyperlink taggers and keyword extension tools.
– Design an authoring tool which provides hyperlinks of
keyword related digital contents automatically.
– Design knowledge-based content retrieval system.
 Future Perspectives
• Content enrichment
– Within TELDAP :
 Standardize object metadata model and data format.
 Provide object metadata in controlled vocabulary.
 Write scripts and stories for different topics with Wiki-like
knowledge structure.
 Enrich the digital collections.
 Establish hyperlinks between text books and TELDAP
collections.
– Extend the knowledge sources : e.g. Wikipedia
Union catalogandknowledge engineering for teldap

More Related Content

PPSX
DOMAINS OF USER STUDIES (User Studies and User Education)
PPSX
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
PPTX
ECS2019 - Managing Content Types in the Modern World
PPTX
Information retrieval 1 introduction to ir
PPT
IR and DSpace - International Seminar, Dhaka University
PDF
Mending the Gap between Library's Electronic and Print Collections in ILS and...
PPTX
Career opportunity1
PPTX
Impact of Covid-19 on Learning and Education
DOMAINS OF USER STUDIES (User Studies and User Education)
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
ECS2019 - Managing Content Types in the Modern World
Information retrieval 1 introduction to ir
IR and DSpace - International Seminar, Dhaka University
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Career opportunity1
Impact of Covid-19 on Learning and Education

What's hot (18)

PPTX
Library orientation: Resources and Finding overview
PPTX
Digital libraries & repositories
PDF
Torsten Reimer
PPT
Design and development of subject gateways with special reference to lisgateway
PPTX
Role of Cataloger in the 21st Century Academic Library
PPTX
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
PPT
Information retrieval system
PPT
Desinging a library portal madhu
PPTX
MetadataTheory: Introduction to Repositories (8th of 10)
PPT
Developments in Access to Art Information: EnCompass Digital Portal. 2003
PPTX
DRI Introductory Training: Introduction to Metadata
PPTX
Jyoti singh
PPTX
Open Science and Identifiers
PPTX
Daffodil International University Permanent Campus Library Orientation
PDF
Text Indexing and Retrieval
PPTX
Web mining
PPT
euclid_linkedup WWW tutorial (Besnik Fetahu)
PDF
Information Retrieval Methods in Libraries and Information Centers
Library orientation: Resources and Finding overview
Digital libraries & repositories
Torsten Reimer
Design and development of subject gateways with special reference to lisgateway
Role of Cataloger in the 21st Century Academic Library
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Information retrieval system
Desinging a library portal madhu
MetadataTheory: Introduction to Repositories (8th of 10)
Developments in Access to Art Information: EnCompass Digital Portal. 2003
DRI Introductory Training: Introduction to Metadata
Jyoti singh
Open Science and Identifiers
Daffodil International University Permanent Campus Library Orientation
Text Indexing and Retrieval
Web mining
euclid_linkedup WWW tutorial (Besnik Fetahu)
Information Retrieval Methods in Libraries and Information Centers
Ad

Viewers also liked (13)

PDF
PPT
Broadband 2010
PPTX
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
PPT
Mecralar
PPT
Dearborn National Dental Wellness Approach
PPT
Beauty Code Vol 17
PPT
Beauty Code June
PPT
多語言藝術與建築索引典
PPT
AAT Translation Assessment Process
PPT
Teldap4 getty multilingual vocab workshop2010
PPT
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
PDF
PPT
Aat in german
Broadband 2010
2013 PNC: A Semantic Approach to Digital Art History- Sophy Shu-Jiun Chen
Mecralar
Dearborn National Dental Wellness Approach
Beauty Code Vol 17
Beauty Code June
多語言藝術與建築索引典
AAT Translation Assessment Process
Teldap4 getty multilingual vocab workshop2010
Introduction and discussion about the AAT-Taiwan Management & Retrieval System
Aat in german
Ad

Similar to Union catalogandknowledge engineering for teldap (20)

PDF
Knowledge Engineering for TELDAP
PPTX
Lecture 4: Metadata
PPT
Guza 281 assignment_3_voice_rev
PPT
Relevance of clasification and indexing
PDF
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
PPTX
Metadata: An Overview for Digital Collections
PPT
Ontology based metadata schema for digital library projects in China
PDF
DTLee's 2008 PNC Keynote Speech
PPTX
DPLA's Archival Description Working Group Update
PPTX
Metadata enriching and filtering for enhanced collection discoverability
PDF
Handout for Applying Digital Library Metadata Standards
PPT
Aksum University digital libraries
PPTX
LIS 653 posters spring 2015
PDF
dcat: An RDF vocabulary for interoperability of data catalogues
PDF
Metadata 101
PDF
Digitalarchives X 10
PPT
Gettingstartedwithdigitalcollectionsweb[1]
PPT
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Pr...
PPTX
Metadata: a library perspective
PDF
Handout for Metadata for your Digital Collections
Knowledge Engineering for TELDAP
Lecture 4: Metadata
Guza 281 assignment_3_voice_rev
Relevance of clasification and indexing
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
Metadata: An Overview for Digital Collections
Ontology based metadata schema for digital library projects in China
DTLee's 2008 PNC Keynote Speech
DPLA's Archival Description Working Group Update
Metadata enriching and filtering for enhanced collection discoverability
Handout for Applying Digital Library Metadata Standards
Aksum University digital libraries
LIS 653 posters spring 2015
dcat: An RDF vocabulary for interoperability of data catalogues
Metadata 101
Digitalarchives X 10
Gettingstartedwithdigitalcollectionsweb[1]
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Pr...
Metadata: a library perspective
Handout for Metadata for your Digital Collections

More from AAT Taiwan (20)

PPTX
German AAT 2013
PPTX
Chile AAT 2013
PPTX
The Dutch AAT 2013
PPTX
Challenges of Developing Terminology in Two Different Cultures
PDF
2013 Sep Getty 刊物報導
PPTX
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
PPTX
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
PPTX
2011 chinese aat update
PPT
Metadata for architectural contents in europe
PPS
Te papa, collections online & thesauri
PPTX
An introduction to the name authority files in iran
PPTX
The spanish language version of the aat
PPTX
The dutch aat
PPT
Illuminating Chaos Using Semantics to Harness the Web
PPT
Introduction about AAT-Taiwan Project
PPT
(Final) cidoc 2009 chinese lang translation of the aat
PPT
(Final) cidoc 2009 chinese lang translation of the aat
PPT
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
PPT
(Final) aat taiwan system
PPT
(Final) bilingual equivalence mapping methods and issues
German AAT 2013
Chile AAT 2013
The Dutch AAT 2013
Challenges of Developing Terminology in Two Different Cultures
2013 Sep Getty 刊物報導
Generating Narratives through Timespace Data 台大數位典藏研究發展中心蔡炯民博士演講_20130605
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
2011 chinese aat update
Metadata for architectural contents in europe
Te papa, collections online & thesauri
An introduction to the name authority files in iran
The spanish language version of the aat
The dutch aat
Illuminating Chaos Using Semantics to Harness the Web
Introduction about AAT-Taiwan Project
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
(Final) contribution and creation of new concepts in the bilingual thesaurus ...
(Final) aat taiwan system
(Final) bilingual equivalence mapping methods and issues

Union catalogandknowledge engineering for teldap

  • 1. Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica
  • 2.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 3.  Introduction The integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly. The goal of our project is to achieve optimized preservation, retrieval, and presentation of digital collections.
  • 4.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 5. What is the union catalog ? • It is a catalog and portal for all digital collections of TELDAP. • It is an integrated platform for browsing and searching entire digital contents of TELDAP. • Metadata provides core descriptions and licensing information of each digital collection.
  • 6. Browsing by topics Search by keywords Home Page of Union Catalog
  • 7.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 8.  Metadata models for different types of objects Archived digital items • Union catalog metadata model- Dublin core+ Web sites • DCCAP (Dublin Core Collections Application Profile) • Fields for internal used only ― Unique Identifier, Format, Evaluation, Cataloging History Documents • Document metadata-Dublin core
  • 9. 9 Metadata for digital items : Over 3 million digital items and still increasing Element Definition Title A name given to the resource Creator An entity primarily responsible for making the content of the resource Subject and Keywords The topic of the content of the resource Description An account of the content of the resource Publisher An entity responsible for making the resource available Contributor An entity responsible for making contributions to the content of the resource Date A date associated with an event in the life cycle of the resource Resource Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Resource Identifier An unambiguous reference to the resource within a given context Source A Reference to a resource from which the present resource is derived Language A language of the intellectual content of the resource Relation A reference to a related resource Coverage The extent or scope of the content of the resource Rights Management Information about rights held in and over the resource
  • 10. 10
  • 11. Metadata for websites Over 500 websites and still increasing Metadata • DCCAP (Dublin Core Collections Application Profile) • Total of 19 data fields
  • 12. The Website Homepage Picture URL, Project Information Type, Name, Author, Subject, Description, Language, Item Type, Target Archived Information: URL, time, authorization Copyright, Purpose, Other Information Figure: http://digitalarchives.tw Metadata for websites
  • 13. Dynamic categorization • User-oriented categorization – General, elementary school students, high school students, researchers, …etc. • Topical-based categorization – Archaeology, painting, animal, plant, document, … etc. • Functional-based categorization – Research, education, business, technology,… • Categorization based on institutions – Academia Sinica, Taiwan U., Palace museum,…
  • 14. Purpose: Education Target: Elementary school student, Junior high school student, Teacher… Purpose: Creative applications Purpose: Academic research Subject: Animal, Archaeology, Anthropology… Figure: http://digitalarchives.tw Digitalarchives.tw
  • 15. Metadata for project documents Over 14,000 documents and still increasing Metadata- Dublin core Construct Teldapwiki- A Wikipedia for TELDAP http://wiki.teldap.tw/
  • 16.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 17. Plans of making knowledge structures for TELDAP • Construct metadata models for different objects. • Establish hyperlinks between contexts and objects. – Develop keyword extraction tools. – Design automatic hyperlink tagging tools. • Construct TELDAP ontology and thesaurus. – Art & Architecture Thesaurus by Getty – Chinese WordNet
  • 18. (1) Metadata models for different objects • Digital collections – Union catalog metadata model- Dublin core+ • Web sites – DCCAP (Dublin Core Collections Application Profile) – Public fields – Private fields  Unique Identifier, Format, Evaluation, Cataloging History • Documents – Document metadata-Dublin core
  • 19. (2) Establish hyperlinks between contents and objects • Identify keywords in contents. • Tag keywords with related object hyperlinks.
  • 20. Develop hyperlink tagging tools • Word segmentation tools – Resolve word segmentation ambiguities and identify keywords. – CKIP word segmentation system: http://ckipsvr.iis.sinica.edu.tw/
  • 21. Develop hyperlink tagging tools • TELDAP keyword dictionary – Extract keywords from metadata and establish object-keyword relations.  Extract text from XML data for each object.  The text are classified by topics, titles, descriptions, authors, locations, eras etc.  From each class of text file extract keywords by automatic word segmentation, keyword extraction, and manual post editing. – Current dictionary contains more than 50,000 Keywords.
  • 22. Prototype system for hyperlink tagger • Identify and select keywords from the input text
  • 23. Prototype system for hyperlink tagger • Produce text with keywords and hyperlinks
  • 24. Prototype system for hyperlink tagger • Hyperlinks point to the related digital collections
  • 25. (3) Construct TELDAP ontology and thesaurus Establish association links between Chinese keywords and Getty AAT. Merge TELDAP keywords with Chinese AAT.
  • 26.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  • 27.  Future Perspective • Technology development – Construct multi-lingua thesauri – extend Getty AAT. – Maintain the TELDAP keyword-and-object relation database. – Construct name authority files, gazetteers, and universal calendars. – Design hyperlink taggers and keyword extension tools. – Design an authoring tool which provides hyperlinks of keyword related digital contents automatically. – Design knowledge-based content retrieval system.
  • 28.  Future Perspectives • Content enrichment – Within TELDAP :  Standardize object metadata model and data format.  Provide object metadata in controlled vocabulary.  Write scripts and stories for different topics with Wiki-like knowledge structure.  Enrich the digital collections.  Establish hyperlinks between text books and TELDAP collections. – Extend the knowledge sources : e.g. Wikipedia