SlideShare a Scribd company logo
Shenghui Wang
Rob Koopman
Exploring a world of
networked information
built from free-text
metadata
OCLC Research EMEA
ELAG2015
What would you do if you are
interested in a topic?
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
Difficult to answer these questions:
• What are the different aspects of this topic?
• Are there related aspects missing in my search terms?
• Who are the most prominent authors about this topic?
• Which journals publish most about this topic?
• How have others — e.g. librarians — described and classified
this topic?
Demo
• http://thoth.pica.nl/relate?input=opac
How do we do this?
• OFFLINE: generates a semantic representation
for each entity
• ONLINE: finds the most related entities and
using multidimensional scaling to display
Build semantic representation
• Basic assumptions
– Entities can be represented by its context
– Entities which share more context are more likely
to be related
• Context is the textual environment where an
entity occurs
• The effects of state prekindergarten programs on young
children’s school readiness in five states
• [author:jung kwanghee]
• [subject:readiness for school]
Dataset
● ArticleFirst, 65 million articles
● Selected 4 million entities (topical terms,
authors, ISSNs, Dewey decimal codes)
● Represented by 1 million topical terms
But a matrix of 4M x 1M is too big to process
Dimension reduction based on Random Projection
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C
after random projection
-- Semantic matrix
Online interface
• Find mutual nearest neighbors
• Use multidimensional scaling to display
Nearest neighbors
Mutual nearest neighbors
Exploring a world of networked information built from free-text metadata
Possible applications
• Explorative interface
• Context based search:
– brain
• Journal finder
– Arctic ice journals
– http://brain.oxfordjournals.org/
• Author name disambiguation
– pre kindergarten
Context matters!
• What does “young” mean in
- AritcleFirst
- WorldCat
- Astrophysics
- Art
Ariadne
(demo) http://thoth.pica.nl/relate
• An extremely fast way of navigating large scale
hetereogeneous entities
• Generalisable to different datasets
– Full WorldCat
– Small but highly curated astrophysics dataset
• Supports explorative information retrieval and
entity disambiguation
References
• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting
Journal Similarity Based on What Has Been Published There.” In Proceedings of
Digital Libraries 2014, 483–484. London, United Kingdom. Association for
Computing Machinery. Paper, Poster
• Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne.
2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked
Information”. In CHI '15 Extended Abstracts on Human Factors in Computing
Systems. ACM, Seoul, South Korea. Paper, Poster
• Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization
of topics - browsing through terms, authors, journals and cluster allocations”. In
Proceedings of 15th International Conference on Scientometrics & Informetrics.
Istanbul, Turkey. Paper
Explore. Share. Magnify.
Thank you
Shenghui Wang
Rob Koopman
OCLC Research EMEA
shenghui.wang@oclc.org
rob.koopman@oclc.org

More Related Content

PPTX
The Evolving Scholarly Record Framing the Landscape
PPTX
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
PPTX
Describing Theses and Dissertations Using Schema.org
PDF
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
PPT
Social metadata for libraries, archives and museums: Research findings from t...
PPTX
Multilingual presentation ifla 2013 08-19
PPTX
Rightscaling, engagement, learning: reconfiguring the library for a network e...
PPTX
Libraries, collections, technology: presented at Pennylvania State University...
The Evolving Scholarly Record Framing the Landscape
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Describing Theses and Dissertations Using Schema.org
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Social metadata for libraries, archives and museums: Research findings from t...
Multilingual presentation ifla 2013 08-19
Rightscaling, engagement, learning: reconfiguring the library for a network e...
Libraries, collections, technology: presented at Pennylvania State University...

What's hot (20)

PPTX
OA in the Library Collection: The Challenge of Identifying and Managing Open ...
PPT
OCLC and the Social Web: Building tools, providing platforms, engaging the co...
PPTX
Redefining the Academic Library
PPTX
Working collaboratively: scaling infrastructure, services, learning and innov...
PPTX
OCLC Research Update at ALA Chicago. June 26, 2017.
PPTX
Collection Directions - Research collections in the network environment
PPTX
The library in the life of the user
PPT
Virtual Research Networks : Towards Research 2.0
PPTX
Thinking about technology .... differently
PPTX
Library futures: converging and diverging directions for public and academic ...
PPTX
Collections unbound: collection directions and the RLUK collective collection
PDF
IASSIT Kansa Presentation
PPTX
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
PPTX
Collection Directions: Some Reflections on Libraries and Stewardship of the ...
PPTX
Linked Data Implementations—Who, What and Why?
PPTX
The OCLC Research Library Partnership
PPTX
The SHARES Partnership, Plus Tracking Trends in ILL Cost and Transaction Data
PPTX
The facilitated collection: collections and collecting in a network environment
PPTX
From local infrastructure to engagement - thinking about the library in the l...
PPTX
Libraries: technology as artifact and technology in practice
OA in the Library Collection: The Challenge of Identifying and Managing Open ...
OCLC and the Social Web: Building tools, providing platforms, engaging the co...
Redefining the Academic Library
Working collaboratively: scaling infrastructure, services, learning and innov...
OCLC Research Update at ALA Chicago. June 26, 2017.
Collection Directions - Research collections in the network environment
The library in the life of the user
Virtual Research Networks : Towards Research 2.0
Thinking about technology .... differently
Library futures: converging and diverging directions for public and academic ...
Collections unbound: collection directions and the RLUK collective collection
IASSIT Kansa Presentation
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Collection Directions: Some Reflections on Libraries and Stewardship of the ...
Linked Data Implementations—Who, What and Why?
The OCLC Research Library Partnership
The SHARES Partnership, Plus Tracking Trends in ILL Cost and Transaction Data
The facilitated collection: collections and collecting in a network environment
From local infrastructure to engagement - thinking about the library in the l...
Libraries: technology as artifact and technology in practice
Ad

Similar to Exploring a world of networked information built from free-text metadata (20)

PPTX
Ariadne's Thread -- Exploring a world of networked information built from fre...
PPTX
Between  information  retrieval  services  and bibliometrics  research. New  ...
PPT
Wikipedia as an Ontology for Describing Documents
PDF
Big Data Palooza Talk: Aspects of Semantic Processing
PPTX
MS-Presentation-new template arid university.pptx
PPTX
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
PDF
Usage of word sense disambiguation in concept identification in ontology cons...
PPT
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
PDF
Volume 2-issue-6-2016-2020
PDF
Volume 2-issue-6-2016-2020
PDF
Probabilistic Topic models
PPT
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
PPTX
Digital Library Applications Of Social Networking
PPTX
Digital Library Applications Of Social Networking Jeju Intl Conference
PDF
IRJET - Deep Collaborrative Filtering with Aspect Information
ODT
Riding The Semantic Wave
PPTX
Semantic Similarity and Selection of Resources Published According to Linked ...
PDF
Dealing with Open Domain Data
PDF
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
PDF
Google Kernel Function
Ariadne's Thread -- Exploring a world of networked information built from fre...
Between  information  retrieval  services  and bibliometrics  research. New  ...
Wikipedia as an Ontology for Describing Documents
Big Data Palooza Talk: Aspects of Semantic Processing
MS-Presentation-new template arid university.pptx
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Usage of word sense disambiguation in concept identification in ontology cons...
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
Probabilistic Topic models
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
Digital Library Applications Of Social Networking
Digital Library Applications Of Social Networking Jeju Intl Conference
IRJET - Deep Collaborrative Filtering with Aspect Information
Riding The Semantic Wave
Semantic Similarity and Selection of Resources Published According to Linked ...
Dealing with Open Domain Data
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
Google Kernel Function
Ad

More from Shenghui Wang (12)

PDF
Non-parametric Subject Prediction
PDF
Our journey with semantic embedding
PPTX
Linking entities via semantic indexing
PDF
Semantic indexing for KOS
PDF
Contextualization of topics - browsing through terms, authors, journals and c...
PDF
Learning Concept Mappings from Instance Similarity
PDF
Measuring the dynamic bi-directional influence between content and social ne...
PDF
Similarity Features, and their Role in Concept Alignment Learning
PDF
What is concept dirft and how to measure it?
PDF
ICA Slides
PPT
ECCS 2010
PDF
Study concept drift in political ontologies
Non-parametric Subject Prediction
Our journey with semantic embedding
Linking entities via semantic indexing
Semantic indexing for KOS
Contextualization of topics - browsing through terms, authors, journals and c...
Learning Concept Mappings from Instance Similarity
Measuring the dynamic bi-directional influence between content and social ne...
Similarity Features, and their Role in Concept Alignment Learning
What is concept dirft and how to measure it?
ICA Slides
ECCS 2010
Study concept drift in political ontologies

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Exploring a world of networked information built from free-text metadata

  • 1. Shenghui Wang Rob Koopman Exploring a world of networked information built from free-text metadata OCLC Research EMEA ELAG2015
  • 2. What would you do if you are interested in a topic?
  • 5. Difficult to answer these questions: • What are the different aspects of this topic? • Are there related aspects missing in my search terms? • Who are the most prominent authors about this topic? • Which journals publish most about this topic? • How have others — e.g. librarians — described and classified this topic?
  • 7. How do we do this? • OFFLINE: generates a semantic representation for each entity • ONLINE: finds the most related entities and using multidimensional scaling to display
  • 8. Build semantic representation • Basic assumptions – Entities can be represented by its context – Entities which share more context are more likely to be related • Context is the textual environment where an entity occurs • The effects of state prekindergarten programs on young children’s school readiness in five states • [author:jung kwanghee] • [subject:readiness for school]
  • 9. Dataset ● ArticleFirst, 65 million articles ● Selected 4 million entities (topical terms, authors, ISSNs, Dewey decimal codes) ● Represented by 1 million topical terms But a matrix of 4M x 1M is too big to process
  • 10. Dimension reduction based on Random Projection C: a co-occurrence matrix R: a random matrix of +/-1 C’: approximation of C after random projection -- Semantic matrix
  • 11. Online interface • Find mutual nearest neighbors • Use multidimensional scaling to display
  • 15. Possible applications • Explorative interface • Context based search: – brain • Journal finder – Arctic ice journals – http://brain.oxfordjournals.org/ • Author name disambiguation – pre kindergarten
  • 16. Context matters! • What does “young” mean in - AritcleFirst - WorldCat - Astrophysics - Art
  • 17. Ariadne (demo) http://thoth.pica.nl/relate • An extremely fast way of navigating large scale hetereogeneous entities • Generalisable to different datasets – Full WorldCat – Small but highly curated astrophysics dataset • Supports explorative information retrieval and entity disambiguation
  • 18. References • Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting Journal Similarity Based on What Has Been Published There.” In Proceedings of Digital Libraries 2014, 483–484. London, United Kingdom. Association for Computing Machinery. Paper, Poster • Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked Information”. In CHI '15 Extended Abstracts on Human Factors in Computing Systems. ACM, Seoul, South Korea. Paper, Poster • Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization of topics - browsing through terms, authors, journals and cluster allocations”. In Proceedings of 15th International Conference on Scientometrics & Informetrics. Istanbul, Turkey. Paper
  • 19. Explore. Share. Magnify. Thank you Shenghui Wang Rob Koopman OCLC Research EMEA shenghui.wang@oclc.org rob.koopman@oclc.org

Editor's Notes

  • #7: Opac -> journal -> author -> [author:medeiros norm] -> worldcat Ambiguous names: [author:balas janet l] [author:balas j l]
  • #16: Journal finder Name disam