SlideShare a Scribd company logo
Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 ,  Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF.  Read the TexPoint manual before you delete this box.:  A A A A A A A A A A A
Let it grow! Evidence for  Emergent Semantics  within Social Applications Meaning  of tags can be captured at different stages    Can we find indicators of „semantic maturity“?
The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Social Bookmarking & Emergent Semantics Maturity Indicators Mining Maturity Profiles Evaluation
Social Tagging Social tagging:  Simple and intuitive way to organize all kinds of resources Uncontrolled  vocabulary:  Tags are „just  strings “ Formal model:  Folksonomy   F = (U, T, R, Y) Users   U,  Tags  T,  Resources   R Tag assignments   Y    (U  T  R)   2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Alice Bob iswc.org bonn.de semantics conference travel
Capturing Tag Semantics Co-occurrence distribution    „ semantic fingerprint “ Capture Semantic Relatedness / Synonyms by  Cosine Similarity 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 with Cattuto et al: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (ISWC 2008)
Semantic Grounding Compute  Folksonomy -based  Relatedness (via Context Vectors) Sim(  ,  ) = 0.74 WordNet  Synset  Taxonomy map Grounded similarity Sim True (  ,  ) = 0.59 (we used Jiang-Conrath dist.)
Appendix: Music Genre Taxonomy learned from last.fm Music Genre Taxonomy  learned from last.fm
The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Cooccurrence Fingerprints capture tag semantics Maturity Indicators Mining Maturity Profiles Evaluation
Frequency Intuition: „the more often used,  the more mature“ Resource frequency User frequency Maturity Indicators (1) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
Centrality "Importance" within co-occurrence network G = (V,E) Intuition: the more important, the more mature Degree, Closeness, Betweenness Maturity Indicators (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Mining Maturity Profiles Evaluation
Pattern Mining using Subgroup Discovery In a nutshell: "Find  descriptions  of subsets in the data, that  differ  significantly for the total population with respect to a  target concept . Pattern: Conjunctive description using tag properties representation as indicator rule: description    target (with probability) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
Finding Maturity Patterns    Which patterns maximize target variable? 0.7 0.05 0.1 0.02 0.2 0.3 python 0.3 0.3 0.25 0.2 0.4 0.4 web2.0 0.6 0.43 0.7 0.6 0.7 0.8 web 0.5 0.35 0.3 0.4 0.2 0.1 semantic 1.0 0.2 0.12 0.1 0.1 0.4 games 1.0 0.02 0.04 0.01 0.3 0.2 game 0.8 0.4 0.1 0.2 0.9 0.13 java TARGET Clos Bet Deg Rfreq Ufreq Tag
Mining Maturity Profiles – Target Variables First: Compute most related tag t sim   for each tag  t (resource context) WordNet Synonym Identification ( SYN ) Binary Target Variable True if t sim  is synonym of t, False otherwise Grounded WordNet Maturity ( MAT ) Binary Target Variable  Based on taxonomic shortest path length True if sim(t sim ,t) > 0.5, false otherwise 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
Pattern Mining - Algorithm Patterns similar to association rules BUT: Fixed target concept (of interest), i.e., high maturity Pattern mining – k best approach Through space of descriptions (conjunctions of features) Maximizing quality function,  e.g., increase in target mean/share Several efficient algorithms, we apply exhaustive one. 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Evaluation Discover indicator subgroups which maximize maturity target
Social Bookmarking Data Folksonomy  crawled from Delicious in 2006, restricted to top 10,000 tags 476,378 users 10,000 tags 12,660,470 resources 101,491,722 tag assignments Preprocessing & Filtering: Filter tags without sufficiently similar partner (cos < 0.05) Limit to tags with only one sense in WordNet Nr. of finally considered tags:  1944 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
Direct Correlation to Target Variables No significant correlation of  individual  indicator with maturity target Eventually higher correlation by  combination  of indicators    consider subgroups 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 0.15 0.15 0.13 0.14 0.12 SYN 0.12 0.15 0.12 0.14 0.09 MAT Ufreq Rfreq Deg Clos Bet
Results: Exemplary Patterns (1) Target :  Synonym Identification  (SYN) ; mean = 0.13 Small groups with highest maturity (measured by increase of synonym discovery rate) Larger group: degree centrality + user frequency Synonmym discovery rate 128 % higher than for all tags 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
Results: Exemplary Patterns (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Target :  WordNet Maturity  (MAT) ; mean = 0.59
Discussion & Implications In general:  centrality  and  frequency  useful to assess maturity Combined evidence  of indicators leads to higher-quality patterns Subgroup discovery  generally useful technique Open issues: Further maturity indicators? Alternative notions of maturity? Temporal aspects Mining of „immaturity“ … 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18
The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011  / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Combined evidence of indicators leads to higher-quality patterns Discover indicator subgroups which maximize maturity target Thanks! [email_address]
Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 ,  Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF.  Read the TexPoint manual before you delete this box.:  A A A A A A A A A A A

More Related Content

PPT
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
PDF
Evolving Swings (topics) from Social Streams using Probability Model
PPT
Aspects of broad folksonomies
PPTX
Slam about "Discrimination and Inequalities in socio-computational systems"
PPTX
Eswc2013 audience short
PDF
Datascience Introduction WebSci Summer School 2014
PPT
Interlinking Online Communities and Enriching Social Software with the Semant...
PDF
Measuring Gender Inequality in Wikipedia
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Evolving Swings (topics) from Social Streams using Probability Model
Aspects of broad folksonomies
Slam about "Discrimination and Inequalities in socio-computational systems"
Eswc2013 audience short
Datascience Introduction WebSci Summer School 2014
Interlinking Online Communities and Enriching Social Software with the Semant...
Measuring Gender Inequality in Wikipedia

Similar to Towards Mining Semantic Maturity in Social Bookmarking Systems (20)

PPS
Semantic Web in Action: Ontology-driven information search, integration and a...
PPT
One Tag to bind them all: Measuring Term abstractness in Social Metadata
PPTX
Improving Personal Tagging Consistency Through Visualization Of Tag
PDF
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
PDF
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
PDF
PhD defense : Multi-points of view semantic enrichment of folksonomies
PDF
Data triangulation on newspapers articles using different softwarei
PDF
Improving Tag Clouds
PDF
IRJET - Deep Collaborrative Filtering with Aspect Information
PPT
Cataloguing of learning objects using social tagging
PPT
Twente ir-course 20-10-2010
PDF
A Model For Semantic Annotation Of Environmental Resources The Tatoo Semanti...
PDF
Taxonomy extraction from automotive natural language requirements using unsup...
PDF
Detection and Privacy Preservation of Sensitive Attributes Using Hybrid Appro...
PDF
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
ODP
Sustainability
PDF
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
PDF
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
PDF
The Revolution Of Cloud Computing
PPT
How tagging pragmatics influence Tag Sense Discovery in Social Annotation Sys...
Semantic Web in Action: Ontology-driven information search, integration and a...
One Tag to bind them all: Measuring Term abstractness in Social Metadata
Improving Personal Tagging Consistency Through Visualization Of Tag
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
PhD defense : Multi-points of view semantic enrichment of folksonomies
Data triangulation on newspapers articles using different softwarei
Improving Tag Clouds
IRJET - Deep Collaborrative Filtering with Aspect Information
Cataloguing of learning objects using social tagging
Twente ir-course 20-10-2010
A Model For Semantic Annotation Of Environmental Resources The Tatoo Semanti...
Taxonomy extraction from automotive natural language requirements using unsup...
Detection and Privacy Preservation of Sensitive Attributes Using Hybrid Appro...
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Sustainability
Epistenet: Facilitating Programmatic Access & Processing of Semantically Rela...
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
The Revolution Of Cloud Computing
How tagging pragmatics influence Tag Sense Discovery in Social Annotation Sys...
Ad

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
sap open course for s4hana steps from ECC to s4
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Review of recent advances in non-invasive hemoglobin estimation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Ad

Towards Mining Semantic Maturity in Social Bookmarking Systems

  • 1. Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A
  • 2. Let it grow! Evidence for Emergent Semantics within Social Applications Meaning of tags can be captured at different stages  Can we find indicators of „semantic maturity“?
  • 3. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Social Bookmarking & Emergent Semantics Maturity Indicators Mining Maturity Profiles Evaluation
  • 4. Social Tagging Social tagging: Simple and intuitive way to organize all kinds of resources Uncontrolled vocabulary: Tags are „just strings “ Formal model: Folksonomy F = (U, T, R, Y) Users U, Tags T, Resources R Tag assignments Y  (U  T  R) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Alice Bob iswc.org bonn.de semantics conference travel
  • 5. Capturing Tag Semantics Co-occurrence distribution  „ semantic fingerprint “ Capture Semantic Relatedness / Synonyms by Cosine Similarity 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 with Cattuto et al: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems (ISWC 2008)
  • 6. Semantic Grounding Compute Folksonomy -based Relatedness (via Context Vectors) Sim( , ) = 0.74 WordNet Synset Taxonomy map Grounded similarity Sim True ( , ) = 0.59 (we used Jiang-Conrath dist.)
  • 7. Appendix: Music Genre Taxonomy learned from last.fm Music Genre Taxonomy learned from last.fm
  • 8. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Maturity Indicators Mining Maturity Profiles Evaluation
  • 9. Frequency Intuition: „the more often used, the more mature“ Resource frequency User frequency Maturity Indicators (1) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 10. Centrality &quot;Importance&quot; within co-occurrence network G = (V,E) Intuition: the more important, the more mature Degree, Closeness, Betweenness Maturity Indicators (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 11. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Mining Maturity Profiles Evaluation
  • 12. Pattern Mining using Subgroup Discovery In a nutshell: &quot;Find descriptions of subsets in the data, that differ significantly for the total population with respect to a target concept . Pattern: Conjunctive description using tag properties representation as indicator rule: description  target (with probability) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 13. Finding Maturity Patterns  Which patterns maximize target variable? 0.7 0.05 0.1 0.02 0.2 0.3 python 0.3 0.3 0.25 0.2 0.4 0.4 web2.0 0.6 0.43 0.7 0.6 0.7 0.8 web 0.5 0.35 0.3 0.4 0.2 0.1 semantic 1.0 0.2 0.12 0.1 0.1 0.4 games 1.0 0.02 0.04 0.01 0.3 0.2 game 0.8 0.4 0.1 0.2 0.9 0.13 java TARGET Clos Bet Deg Rfreq Ufreq Tag
  • 14. Mining Maturity Profiles – Target Variables First: Compute most related tag t sim for each tag t (resource context) WordNet Synonym Identification ( SYN ) Binary Target Variable True if t sim is synonym of t, False otherwise Grounded WordNet Maturity ( MAT ) Binary Target Variable Based on taxonomic shortest path length True if sim(t sim ,t) > 0.5, false otherwise 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 15. Pattern Mining - Algorithm Patterns similar to association rules BUT: Fixed target concept (of interest), i.e., high maturity Pattern mining – k best approach Through space of descriptions (conjunctions of features) Maximizing quality function, e.g., increase in target mean/share Several efficient algorithms, we apply exhaustive one. 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 16. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Evaluation Discover indicator subgroups which maximize maturity target
  • 17. Social Bookmarking Data Folksonomy crawled from Delicious in 2006, restricted to top 10,000 tags 476,378 users 10,000 tags 12,660,470 resources 101,491,722 tag assignments Preprocessing & Filtering: Filter tags without sufficiently similar partner (cos < 0.05) Limit to tags with only one sense in WordNet Nr. of finally considered tags: 1944 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 18. Direct Correlation to Target Variables No significant correlation of individual indicator with maturity target Eventually higher correlation by combination of indicators  consider subgroups 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 0.15 0.15 0.13 0.14 0.12 SYN 0.12 0.15 0.12 0.14 0.09 MAT Ufreq Rfreq Deg Clos Bet
  • 19. Results: Exemplary Patterns (1) Target : Synonym Identification (SYN) ; mean = 0.13 Small groups with highest maturity (measured by increase of synonym discovery rate) Larger group: degree centrality + user frequency Synonmym discovery rate 128 % higher than for all tags 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 20. Results: Exemplary Patterns (2) 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Target : WordNet Maturity (MAT) ; mean = 0.59
  • 21. Discussion & Implications In general: centrality and frequency useful to assess maturity Combined evidence of indicators leads to higher-quality patterns Subgroup discovery generally useful technique Open issues: Further maturity indicators? Alternative notions of maturity? Temporal aspects Mining of „immaturity“ … 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18
  • 22. The Story 2011/10/23 Atzmueller et al: Towards Mining Semantic Maturity, SDOW 2011 / 18 Cooccurrence Fingerprints capture tag semantics Frequency / Centrality Properties as maturity indicators Combined evidence of indicators leads to higher-quality patterns Discover indicator subgroups which maximize maturity target Thanks! [email_address]
  • 23. Towards Mining Semantic Maturity in Social Bookmarking Systems Martin Atzmueller 1 , Dominik Benz 1 , Andreas Hotho 2 , Gerd Stumme 1 1 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 2 Data Mining and Information Retrieval Group University of Würzburg, Germany TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A