SlideShare a Scribd company logo
Stop thinking, start tagging: Tag Semantics arise from  Collaborative Verbosity Christian Körner 1 , Dominik Benz 2 , Andreas Hotho 3 , Markus Strohmaier 1 , Gerd Stumme 2 1 Knowledge Management Institute and Know Center, Graz University of Technology, Austria 2 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 3 Data Mining and Information Retrieval Group University of Würzburg, Germany
Where do Semantics come from? Semantically  annotated content is the „fuel“ of the  next generation  World Wide Web – but where is the petrol station? Expert-built    expensive Evidence for  emergent semantics  in Web2.0 data    Built by the crowd!    Which factors influence emergence of semantics?    Do certain users contribute more than others?
The Story Emergent Tag  Semantics Pragmatics  of tagging Semantic  Implications  of Tagging Pragmatics Conclusions
Emergent Tag Semantics tagging  is a simple and intuitive way to organize all kinds of resources uncontrolled  vocabulary, tags are „just  strings “ formal model:  folksonomy   F = (U, T, R, Y) Users   U,  Tags  T,  Resources   R Tag assignments   Y    (U  T  R)   evidence of  emergent semantics  Tag similarity measures  can   identify e.g. synonym tags  ( web2.0, web_two )
Tag Similarity Measures: Tag Context Similarity   Tag Context Similarity  is a scalable and precise tag similarity measure  [Cattuto2008,Markines2009]:  Describe each tag as a  context vector Each dimension of the vector space correspond to  another tag ; entry denotes  co-occurrence  count Compute similar tags by  cosine similarity design software blog web programming … JAVA    Will be used as indicator of emergent semantics! 50 10 1 30 5
Assessing the Quality of Tag Semantics JCN(t,t sim ) = 3.68 TagCont(t,t sim ) = 0.74 Folksonomy Tags = tag = synset WordNet Hierarchy Mapping Average JCN(t,t sim ) over all tags t: „ Quality  of semantics“
The Story Pragmatics  of tagging Semantic  Implications  of Tagging Pragmatics Conclusions Tag Similarity measures can capture  emergent tag semantics
Tagging motivation Evidence of different ways  HOW  users tag (Tagging  Pragmatics ) Broad distinction by tagging  motivation  [Strohmaier2009]: „ Categorizers “… use a small controlled tag vocabulary goal: „ontology-like“ categorization by   tags, for later browsing tags a replacement for folders „ Describers “… tag „verbously“ with freely chosen words vocabulary not necessarily consistent    (synomyms, spelling variants, …) goal: describe content, ease retrieval donuts duff marge beer bart barty Duff-beer bev alc nalc beer wine
Tagging Pragmatics: Measures How to disinguish between two types of taggers?  Intuition: Describers use open set of many tags, Categorizers use small set of controlled tags: Vocabulary size:   Tag / Resource ratio:   Average # tags per  post: high low
Tagging Pragmatics: Measures Next Intuition: Describers don‘t care about „abandoned“ tags, Categorizers do Orphan ratio:   R(t): set of resources tagged by user u with tag t high low
Tagging pragmatics: Limitations of measures Real users: no „perfect“ Categorizers / Describers, but  „mixed“ behaviour Possibly influenced by  user interfaces  / recommenders Measures are correlated But: independent of  semantics ; measures capture  usage patterns
The Story Semantic  Implications  of Tagging Pragmatics Conclusions Tag Similarity measures can capture  emergent tag semantics Measures of  tagging pragmatics  differentiate users by tagging motivation
Influence of Tagging Pragmatics on Emergent Semantics Idea: Can we learn the same (or even better) semantics from the folksonomy induced by a  subset  of describers / categorizers? Extreme Categorizers Extreme Describers Complete folksonomy Subset of 30% categorizers = user
Experimental setup Apply pragmatic measures  vocab, trr, tpp, orphan  to each user Systematically create „ sub-folksonomies “ CF i  / DF i  by subsequently adding i % of Categorizers / Describers  (i = 1,2,…,25,30,…,100) Compute  similar tags  based on each subset (TagContext Sim.) Assess (semantic)  quality  of similar tags by  avg. JCN  distance TagCont(t,t sim )= … JCN(t,t sim )= … DF 20 CF 5
Dataset From Social Bookmarking Site  Delicious  in 2006    ORIGINAL Two filtering steps (to make measures more meaningful): Restrict to  top 10.000 tags     FULL Keep only users with  > 100 resources     MIN100RES 140,333,714  18,782,132  667,128  2,454,546 ORIGINAL 96,298,409 12,125,176 100,363 9,944 MIN100RES 117,319,016 14,567,465 511,348 10,000 FULL |Y| |R| |U| |T| dataset
Results – adding Describers (DF i ) more describers better semantics Almost all sub-folksonomies are better than random-picked ones   40% of describers according to trr outperform complete data!  Optimal performance for  70% describers (trr)
Results – adding Categorizers (CF i ) better semantics more categorizers Almost all sub-folksonomies are worse than random-picked ones   Global optimum for 90% categorizers (tpp)    removing 10% most extreme describers! (Spammers?)
The Story Tag Similarity measures can capture  emergent tag semantics S ub-folksonomies  introduced by measures of pragmatics show different semantic qualities Conclusions Measures of  tagging pragmatics  differentiate users by tagging motivation
Summary & Conclusions Introduction of  measures  of users‘  tagging motivation  (Categorizers vs. Describers) Evidence for  causal link  between tagging  pragmatics  (HOW people use tags) and tag  semantics  (WHAT tags mean) „ Mass matters“ for „wisdom of the crowd“, but  composition of crowd  makes a difference („ Verbosity “ of describers in general better, but with a limitation) Relevant for  tag recommendation  and  ontology learning  algorithms
Guess who‘s a Categorizer from the authors  
Thanks for the attention! Questions? Be verbous   Tag Similarity measures can capture  emergent tag semantics S ub-folksonomies  introduced by measures of pragmatics show different semantic qualities Evidende of  causal link  between pragmatics and semantics of tagging! [email_address] [email_address] Measures of  tagging pragmatics  differentiate users by tagging motivation
References [Cattuto2008]  Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme:  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems . In: Proc. 7 th  Intl. Semantic Web Conference (2008), p. 615-631 [Markines2009]  Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho, Gerd Stumme:  Evaluating Similarity Measures for Emergent Semantics of Social Tagging . In: Proc. 18 th  Intl. World Wide Web Conference (2009), p.641-641 [Strohmaier2009]  Markus Strohmaier, Christian Körner, Roman Kern:  Why do users tag? Detecting users‘ motivation for tagging in social tagging systems . Technical Report, Knowledge Management Institute – Graz University of Technology (2009)

More Related Content

PDF
Semantic Grounding Strategies for Tagbased Recommender Systems
PPT
Aspects of broad folksonomies
PPT
Social media recommendation based on people and tags (final)
PDF
Evolving Swings (topics) from Social Streams using Probability Model
PDF
A Survey Of Collaborative Filtering Techniques
PDF
News Reliability Evaluation using Latent Semantic Analysis
PPT
Towards Mining Semantic Maturity in Social Bookmarking Systems
PPT
Developing a Secured Recommender System in Social Semantic Network
Semantic Grounding Strategies for Tagbased Recommender Systems
Aspects of broad folksonomies
Social media recommendation based on people and tags (final)
Evolving Swings (topics) from Social Streams using Probability Model
A Survey Of Collaborative Filtering Techniques
News Reliability Evaluation using Latent Semantic Analysis
Towards Mining Semantic Maturity in Social Bookmarking Systems
Developing a Secured Recommender System in Social Semantic Network

What's hot (20)

PDF
Predicting Forced Population Displacement Using News Articles
PDF
Survey in Online Social Media Skelton by Network based Spam
PDF
58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
PPTX
Selection of Tags for Tag Clouds
PDF
Iaetsd similarity search in information networks using
PDF
Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
PDF
B046021319
PPTX
Linking Content Information with Bayesian Personalized Ranking via Multiple C...
PDF
Hc3612711275
PPT
CS8091_BDA_Unit_III_Content_Based_Recommendation
PDF
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
PDF
Approaches for Keyword Query Routing
PPTX
Ontology mapping for the semantic web
PDF
2015-02-25 research seminal, Paul Seitlinger
PPTX
PhD defense
PPTX
PhD Consortium ADBIS presetation.
PDF
Fake News Detection using Machine Learning
PPT
The Troll under the Bridge: Data Management for Huge Web Science Mediabases
PDF
Discovering latent informaion by
PDF
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
Predicting Forced Population Displacement Using News Articles
Survey in Online Social Media Skelton by Network based Spam
58903240-SentiMatrix-Multilingual-Sentiment-Analysis-Service
Selection of Tags for Tag Clouds
Iaetsd similarity search in information networks using
Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
B046021319
Linking Content Information with Bayesian Personalized Ranking via Multiple C...
Hc3612711275
CS8091_BDA_Unit_III_Content_Based_Recommendation
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
Approaches for Keyword Query Routing
Ontology mapping for the semantic web
2015-02-25 research seminal, Paul Seitlinger
PhD defense
PhD Consortium ADBIS presetation.
Fake News Detection using Machine Learning
The Troll under the Bridge: Data Management for Huge Web Science Mediabases
Discovering latent informaion by
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
Ad

Viewers also liked (20)

DOC
Cómo fueron dados los dones espirituales | Iglesia de Cristo, Ro 16:16, Mateo...
PPT
Tribus urbanes
PDF
Display Flash Brochure copy 2
PPTX
Que aspectos de seguridad y buen uso debe considerarse al utilizar Internet?
PPT
Plan 5 minutos colombia
PPTX
sobre luis cernuda
PDF
Declaracion de Universidad de Chile por violento ingreso de carabineros a cas...
PDF
Celero Transport V1
PDF
Profil Cubic Consulting
PDF
Consultree
PDF
Übersicht social media Kirche OÖ
PDF
Lectora snap empower training school create full-featured flash animations is...
PDF
Prototyp elternfreund
PDF
CV - Modèle pour postuler auprès d'une organisation internationle
KEY
Presentación HAS Albea Veterinarios
PDF
E richer cv
DOCX
PLAN DE TRABAJO DE AIP Y CRT DE LA I.E.Nº 10826. CCI.LUJAN-JLO.CHICLAYO
PDF
20140204_Infoday regional H2020_Biotech-Agrifood_Juan Viesca
PPT
Mediadordeconflictos
PPTX
Laminate Flooring | HARO Floorng New Zealand
Cómo fueron dados los dones espirituales | Iglesia de Cristo, Ro 16:16, Mateo...
Tribus urbanes
Display Flash Brochure copy 2
Que aspectos de seguridad y buen uso debe considerarse al utilizar Internet?
Plan 5 minutos colombia
sobre luis cernuda
Declaracion de Universidad de Chile por violento ingreso de carabineros a cas...
Celero Transport V1
Profil Cubic Consulting
Consultree
Übersicht social media Kirche OÖ
Lectora snap empower training school create full-featured flash animations is...
Prototyp elternfreund
CV - Modèle pour postuler auprès d'une organisation internationle
Presentación HAS Albea Veterinarios
E richer cv
PLAN DE TRABAJO DE AIP Y CRT DE LA I.E.Nº 10826. CCI.LUJAN-JLO.CHICLAYO
20140204_Infoday regional H2020_Biotech-Agrifood_Juan Viesca
Mediadordeconflictos
Laminate Flooring | HARO Floorng New Zealand
Ad

Similar to Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity (20)

PPT
How tagging pragmatics influence Tag Sense Discovery in Social Annotation Sys...
PPTX
Improving Personal Tagging Consistency Through Visualization Of Tag
PPT
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
PPT
On the Navigability of Social Tagging Systems
PPT
FaceTag - IASummit 2007
PPT
FaceTag at IASummit 2007
PPT
One Tag to bind them all: Measuring Term abstractness in Social Metadata
PPT
Tags as tools for social classification
PDF
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
PDF
IRJET - Deep Collaborrative Filtering with Aspect Information
PDF
A Survey on Decision Support Systems in Social Media
PDF
A Survey on Decision Support Systems in Social Media
PDF
A Survey on Decision Support Systems in Social Media
PPT
int.ere.st: SCOT-based Tag Sharing Services
PPTX
XXIX Charleston 2009 Silverchair Kerner
PDF
Improving Tag Clouds
PDF
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
PDF
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
How tagging pragmatics influence Tag Sense Discovery in Social Annotation Sys...
Improving Personal Tagging Consistency Through Visualization Of Tag
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
On the Navigability of Social Tagging Systems
FaceTag - IASummit 2007
FaceTag at IASummit 2007
One Tag to bind them all: Measuring Term abstractness in Social Metadata
Tags as tools for social classification
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...
IRJET - Deep Collaborrative Filtering with Aspect Information
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
int.ere.st: SCOT-based Tag Sharing Services
XXIX Charleston 2009 Silverchair Kerner
Improving Tag Clouds
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
sap open course for s4hana steps from ECC to s4
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction

Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

  • 1. Stop thinking, start tagging: Tag Semantics arise from Collaborative Verbosity Christian Körner 1 , Dominik Benz 2 , Andreas Hotho 3 , Markus Strohmaier 1 , Gerd Stumme 2 1 Knowledge Management Institute and Know Center, Graz University of Technology, Austria 2 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 3 Data Mining and Information Retrieval Group University of Würzburg, Germany
  • 2. Where do Semantics come from? Semantically annotated content is the „fuel“ of the next generation World Wide Web – but where is the petrol station? Expert-built  expensive Evidence for emergent semantics in Web2.0 data  Built by the crowd!  Which factors influence emergence of semantics?  Do certain users contribute more than others?
  • 3. The Story Emergent Tag Semantics Pragmatics of tagging Semantic Implications of Tagging Pragmatics Conclusions
  • 4. Emergent Tag Semantics tagging is a simple and intuitive way to organize all kinds of resources uncontrolled vocabulary, tags are „just strings “ formal model: folksonomy F = (U, T, R, Y) Users U, Tags T, Resources R Tag assignments Y  (U  T  R) evidence of emergent semantics Tag similarity measures can identify e.g. synonym tags ( web2.0, web_two )
  • 5. Tag Similarity Measures: Tag Context Similarity Tag Context Similarity is a scalable and precise tag similarity measure [Cattuto2008,Markines2009]: Describe each tag as a context vector Each dimension of the vector space correspond to another tag ; entry denotes co-occurrence count Compute similar tags by cosine similarity design software blog web programming … JAVA  Will be used as indicator of emergent semantics! 50 10 1 30 5
  • 6. Assessing the Quality of Tag Semantics JCN(t,t sim ) = 3.68 TagCont(t,t sim ) = 0.74 Folksonomy Tags = tag = synset WordNet Hierarchy Mapping Average JCN(t,t sim ) over all tags t: „ Quality of semantics“
  • 7. The Story Pragmatics of tagging Semantic Implications of Tagging Pragmatics Conclusions Tag Similarity measures can capture emergent tag semantics
  • 8. Tagging motivation Evidence of different ways HOW users tag (Tagging Pragmatics ) Broad distinction by tagging motivation [Strohmaier2009]: „ Categorizers “… use a small controlled tag vocabulary goal: „ontology-like“ categorization by tags, for later browsing tags a replacement for folders „ Describers “… tag „verbously“ with freely chosen words vocabulary not necessarily consistent (synomyms, spelling variants, …) goal: describe content, ease retrieval donuts duff marge beer bart barty Duff-beer bev alc nalc beer wine
  • 9. Tagging Pragmatics: Measures How to disinguish between two types of taggers? Intuition: Describers use open set of many tags, Categorizers use small set of controlled tags: Vocabulary size: Tag / Resource ratio: Average # tags per post: high low
  • 10. Tagging Pragmatics: Measures Next Intuition: Describers don‘t care about „abandoned“ tags, Categorizers do Orphan ratio: R(t): set of resources tagged by user u with tag t high low
  • 11. Tagging pragmatics: Limitations of measures Real users: no „perfect“ Categorizers / Describers, but „mixed“ behaviour Possibly influenced by user interfaces / recommenders Measures are correlated But: independent of semantics ; measures capture usage patterns
  • 12. The Story Semantic Implications of Tagging Pragmatics Conclusions Tag Similarity measures can capture emergent tag semantics Measures of tagging pragmatics differentiate users by tagging motivation
  • 13. Influence of Tagging Pragmatics on Emergent Semantics Idea: Can we learn the same (or even better) semantics from the folksonomy induced by a subset of describers / categorizers? Extreme Categorizers Extreme Describers Complete folksonomy Subset of 30% categorizers = user
  • 14. Experimental setup Apply pragmatic measures vocab, trr, tpp, orphan to each user Systematically create „ sub-folksonomies “ CF i / DF i by subsequently adding i % of Categorizers / Describers (i = 1,2,…,25,30,…,100) Compute similar tags based on each subset (TagContext Sim.) Assess (semantic) quality of similar tags by avg. JCN distance TagCont(t,t sim )= … JCN(t,t sim )= … DF 20 CF 5
  • 15. Dataset From Social Bookmarking Site Delicious in 2006  ORIGINAL Two filtering steps (to make measures more meaningful): Restrict to top 10.000 tags  FULL Keep only users with > 100 resources  MIN100RES 140,333,714 18,782,132 667,128 2,454,546 ORIGINAL 96,298,409 12,125,176 100,363 9,944 MIN100RES 117,319,016 14,567,465 511,348 10,000 FULL |Y| |R| |U| |T| dataset
  • 16. Results – adding Describers (DF i ) more describers better semantics Almost all sub-folksonomies are better than random-picked ones 40% of describers according to trr outperform complete data! Optimal performance for 70% describers (trr)
  • 17. Results – adding Categorizers (CF i ) better semantics more categorizers Almost all sub-folksonomies are worse than random-picked ones Global optimum for 90% categorizers (tpp)  removing 10% most extreme describers! (Spammers?)
  • 18. The Story Tag Similarity measures can capture emergent tag semantics S ub-folksonomies introduced by measures of pragmatics show different semantic qualities Conclusions Measures of tagging pragmatics differentiate users by tagging motivation
  • 19. Summary & Conclusions Introduction of measures of users‘ tagging motivation (Categorizers vs. Describers) Evidence for causal link between tagging pragmatics (HOW people use tags) and tag semantics (WHAT tags mean) „ Mass matters“ for „wisdom of the crowd“, but composition of crowd makes a difference („ Verbosity “ of describers in general better, but with a limitation) Relevant for tag recommendation and ontology learning algorithms
  • 20. Guess who‘s a Categorizer from the authors 
  • 21. Thanks for the attention! Questions? Be verbous  Tag Similarity measures can capture emergent tag semantics S ub-folksonomies introduced by measures of pragmatics show different semantic qualities Evidende of causal link between pragmatics and semantics of tagging! [email_address] [email_address] Measures of tagging pragmatics differentiate users by tagging motivation
  • 22. References [Cattuto2008] Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems . In: Proc. 7 th Intl. Semantic Web Conference (2008), p. 615-631 [Markines2009] Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho, Gerd Stumme: Evaluating Similarity Measures for Emergent Semantics of Social Tagging . In: Proc. 18 th Intl. World Wide Web Conference (2009), p.641-641 [Strohmaier2009] Markus Strohmaier, Christian Körner, Roman Kern: Why do users tag? Detecting users‘ motivation for tagging in social tagging systems . Technical Report, Knowledge Management Institute – Graz University of Technology (2009)