SlideShare a Scribd company logo
Association Rule Mining
in Social Network Data
PRESENTED BY: HOSSEIN MOBASHER
COURSE: DATA MINING
19/2
Contents
• Introduction
• Related Works
• The proposed Framework
• Experimental Evaluation
• Conclusion
19/3
Introduction
• The use of social networks has altered the way of life of online community since
last decade.
• Social data uses in:
• Academic applications
• E-commerce
• Discovers the user habits and interests of different geographical online communities
• Sentimental analysis of users
• Purpose: Support analysts in decision-making and optimal resource
management in businesses as well as web maintenance.
19/4
Introduction (continue)
• The social data is one of the powerful sources of data:
• To get knowledge about social communities
• Investigate the behavior and other different aspects of the online communities
• User-generated contents (UGC) used to help online organizations to enhance
their services based on user perspectives.
• The data mining techniques are effectively exploited to discover hidden,
interested and meaningful knowledge from the social data.
19/5
Related Works
• TwitterEcho
• Collect data from distributed architecture (Portuguese Twittosphere)
• Use of micro-blogging as the means to predict the political sentiment.
• TWICALL
• Discovers important events, categorizes and classifies them
• NIF-T
• Exploring data published on micro-blogging websites (i.e. Twitter)
19/6
The proposed Framework
• Environment for the association rule mining to discover hidden patterns from
tweets.
19/7
Collecting and preprocessing of tweets
• Access tweets using Twitter API.
• Received tweets are unsuitable for the subsequent processes.
• Includes information which is not required for problem under consideration
• Remove unnecessary information and transform them into items and related
contextual features.
Access data using
Twitter API
Remove Unnecessary
Information
Transform into
suitable format
Mapped into a
transactional database
19/8
Collecting and preprocessing of tweets
• Transformed tweets are then mapped into a transactional database.
• Composed of set of stems
• i.e. “Imagination is more important than knowledge” may be mapped into {imagination,
important, knowledge}
Access data using
Twitter API
Remove Unnecessary
Information
Transform into
suitable format
Mapped into a
transactional database
19/9
Discovery of Correlations
• Use apriori method to extract frequent itemset mining.
• An association rule is usually represented as: If Body then Head
• If Body happens then there are more chance that Head may also happen
• It is the relationship between them
• Strength of the rule depends on association rule support and confidence
• The higher the strength of the rule, higher the association in between the terms.
• 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑡𝑖𝑜𝑛 ⇒ 𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒
• Support = 40%
• Confidence = 70%
19/10
Taxonomy Generation
• Automatically generates taxonomy based on tweet attributes (i.e. frequent
keywords that are generated in the previous phase).
• The more generalized or high-level concepts or correlations can be extracted.
• The taxonomy nodes represent distinct terms extracted from tweet contents
• Graph extraction
• Graph partitioning and pruning
19/11
Taxonomy Generation (Graph extraction)
• Strong correlations are detected using previous phase result.
• Generated correlations are represented in graph format
• Edge: The implications present in the rule
• Vertices: Items of tweet contents
• 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑
𝑠𝑜𝑐𝑖𝑒𝑡𝑦, 𝑝𝑒𝑜𝑝𝑙𝑒 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦
𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑
𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑
𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦
19/12
Taxonomy Generation (Graph partitioning and pruning)
• Makes the graph compact
• Prunes edges which do not have string relevant relationship by performing
vertex labeling. (Label represents level of taxonomy)
19/13
Analyzing Correlations
• The selection and ranking of the significant correlations
• The selection is made having
• A rule schema < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑,∗ > ⇒ < 𝑃𝑙𝑎𝑐𝑒,∗ >
• Given interesting rule items < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑, 𝑆𝑐ℎ𝑜𝑜𝑙 > ⇒ < 𝑃𝑙𝑎𝑐𝑒, 𝐿𝑜𝑛𝑑𝑜𝑛 >
• The results ranked based on their support and confidence quality indexes.
19/14
Experimental Evaluation
• The proposed framework highlights famous topical subjects (i.e. European
Union)
• The results includes 58 transactions with 209 distinct items (i.e. keywords).
• Firstly, the effectiveness is presented in two scenarios:
• User behavior analysis
• Topic trend analysis
• Secondly, the effectiveness is presented as quality of generated taxonomies.
19/15
User Behavior Analysis
• Extracted correlations allow experts to highlight hidden and potentially
interesting user behaviors.
• 𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑, 𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦, 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑
• Proposed framework automatically generates the taxonomy from the mined rules.
• The taxonomy clearly highlights the behavior of people towards the peace.
19/16
Topic Trend Analysis
• Discovery and analysis of currently matter of contention on Twitter.
• Domain expert wants to discover subjects of topical interest for Twitter users.
• The taxonomy suggests that society as a general and people in particular are
concerns with peace in the World.
19/17
Quality of generated taxonomies
• The evaluation of taxonomy generation is measured with
• Global quality (Using geometry average)
• Local quality (Degree of correlation between non-leaf and leaf nodes)
• Spread (Number of nodes across the taxonomy to move from node to its root node in graph)
• The results are compared with the approach of
• “Evolutionary Taxonomy Construction from Dynamic Tag Space”, 2010
19/18
Quality of generated taxonomies (continue)
• Global quality remained same in both approaches.
• Produced pretty balanced local quality vs. spread measurement indexes.
• Proposed approach takes slightly less time comparing with the approach
reported in.
19/19
Conclusion
• Present the mechanism of extracting hidden correlations between contents.
• Generated correlations are helpful to understand the hidden associations
among the textual and contextual features of the UGC.
• Proposed approach automatically generates taxonomy.
• The experimental results validate the efficiency and effectiveness of the
proposed framework.
Thanks for your attentions 
Questions ?

More Related Content

PPT
Gabriel Rissola: "Measuring the impact of eInclusion actors"
PPT
Structural implementation of e democracy - presentation 'networking democracy...
PPT
Expression of Political Opinions in Press
PPTX
Jan Romportl, Chief Data Scientist at O2 Czech Republic
PDF
Katharina Götsch, Verena Grubmüller, Igor Pejic – The UniteEurope Project
PPTX
Dacena
PPT
McCallum, Making and Moving Metadata: Two Library of Congress Initiatives
PPTX
Talk to NTU - Spark
Gabriel Rissola: "Measuring the impact of eInclusion actors"
Structural implementation of e democracy - presentation 'networking democracy...
Expression of Political Opinions in Press
Jan Romportl, Chief Data Scientist at O2 Czech Republic
Katharina Götsch, Verena Grubmüller, Igor Pejic – The UniteEurope Project
Dacena
McCallum, Making and Moving Metadata: Two Library of Congress Initiatives
Talk to NTU - Spark

Similar to Association Rule Mining in Social Network Data (20)

DOCX
Discovering emerging topics in social streams via link anomaly detection
PPTX
Topical_Facets
PDF
Twitter as a personalizable information service ii
PDF
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
DOCX
JPJ1419 Discovering Emerging Topics in Social Streams via Link-Anomaly Detec...
PDF
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
PPTX
Dsir2016 vuwp
DOC
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
PDF
Text mining on Twitter information based on R platform
PDF
Application For Sentiment And Demographic Analysis Processes On Social Media
PPTX
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
PDF
Twitris - Web Information System 2011 Course
PPT
Evolving social data mining and affective analysis
PDF
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
PDF
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
PDF
Event detection in twitter using text and image fusion
Discovering emerging topics in social streams via link anomaly detection
Topical_Facets
Twitter as a personalizable information service ii
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
JPJ1419 Discovering Emerging Topics in Social Streams via Link-Anomaly Detec...
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Dsir2016 vuwp
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
Text mining on Twitter information based on R platform
Application For Sentiment And Demographic Analysis Processes On Social Media
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Twitris - Web Information System 2011 Course
Evolving social data mining and affective analysis
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Event detection in twitter using text and image fusion
Ad

More from Hossein Mobasher (7)

PDF
Advanced Java
PDF
CodeIgniter
ODP
PDF
Live API Documentation
PPTX
Presentation
Advanced Java
CodeIgniter
Live API Documentation
Presentation
Ad

Association Rule Mining in Social Network Data

  • 1. Association Rule Mining in Social Network Data PRESENTED BY: HOSSEIN MOBASHER COURSE: DATA MINING
  • 2. 19/2 Contents • Introduction • Related Works • The proposed Framework • Experimental Evaluation • Conclusion
  • 3. 19/3 Introduction • The use of social networks has altered the way of life of online community since last decade. • Social data uses in: • Academic applications • E-commerce • Discovers the user habits and interests of different geographical online communities • Sentimental analysis of users • Purpose: Support analysts in decision-making and optimal resource management in businesses as well as web maintenance.
  • 4. 19/4 Introduction (continue) • The social data is one of the powerful sources of data: • To get knowledge about social communities • Investigate the behavior and other different aspects of the online communities • User-generated contents (UGC) used to help online organizations to enhance their services based on user perspectives. • The data mining techniques are effectively exploited to discover hidden, interested and meaningful knowledge from the social data.
  • 5. 19/5 Related Works • TwitterEcho • Collect data from distributed architecture (Portuguese Twittosphere) • Use of micro-blogging as the means to predict the political sentiment. • TWICALL • Discovers important events, categorizes and classifies them • NIF-T • Exploring data published on micro-blogging websites (i.e. Twitter)
  • 6. 19/6 The proposed Framework • Environment for the association rule mining to discover hidden patterns from tweets.
  • 7. 19/7 Collecting and preprocessing of tweets • Access tweets using Twitter API. • Received tweets are unsuitable for the subsequent processes. • Includes information which is not required for problem under consideration • Remove unnecessary information and transform them into items and related contextual features. Access data using Twitter API Remove Unnecessary Information Transform into suitable format Mapped into a transactional database
  • 8. 19/8 Collecting and preprocessing of tweets • Transformed tweets are then mapped into a transactional database. • Composed of set of stems • i.e. “Imagination is more important than knowledge” may be mapped into {imagination, important, knowledge} Access data using Twitter API Remove Unnecessary Information Transform into suitable format Mapped into a transactional database
  • 9. 19/9 Discovery of Correlations • Use apriori method to extract frequent itemset mining. • An association rule is usually represented as: If Body then Head • If Body happens then there are more chance that Head may also happen • It is the relationship between them • Strength of the rule depends on association rule support and confidence • The higher the strength of the rule, higher the association in between the terms. • 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑡𝑖𝑜𝑛 ⇒ 𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒 • Support = 40% • Confidence = 70%
  • 10. 19/10 Taxonomy Generation • Automatically generates taxonomy based on tweet attributes (i.e. frequent keywords that are generated in the previous phase). • The more generalized or high-level concepts or correlations can be extracted. • The taxonomy nodes represent distinct terms extracted from tweet contents • Graph extraction • Graph partitioning and pruning
  • 11. 19/11 Taxonomy Generation (Graph extraction) • Strong correlations are detected using previous phase result. • Generated correlations are represented in graph format • Edge: The implications present in the rule • Vertices: Items of tweet contents • 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑 𝑠𝑜𝑐𝑖𝑒𝑡𝑦, 𝑝𝑒𝑜𝑝𝑙𝑒 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑 𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑 𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦
  • 12. 19/12 Taxonomy Generation (Graph partitioning and pruning) • Makes the graph compact • Prunes edges which do not have string relevant relationship by performing vertex labeling. (Label represents level of taxonomy)
  • 13. 19/13 Analyzing Correlations • The selection and ranking of the significant correlations • The selection is made having • A rule schema < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑,∗ > ⇒ < 𝑃𝑙𝑎𝑐𝑒,∗ > • Given interesting rule items < 𝐾𝑒𝑦𝑤𝑜𝑟𝑑, 𝑆𝑐ℎ𝑜𝑜𝑙 > ⇒ < 𝑃𝑙𝑎𝑐𝑒, 𝐿𝑜𝑛𝑑𝑜𝑛 > • The results ranked based on their support and confidence quality indexes.
  • 14. 19/14 Experimental Evaluation • The proposed framework highlights famous topical subjects (i.e. European Union) • The results includes 58 transactions with 209 distinct items (i.e. keywords). • Firstly, the effectiveness is presented in two scenarios: • User behavior analysis • Topic trend analysis • Secondly, the effectiveness is presented as quality of generated taxonomies.
  • 15. 19/15 User Behavior Analysis • Extracted correlations allow experts to highlight hidden and potentially interesting user behaviors. • 𝑝𝑒𝑎𝑐𝑒 ⇒ 𝑊𝑜𝑟𝑙𝑑, 𝑠𝑜𝑐𝑖𝑒𝑡𝑦 ⇒ 𝑐𝑜𝑢𝑛𝑡𝑟𝑦, 𝑐𝑜𝑢𝑛𝑡𝑟𝑦 ⇒ 𝑊𝑜𝑟𝑙𝑑 • Proposed framework automatically generates the taxonomy from the mined rules. • The taxonomy clearly highlights the behavior of people towards the peace.
  • 16. 19/16 Topic Trend Analysis • Discovery and analysis of currently matter of contention on Twitter. • Domain expert wants to discover subjects of topical interest for Twitter users. • The taxonomy suggests that society as a general and people in particular are concerns with peace in the World.
  • 17. 19/17 Quality of generated taxonomies • The evaluation of taxonomy generation is measured with • Global quality (Using geometry average) • Local quality (Degree of correlation between non-leaf and leaf nodes) • Spread (Number of nodes across the taxonomy to move from node to its root node in graph) • The results are compared with the approach of • “Evolutionary Taxonomy Construction from Dynamic Tag Space”, 2010
  • 18. 19/18 Quality of generated taxonomies (continue) • Global quality remained same in both approaches. • Produced pretty balanced local quality vs. spread measurement indexes. • Proposed approach takes slightly less time comparing with the approach reported in.
  • 19. 19/19 Conclusion • Present the mechanism of extracting hidden correlations between contents. • Generated correlations are helpful to understand the hidden associations among the textual and contextual features of the UGC. • Proposed approach automatically generates taxonomy. • The experimental results validate the efficiency and effectiveness of the proposed framework.
  • 20. Thanks for your attentions  Questions ?