SlideShare a Scribd company logo
Content Complexity, Similarity, and
Consistency in Social Media:
A Deep Learning Approach
Gene Moo Lee
University of Texas at Arlington
Joint work with
Donghyuk Shin (UT Austin/Amazon), Shu He (UConn),
Andrew B. Whinston (UT Austin)
DSI 2016, Austin TX
Social media: More users
2
Social media: More spending
3
Challenges and opportunities: 78% photos
4
Source: Chang et al. 2014
Research questions
• How can firms optimize social media strategies by
incorporating visual content?
• Specifically, what are the determinants of consumer
engagement in terms of “likes” and “reblogs” (sharing)
actions?
• How visual and textual contents play role?
• Operationally, how to construct measures on these
unstructured data sources?
5
Tumblr data
• Tumblr: microblogging platform (acquired by Yahoo!)
• 35,651 posts by 183 companies (May - Oct 2014)
• Automobile, Entertainment, Food, Fashion,
Finance, Leisure, Retail, Tech
• 89.7% photo & text, 6.3% pure text, 4% videos
• Collected “likes” and “reblogs” until Apr 2015
6
Company blogs in Tumblr
7
BMW USA Vogue IBM
Data: blog post and engagement
8
Post = Visual Info (Image) + Textual Info (Text, Tags)
Customer engagement = Notes (Likes + Reblogs)
Visual features
• Aesthetics (beautiful photos)
• Adult-contents
• Celebrity
• Feature complexity (low-level, flashy images)
• Semantic complexity (high-level, complex meaning)
• Number of salient objects
9
Feature complexity (low level)
• Visual complexity theory [Donderi 2006a, Pieters et al. 2010]
• Visually complex (flashy) images (colors, luminance,
shape) gets more attention
• This feature complexity can be captured by the
image’s compressed file size [Donderi 2006a; Donderi
2006b; Machado et al. 2015; Forsythe et al. 2011]
• However, this complexity can only capture low-level
complexity based on “pixel” values
10
Semantic complexity (high level)
• Recognition-By-Components theory [Biederman 1987]
• Human object recognition is invariant to feature
factors (colors, brightness, edges, positions, etc.)
• Vessel and Rubin (2010) show that visual preferences
are influenced by semantic content in the image
• We posit that semantic complexity matters!
• Operational question: How do we calculate semantics
from unstructured images?
11
Deep learning
• A branch of machine learning, inspired by human brain
• Algorithms to model high-level abstractions with multiple processing
layers of non-linear transformations
• (1) theoretical breakthroughs, (2) Big Data, (3) powerful computation
• Successfully applied in image/video/voice recognition, AlphaGo, etc.
12
Semantic complexity via deep learning
• Deep convolutional neural network (CNN) [Jia et al. 2014]
• Model trained with 1.2 million images with tags (ImageNet, Flickr)
• Tested on 53,417 images from brand-generated Tumblr posts
• Each image is represented by a 1,700 dimensional vector, where each
value is the confidence score w.r.t. an object (tag)
• We define semantic complexity as the Shannon Diversity Index (entropy)
on the 1,700-dimensional vector
• max = log(d), if p is uniformly distributed
• min = 0, if p_i = 1 for some i
13
ImageNet: Image DB with tree-structure tags
14
Source: ImageNet
More visual features
• 7th-layer output = robust representation of the image for “computer vision” tasks
• Aesthetic/beauty score [Dhar et al. 2011 (CVPR, Vision)]
• Adult-content score [Sengamedu et al. 2011 (MM, Vision)]
• Celebrity (450 celebrities) [Parhki et al. 2015 (BMV, Vision)]
• Number of salient objects [Zhang et al. 2015 (CVPR, Vision)]
15
Examples: Visual features
• Visual complexity theory (Attneave 1994,
Donderi 2006, Pieters et al. 2010)
• Visual stimuli are a composite of
colors,luminance, shape, number of
objects/patterns
16
Textual features
• Two textual sources: text and tags
• Length: # of words, # of tags
• Topic complexity: LDA topic model (text, tags)
• Order complexity: word2vec (for text only)
17
Examples: Textual features
• Topics
• Word clusters
18
Visual-Textual Content Similarity
• Image: pixels, Text/Tags: characters
— Need a common representation!
1. Represent each image as a collection of the predicted labels
obtained from deep learning — “image corpus”
2. Train LDA with both image and text/tags corpora — topic
distribution for images and text/tags
3. Cosine similarity between the two corresponding topic
distribution
19
Examples: Content similarity
• Topics
• Word clusters
20
21
Empirical Model
• Linear fixed effects model
• DV (likes/reblogs): take log transformation due to their
skewed distributions
• Capture blog (firm) heterogeneity
• Capture time effects (day of week, month)
• Other models
• Identical results with random effects
• Consistent results with negative binomial model
22
23
Summary and implications
1. Large-scale analysis on visual content in social
media
2. New visual semantic complexity via deep learning
• Able to relate visual and textual content
Visual content analysis can be used to optimize
content design for social media marketing
24
Thank you!
Contact Info: Gene Moo Lee
gene.lee@uta.edu

More Related Content

PPT
The mechanism of protein folding
PPTX
Mutational analysis
PDF
Gene Expression Data Analysis
PPTX
Comparative genomic hybridization
PPTX
THE DNA DNA structure Replication of DNA Enzymes involved in DNA Replication...
PPTX
PAM : Point Accepted Mutation
The mechanism of protein folding
Mutational analysis
Gene Expression Data Analysis
Comparative genomic hybridization
THE DNA DNA structure Replication of DNA Enzymes involved in DNA Replication...
PAM : Point Accepted Mutation

What's hot (20)

PPTX
Blotting & its types
PPT
Dna replication in eukaryotes
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PPTX
Scoring matrices
PPTX
InterPro and InterProScan 5.0
 
PPTX
Application of stochastic modelling in bioinformatics
PPTX
Western blotting pppt
PPT
Dna replication;transcription and translation
PDF
MCQs on DNA Fingerprinting.pdf
PPTX
Gene silencing
PPT
Pairwise sequence alignment
PPTX
termination of translation - protein synthesis
PPTX
Directed Enzyme Evolution
DOCX
DNA repair system_mechanism
PDF
Machine Learning in Bioinformatics
PPTX
Eukaryotic translation pathway
PDF
Deep learning for NLP and Transformer
PPTX
FISH Technique-WPS Officeeeeeeeeeeeeeeeee
PPT
biochemistry Regulation of gene expression
Blotting & its types
Dna replication in eukaryotes
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Scoring matrices
InterPro and InterProScan 5.0
 
Application of stochastic modelling in bioinformatics
Western blotting pppt
Dna replication;transcription and translation
MCQs on DNA Fingerprinting.pdf
Gene silencing
Pairwise sequence alignment
termination of translation - protein synthesis
Directed Enzyme Evolution
DNA repair system_mechanism
Machine Learning in Bioinformatics
Eukaryotic translation pathway
Deep learning for NLP and Transformer
FISH Technique-WPS Officeeeeeeeeeeeeeeeee
biochemistry Regulation of gene expression
Ad

Similar to Content Complexity, Similarity, and Consistency in Social Media: A Deep Learning Approach (20)

PPSX
Image Search: Then and Now
PDF
iliananpappi_mscthesis
PPTX
Data Science meets Digital Marketing
PPT
Personalization on the Web with Semantic Patterns (in LOD)
PDF
my model genuines.
PDF
Advances in Image Search and Retrieval
PDF
A Meteoroid on Steroids: Ranking Media Items Stemming from Multiple Social Ne...
PDF
Deep Representation: Building a Semantic Image Search Engine
PPTX
Towards Understanding Crisis Events On Online Social Networks Through Pictures
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
UXSG2014 Lightning Talks - UX and Semantic web making web more human (Nurgul ...
PDF
A Meteoroid on Steroids: Ranking Media Items Stemming from Multiple Social Ne...
PDF
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
PDF
Machine Learning for Developers - Danilo Poccia - Codemotion Rome 2017
PDF
Leveraging social media for training object detectors
PDF
Searching Images: Recent research at Southampton
PDF
Content Based Image Retrieval
PDF
Similarity-based retrieval of multimedia content
PDF
Image Tagging With Social Assistance
PDF
Zinoviev - The Pain of Complexity presentation
Image Search: Then and Now
iliananpappi_mscthesis
Data Science meets Digital Marketing
Personalization on the Web with Semantic Patterns (in LOD)
my model genuines.
Advances in Image Search and Retrieval
A Meteoroid on Steroids: Ranking Media Items Stemming from Multiple Social Ne...
Deep Representation: Building a Semantic Image Search Engine
Towards Understanding Crisis Events On Online Social Networks Through Pictures
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
UXSG2014 Lightning Talks - UX and Semantic web making web more human (Nurgul ...
A Meteoroid on Steroids: Ranking Media Items Stemming from Multiple Social Ne...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Machine Learning for Developers - Danilo Poccia - Codemotion Rome 2017
Leveraging social media for training object detectors
Searching Images: Recent research at Southampton
Content Based Image Retrieval
Similarity-based retrieval of multimedia content
Image Tagging With Social Assistance
Zinoviev - The Pain of Complexity presentation
Ad

More from Gene Moo Lee (14)

PPTX
Developing A Big Data Analytics Framework for Industry Intelligence
PPTX
Big Data Analytics: Challenges and Opportunities
PPTX
Analyzing the spillover roles of user-generated reviews on purchases: Evidenc...
PPTX
Towards Advanced Business Analytics using Text Mining and Deep Learning
PDF
Towards a better measure of business proximity: Topic modeling for industry i...
PPTX
Designing Cybersecurity Policies with Field Experiments
PPT
Introduction to NP Completeness
PPTX
Strategic Network Formation in a Location-Based Social Network
PDF
Matching Mobile Applications for Cross Promotion
PDF
Improving Sketch Reconstruction Accuracy
PDF
Improving the Interaction between Overlay Routing and Traffic Engineering
PDF
Modeling Human Mobility using Location Based Social Networks
PDF
Mobile Video Delivery via Human Movement
PDF
Towards modeling M&A in high tech industries
Developing A Big Data Analytics Framework for Industry Intelligence
Big Data Analytics: Challenges and Opportunities
Analyzing the spillover roles of user-generated reviews on purchases: Evidenc...
Towards Advanced Business Analytics using Text Mining and Deep Learning
Towards a better measure of business proximity: Topic modeling for industry i...
Designing Cybersecurity Policies with Field Experiments
Introduction to NP Completeness
Strategic Network Formation in a Location-Based Social Network
Matching Mobile Applications for Cross Promotion
Improving Sketch Reconstruction Accuracy
Improving the Interaction between Overlay Routing and Traffic Engineering
Modeling Human Mobility using Location Based Social Networks
Mobile Video Delivery via Human Movement
Towards modeling M&A in high tech industries

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Getting Started with Data Integration: FME Form 101
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
A comparative analysis of optical character recognition models for extracting...
Encapsulation_ Review paper, used for researhc scholars
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Programs and apps: productivity, graphics, security and other tools
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Building Integrated photovoltaic BIPV_UPV.pdf
TLE Review Electricity (Electricity).pptx
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Getting Started with Data Integration: FME Form 101

Content Complexity, Similarity, and Consistency in Social Media: A Deep Learning Approach

  • 1. Content Complexity, Similarity, and Consistency in Social Media: A Deep Learning Approach Gene Moo Lee University of Texas at Arlington Joint work with Donghyuk Shin (UT Austin/Amazon), Shu He (UConn), Andrew B. Whinston (UT Austin) DSI 2016, Austin TX
  • 3. Social media: More spending 3
  • 4. Challenges and opportunities: 78% photos 4 Source: Chang et al. 2014
  • 5. Research questions • How can firms optimize social media strategies by incorporating visual content? • Specifically, what are the determinants of consumer engagement in terms of “likes” and “reblogs” (sharing) actions? • How visual and textual contents play role? • Operationally, how to construct measures on these unstructured data sources? 5
  • 6. Tumblr data • Tumblr: microblogging platform (acquired by Yahoo!) • 35,651 posts by 183 companies (May - Oct 2014) • Automobile, Entertainment, Food, Fashion, Finance, Leisure, Retail, Tech • 89.7% photo & text, 6.3% pure text, 4% videos • Collected “likes” and “reblogs” until Apr 2015 6
  • 7. Company blogs in Tumblr 7 BMW USA Vogue IBM
  • 8. Data: blog post and engagement 8 Post = Visual Info (Image) + Textual Info (Text, Tags) Customer engagement = Notes (Likes + Reblogs)
  • 9. Visual features • Aesthetics (beautiful photos) • Adult-contents • Celebrity • Feature complexity (low-level, flashy images) • Semantic complexity (high-level, complex meaning) • Number of salient objects 9
  • 10. Feature complexity (low level) • Visual complexity theory [Donderi 2006a, Pieters et al. 2010] • Visually complex (flashy) images (colors, luminance, shape) gets more attention • This feature complexity can be captured by the image’s compressed file size [Donderi 2006a; Donderi 2006b; Machado et al. 2015; Forsythe et al. 2011] • However, this complexity can only capture low-level complexity based on “pixel” values 10
  • 11. Semantic complexity (high level) • Recognition-By-Components theory [Biederman 1987] • Human object recognition is invariant to feature factors (colors, brightness, edges, positions, etc.) • Vessel and Rubin (2010) show that visual preferences are influenced by semantic content in the image • We posit that semantic complexity matters! • Operational question: How do we calculate semantics from unstructured images? 11
  • 12. Deep learning • A branch of machine learning, inspired by human brain • Algorithms to model high-level abstractions with multiple processing layers of non-linear transformations • (1) theoretical breakthroughs, (2) Big Data, (3) powerful computation • Successfully applied in image/video/voice recognition, AlphaGo, etc. 12
  • 13. Semantic complexity via deep learning • Deep convolutional neural network (CNN) [Jia et al. 2014] • Model trained with 1.2 million images with tags (ImageNet, Flickr) • Tested on 53,417 images from brand-generated Tumblr posts • Each image is represented by a 1,700 dimensional vector, where each value is the confidence score w.r.t. an object (tag) • We define semantic complexity as the Shannon Diversity Index (entropy) on the 1,700-dimensional vector • max = log(d), if p is uniformly distributed • min = 0, if p_i = 1 for some i 13
  • 14. ImageNet: Image DB with tree-structure tags 14 Source: ImageNet
  • 15. More visual features • 7th-layer output = robust representation of the image for “computer vision” tasks • Aesthetic/beauty score [Dhar et al. 2011 (CVPR, Vision)] • Adult-content score [Sengamedu et al. 2011 (MM, Vision)] • Celebrity (450 celebrities) [Parhki et al. 2015 (BMV, Vision)] • Number of salient objects [Zhang et al. 2015 (CVPR, Vision)] 15
  • 16. Examples: Visual features • Visual complexity theory (Attneave 1994, Donderi 2006, Pieters et al. 2010) • Visual stimuli are a composite of colors,luminance, shape, number of objects/patterns 16
  • 17. Textual features • Two textual sources: text and tags • Length: # of words, # of tags • Topic complexity: LDA topic model (text, tags) • Order complexity: word2vec (for text only) 17
  • 18. Examples: Textual features • Topics • Word clusters 18
  • 19. Visual-Textual Content Similarity • Image: pixels, Text/Tags: characters — Need a common representation! 1. Represent each image as a collection of the predicted labels obtained from deep learning — “image corpus” 2. Train LDA with both image and text/tags corpora — topic distribution for images and text/tags 3. Cosine similarity between the two corresponding topic distribution 19
  • 20. Examples: Content similarity • Topics • Word clusters 20
  • 21. 21
  • 22. Empirical Model • Linear fixed effects model • DV (likes/reblogs): take log transformation due to their skewed distributions • Capture blog (firm) heterogeneity • Capture time effects (day of week, month) • Other models • Identical results with random effects • Consistent results with negative binomial model 22
  • 23. 23
  • 24. Summary and implications 1. Large-scale analysis on visual content in social media 2. New visual semantic complexity via deep learning • Able to relate visual and textual content Visual content analysis can be used to optimize content design for social media marketing 24
  • 25. Thank you! Contact Info: Gene Moo Lee gene.lee@uta.edu

Editor's Notes

  • #24: Industry subsample analysis Long- and short-term customer engagement Categorize posts/blogs into ‘utilitarian’ vs ‘hedonic’ Examine non-linear effects