Enhancing Object Detectors Using the
Collective
Intelligence of Social Media
Manish Kumar
Summary of the presentation
•  Recently, we have been witnessing the rapid growth of Social Media that emerged as the result of
users’ willingness to communicate, socialize, collaborate and share content.
•  The outcome of this massive activity was the generation of a tremendous volume of user contributed
data that have been made available on the Web, usually along with an indication of their meaning
(i.e., tags).
•  This has motivated the research objective of investigating whether the Collective Intelligence that
emerges from the users’ contributions inside a Web 2.0 application, can be used to remove the
need for dedicated human supervision during the process of learning.
•  In this presentation, I deal with a very demanding learning problem in computer vision that consists
of detecting and localizing an object within the image content.
Background - 1
•  The recent advances of Web technologies have effectively turned ordinary
people into active members of the Web, that generate, share, contribute and
exchange various types of information.
•  Based on this huge repository of content, various services have evolved,
ranging from the field of eCommerce, to emergency response and consumer
collective applications such as realtravel.com.
•  The intelligence provided by single users organized in communities, takes a
radical new shape in the context of Web 2.0, that of Collective Intelligence.
Background - 2
•  The MIT Center for Collective Intelligence frames the research question as
“How can people and computers be connected so that-collectively-they act
more intelligently than any individuals, groups, or computers have ever done
before?”.
•  In this presentation, I try to investigate whether the Collective Intelligence
derived from the user contributed content can be used to guide a learning
process that will teach the machine how to recognize objects from visual
content, the way a human does.
Learning and Web 2.0 Multimedia (1)
•  If we wish to construct a visual system that is able to scale on an arbitrary large
number of concepts, effortless learning is crucial. To solve this issue, we have to
address these questions:
1.  Can a computer program learn how to recognize semantic concepts from images?
2.  What is the process of learning?
3.  What is the mechanism that allows humans to initially require many examples to learn, as performed by little
babies, and after they have learned how to learn, they can learn from just a few examples?
4.  Most importantly what is the role of the teacher in this process and what is the minimum amount of
supervision that is absolutely necessary for facilitating efficient learning?
Learning and Web 2.0 Multimedia (2)
1.  Annotation-based learning model
a.  use labels provided by human annotators
b.  the amount of human effort that is required for annotation increases linearly
c.  Social Tagging Systems (STS) is the main driving factor
2.  Search-based model
a.  use models automatically obtained from the Web
b.  classification performance decreases from search-based methods
Social Tagging System - STS (1)
•  An STS is a web-based application, where users, either as individuals or more
commonly as members of a community (i.e., social networks), assign labels (i.e.,
arbitrary textual descriptions) to digital resources. Their motivation for tagging is
information organization and sharing.
•  Social tagging systems tend to form rich knowledge repositories that enable the
extraction of patterns reflecting the way content semantics is perceived by the web
users.
•  The tag proportions each resource receives crystallizes after about 100 annotations
attributing this behavior to the users’ common background and their tendency for
imitation on other users’ tagging habits.
Social Tagging System (2)
•  Limitations of STS:
1.  Users are prone to make mistakes and they often suggest invalid metadata (tag spamming).
2.  The lack of (hierarchical) structure of information results in tag ambiguity (a tag may have many
senses), tag synonymy (two different tags may have the same meaning) and granularity variation
(users do not use the same description level, when they refer to a concept).
•  The correlations between the tag and visual information space that are
established when the users suggest tags for the uploaded visual content, are
mostly treated as complementary sources of information that both contribute to
the semantic description of the resources.
Techniques used for Multimedia Analysis (1)
•  The very first attempts for image retrieval were based on keyword search applied
either on the associated annotations (assuming that annotations existed) or on the
images’ file names. But they are barely as descriptive as the multimedia content
itself.
•  To overcome these limitations, the use of the image visual characteristics has been
proposed. In this case, the visual content is utilized by extracting a set of visual
features from each image or image region. By comparing the visual features an
algorithm can decide whether the two images/regions represent the same semantic
concept. Then, image retrieval is performed by comparing the visual features of an
example image/region that is associated with a semantic concept by the user, to the
visual features of all images in a given collection.
Techniques used for Multimedia Analysis (2)
Currently pattern classification has been brought to the core of most image analysis techniques in order to render a kind of meaning on visual
patterns. A typical pattern classification problem can be consider to include a series of sub-problems, the most important of which are:
•  a) determining the optimal feature space,
•  b) removing the noisy data that can be misleading,
•  c) avoid over-fitting on training data,
•  d) use the most appropriate distribution for the model,
•  e) make good use of any prior knowledge that may help you in making the correct choices,
•  f) perform meaningful segmentation when the related task requires to do so,
•  h) exploit the analysis context, etc.
All the above are crucial for initiating a learning process that aims at using the available training samples to estimate the parameters of a model representing a semantic concept.
Techniques used for Multimedia Analysis (3)
•  Many problems derive from the fact that it is very difficult to describe visual content effectively in
a form that can be handled by machines.
•  In general, feature extraction is a domain dependent problem and it is unlikely that a good
feature extractor for a specific domain will work as good for another domain.
•  Additionally, many problems derive from the fact that images tend to include more than one
objects in their content, which decreases the descriptiveness of the feature space and raises
the need for segmentation.
•  The segmentation of images into regions and the use of a separate set of features for each
region was introduced to address the aforementioned issue. Segmentation techniques seek to
detect groups of pixels sharing similar visual characteristics and identify in this way meaningful
objects (similar to the ones identified by human visual system).
Learning mechanism for Multimedia Analysis (2)
Un-supervised Learning
•  Unsupervised learning is a class of problems in which one seeks to determine how the data are organized. It is distinguished from supervised learning in that the learner is given only unlabeled examples. e.g. Clustering algorithms
Strongly-Supervised Learning
•  In strongly-supervised learning there is prior knowledge about the labels of the training samples and there is one-to-one relation between a sample and its label
•  The aim of strongly-supervised learning is to generate a global model that maps input objects to the desired outputs and generalize from the presented data to unseen situations in a “reasonable” way.
Semi-supervised Learning
•  Semi-supervised learning algorithms try to exploit unlabeled data, which are usually of low cost and can be obtained in high quantities, in conjunction with some supervision information. In this case, only a
small portion of the data is labeled and the algorithm aims at propagating the labels to the unlabeled data.
Weakly-Supervised Learning
•  By weakly-supervised we refer to the process of learning using weakly labeled data (i.e., samples labeled as containing the semantic concept of interest, but without indication of which segments/parts of
the sample are observations of that concept. In this case, the basic idea is to introduce a set of latent variables that encode hidden states of the world, where each state induces a joint distribution on the
space of semantic labels and image visual features. New images are annotated by maximizing the joint density of semantic labels, given the visual features of the new image.
Annotation Cost for Learning
•  Object detection schemes always employ some form of supervision as it is practically impossible to
detect and recognize an object without using any semantic information during training.
•  Semantic labels may be provided at different levels of granularity (global or region level) and
preciseness (one-to-one or many- to-many relation between objects and labels), imposing different
requirements on the effort required to generate them.
•  There is a clear distinction between the strong and accurate annotations that are usually generated
manually and constitute a laborious and time consuming task, and the weak and noisy annotations
that are usually generated by web users for their personal interest and can be obtained in large
quantities from the Web or collaborative tagging environments like flickr5.
•  The goal is to highlight the tradeoff between the annotation cost for preparing the necessary training
samples and the quality of the resulting models.
Pros & Cons for the different types of annotation
Annotation
Type
Automated
Annotation
Scaling
Capability
Training
Efficiency
Learning
Mechanism
Example
Region-level
(manual)
Poor Poor Excellent strongly-
supervised
Global-level
(manual)
Fair Fair Good weakly-
supervised
Global-level
(automatically
via Search
Engines)
Excellent Excellent Poor weakly-
supervised
Global-level
(automatically
via Social
Networks)
Excellent Excellent Fair weakly-
supervised
when further training is not resulting in better generalization.
3.3.3 Semi-supervised Learning
Semi-supervised learning algorithms try to exploit unlabeled data
ally of low cost and can be obtained in high quantities, in conjun
supervision information. In this case, only a small portion of the d
the algorithm aims at propagating the labels to the unlabeled data.
about using unlabeled data when learning a classification model is
self-learning, the classification model is initially trained using only
and at each step a part of the unlabeled data is labeled according to
current model. Then, a new classification model is trained using bo
well as the data that were labeled as positive from the previous ste
Another category of semi-supervised learning algorithms is bas
assumption, according to which the points that are in the same
Fig. 2. An image depicting the object sea that is manually annotated a
Social Media for Training Object Detectors
Machine learning algorithms fail in two main categories in terms of the annotation granularity:
•  The algorithms that are designed to learn from strongly annotated samples (i.e., samples in which the exact location of
an object within an image is known). The goal in this case is to learn a mapping from visual features fi to semantic
labels ci given a training set made of pairs ( fi , ci ). Such samples are very expensive to obtain.
•  The algorithms that learn from weakly annotated samples (i.e., samples in which it is known that an object is depicted in
the image, but its location is unknown). The goal in this case is to estimate the joint probability distribution between the
visual features fi and the semantic labels ci given a training set made of pairs between sets {(f1,..., fn),(c1,...,cm)}.
Weakly annotated samples can be found in large quantities especially from sources related to social networks.
This work aims at combining the advantages of both strongly supervised (learn model parameters
more efficiently) and weakly supervised (learn from samples obtained at low cost) methods, by
allowing the strongly supervised methods to learn object detection models from training samples
that are found in collaborative tagging environments.
Problem Formulation
Drawing from a large pool of weakly annotated images, the goal is to benefit from
the knowledge that can be extracted from social tagging systems, in order to
automatically transform some of the weakly annotated images into strongly
annotated ones.
Framework Description
To-Do
References
•  Flickr
•  Google Images
•  New Directions in Web Data Management, Vakali & Jain

More Related Content

PDF
Learner Ontological Model for Intelligent Virtual Collaborative Learning Envi...
PDF
Harvesting Intelligence from User Interactions
PPTX
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
PDF
The concept and architecture of learning cell
PDF
Following the user’s interests in mobile context aware recommender systems
PDF
iaetsd Adaptive privacy policy prediction for user uploaded images on
PDF
Costomization of recommendation system using collaborative filtering algorith...
PDF
Designing a Survey Study to Measure the Diversity of Digital Learners
Learner Ontological Model for Intelligent Virtual Collaborative Learning Envi...
Harvesting Intelligence from User Interactions
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
The concept and architecture of learning cell
Following the user’s interests in mobile context aware recommender systems
iaetsd Adaptive privacy policy prediction for user uploaded images on
Costomization of recommendation system using collaborative filtering algorith...
Designing a Survey Study to Measure the Diversity of Digital Learners

What's hot (18)

PPT
LAK13: Visualizing Social Learning Ties by Type and Topic: Rationale and Co...
PDF
Design and Implementation of Efficient Search Methodology for Content-Based R...
PDF
A Novel Frame Work System Used In Mobile with Cloud Based Environment
DOC
Abraham
PDF
Socially Shared Images with Automated Annotation Process by Using Improved Us...
PDF
Learning Process Interaction Aided by an Adapter Agent
PDF
S180304116124
PDF
Collaborative Learning of Organisational Knolwedge
PDF
FIND MY VENUE: Content & Review Based Location Recommendation System
PPTX
Podobnostní hledání v netextových datech (Pavel Zezula)
PDF
DYNAMIC AND REALTIME MODELLING OF UBIQUITOUS INTERACTION
PDF
Preso on social network analysis for rtp analytics unconference
PDF
Recommender Systems
PPTX
Student privacy self-management: Implications for learning analytics
PDF
Visually Exploring Social Participation in Encyclopedia of Life
PDF
PDF
Stabilization of Black Cotton Soil with Red Mud and Formulation of Linear Reg...
PDF
The quasimoderating effect of perceived affective quality on an extending Tec...
LAK13: Visualizing Social Learning Ties by Type and Topic: Rationale and Co...
Design and Implementation of Efficient Search Methodology for Content-Based R...
A Novel Frame Work System Used In Mobile with Cloud Based Environment
Abraham
Socially Shared Images with Automated Annotation Process by Using Improved Us...
Learning Process Interaction Aided by an Adapter Agent
S180304116124
Collaborative Learning of Organisational Knolwedge
FIND MY VENUE: Content & Review Based Location Recommendation System
Podobnostní hledání v netextových datech (Pavel Zezula)
DYNAMIC AND REALTIME MODELLING OF UBIQUITOUS INTERACTION
Preso on social network analysis for rtp analytics unconference
Recommender Systems
Student privacy self-management: Implications for learning analytics
Visually Exploring Social Participation in Encyclopedia of Life
Stabilization of Black Cotton Soil with Red Mud and Formulation of Linear Reg...
The quasimoderating effect of perceived affective quality on an extending Tec...
Ad

Viewers also liked (14)

DOCX
Chainsaw Training in Papua New
PPTX
Service operation management 7
PPTX
How to Become an Instant Authority in Your Industry
PPTX
Perekonomian Indonesia 2
PPTX
Perekonomian Indonesia 1
PPTX
Bab 3 mengumpulkan informasi dan memindai lingkungan
PPTX
Service operation management 4
PPTX
Bab 18 mengelola komunikasi massa
PPTX
Bab 5 menciptakan nilai, kepuasan dan loyalitas pelanggan
PPTX
Bab 10 membentuk positioning merek
PPTX
Alat bantu mengampu mata kuliah : Manajemen Perubahan - 6
PPTX
Alat bantu mengampu mata kuliah : Manajemen Perubahan - 2
PPTX
Bab 12 menentukan strategi produk
PPTX
Bab 17 merancang dan mengelola komunikasi pemasaran terintegrasi
Chainsaw Training in Papua New
Service operation management 7
How to Become an Instant Authority in Your Industry
Perekonomian Indonesia 2
Perekonomian Indonesia 1
Bab 3 mengumpulkan informasi dan memindai lingkungan
Service operation management 4
Bab 18 mengelola komunikasi massa
Bab 5 menciptakan nilai, kepuasan dan loyalitas pelanggan
Bab 10 membentuk positioning merek
Alat bantu mengampu mata kuliah : Manajemen Perubahan - 6
Alat bantu mengampu mata kuliah : Manajemen Perubahan - 2
Bab 12 menentukan strategi produk
Bab 17 merancang dan mengelola komunikasi pemasaran terintegrasi
Ad

Similar to Leveraging social media for training object detectors (20)

PDF
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
PPTX
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
PPTX
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
PPT
Artificial Intelligence and the Internet
PPT
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
PDF
Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)
PPT
Big-Data Analytics for Media Management
PDF
Multimedia Information Retrieval
PDF
Data Research Vision
PDF
Multimedia Semantics - SSMS 2010
PDF
A scenario based approach for dealing with
PDF
Adaptive Search Based On User Tags in Social Networking
PDF
A Clustering Based Approach for knowledge discovery on web.
PDF
Rae
PDF
The state of the art in integrating machine learning into visual analytics
PDF
Image processing project list for java and dotnet
PPTX
Fcv rep darrell
PPTX
TechnicalBackgroundOverview
PPTX
CHAPTER -12 it.pptx
PDF
Top 10 Download Article in Computer Aided Technologies: October 2021
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Artificial Intelligence and the Internet
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)
Big-Data Analytics for Media Management
Multimedia Information Retrieval
Data Research Vision
Multimedia Semantics - SSMS 2010
A scenario based approach for dealing with
Adaptive Search Based On User Tags in Social Networking
A Clustering Based Approach for knowledge discovery on web.
Rae
The state of the art in integrating machine learning into visual analytics
Image processing project list for java and dotnet
Fcv rep darrell
TechnicalBackgroundOverview
CHAPTER -12 it.pptx
Top 10 Download Article in Computer Aided Technologies: October 2021

Recently uploaded (20)

PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Microsoft Office 365 Crack Download Free
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Types of Token_ From Utility to Security.pdf
PPTX
"Secure File Sharing Solutions on AWS".pptx
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Website Design Services for Small Businesses.pdf
PPTX
Computer Software - Technology and Livelihood Education
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Salesforce Agentforce AI Implementation.pdf
Monitoring Stack: Grafana, Loki & Promtail
Computer Software and OS of computer science of grade 11.pptx
Microsoft Office 365 Crack Download Free
Oracle Fusion HCM Cloud Demo for Beginners
Designing Intelligence for the Shop Floor.pdf
Wondershare Recoverit Full Crack New Version (Latest 2025)
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Types of Token_ From Utility to Security.pdf
"Secure File Sharing Solutions on AWS".pptx
Why Generative AI is the Future of Content, Code & Creativity?
Website Design Services for Small Businesses.pdf
Computer Software - Technology and Livelihood Education
DNT Brochure 2025 – ISV Solutions @ D365
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Top 10 Software Development Trends to Watch in 2025 🚀.pdf

Leveraging social media for training object detectors

  • 1. Enhancing Object Detectors Using the Collective Intelligence of Social Media Manish Kumar
  • 2. Summary of the presentation •  Recently, we have been witnessing the rapid growth of Social Media that emerged as the result of users’ willingness to communicate, socialize, collaborate and share content. •  The outcome of this massive activity was the generation of a tremendous volume of user contributed data that have been made available on the Web, usually along with an indication of their meaning (i.e., tags). •  This has motivated the research objective of investigating whether the Collective Intelligence that emerges from the users’ contributions inside a Web 2.0 application, can be used to remove the need for dedicated human supervision during the process of learning. •  In this presentation, I deal with a very demanding learning problem in computer vision that consists of detecting and localizing an object within the image content.
  • 3. Background - 1 •  The recent advances of Web technologies have effectively turned ordinary people into active members of the Web, that generate, share, contribute and exchange various types of information. •  Based on this huge repository of content, various services have evolved, ranging from the field of eCommerce, to emergency response and consumer collective applications such as realtravel.com. •  The intelligence provided by single users organized in communities, takes a radical new shape in the context of Web 2.0, that of Collective Intelligence.
  • 4. Background - 2 •  The MIT Center for Collective Intelligence frames the research question as “How can people and computers be connected so that-collectively-they act more intelligently than any individuals, groups, or computers have ever done before?”. •  In this presentation, I try to investigate whether the Collective Intelligence derived from the user contributed content can be used to guide a learning process that will teach the machine how to recognize objects from visual content, the way a human does.
  • 5. Learning and Web 2.0 Multimedia (1) •  If we wish to construct a visual system that is able to scale on an arbitrary large number of concepts, effortless learning is crucial. To solve this issue, we have to address these questions: 1.  Can a computer program learn how to recognize semantic concepts from images? 2.  What is the process of learning? 3.  What is the mechanism that allows humans to initially require many examples to learn, as performed by little babies, and after they have learned how to learn, they can learn from just a few examples? 4.  Most importantly what is the role of the teacher in this process and what is the minimum amount of supervision that is absolutely necessary for facilitating efficient learning?
  • 6. Learning and Web 2.0 Multimedia (2) 1.  Annotation-based learning model a.  use labels provided by human annotators b.  the amount of human effort that is required for annotation increases linearly c.  Social Tagging Systems (STS) is the main driving factor 2.  Search-based model a.  use models automatically obtained from the Web b.  classification performance decreases from search-based methods
  • 7. Social Tagging System - STS (1) •  An STS is a web-based application, where users, either as individuals or more commonly as members of a community (i.e., social networks), assign labels (i.e., arbitrary textual descriptions) to digital resources. Their motivation for tagging is information organization and sharing. •  Social tagging systems tend to form rich knowledge repositories that enable the extraction of patterns reflecting the way content semantics is perceived by the web users. •  The tag proportions each resource receives crystallizes after about 100 annotations attributing this behavior to the users’ common background and their tendency for imitation on other users’ tagging habits.
  • 8. Social Tagging System (2) •  Limitations of STS: 1.  Users are prone to make mistakes and they often suggest invalid metadata (tag spamming). 2.  The lack of (hierarchical) structure of information results in tag ambiguity (a tag may have many senses), tag synonymy (two different tags may have the same meaning) and granularity variation (users do not use the same description level, when they refer to a concept). •  The correlations between the tag and visual information space that are established when the users suggest tags for the uploaded visual content, are mostly treated as complementary sources of information that both contribute to the semantic description of the resources.
  • 9. Techniques used for Multimedia Analysis (1) •  The very first attempts for image retrieval were based on keyword search applied either on the associated annotations (assuming that annotations existed) or on the images’ file names. But they are barely as descriptive as the multimedia content itself. •  To overcome these limitations, the use of the image visual characteristics has been proposed. In this case, the visual content is utilized by extracting a set of visual features from each image or image region. By comparing the visual features an algorithm can decide whether the two images/regions represent the same semantic concept. Then, image retrieval is performed by comparing the visual features of an example image/region that is associated with a semantic concept by the user, to the visual features of all images in a given collection.
  • 10. Techniques used for Multimedia Analysis (2) Currently pattern classification has been brought to the core of most image analysis techniques in order to render a kind of meaning on visual patterns. A typical pattern classification problem can be consider to include a series of sub-problems, the most important of which are: •  a) determining the optimal feature space, •  b) removing the noisy data that can be misleading, •  c) avoid over-fitting on training data, •  d) use the most appropriate distribution for the model, •  e) make good use of any prior knowledge that may help you in making the correct choices, •  f) perform meaningful segmentation when the related task requires to do so, •  h) exploit the analysis context, etc. All the above are crucial for initiating a learning process that aims at using the available training samples to estimate the parameters of a model representing a semantic concept.
  • 11. Techniques used for Multimedia Analysis (3) •  Many problems derive from the fact that it is very difficult to describe visual content effectively in a form that can be handled by machines. •  In general, feature extraction is a domain dependent problem and it is unlikely that a good feature extractor for a specific domain will work as good for another domain. •  Additionally, many problems derive from the fact that images tend to include more than one objects in their content, which decreases the descriptiveness of the feature space and raises the need for segmentation. •  The segmentation of images into regions and the use of a separate set of features for each region was introduced to address the aforementioned issue. Segmentation techniques seek to detect groups of pixels sharing similar visual characteristics and identify in this way meaningful objects (similar to the ones identified by human visual system).
  • 12. Learning mechanism for Multimedia Analysis (2) Un-supervised Learning •  Unsupervised learning is a class of problems in which one seeks to determine how the data are organized. It is distinguished from supervised learning in that the learner is given only unlabeled examples. e.g. Clustering algorithms Strongly-Supervised Learning •  In strongly-supervised learning there is prior knowledge about the labels of the training samples and there is one-to-one relation between a sample and its label •  The aim of strongly-supervised learning is to generate a global model that maps input objects to the desired outputs and generalize from the presented data to unseen situations in a “reasonable” way. Semi-supervised Learning •  Semi-supervised learning algorithms try to exploit unlabeled data, which are usually of low cost and can be obtained in high quantities, in conjunction with some supervision information. In this case, only a small portion of the data is labeled and the algorithm aims at propagating the labels to the unlabeled data. Weakly-Supervised Learning •  By weakly-supervised we refer to the process of learning using weakly labeled data (i.e., samples labeled as containing the semantic concept of interest, but without indication of which segments/parts of the sample are observations of that concept. In this case, the basic idea is to introduce a set of latent variables that encode hidden states of the world, where each state induces a joint distribution on the space of semantic labels and image visual features. New images are annotated by maximizing the joint density of semantic labels, given the visual features of the new image.
  • 13. Annotation Cost for Learning •  Object detection schemes always employ some form of supervision as it is practically impossible to detect and recognize an object without using any semantic information during training. •  Semantic labels may be provided at different levels of granularity (global or region level) and preciseness (one-to-one or many- to-many relation between objects and labels), imposing different requirements on the effort required to generate them. •  There is a clear distinction between the strong and accurate annotations that are usually generated manually and constitute a laborious and time consuming task, and the weak and noisy annotations that are usually generated by web users for their personal interest and can be obtained in large quantities from the Web or collaborative tagging environments like flickr5. •  The goal is to highlight the tradeoff between the annotation cost for preparing the necessary training samples and the quality of the resulting models.
  • 14. Pros & Cons for the different types of annotation Annotation Type Automated Annotation Scaling Capability Training Efficiency Learning Mechanism Example Region-level (manual) Poor Poor Excellent strongly- supervised Global-level (manual) Fair Fair Good weakly- supervised Global-level (automatically via Search Engines) Excellent Excellent Poor weakly- supervised Global-level (automatically via Social Networks) Excellent Excellent Fair weakly- supervised when further training is not resulting in better generalization. 3.3.3 Semi-supervised Learning Semi-supervised learning algorithms try to exploit unlabeled data ally of low cost and can be obtained in high quantities, in conjun supervision information. In this case, only a small portion of the d the algorithm aims at propagating the labels to the unlabeled data. about using unlabeled data when learning a classification model is self-learning, the classification model is initially trained using only and at each step a part of the unlabeled data is labeled according to current model. Then, a new classification model is trained using bo well as the data that were labeled as positive from the previous ste Another category of semi-supervised learning algorithms is bas assumption, according to which the points that are in the same Fig. 2. An image depicting the object sea that is manually annotated a
  • 15. Social Media for Training Object Detectors Machine learning algorithms fail in two main categories in terms of the annotation granularity: •  The algorithms that are designed to learn from strongly annotated samples (i.e., samples in which the exact location of an object within an image is known). The goal in this case is to learn a mapping from visual features fi to semantic labels ci given a training set made of pairs ( fi , ci ). Such samples are very expensive to obtain. •  The algorithms that learn from weakly annotated samples (i.e., samples in which it is known that an object is depicted in the image, but its location is unknown). The goal in this case is to estimate the joint probability distribution between the visual features fi and the semantic labels ci given a training set made of pairs between sets {(f1,..., fn),(c1,...,cm)}. Weakly annotated samples can be found in large quantities especially from sources related to social networks. This work aims at combining the advantages of both strongly supervised (learn model parameters more efficiently) and weakly supervised (learn from samples obtained at low cost) methods, by allowing the strongly supervised methods to learn object detection models from training samples that are found in collaborative tagging environments.
  • 16. Problem Formulation Drawing from a large pool of weakly annotated images, the goal is to benefit from the knowledge that can be extracted from social tagging systems, in order to automatically transform some of the weakly annotated images into strongly annotated ones.
  • 18. References •  Flickr •  Google Images •  New Directions in Web Data Management, Vakali & Jain