SlideShare a Scribd company logo
Semi Supervised Learning
• Qiang Yang
– Adapted from…
• Thanks
– Zhi-Hua Zhou
– http://cs.nju.edu.cn/pe
ople/zhouzh/
– zhouzh@nju.edu.cn
– LAMDA Group,
– National Laboratory for
Novel Software
Technology, Nanjing
University, China
Supervised learning is a typical machine learning setting,
where labeled examples are used as training examples
decision trees, neural networks,
support vector machines, etc.
trained
model
training
data
Name Rank Years Tenured
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
label
training
? =
yes
unseen data
(Jeff, Professor, 7, ?)
label
unknown
Supervised learningSupervised learning
Labeled vs. UnlabeledLabeled vs. Unlabeled
In many practical applications, unlabeled training
examples are readily available but labeled ones are fairly
expansive to obtain because labeling the unlabeled examples
requires human effort
class = “war”
(almost) infinite
number of web pages on
the Internet
?
Three main paradigms for Semi-supervisedThree main paradigms for Semi-supervised
Learning:Learning:
• Transductive learning:
Unlabeled examples are exactly the test examples
• Active learning:
•Assume that a user can continue to label data
•The learner actively selects some unlabeled
examples to query from an oracle (assume the learner
has some control over the input space)
• Multi-view Learning
•Unlabeled examples may be different from the
test examples
•Regularization (minimize error and maximize
smoothness)
•Multi-view Learning and Co-training
SSL: Why unlabeled data can be helpful?SSL: Why unlabeled data can be helpful?
Suppose the data is well-modeled by a mixture density:
Thus, the optimal classification rule for this model is the MAP rule:
where and θ = {θl }( ) ( )
1
L
l l
l
f x f xθ α θ
=
= ∑ 1
1
L
ll
α=
=∑
The class labels are viewed as random quantities and are assumed chosen
conditioned on the selected mixture component mi ∈ {1,2,…,L} and
possibly on the feature value, i.e. according to the probabilities P[ci|xi,mi]
( ) arg max P , Pi i i i ijk
S x c k m j x m j x=  = =   =    ∑
where
( )
( )
1
P
j i j
i i L
l i l
l
f x
m j x
f x
α θ
α θ
=
 =  = 
∑
unlabeled examples can be used
to help estimate this term
[D.J. Miller & H.S. Uyar, NIPS’96]
Transductive SVMTransductive SVM
Transductive SVM: Taking into account a particular test
set and trying to minimize misclassifications of just those
particular examples
Figure reprinted from [T. Joachims, ICML99]
Concretely, using
unlabeled examples
to help identify the
maximum margin
hyperplanes
Active learning: Getting more from queryActive learning: Getting more from query
The labels of the training examples are obtained by
querying the oracle. Thus, for the same number of queries,
more helpful information can be obtained by actively
selecting some unlabeled examples to query
Key: To select the unlabeled examples on which the
labeling will convey the most helpful information
for the learner
 Uncertainty sampling
Train a single learner and then query the unlabeled
instances on which the learner is the least confident
[Lewis & Gale, SIGIR’94]
 Committee-based sampling
Generate a committee of multiple learners and select the
unlabeled examples on which the committee members
disagree the most [Abe & Mamitsuka, ICML’98; Seung et al.,
COLT’92]
Active Learning: Representative approachesActive Learning: Representative approaches
To retrieve images from a (usually large) image database
according to user interest
very useful in digital library, digital photo album, etc.
Active Learning Application: ImageActive Learning Application: Image
retrievalretrieval
Where are my photos
taken at Guilin?
DatabaseText
Interface
Text
Interface
Text-based Retrieval Engine
− Every image is associated with a text annotation
− User poses a keyword
− The system retrieves images by matching the keyword
with annotations
Active Learning: Text-based imageActive Learning: Text-based image
retrievalretrieval
“tiger”
query
tiger lily
white tiger
In some applications, there are two sufficient and redundant views,
i.e. two attribute sets each of which is sufficient for learning and
conditionally independent to the other given the class label
e.g. two views for web page classification: 1) the text appearing on the
page itself, and 2) the anchor text attached to hyperlinks pointing to this
page, from other pages
Co-trainingCo-training
learner1 learner2
X1 view X2 view
labeled training examples
unlabeled training examples
labeled
unlabeled examples
labeled
unlabeled examples
[A. Blum & T. Mitchell, COLT98]
Co-training (con’t)Co-training (con’t)
Co-training (con’t)Co-training (con’t)
 Theoretical analysis [Blum & Mitchell, COLT’98; Dasgupta,
NIPS’01; Balcan et al., NIPS’04; etc.]
 Experimental studies [Nigam & Ghani, CIKM’00]
 New algorithms
• Co-training without two views [Goldman & Zhou, ICML’00;
Zhou & Li, TKDE’05]
• Semi-supervised regression [Zhou & Li, IJCAI’05]
 Applications
• Statistical parsing [Sarkar, NAACL01; Steedman et al.,
EACL03; R. Hwa et al., ICML03w]
• Noun phrase identification [Pierce & Cardie, EMNLP01]
• Image retrieval [Zhou et al., ECML’04; Zhou et al., TOIS06]
Multi-view Learning and Co-
training
• Multi-view learning describes the setting of
learning from data where observations are
represented by multiple independent sets of
features.
An example of two views:
• Features can be split into two sets:
– The instance space:
– Each instance:
21 XXX ×=
),( 21 xxx =
Inductive vs.Transductive
• Transductive: Produce label only for the available
unlabeled data.
– The output of the method is not a classifier.
• Inductive: Not only produce label for unlabeled
data, but also produce a classifier.
An Example of two views
• Web-page classification: e.g.,
find homepages of faculty members.
– Page text: words occurring on that page:
e.g., “research interest”, “teaching”
– Hyperlink text: words occurring in hyperlinks
that point to that page:
e.g., “my advisor”
Another Example
X1 : job title
X2: job description
Classifying Jobs for FlipDog
Two Views
• : the set of target function over .
• : the set of target functions over .
• : the set of target function over .
• Instead of learning from , multi-view
learning aims to learn a pair of functions
from , such that .
1X
21 XXX ×=
2X2C
1C
C
f C
),( 21 ff
21 CC × )()()( 2211 xfxfxf ==
Co-training
• Proposed by (Blum and Mitchell 1998)
Combine Multi-view learning & semi-supervised learning.
• Related work:
– (Yarowsky 1995)
– (Nigam and Ghani, 2000)
– (Goldman and Zhou, 2000)
– (Abney, 2002)
– (Sarkar, 2002)
– …
• Used in document classification, parsing, etc.
The Yarowsky Algorithm
Iteration: 0
+
-
A
Classifier
trained
by SL
Choose instances
labeled with high
confidence
Iteration: 1
+
-
Add them to the
pool of current
labeled training
data
……
(Yarowsky 1995)
Iteration: 2
+
-
Co-training
Assumption 1: compatibility
• The instance distribution is compatible with
the target function if for any
with non-zero probability, .
• Definition: compatibility of with :
),( 21 xxx =
D
),( 21 fff =
)()()( 2211 xfxfxf ==
 Each set of features is sufficient for classification
0)]()(:),[(Pr1 221121 >≠−= xfxfxxp D
f D
Co-training
Assumption 2: conditional
independence
• Definition: A pair of views satisfy view
independence when:
• A classification problem instance satisfies view
independence when all pairs satisfy view
independence.
),( 21 xx
)|(),|(
)|(),|(
221122
112211
yYxXPyYxXxXP
yYxXPyYxXxXP
======
======
),( 21 xx
Co-training Algorithm
Co-Training
• Instances contain two sufficient sets of features
– i.e. an instance is x=(x1,x2)
– Each set of features is called a View
• Two views are independent given the label:
• Two views are consistent:
x
x1
x2
(Blum and Mitchell 1998)
Co-Training
Iteration: t
+
-
Iteration: t+1
+
-
……
C1: A
Classifier
trained
on view 1
C2: A
Classifier
trained
on view 2
Allow C1 to label
Some instances
Allow C2 to label
Some instances
Add self-labeled
instances to the pool
of training data
Agreement Maximization
• A side effect of the Co-Training: Agreement between
two views.
• Is it possible to pose agreement as the explicit goal?
– Yes. The resulting algorithm: Agreement Boost
(Leskes 2005)
What if Co-training Assumption
Not Perfectly Satisfied?
• Idea: Want classifiers that produce a maximally
consistent labeling of the data
• If learning is an optimization problem, what
function should we optimize?
-
+
+
+
Other Related Works
• Multi-view clustering (Bickel & Scheffer 2004)
Modified the co-training algorithm by replacing the class
variable (class label) with a mixture coefficient to obtain
a multi-view clustering algorithm.
• Manifold co-regularization (Sindhwani et al., 2005)
Extended Manifold regularization to multi-view learning.
• Active multi-view learning (Muslea 2002)
Combine active learning and multi-view learning.
• More related works can be find in the workshop on Multi-
view learning in ICML 2005:
http://www-ai.cs.uni-dortmund.de/MULTIVIEW2005/index.html
Reference
• A. Blum and T. Mitchell, 1998. “Combining Labeled and Unlabeled Data with
Co-Training,” In Proceedings of COLT 1998.
• D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised
methods. In Proceedings of ACL 1995.
• Nigam, K., & Ghani, R, 2000. Analyzing the effectiveness and applicability of
co-training. In Proceedings of CIKM 2000.
• Steven Abney, 2002. Bootstrapping. In Proceedings of ACL, 2002.
• Ulf Brefeld and Tobias Scheer. Co-EM support vector learning. In
Proceedings ICML, 2004.
• Steen Bickel and Tobias Scheer. Multi-view clustering. In Proceedings of
ICDM, 2004.
• Sindhwani, V.; Niyogi, P.; and Belkin, M. 2005. A Co-Regularization
Approach to Semi-supervised Learning with Multiple Views. In Workshop on
Learning with Multiple Views at ICML 2005.
• Ion Muslea. Active learning with multiple views. PhD thesis, University of
Southern California, 2002.

More Related Content

PDF
An introduction to Machine Learning
PPTX
Semi-Supervised Learning
PPTX
Machine Learning
PPTX
Python Scipy Numpy
PPTX
Deep Learning With Neural Networks
PPTX
Machine learning ppt.
PPT
introduction to data mining tutorial
PPTX
Overfitting & Underfitting
An introduction to Machine Learning
Semi-Supervised Learning
Machine Learning
Python Scipy Numpy
Deep Learning With Neural Networks
Machine learning ppt.
introduction to data mining tutorial
Overfitting & Underfitting

What's hot (20)

PDF
Naive Bayes
PDF
Machine Learning: Introduction to Neural Networks
PPTX
Deep neural networks
PDF
Applications in Machine Learning
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PPTX
Feature Selection in Machine Learning
PPTX
Machine learning
PPTX
Deep learning.pptx
PPTX
Radial basis function network ppt bySheetal,Samreen and Dhanashri
PDF
Understanding Bagging and Boosting
PPTX
Semi supervised learning machine learning made simple
PPTX
Deep Learning Explained
PPT
2.5 backpropagation
PDF
Security and Privacy of Machine Learning
PDF
Introduction to Neural Networks
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PPTX
Introduction to Machine Learning
PPT
Machine Learning presentation.
PPTX
Machine learning with ADA Boost
PPTX
Naive Bayes Classifier.pptx
Naive Bayes
Machine Learning: Introduction to Neural Networks
Deep neural networks
Applications in Machine Learning
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Feature Selection in Machine Learning
Machine learning
Deep learning.pptx
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Understanding Bagging and Boosting
Semi supervised learning machine learning made simple
Deep Learning Explained
2.5 backpropagation
Security and Privacy of Machine Learning
Introduction to Neural Networks
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Introduction to Machine Learning
Machine Learning presentation.
Machine learning with ADA Boost
Naive Bayes Classifier.pptx
Ad

Viewers also liked (20)

PPTX
Semi supervised learning
PDF
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
PDF
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
PDF
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
PDF
Label propagation - Semisupervised Learning with Applications to NLP
PDF
Machine Learning with Big Data using Apache Spark
PPT
Face recognition ppt
PPTX
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
PPTX
Recent Advances in Crop Classification
PDF
vts_7560_10802
PPT
Data.Mining.C.6(II).classification and prediction
PDF
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
PPT
PPT file
PPTX
Improving Healthcare Operations Using Process Data Mining
PDF
On Semi-Supervised Learning and Beyond
PDF
Supervised Approach to Extract Sentiments from Unstructured Text
PPTX
Machine Learning techniques
PDF
Some Take-Home Message about Machine Learning
PDF
07 history of cv vision paradigms - system - algorithms - applications - eva...
PDF
Power of Code: What you don’t know about what you know
Semi supervised learning
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
Label propagation - Semisupervised Learning with Applications to NLP
Machine Learning with Big Data using Apache Spark
Face recognition ppt
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
Recent Advances in Crop Classification
vts_7560_10802
Data.Mining.C.6(II).classification and prediction
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
PPT file
Improving Healthcare Operations Using Process Data Mining
On Semi-Supervised Learning and Beyond
Supervised Approach to Extract Sentiments from Unstructured Text
Machine Learning techniques
Some Take-Home Message about Machine Learning
07 history of cv vision paradigms - system - algorithms - applications - eva...
Power of Code: What you don’t know about what you know
Ad

Similar to Semi-supervised Learning (20)

PPT
Semi-supervised Learning
PPT
Semi-supervised Learning
PDF
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
PPTX
Semi-supervised Learning Survey - 20 years of evaluation
PDF
Neural Semi-supervised Learning under Domain Shift
PPTX
in5490-classification (1).pptx
PPTX
big data analytics.pptx
PDF
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
PDF
Machine Learning
PDF
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
PPT
Lecture 2
PDF
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
PDF
Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
PPT
slides
PPT
slides
PPTX
Introduction to Machine Learning
PPT
Text categorization
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
DOC
Improving Classifier Accuracy using Unlabeled Data..doc
Semi-supervised Learning
Semi-supervised Learning
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
Semi-supervised Learning Survey - 20 years of evaluation
Neural Semi-supervised Learning under Domain Shift
in5490-classification (1).pptx
big data analytics.pptx
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
Machine Learning
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
Lecture 2
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
slides
slides
Introduction to Machine Learning
Text categorization
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Improving Classifier Accuracy using Unlabeled Data..doc

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Semi-supervised Learning

  • 1. Semi Supervised Learning • Qiang Yang – Adapted from… • Thanks – Zhi-Hua Zhou – http://cs.nju.edu.cn/pe ople/zhouzh/ – zhouzh@nju.edu.cn – LAMDA Group, – National Laboratory for Novel Software Technology, Nanjing University, China
  • 2. Supervised learning is a typical machine learning setting, where labeled examples are used as training examples decision trees, neural networks, support vector machines, etc. trained model training data Name Rank Years Tenured Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no label training ? = yes unseen data (Jeff, Professor, 7, ?) label unknown Supervised learningSupervised learning
  • 3. Labeled vs. UnlabeledLabeled vs. Unlabeled In many practical applications, unlabeled training examples are readily available but labeled ones are fairly expansive to obtain because labeling the unlabeled examples requires human effort class = “war” (almost) infinite number of web pages on the Internet ?
  • 4. Three main paradigms for Semi-supervisedThree main paradigms for Semi-supervised Learning:Learning: • Transductive learning: Unlabeled examples are exactly the test examples • Active learning: •Assume that a user can continue to label data •The learner actively selects some unlabeled examples to query from an oracle (assume the learner has some control over the input space) • Multi-view Learning •Unlabeled examples may be different from the test examples •Regularization (minimize error and maximize smoothness) •Multi-view Learning and Co-training
  • 5. SSL: Why unlabeled data can be helpful?SSL: Why unlabeled data can be helpful? Suppose the data is well-modeled by a mixture density: Thus, the optimal classification rule for this model is the MAP rule: where and θ = {θl }( ) ( ) 1 L l l l f x f xθ α θ = = ∑ 1 1 L ll α= =∑ The class labels are viewed as random quantities and are assumed chosen conditioned on the selected mixture component mi ∈ {1,2,…,L} and possibly on the feature value, i.e. according to the probabilities P[ci|xi,mi] ( ) arg max P , Pi i i i ijk S x c k m j x m j x=  = =   =    ∑ where ( ) ( ) 1 P j i j i i L l i l l f x m j x f x α θ α θ =  =  =  ∑ unlabeled examples can be used to help estimate this term [D.J. Miller & H.S. Uyar, NIPS’96]
  • 6. Transductive SVMTransductive SVM Transductive SVM: Taking into account a particular test set and trying to minimize misclassifications of just those particular examples Figure reprinted from [T. Joachims, ICML99] Concretely, using unlabeled examples to help identify the maximum margin hyperplanes
  • 7. Active learning: Getting more from queryActive learning: Getting more from query The labels of the training examples are obtained by querying the oracle. Thus, for the same number of queries, more helpful information can be obtained by actively selecting some unlabeled examples to query Key: To select the unlabeled examples on which the labeling will convey the most helpful information for the learner
  • 8.  Uncertainty sampling Train a single learner and then query the unlabeled instances on which the learner is the least confident [Lewis & Gale, SIGIR’94]  Committee-based sampling Generate a committee of multiple learners and select the unlabeled examples on which the committee members disagree the most [Abe & Mamitsuka, ICML’98; Seung et al., COLT’92] Active Learning: Representative approachesActive Learning: Representative approaches
  • 9. To retrieve images from a (usually large) image database according to user interest very useful in digital library, digital photo album, etc. Active Learning Application: ImageActive Learning Application: Image retrievalretrieval Where are my photos taken at Guilin?
  • 10. DatabaseText Interface Text Interface Text-based Retrieval Engine − Every image is associated with a text annotation − User poses a keyword − The system retrieves images by matching the keyword with annotations Active Learning: Text-based imageActive Learning: Text-based image retrievalretrieval “tiger” query tiger lily white tiger
  • 11. In some applications, there are two sufficient and redundant views, i.e. two attribute sets each of which is sufficient for learning and conditionally independent to the other given the class label e.g. two views for web page classification: 1) the text appearing on the page itself, and 2) the anchor text attached to hyperlinks pointing to this page, from other pages Co-trainingCo-training
  • 12. learner1 learner2 X1 view X2 view labeled training examples unlabeled training examples labeled unlabeled examples labeled unlabeled examples [A. Blum & T. Mitchell, COLT98] Co-training (con’t)Co-training (con’t)
  • 13. Co-training (con’t)Co-training (con’t)  Theoretical analysis [Blum & Mitchell, COLT’98; Dasgupta, NIPS’01; Balcan et al., NIPS’04; etc.]  Experimental studies [Nigam & Ghani, CIKM’00]  New algorithms • Co-training without two views [Goldman & Zhou, ICML’00; Zhou & Li, TKDE’05] • Semi-supervised regression [Zhou & Li, IJCAI’05]  Applications • Statistical parsing [Sarkar, NAACL01; Steedman et al., EACL03; R. Hwa et al., ICML03w] • Noun phrase identification [Pierce & Cardie, EMNLP01] • Image retrieval [Zhou et al., ECML’04; Zhou et al., TOIS06]
  • 14. Multi-view Learning and Co- training • Multi-view learning describes the setting of learning from data where observations are represented by multiple independent sets of features. An example of two views: • Features can be split into two sets: – The instance space: – Each instance: 21 XXX ×= ),( 21 xxx =
  • 15. Inductive vs.Transductive • Transductive: Produce label only for the available unlabeled data. – The output of the method is not a classifier. • Inductive: Not only produce label for unlabeled data, but also produce a classifier.
  • 16. An Example of two views • Web-page classification: e.g., find homepages of faculty members. – Page text: words occurring on that page: e.g., “research interest”, “teaching” – Hyperlink text: words occurring in hyperlinks that point to that page: e.g., “my advisor”
  • 17. Another Example X1 : job title X2: job description Classifying Jobs for FlipDog
  • 18. Two Views • : the set of target function over . • : the set of target functions over . • : the set of target function over . • Instead of learning from , multi-view learning aims to learn a pair of functions from , such that . 1X 21 XXX ×= 2X2C 1C C f C ),( 21 ff 21 CC × )()()( 2211 xfxfxf ==
  • 19. Co-training • Proposed by (Blum and Mitchell 1998) Combine Multi-view learning & semi-supervised learning. • Related work: – (Yarowsky 1995) – (Nigam and Ghani, 2000) – (Goldman and Zhou, 2000) – (Abney, 2002) – (Sarkar, 2002) – … • Used in document classification, parsing, etc.
  • 20. The Yarowsky Algorithm Iteration: 0 + - A Classifier trained by SL Choose instances labeled with high confidence Iteration: 1 + - Add them to the pool of current labeled training data …… (Yarowsky 1995) Iteration: 2 + -
  • 21. Co-training Assumption 1: compatibility • The instance distribution is compatible with the target function if for any with non-zero probability, . • Definition: compatibility of with : ),( 21 xxx = D ),( 21 fff = )()()( 2211 xfxfxf ==  Each set of features is sufficient for classification 0)]()(:),[(Pr1 221121 >≠−= xfxfxxp D f D
  • 22. Co-training Assumption 2: conditional independence • Definition: A pair of views satisfy view independence when: • A classification problem instance satisfies view independence when all pairs satisfy view independence. ),( 21 xx )|(),|( )|(),|( 221122 112211 yYxXPyYxXxXP yYxXPyYxXxXP ====== ====== ),( 21 xx
  • 24. Co-Training • Instances contain two sufficient sets of features – i.e. an instance is x=(x1,x2) – Each set of features is called a View • Two views are independent given the label: • Two views are consistent: x x1 x2 (Blum and Mitchell 1998)
  • 25. Co-Training Iteration: t + - Iteration: t+1 + - …… C1: A Classifier trained on view 1 C2: A Classifier trained on view 2 Allow C1 to label Some instances Allow C2 to label Some instances Add self-labeled instances to the pool of training data
  • 26. Agreement Maximization • A side effect of the Co-Training: Agreement between two views. • Is it possible to pose agreement as the explicit goal? – Yes. The resulting algorithm: Agreement Boost (Leskes 2005)
  • 27. What if Co-training Assumption Not Perfectly Satisfied? • Idea: Want classifiers that produce a maximally consistent labeling of the data • If learning is an optimization problem, what function should we optimize? - + + +
  • 28. Other Related Works • Multi-view clustering (Bickel & Scheffer 2004) Modified the co-training algorithm by replacing the class variable (class label) with a mixture coefficient to obtain a multi-view clustering algorithm. • Manifold co-regularization (Sindhwani et al., 2005) Extended Manifold regularization to multi-view learning. • Active multi-view learning (Muslea 2002) Combine active learning and multi-view learning. • More related works can be find in the workshop on Multi- view learning in ICML 2005: http://www-ai.cs.uni-dortmund.de/MULTIVIEW2005/index.html
  • 29. Reference • A. Blum and T. Mitchell, 1998. “Combining Labeled and Unlabeled Data with Co-Training,” In Proceedings of COLT 1998. • D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of ACL 1995. • Nigam, K., & Ghani, R, 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of CIKM 2000. • Steven Abney, 2002. Bootstrapping. In Proceedings of ACL, 2002. • Ulf Brefeld and Tobias Scheer. Co-EM support vector learning. In Proceedings ICML, 2004. • Steen Bickel and Tobias Scheer. Multi-view clustering. In Proceedings of ICDM, 2004. • Sindhwani, V.; Niyogi, P.; and Belkin, M. 2005. A Co-Regularization Approach to Semi-supervised Learning with Multiple Views. In Workshop on Learning with Multiple Views at ICML 2005. • Ion Muslea. Active learning with multiple views. PhD thesis, University of Southern California, 2002.