Lecture1 xing fei-fei

Machine LearningMachine Learninggg
Introduction &Introduction &
Nonparametric ClassifiersNonparametric ClassifiersNonparametric ClassifiersNonparametric Classifiers
Eric XingEric Xing
Lecture 1, August 12, 2010
© Eric Xing @ CMU, 2006-2010
Reading:

Machine Learning
 Where does it come from:
Machine Learning
 http://www.cs.cmu.edu/~epxing/Class/10701/
 http://www.cs.cmu.edu/~epxing/Class/10708/
© Eric Xing @ CMU, 2006-2010

LogisticsLogistics
 Text book
 Chris Bishop, Pattern Recognition and Machine Learning (required)
 Tom Mitchell, Machine Learning
 David Mackay, Information Theory, Inference, and Learning Algorithms
 Daphnie Koller and Nir Friedman, Probabilistic Graphical Models
 Class resource
 http://bcmi sjtu edu cn/ds/ http://bcmi.sjtu.edu.cn/ds/
 Host:
 Lu Baoliang, Shanghai JiaoTong University
 Xue Xiangyang, Fudan University
 Instructors:
© Eric Xing @ CMU, 2006-2010
Instructors:

Local Hosts and
co instructorsco-instructors
© Eric Xing @ CMU, 2006-2010

LogisticsLogistics
 Class mailing listg
 dragonstar_machinelearning@googlegroups.com
 Home work Home work
 Exam
 Project
© Eric Xing @ CMU, 2006-2010

What is LearningWhat is Learning
Learning is about seeking a predictive and/or executable understanding of
natural/artificial subjects phenomena or activities from
Apoptosis + Medicine
natural/artificial subjects, phenomena, or activities from …
Grammatical rules
Manufacturing procedures
Inferenceg p
Natural laws
…
Inference
© Eric Xing @ CMU, 2006-2010

Machine LearningMachine Learning
© Eric Xing @ CMU, 2006-2010

Fetching a stapler from inside an
office the Stanford STAIR robotoffice --- the Stanford STAIR robot
© Eric Xing @ CMU, 2006-2010

What is Machine Learning?What is Machine Learning?
Machine Learning seeks to develop theories and computer systems for
 representing;
 classifying, clustering and recognizing;
i d t i t reasoning under uncertainty;
 predicting;
 and reacting to
 …
complex, real world data, based on the system's own experience with data,
and (hopefully) under a unified model or mathematical framework, that
 can be formally characterized and analyzed
 can take into account human prior knowledge
 can generalize and adapt across data and domains
© Eric Xing @ CMU, 2006-2010
 can operate automatically and autonomously
 and can be interpreted and perceived by human.

Where Machine Learning is being
used or can be useful?used or can be useful?
Speech recognitionSpeech recognition
Information retrievalInformation retrieval
Computer visionComputer vision
GamesGames
Robotic controlRobotic control
GamesGames
© Eric Xing @ CMU, 2006-2010 PlanningPlanning
EvolutionEvolution
PedigreePedigree

Natural language processing and
speech recognitionspeech recognition
 Now most pocket Speech Recognizers or Translators are running
on some sort of learning device --- the more you play/use them, the
smarter they become!
© Eric Xing @ CMU, 2006-2010

Object RecognitionObject Recognition
 Behind a security camera,
most likely there is a computer
that is learning and/or
checking!
© Eric Xing @ CMU, 2006-2010

Robotic Control IRobotic Control I
 The best helicopter pilot is now a computer!p p p
 it runs a program that learns how to fly and make acrobatic maneuvers by itself!
 no taped instructions, joysticks, or things like …
© Eric Xing @ CMU, 2006-2010
A. Ng 2005

Text MiningText Mining
 We want:
 Reading, digesting, and
categorizing a vast text
database is too much fordatabase is too much for
human!
© Eric Xing @ CMU, 2006-2010

Bioinformatics
g g g g ggg g ggg g g g gg g g g g g g gg g g gg g gg g gg g
cacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgc
ggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttgg
attaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgta
ctgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggca
cagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtc
actccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtca
tattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccata
catatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccg
Bioinformaticsg g g g g g g g g g g g g g g g g g g g g g g gg g g
tggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggc
aagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgt
tctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgtt
tacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactat
agtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagc
gagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagatt
gagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaataga
accaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgag
ttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtag
aaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattg
attgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtac
aagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttc
ttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctc
ttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtc
gtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacg
ctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatcctttaagaaagcccatgggtataactt
actgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcg
aggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtg
ataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataa
cccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaa
cgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaac
cacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgc
ggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttgg
attaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgta
ctgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggca
cagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtc
actccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtca
tattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccata
catatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccg
tggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggc
aagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgt
tctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgtt
Wh i h ?Wh i h ?tacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactat
agtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagc
gagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagatt
gagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaataga
accaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgag
ttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtag
aaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattg
attgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtac
Where is the gene?Where is the gene?
© Eric Xing @ CMU, 2006-2010
aagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttc
ttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctc
ttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtc
gtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacg
ctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctctgaaaacagttcatggtttaaaaatatcctttaagaaagcccatgggtataactt
actgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcg
aggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtg
ataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataa

Paradigms of Machine LearningParadigms of Machine Learning
 Supervised Learningp g
 Given , learn , s.t.
 Unsupervised Learning
 iiD YX ,  ii XY f:)f(     jjD YXnew

 Unsupervised Learning
 Given , learn , s.t.
R i f t L i
 iD X  ii XY f:)f(     jjD YXnew

 Reinforcement Learning
 Given  gametrace/realsimulator/rewards,,actions,envD
are :policy
 learn , s.t.
 Active Learning
rea
are


,:utility
,:policy
  321 aaa ,,gamerealnew,env 
© Eric Xing @ CMU, 2006-2010
 Active Learning
 Given , learn , s.t.)(G~ D  jD Ypolicy,),(G'all
)f(and)(G'~new
D

Elements of LearningElements of Learning
 Here are some important elements to consider before you start:
 Task: Task:
 Embedding? Classification? Clustering? Topic extraction? …
 Data and other info:
 Input and output (e.g., continuous, binary, counts, …)
S i d i d f bl d f thi ? Supervised or unsupervised, of a blend of everything?
 Prior knowledge? Bias?
 Models and paradigms:
 BN? MRF? Regression? SVM?
 Bayesian/Frequents ? Parametric/Nonparametric?
 Objective/Loss function:
 MLE? MCLE? Max margin?
 Log loss, hinge loss, square loss? … Log loss, hinge loss, square loss? …
 Tractability and exactness trade off:
 Exact inference? MCMC? Variational? Gradient? Greedy search?
 Online? Batch? Distributed?
E l ti
© Eric Xing @ CMU, 2006-2010
 Evaluation:
 Visualization? Human interpretability? Perperlexity? Predictive accuracy?
 It is better to consider one element at a time!

Theories of LearningTheories of Learning
For the learned F(; )
 Consistency (value, pattern, …)
Bi i Bias versus variance
 Sample complexity
 Learning rateg
 Convergence
 Error bound
 Confidence
 Stability

© Eric Xing @ CMU, 2006-2010
 …

ClassificationClassification
 Representing data:p g
 Hypothesis (classifier)
© Eric Xing @ CMU, 2006-2010

Decision-making as dividing a
high dimensional spacehigh-dimensional space
 Classification-specific Dist.: P(X|Y)p ( | )
);(
)|( 1




Xp
YXp
),;( 111  Xp
),;(
)|(
222
2




Xp
YXp
 Class prior (i.e., "weight"): P(Y)
© Eric Xing @ CMU, 2006-2010
p ( , g ) ( )

The Bayes RuleThe Bayes Rule
 What we have just did leads to the following generalj g g
expression:
)()|( YpYXP
)(
)()|(
)|(
XP
YpYXP
XYP 
This is Bayes Rule
© Eric Xing @ CMU, 2006-2010

The Bayes Decision Rule for
Minimum ErrorMinimum Error
 The a posteriori probability of a samplep p y p
)(
)|(
)|(
)(
)()|(
)|( Xq
iYXp
iYXp
Xp
iYPiYXp
XiYP i
i ii
ii






 

 Bayes Test:
 Likelihood Ratio:
)(X
 Discriminant function:
)(X
© Eric Xing @ CMU, 2006-2010
)(Xh

Example of Decision RulesExample of Decision Rules
 When each class is a normal …
 We can write the decision boundary analytically in some
© Eric Xing @ CMU, 2006-2010
 We can write the decision boundary analytically in some
cases … homework!!

Bayes ErrorBayes Error
 We must calculate the probability of errorp y
 the probability that a sample is assigned to the wrong class
 Given a datum X, what is the risk?
 The Bayes error (the expected risk): The Bayes error (the expected risk):
© Eric Xing @ CMU, 2006-2010

More on Bayes ErrorMore on Bayes Error
 Bayes error is the lower bound of probability of classification error
 Bayes classifier is the theoretically best classifier that minimize
probability of classification error
 Computing Bayes error is in general a very complex problem Why? Computing Bayes error is in general a very complex problem. Why?
 Density estimation:
 Integrating density function:
© Eric Xing @ CMU, 2006-2010
 Integrating density function:

Learning ClassifierLearning Classifier
 The decision rule:
 Learning strategies
 Generative Learning
 Discriminative Learning Discriminative Learning
 Instance-based Learning (Store all past experience in memory)
 A special case of nonparametric classifier
© Eric Xing @ CMU, 2006-2010

Supervised LearningSupervised Learning
 K-Nearest-Neighbor Classifier:
where the h(X) is represented by all the data, and by an algorithm
© Eric Xing @ CMU, 2006-2010

Recall: Vector Space
RepresentationRepresentation
 Each document is a vector, one
Doc 1 Doc 2 Doc 3
component for each term (= word).
Doc 1 Doc 2 Doc 3 ...
Word 1 3 0 0 ...
Word 2 0 8 1 ...
Word 3 12 1 10 ...
... 0 1 3 ...
0 0 0... 0 0 0 ...
 Normalize to unit length.
 High dimensional vector space:
© Eric Xing @ CMU, 2006-2010
 High-dimensional vector space:
 Terms are axes, 10,000+ dimensions, or even 100,000+
 Docs are vectors in this space

Test Document = ?Test Document = ?
Sports
Science
Arts
© Eric Xing @ CMU, 2006-2010

1-Nearest Neighbor (kNN)
classifierclassifier
Sports
Science
Arts
© Eric Xing @ CMU, 2006-2010

Sports
Science
Arts
© Eric Xing @ CMU, 2006-2010

K-Nearest Neighbor (kNN)
Voting kNN
Sports
Science
Arts
© Eric Xing @ CMU, 2006-2010

Classes in a Vector SpaceClasses in a Vector Space
Sports
Science
Arts
© Eric Xing @ CMU, 2006-2010

kNN Is Close to OptimalkNN Is Close to Optimal
 Cover and Hart 1967
 Asymptotically, the error rate of 1-nearest-neighbor
classification is less than twice the Bayes rate [error rate of
classifier knowing model that generated data]classifier knowing model that generated data]
 In particular, asymptotic error rate is 0 if Bayes rate is 0.
 Where does kNN come from?
 Nonparametric density estimation
© Eric Xing @ CMU, 2006-2010

Nearest-Neighbor Learning
AlgorithmAlgorithm
 Learning is just storing the representations of the trainingg j g p g
examples in D.
 Testing instance x:
 Compute similarity between x and all examples in D.
 Assign x the category of the most similar example in D.
Does not explicitly compute a generalization or category Does not explicitly compute a generalization or category
prototypes.
 Also called: Also called:
 Case-based learning
 Memory-based learning
 Lazy learning
© Eric Xing @ CMU, 2006-2010
 Lazy learning

kNN is an instance of
Instance Based LearningInstance-Based Learning
 What makes an Instance-Based Learner?
 A distance metric
 How many nearby neighbors to look at?
 A weighting function (optional)
 How to relate to the local points?
© Eric Xing @ CMU, 2006-2010

Euclidean Distance MetricEuclidean Distance Metric
D 22
)'()'(
 Or equivalently,
 
i
iii xxxxD 22
)'()',( 
Oth t i
)'()'()',( xxxxxxD T

 Other metrics:
 L1 norm: |x-x'|
 L∞ norm: max |x-x'| (elementwise …)
 Mahalanobis: where  is full, and symmetric
 Correlation
 Angle
H i di t M h tt di t
© Eric Xing @ CMU, 2006-2010
 Hamming distance, Manhattan distance
 …

Case Study:
kNN for Web ClassificationkNN for Web Classification
 Dataset
 20 News Groups (20 classes)
 Download :(http://people.csail.mit.edu/jrennie/20Newsgroups/)
 61 118 words 18 774 documents 61,118 words, 18,774 documents
 Class labels descriptions
© Eric Xing @ CMU, 2006-2010© Eric Xing @ CMU, 2006-2008

Results: Binary ClassesResults: Binary Classes
alt.atheism
Accuracy vs.
comp.graphics
rec.autos
vs
comp.windows.x
vs.
Accuracy
vs.
rec.sport.baseball
rec.motorcycles
k

Results: Multiple ClassesResults: Multiple Classes
Random select 5-out-of-20 classes, repeat 10 runs and average
Accuracy
All 20 classes
k

Is kNN ideal?Is kNN ideal?
© Eric Xing @ CMU, 2006-2010

Is kNN ideal? more laterIs kNN ideal? … more later
© Eric Xing @ CMU, 2006-2010

Effect of ParametersEffect of Parameters
 Sample size
 The more the better
 Need efficient search algorithm for NN
 Dimensionality
 Curse of dimensionality
 Density
 How smooth?
 Metric
 The relative scalings in the distance metric affect region shapes.
 Weight Weight
 Spurious or less relevant points need to be downweighted
 K
© Eric Xing @ CMU, 2006-2010

SummarySummary
 Machine Learning is Cool and Useful!!
P di f M hi L i Paradigms of Machine Learning.
 Design elements learning
 Theories on learning
 Fundamental theory of classification
 Bayes optimal classifier
 Instance-based learning: kNN – a Nonparametric classifier
 A nonparametric method does not rely on any assumption concerning the structure
of the underlying density function.
 Very little “learning” is involved in these methods
 Good news: Good news:
 Simple and powerful methods; Flexible and easy to apply to many problems.
 kNN classifier asymptotically approaches the Bayes classifier, which is theoretically the
best classifier that minimizes the probability of classification error.
© Eric Xing @ CMU, 2006-2010
 Bad news:
 High memory requirements
 Very dependant on the scale factor for a specific problem.

Learning Based Approaches forLearning Based Approaches for
Visual Recognition
L. Fei‐Fei
Computer Science Dept.
f dStanford University

As legend goes…
8 August 2010
L. Fei-Fei, Dragon Star 2010,
Stanford
47

CVPR: 1985 ‐ 2010
8 August 2010
Stanford
48

What is vision?
“ d di ” iReal world “understanding” pictures
pixel world
“forming” pictures8 August 2010 49
Stanford

“ d di ” i
What is vision?
“understanding” pictures
• edges
Real world
• intensity
• texture
• …
pixel world
LowLow‐‐Level VisionLevel Vision
8 August 2010 50
Stanford

“ d di ” i
What is vision?
• groupings of
Real world
similar pixels
• geometry
• …
pixel world
MidMid‐‐Level VisionLevel Vision
8 August 2010 51
Stanford

“ d di ” i
What is vision?
This is a story of love and
Real world
friendship among three
young hamsters. On a
sunny day in the garden…
pixel world
Hi hHi h L l Vi iL l Vi i
MidMid‐‐Level VisionLevel Vision
HighHigh‐‐Level VisionLevel Vision
8 August 2010 52
Stanford

Humans are extremely good at high‐
level semantic understandingg
time
Fei‐Fei et al. JoV 200753
Stanford
8 August 2010

Humans are extremely good at high‐
level semantic understandingg
Stanford
8 August 2010

PT = 27ms
This was a picture with some dark sploches in it.
Y h th t' b t it (S bj t KM)
I think I saw two people on a field. (Subject:
PT = 40ms
Yeah. . .that's about it. (Subject: KM)
p p ( j
RW)
PT = 67ms
Outdoor scene. There were some kind of
animals, maybe dogs or horses, in the middle of
the picture. It looked like they were running in
the middle of a grassy field. (Subject: IV)
S ki d f fi h f
PT = 500ms
Some kind of game or fight. Two groups of two
men? The foregound pair looked like one was
getting a fist in the face. Outdoors seemed like
because i have an impression of grass and maybe
li h ? Th ld b h I hi k
PT = 107ms
two people, whose profile was toward me.
looked like they were on a field of some sort and
engaged in some sort of sport (their attire
suggested soccer, but it looked like there was too
lines on the grass? That would be why I think
perhaps a game, rough game though, more like
rugby than football because they pairs weren't in
pads and helmets, though I did get the
i i f i il l hi b much contact for that). (Subject: AI) impression of similar clothing. maybe some
trees? in the background. (Subject: SM)
Stanford
8 August 2010

Visual recognition
social rolessocial roles
situationsituation goals and intentionsgoals and intentions
functionality
(human-object interaction)
functionality
roles and functionsroles and functions causalitycausality
High‐level
1 sec
on
me in
ystem
recognitionrecognition
geometric layoutgeometric layout
scene understandingscene understanding
activity understandingactivity understanding
3D geometry3D geometry
presentatio
ion tasks
ocessing tim
man visual sy
t tit ti
parts and attributesparts and attributes
basic-level
classification
basic-level
classification
segmentationsegmentation
action classificationaction classification
150 msec
of image re
nd recogniti
Pro
hum
shapeshape
features and descriptorsfeatures and descriptors
gg
90 msec
Level o
an
texturetexture
trackingtracking
ObjectObject SceneScene Event/activityEvent/activity
pp
Low‐level
8 August 2010 56
Stanford

Visual recognition
1990 l 2000
1990‐early 2000
functionality
functionality
High‐level
1 sec
on
me in
ystem
presentatio
ion tasks
ocessing tim
man visual sy
t tit ti
basic-level
classification
basic-level
classification
150 msec
of image re
nd recogniti
Pro
hum
shapeshape
gg
90 msec
Level o
an
texturetexture
trackingtracking
pp
Low‐level
8 August 2010 57
Stanford

Visual recognition
l 2000
early 2000 ‐ now
functionality
functionality
High‐level
1 sec
on
me in
ystem
presentatio
ion tasks
ocessing tim
man visual sy
t tit ti
basic-level
classification
basic-level
classification
150 msec
of image re
nd recogniti
Pro
hum
shapeshape
gg
90 msec
Level o
an
texturetexture
trackingtracking
pp
Low‐level
8 August 2010 58
Stanford

Why machine learning?
8 August 2010
Stanford
59

• 80,000,000,000+ images
• 5,000,000,000+ images
• 120,000,000 videos (upload: 13hours/min)
• 20Gb of imagesg
• My mom’s hard drive: 220+Gb of images• My mom s hard‐drive: 220+Gb of images
8 August 2010
Stanford
60

• Today: Lots of data, complex tasks
Internet images, Movies, news, sports
personal photo albums
Surveillance and security Medical and scientific images 61
Stanford
8 August 2010

Machine learning in computer vision
• Aug 12, Lecture 1: Nearest Neighbor
– Large‐scale image classification
– Scene completion
• Aug 12 Lecture 3: Neural NetworkAug 12, Lecture 3: Neural Network
– Convolutional Nets for object recognition
– Unsupervised feature learning via Deep Belief Net
• Aug 13, Lecture 7: Dimensionality reduction, Manifold learning
– Eigen‐ and Fisher‐ faces
– Applications to object representation
• Aug 15, Lecture 13: Conditional Random Field
– Image segmentation, object recognition & image annotationImage segmentation, object recognition & image annotation
• Aug 15 & 16, Lecture 14 + 17: Topic models
– Object recognition
– Scene classification, image annotation, large‐scale image clustering
Total scene understanding– Total scene understanding
8 August 2010 62
Stanford

Nearest Neighbor approaches:Nearest Neighbor approaches:
two case study
L. Fei‐Fei
Computer Science Dept.
f dStanford University

– Large‐scale image classificationLarge scale image classification
8 August 2010 64
Stanford

8 August 2010 65
Stanford

66L. Fei-Fei, Dragon Star 2010,
Stanford
8 August 2010

http://www.image-net.org
Stanford
8 August 2010

Large-scale image classification and
t i l i dd d bl i i iretrieval is an unaddressed problem in vision
? ?
Stanford
8 August 2010

) Humans
4
y(log_10)
3
PASCAL1
category
2
Caltech101/256MRSC
PASCAL1
LabelMe3
agesper
1
Caltech101/256
Tiny Images2
cleanima
1 2 3 4 5
# of visual concept categories (log 10)
#of
p g ( g_ )
1. Excluding the Caltech101 datasets from PASCAL
2. Image in this dataset are not human annotated. The # of clean images per category is a rough estimation
3. Categories of more than 100 images are considered only

kNN for image classification: basic set‐up
Antelope
Trombone?Antelope
J ll fi h
Kangaroo
Jellyfish
German Shepherd

kNN for image classification: basic set‐up
?
5‐NN
?

Classification
5‐NN
? Kangaroo
? a ga oo
Count
3
2
1
Antelope Jellyfish German Shepherd TromboneKangaroo
0

10K classes, 4.5M Queries, 4.5M training
images imagesimages images
?
o Torralba
?
urtesy: Antoniound image couBackgrou

KNN on 10K classes
• 10K classes
• 4.5M queries
• 4.5M trainingg
• Features
BOW– BOW
– GIST
Deng, Berg, Li & Fei‐Fei, ECCV 2010
74
Stanford
8 August 2010

How fast is kNN?
• Brute force linear scan
– E.g. scanning 4.5M images!
• Can we be faster?
– Yes if feature dimensionality is low ( e.g. <= 16 )
– K‐D tree.
http://graphics.stanford.edu/courses/cs368‐00‐spring/TA/manuals/CGAL/ref‐manual2/SearchStructures/kdtree.gif
75
Stanford
8 August 2010

Curse of dimensionality
• For high dimensionality, in both theory and
practice, little improvement over brute‐force
linear scan.
• E.g. KD‐tree on 128 dimension SIFT is not
much faster than linear scan.
76
Stanford
8 August 2010
http://graphics.stanford.edu/courses/cs368‐00‐spring/TA/manuals/CGAL/ref‐manual2/SearchStructures/kdtree.gif

Locality sensitive hashing
• Approximate kNN
G d h i i– Good enough in practice
– Can get around curse of dimensionality
• Locality sensitive hashing
– Near feature points  (likely) same hash values
Hash table
77
Stanford
8 August 2010

Example: Random projection
• h(x) = sgn (x ∙ r), r is a random unit vector
• h(x) gives 1 bit. Repeat and concatenate.h(x) gives 1 bit. Repeat and concatenate.
• Prob[h(x) = h(y)] = 1 – θ(x,y) / π
y y y
r
y
θ
x
y
x
y
x
h(x) = 0, h(y) = 1 h l
h(x) = 0, h(y) = 0 h(x) = 0, h(y) = 1
h(x) 0, h(y) 1 hyperplane
x y
Hash table
000 101
78
Stanford
8 August 2010

Example: Random projection
• h(x) = sgn (x ∙ r), r is a random unit vector
• h(x) gives 1 bit. Repeat and concatenate.h(x) gives 1 bit. Repeat and concatenate.
• Prob[h(x) = h(y)] = 1 – θ(x,y) / π
ry x
y
x
y
x
y
θ
h(x) = 0, h(y) = 0 h l
h(x) = 0, h(y) = 0 h(x) = 0, h(y) = 0
h(x) 0, h(y) 0 hyperplane
x y
Hash table
000 101
79
Stanford
8 August 2010

Retrieved NNs
Hash table
?
80
Stanford
8 August 2010

• 1000X speed‐up with 50% recall of top 10‐NN
• 1.2M images + 1000 dimensions
0.6
0.7op10 L1Prod LSH + L1Prod ranking
RandHP LSH + L1Prod ranking
rieved
0.5
0.6
1Prodattop
exact NN ret
0.3
0.4
ecallofL1P
rcentage of e
0.4 0.6 0.8 1 1.2 1.4 1.6
x 10
−3
0.2
Rec
Scan cost
Percentage of points scanned
Per
x 10Percentage of points scanned
81
Stanford
8 August 2010

10K classes, 4.5M Queries, 4.5M training
images imagesimages images
??

(slides courtesy: Alyosha Efros (CMU))(slides courtesy: Alyosha Efros (CMU))
8 August 2010 83
Stanford

[Hays and Efros. Scene Completion Using Millions of Photographs.
SIGGRAPH 2007 and CACM October 2008.] 848 August 2010
Stanford

858 August 2010
Stanford

Diffusion Result
868 August 2010
Stanford

Efros and Leung result
878 August 2010
Stanford

888 August 2010
Stanford

Scene Matching for
I C l tiImage Completion
898 August 2010
Stanford

Scene Completion Result 90
Stanford
8 August 2010

91
Stanford
8 August 2010

Scene Matching
92
Stanford
8 August 2010

… 200 total
93
Stanford
8 August 2010

Context Matching
94
Stanford
8 August 2010

95
Stanford
8 August 2010

96
Stanford
8 August 2010

97
Stanford
8 August 2010

98
Stanford
8 August 2010

“The Internet is the world’s largest library.”
99
Stanford
8 August 2010

100
Stanford
8 August 2010

Nearest neighbors from a
collection of 20 thousand images 1018 August 2010
Stanford

Nearest neighbors from a
collection of 2 million images 1028 August 2010
Stanford

Hays and Efros, SIGGRAPH 2007
103
Stanford
8 August 2010

104
Stanford
8 August 2010

105
Stanford
8 August 2010

106
Stanford
8 August 2010

107
Stanford
8 August 2010

108
Stanford
8 August 2010

Lecture1 xing fei-fei

More Related Content

Similar to Lecture1 xing fei-fei (20)

More from Tianlu Wang (20)

Recently uploaded (20)

Lecture1 xing fei-fei