Introduction to Machine Learning Lecture

Biomedical Informatics 260
Machine Learning for Images: Introduction
Lecture 11
Spring 2014

Review: Feature Extraction
Today: Machine learning to classify, predict
1. We have a rich set of semantic and
computational features
2. We want to harness these features to
build tools for medical decision support

Machine Learning
• What is machine learning?
• How do I prepare my data?
• Types of Learning Algorithms
• How do I evaluate performance?

http://www.vbmis.com/learn/some-link-here
More details on methods
(formulas, written out explanations, etc.)

Real world
phenomenon
Model
Data 1.2 ‘green’ 234.5 10
1.4 ‘red’ 160.8 11
.05 ‘red 150.3 10
What is
machine
learning?

Where do I start?
• Identify a question
– Can I predict candy type from image features?
• Ideal data?
– High resolution images for all candy in the world
• Available data
– 100 pictures of peanut and regular M&M’s
• Explore and visualize
• Create model
• Evaluate results
What is
machine
learning?

Look at your data
Start with simple methods first
Always evaluate
Good ideas:
What is
machine
learning?

How do I prepare my data?
Data
Learning
Algorithm
Classifier
Features
Evaluation

GOAL:
How do I
prepare my
data?
Prepare data for feature extraction

Filtering
Quality Analysis
Registration
Segmentation
How do I
prepare my
data?

Filtering
Do we need to smooth edges?
Smooth pixels for noise reduction?
Filter out a threshold of interest?
How do I
prepare my
data?
http://www.vbmis.com/learn/category/medical-imaging/image-
processing/filtering/

Quality Analysis
Are any images still not good enough?
Is the data biased in any way?
Do we have representation of all classes?
How do I
prepare my
data?

Registration
Functional data aligned with structural?
Alignment to a standard space?
Alignment to a group template?
How do I
prepare my
data?
processing/registration-image-processing/

Segmentation
What is background vs.
region of interest?
How do I
prepare my
data?
processing/segmentation/

Features
Data
Learning
Algorithm
Classifier
Features
Evaluation

Feature Space Representations
Most algorithms work with general vectors
2.9 0.2
0.1 1.9
7.2 2.1
Image 1
Image 2
Image 3
X1 X2 Y
“green peanut”
“red regular”
“red peanut”
How do I
prepare my
data?

Feature Normalization
Here is our vector
v = <x,y>
Calculate the norm
|v| = +
For example,
N=<4,-3>
|v| = 4 +3 = 5
Then divide vector by the norm to
make the unit vector
v / |v|
How do I
prepare my
data?

How do I build a classifier?
Choosing a learning algorithm
Data
Learning
Algorithm
Classifier
Features
Evaluation

Three Types of Learning Algorithms
1. Supervised Classification
2. Unsupervised Clustering
3. Regression
How do I
build a
classifier?

• K-nearest neighbor (KNN)
• Naïve Bayes
• Support vector machines (SVM)
How do I
build a
classifier?

K-Nearest Neighbor
1. For each unlabeled point, compute the distance to all other points (order N)
2. sort the distances so we have a sorted list of the neighbors, including labels
3. determine the closest K neighbors (take top K off the list)
4. combine the neighbors labels to make a decision **
1. MUST have a way to resolve K neighbors that don't agree!
How do I
build a
classifier?
http://www.vbmis.com/learn/k-nearest-neighbor-clustering-knn/

KNN with K=1
How do I
build a
classifier?

KNN with K=15
How do I
build a
classifier?

KNN with K=100
How do I
build a
classifier?

KNN with K=400
How do I
build a
classifier?

K-Nearest Neighbor
How do I
build a
classifier?

Naïve Bayes
Estimate the probability of belonging to each class, and assign to highest
probable class
1. Assume conditions are independent: observing a feature says nothing about others
2. We typically sum the log of probabilities instead of multiplying probabilities
( | ) =
∗ ( )
( )
How do I
build a
classifier?
http://www.vbmis.com/learn/naive-bayes/

Naïve Bayes
( == | ) =
== ∗ ( )
( )
( == | ) =
== ∗ ( )
( )
How do I
build a
classifier?

( == | ) =
== ∗ ( )
( )
Naïve Bayes
!"# $| ==
If the hypothesis were true, what would the features look like?
How do I
build a
classifier?

Naïve Bayes
( == | ) =
== ∗ ( )
( )
== ∗ + == ∗ ( )
!"# $
The overall probability of observing the features in the data, regardless of candy type
How do I
build a
classifier?

( == | ) =
== ∗ ( )
( )
Naïve Bayes
The probability of the hypothesis before looking at the data
How do I
build a
classifier?

Support Vector Machines
1. a way to transform dots into higher dimensional space (kernel mapping)
2. looking for a best boundary line that best distinguishes classes
3. the points that form the boundary line are called the support vectors
How do I
build a
classifier?
http://www.vbmis.com/learn/support-vector-machines-svms/

2. Regression
1. Supervised learning method
2. Assume that relationship between data points X, and Y, is linear
3. We can solve with the least squares approach (min sum of squared error)
Linear Regression
& = '( + ')
)
+ ')
)
+ . . . + '*
*
How do I
build a
classifier?
http://www.vbmis.com/learn/linear-regression/

3. Unsupervised Clustering
• K-means clustering
• Hierarchical clustering
How do I
build a
classifier?

Unsupervised Clustering
K-Means
1. Generate K random points (cluster centers) in the space of the objects to be clustered.
2. Compute the distance of each data point (objects) to all the cluster centers
3. Assign each object to the closest cluster center (CC)
4. Compute a new position for the cluster center as the sum average of the assigned
objects (lots of ways to do average)
Loop to step 2, until bored or the cluster centers don't change significantly
(less than or equal to some epsilon that we set)
How do I
build a
classifier?
http://www.vbmis.com/learn/k-means-clustering/

Unsupervised Clustering
Hierarchical
1. Computer a matrix of all distances between objects (not order N, calculate N squared
distances)
2. Find the two closest nodes
3. Merge them by "averaging" (multiple strategies for averaging, usually is a weighted
average) positions
4. Compute the distance of new merged node to all others
This leaves N-1 nodes
5. Repeat until all nodes merged (there is one node)
6. Draw cluster boundaries as you see fit
How do I
build a
classifier?
http://www.vbmis.com/learn/hierarchical-clustering/

Data
Learning
Algorithm
Classifier
Features
Evaluation
How do I build a classifier?
Ok, now build it!

What data do I use to build?
Training vs. Testing Data

Data
Learning
Algorithm
?
?
? Classifier
Orange
M&M
Yellow M&M
Blue Peanut
M&M
Features
1.2 ‘green’ 234.5 10
1.4 ‘red’ 160.8 11
.05 ‘red 150.3 10
TRAINING
building model
The algorithm “learns” the optimal
parameters for the model
How do I
build a
classifier?

Data
Learning
Algorithm
?
?
? Classifier
Orange
M&M
Yellow M&M
Blue Peanut
M&M
Features
1.2 ‘green’ 234.5 10
1.4 ‘red’ 160.8 11
.05 ‘red 150.3 10
TESTING
evaluating model
We give the classifier new data to
predict class labels
How do I
build a
classifier?

How do I use the data
to obtain reliable estimates?

How do I obtain reliable estimates?
• Holdout: if you have enough data
• Cross Validation: If you don’t
How do I
build a
classifier?
• Ideal: We have new data to test on
• Reality: We don’t

• Holdout
– 2/3 for training
– 1/3 for testing
How do I
build a
classifier?

• Cross Validation
– Partition into N sets
– Train on N-1
– Test on “held out” set
How do I
build a
classifier?

How do I choose the right model
complexity?
Overfitting and Underfitting

Bishop, et al. Pattern Recognition and Machine Learning
What Model Complexity to Choose?
generated
data model
fitted models of
different orders
data
How do I
build a
classifier?

What Model Complexity to Choose?
Gareth, et al. An Intro to Statistical Learning with Applications in R
How do I
build a
classifier?

How to evaluate performance?
Data
Learning
Algorithm
Classifier
Features
Evaluation

How do I evaluate performance?
of a classification model
• Focus on predictive ability of the model
Confusion Matrix
Predicted Class
Actual Class
Yes No
Yes TP FN
No FP TN
How do I
assess
performance?

+,,-./,0 =
12 + 13
12 + 13 + 42 + 43
Predicted Class
Actual Class
Yes No
Yes TP FN
No FP TN
How do I
assess
performance?

1.-5 267898:5 ;/95 =
12
12 + 43
(Sensitivity)
1.-5 35</98:5 ;/95 =
13
13 + 42
(Specificity)
How do I
assess
performance?

of a clustering model
• Ideal Clustering: finding "optimal" (best that we can do) grouping of
objects such that the within group distance is low (minimized) and
the between group distance is high (maximized).
How do I
assess
performance?
http://www.vbmis.com/learn/cluster-validation/

• Internal Measures
– stability validation
– connectivity
– compactness
– separation
– the Dunn Index
– silhouette width
How do I
assess
performance?

• External Measures
– Biological Homogeneity Index
– Biological Stability Index
How do I
assess
performance?

Evaluation:
Comparing to Competitor Methods

ROC Curve Analysis
=5>7898:890 = 12; =
12
12 + 43
We care a lot about missing something – FP are expensive
=?5,8@8,890 = 13; =
13
13 + 42
We don’t care about missing something that might be true
How do I
assess
performance?

Find some subset of features (predictor
variables) that are most informative about a
class label
Feature Selection:
How do I
assess
performance?

Feature Selection
Methods for dimensionality reduction
• Criterion
– mean squared error (regression)
– misclassification rate (classification)
How do I
assess
performance?

Feature Selection
Methods for dimensionality reduction
1. Best Subset Selection
2. Sequential Forward Search
3. Sequential Backward Search
4. Shrinkage Methods
5. Dimensionality Reduction
How do I
assess
performance?

Best Subset Selection
• Only feasible when number of features (p) is small
P1 P2 P3
How do I
assess
performance?

Sequential Forward Search
• Sequentially add features to an empty candidate set
until the addition of further features doesn’t
decrease our criterion
P1 P2 P3
How do I
prepare my
data?

Sequential Backward Search
• features are sequentially removed from a full
candidate set until the removal of further features
increase the criterion
P1 P2 P3
How do I
prepare my
data?

Machine learning is using methods
from statistics and computer science
to predict an outcome
What is
machine
learning?

How do I
prepare my
data?
We should visualize, normalize, and
clean our data before turning it into a
vector to train a learning algorithm

We should choose a method based on
our data, perform intelligent feature
selection, and take advantage of
Matlab’s built in functions
How do I
build a
classifier?

Generally, we should evaluate with a
separate test data set, and look at ROC
curve metrics, and internal and
external validation for clustering
How do I
assess
performance?

What does it mean for me?
• First identify your question
• Come up with “the ideal” data
• Find “actual” data
• Explore it, visualize it!
• Experiment with different classifiers
• Evaluate and write up your results

Courses to Take
• Statistics:
– 116, 200, 202
• Computer Science:
– CS229
Next Time:
Advanced Machine Learning for Imaging

Introduction to Machine Learning Lecture

Introduction to Machine Learning Lecture

More Related Content

What's hot (12)

Viewers also liked (13)

Similar to Introduction to Machine Learning Lecture (20)

More from Vanessa S (20)

Recently uploaded (20)

Introduction to Machine Learning Lecture