Scalable machine learning

Scalable Machine Learning
BerkeleyX: CS190.1x Scalable Machine Learning

outline
1. what is machine learning ?
2. linear regression
3. how to scale linear regression
4. gradient descent
5. how to scale gradient descent

Machine learning explores the study and construction of algorithms
that can learn from and make predictions on data
A Deﬁnition
Face recognition
Link prediction
document classiﬁcation

Raw data Raw data comes from many sources
apache log
email
image

Raw data
Feature Extraction
Initial observations can be in arbitrary format
We extract features to represent observations
We can incorporate domain knowledge 
We typically want numeric features
Success of entire pipeline often depends on choosing
good descriptions of observations!!

Raw data
Feature Extraction
Supervised Learning
Train a supervised model using labeled data,
e.g., Classiﬁcation or Regression model
weather
sunny
rainy
cloudy
vehicle count
1000
99
5

Raw data
Feature Extraction
Supervised Learning
Evaluation
TP FP
FN TN
Has cancer
No cancer
Predict cancer
Predict No cancer
confusion matrix

Raw data
Feature Extraction
Learning Model
Evaluation
Predict

full
dataset
training
set
validation
set
testing
set
classifier
turning model
model builder
final classifier predict

Example: Predicting shoe size from height, gender, and weight
For each observation we have a feature vector, x, and label, y
We assume a linear mapping between features and label:

Goal: ﬁnd the line of best ﬁt
x coordinate: features 
y coordinate: labels
1 D Example

Linear Least Squares Regression
Assume we have n training points, where x(i) denotes the ith point
Recall two earlier points:
• Linear assumption : y = wTx
• squared loss : ( y - y )2

Linear Least Squares Regression

Computing Closed Form Solution
Computational bottlenecks:
• Matrix multiply of XT X : O(nd2) operations
• Matrix inverse: O(d3)operations
d
nX:
n
dXT:
O(nd2)O(dn2) ?

Matrix Multiplication via Inner Products
Matrix Multiplication via Outer Products
result

Gradient Descent
Start at a random point

Gradient Descent
1. Determine a descent direction
2. Choose a step size
3. Update

Gradient Descent
3. Update
Repeat
Until stopping criterion is satisﬁed

Choosing Descent Direction (1D)

Choosing Step Size

Parallel Gradient Descent for
Least Squares

Gradient Descent Summary
• Easily parallelized
• Cheap at each iteration
positive
• Slow convergence
• Requires communication across nodes!
negative

Scalable machine learning

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Scalable machine learning (20)

More from Tien-Yang (Aiden) Wu (8)

Recently uploaded (20)

Scalable machine learning