SlideShare a Scribd company logo
Scalable Machine Learning
BerkeleyX: CS190.1x Scalable Machine Learning
outline
1. what is machine learning ?
2. linear regression
3. how to scale linear regression
4. gradient descent
5. how to scale gradient descent
Challenge: Scalability

What is Machine
Learning?
Machine learning explores the study and construction of algorithms
that can learn from and make predictions on data
A Definition
Face recognition
Link prediction
document classification
Raw data Raw data comes from many sources
apache log
email
image
Raw data
Feature Extraction
Initial observations can be in arbitrary format
We extract features to represent observations
We can incorporate domain knowledge

We typically want numeric features
Success of entire pipeline often depends on choosing
good descriptions of observations!!
Raw data
Feature Extraction
Supervised Learning
Train a supervised model using labeled data,
e.g., Classification or Regression model
weather
sunny
rainy
cloudy
vehicle count
1000
99
5
Raw data
Feature Extraction
Supervised Learning
Evaluation
TP FP
FN TN
Has cancer
No cancer
Predict cancer
Predict No cancer
confusion matrix
Raw data
Feature Extraction
Learning Model
Evaluation
Predict
full
dataset
training
set
validation
set
testing
set
classifier
turning model
model builder
final classifier predict
Linear regression
Example: Predicting shoe size from height, gender, and weight
For each observation we have a feature vector, x, and label, y
We assume a linear mapping between features and label:
Goal: find the line of best fit
x coordinate: features

y coordinate: labels
1 D Example
Linear Least Squares Regression
Assume we have n training points, where x(i) denotes the ith point
Recall two earlier points:
• Linear assumption : y = wTx
• squared loss : ( y - y )2
Linear Least Squares Regression
Computing Closed Form Solution
Computational bottlenecks:
• Matrix multiply of XT X : O(nd2) operations
• Matrix inverse: O(d3)operations
d
nX:
n
dXT:
O(nd2)O(dn2) ?
Matrix Multiplication via Inner Products
Matrix Multiplication via Outer Products
result
Scalable machine learning
Scalable machine learning
if Big n and Big d ?
Scalable machine learning
Gradient Descent
=
Gradient Descent
Start at a random point
Gradient Descent
Start at a random point
1. Determine a descent direction
2. Choose a step size
3. Update
Gradient Descent
Start at a random point
1. Determine a descent direction
2. Choose a step size
3. Update
Repeat
Until stopping criterion is satisfied
Choosing Descent Direction (1D)
1. Determine a descent direction
Choosing Step Size
2. Choose a step size
=
Parallel Gradient Descent for
Least Squares
Gradient Descent Summary
• Easily parallelized
• Cheap at each iteration
positive
• Slow convergence
• Requires communication across nodes!
negative

More Related Content

PPT
Free Quality C Charts
PDF
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
PDF
How to use SVM for data classification
PDF
Assignment in java
PDF
Arrays and its properties IN SWIFT
PPTX
2D Plot Matlab
PDF
Matlab Graphics Tutorial
PPTX
Lecture one
Free Quality C Charts
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
How to use SVM for data classification
Assignment in java
Arrays and its properties IN SWIFT
2D Plot Matlab
Matlab Graphics Tutorial
Lecture one

What's hot (20)

DOC
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
PPTX
Grade 9 COMPUTER
PPT
BASICS OF DATA STRUCTURE
PPTX
Variables in matlab
DOC
Xi practical file
PPT
C programming , array 2020
PPTX
Final Project
PPTX
Matlab matrices and arrays
PDF
Data Structures 01
PDF
PDF
PDF
Vectors data frames
 
PPT
Matlab
PDF
Array sheet
PDF
Two dimensional array
PPTX
170120107066 dbms
DOC
Practical java
PPT
Inequalties Of Combined Functions2[1]
Math 131 he goal of this lab is to find descriptive statistics/tutorialoutlet
Grade 9 COMPUTER
BASICS OF DATA STRUCTURE
Variables in matlab
Xi practical file
C programming , array 2020
Final Project
Matlab matrices and arrays
Data Structures 01
Vectors data frames
 
Matlab
Array sheet
Two dimensional array
170120107066 dbms
Practical java
Inequalties Of Combined Functions2[1]
Ad

Viewers also liked (20)

PDF
Hidden markov model
PDF
Collaborative Filtering Recommendation Algorithm based on Hadoop
PPTX
Creating the Best Experience: Accessibility & Usability
PDF
Artificial Intelligence Basics, Emergent properties where the magic happens !
PPTX
Mature Products: The Cycle of UX Reinvention UXPA 2016
PPT
Best practices in IBM Operational Decision Manager Standard 8.7.0 topologies
PPTX
DIY Usability Testing for Business Analysts (BA)
PPT
Best practices in deploying IBM Operation Decision Manager Standard 8.8.0
PDF
Anti-Money Laundering Solution
PDF
Collaborative filtering
PDF
Parallel-kmeans
PPTX
Microservice Memoirs
PPTX
"Machine Learning is Changing Everything" at SaaS North 2016
PDF
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
PDF
Using bpm, mdm and odm to implement on boarding solutions for banking - sessi...
PDF
Scalable sentiment classification for big data analysis using naive bayes cla...
PDF
IBM Business Process Management
PPTX
Integrated BPMN, CMMN and DMN - Combining Processes, Cases and Decisions
PDF
IBM BPM & ODM
Hidden markov model
Collaborative Filtering Recommendation Algorithm based on Hadoop
Creating the Best Experience: Accessibility & Usability
Artificial Intelligence Basics, Emergent properties where the magic happens !
Mature Products: The Cycle of UX Reinvention UXPA 2016
Best practices in IBM Operational Decision Manager Standard 8.7.0 topologies
DIY Usability Testing for Business Analysts (BA)
Best practices in deploying IBM Operation Decision Manager Standard 8.8.0
Anti-Money Laundering Solution
Collaborative filtering
Parallel-kmeans
Microservice Memoirs
"Machine Learning is Changing Everything" at SaaS North 2016
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Using bpm, mdm and odm to implement on boarding solutions for banking - sessi...
Scalable sentiment classification for big data analysis using naive bayes cla...
IBM Business Process Management
Integrated BPMN, CMMN and DMN - Combining Processes, Cases and Decisions
IBM BPM & ODM
Ad

Similar to Scalable machine learning (20)

PPTX
Building and deploying analytics
PPTX
background.pptx
PPT
isabelle_webinar_jan..
PDF
Machine Learning Algorithms Introduction.pdf
PPT
Data Mining.ppt
PPTX
Deep learning from mashine learning AI..
PPT
Data science: DATA MINING AND DATA WHEREHOUSE.ppt
PDF
Towards explanations for Data-Centric AI using provenance records
PDF
Visualizing the Model Selection Process
PPT
Ala Stolpnik's Standard Model talk
PPTX
Keynote at IWLS 2017
PDF
Making BIG DATA smaller
PPTX
Application of Machine Learning in Agriculture
DOCX
Sample Project Report okokokokokokokokok
PDF
机器学习Adaboost
PDF
LR2. Summary Day 2
PPT
[ppt]
PPT
[ppt]
PPTX
Machine learning and linear regression programming
PPTX
Dimension reduction techniques[Feature Selection]
Building and deploying analytics
background.pptx
isabelle_webinar_jan..
Machine Learning Algorithms Introduction.pdf
Data Mining.ppt
Deep learning from mashine learning AI..
Data science: DATA MINING AND DATA WHEREHOUSE.ppt
Towards explanations for Data-Centric AI using provenance records
Visualizing the Model Selection Process
Ala Stolpnik's Standard Model talk
Keynote at IWLS 2017
Making BIG DATA smaller
Application of Machine Learning in Agriculture
Sample Project Report okokokokokokokokok
机器学习Adaboost
LR2. Summary Day 2
[ppt]
[ppt]
Machine learning and linear regression programming
Dimension reduction techniques[Feature Selection]

More from Tien-Yang (Aiden) Wu (8)

PDF
沒有想像中簡單的簡單分類器 Knn
PDF
Semantic ui教學
PDF
響應式網頁教學
PDF
NoSQL & JSON
PDF
Weebly上手教學
PDF
簡易爬蟲製作和Pttcrawler
PDF
Python簡介和多版本虛擬環境架設
沒有想像中簡單的簡單分類器 Knn
Semantic ui教學
響應式網頁教學
NoSQL & JSON
Weebly上手教學
簡易爬蟲製作和Pttcrawler
Python簡介和多版本虛擬環境架設

Recently uploaded (20)

PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
System and Network Administraation Chapter 3
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Introduction to Artificial Intelligence
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
L1 - Introduction to python Backend.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
history of c programming in notes for students .pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
How to Migrate SBCGlobal Email to Yahoo Easily
System and Network Administraation Chapter 3
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Introduction to Artificial Intelligence
How to Choose the Right IT Partner for Your Business in Malaysia
wealthsignaloriginal-com-DS-text-... (1).pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
Which alternative to Crystal Reports is best for small or large businesses.pdf
CHAPTER 2 - PM Management and IT Context
Odoo POS Development Services by CandidRoot Solutions
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Reimagine Home Health with the Power of Agentic AI​
L1 - Introduction to python Backend.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo Companies in India – Driving Business Transformation.pdf
history of c programming in notes for students .pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...

Scalable machine learning