SlideShare a Scribd company logo
Sebastian Raschka, Ph.D.
MSU Data Science workshop
East Lansing, Michigan State University • Feb 21, 2018
Machine Learning with Python
Today’s focus:
And if we have time, a quick
overview ...
2
Contact:
o E-mail: mail@sebastianraschka.com
o Website: http://sebastianraschka.com
o Twitter: @rasbt
o GitHub: rasbt
Tutorial Material on GitHub:
https://github.com/rasbt/msu-datascience-ml-tutorial-2018
3
Machine learning is used & useful (almost) anywhere
4
5
3 Types of Learning
Reinforcement
Supervised Unsupervised
6
Working with Labeled Data
Supervised
Learning
?
x (“input”)
y
(“output”)
x1 (“input”)
x
2
(“input”)
?
Regression
Classification
7
Working with Unlabeled Data
Unsupervised
Learning
Clustering
Compression
8
Topics
1. Introduction to Machine Learning
2. Linear Regression
3. Introduction to Classification
4. Feature Preprocessing & scikit-learn Pipelines
5. Dimensionality Reduction: Feature Selection & Extraction
6. Model Evaluation & Hyperparameter Tuning
9
y
(response
variable)
x (explanatory variable)
(xi, yi)
ŷ = w0 + w1x
w0 (intercept)
w1 (slope)
= Δy / Δx
Δx
Δy
vertical offset
|ŷ − y|
Simple Linear Regression
10
11
Columns: features (explanatory variables, independent variables, covariates,
predictors, variables, inputs, attributes)
x0 x1 … xm
x0,0 x0,1
x1,0 x1,1
x2,0 x2,1
x3,0 x3,1
.
.
.
xn,0 xn,1 … xn,m
X=
y0
y1
y2
y3
.
.
.
yn
y=
Data Representation
Rows:
training
examples
(observations,
records,
instances,
samples)
Targets (target
variable,response variable,
dependent variable, labels,
ground truth)
Learning
Algorithm
Hyperparameter
Values
Model
Prediction
Test Labels
Performance
Model
Learning
Algorithm
Hyperparameter
Values Final
Model
2
3
4
1
Test Labels
Test Data
Training Data
Training Labels
Data
Labels
Data
Labels
Training Data
Training Labels
Test Data
“Basic” Supervised Learning Workflow
12
Jupyter Notebook
13
Topics
1. Introduction to Machine Learning
2. Linear Regression
3. Introduction to Classification
4. Feature Preprocessing & scikit-learn Pipelines
5. Dimensionality Reduction: Feature Selection & Extraction
6. Model Evaluation & Hyperparameter Tuning
14
Scikit-learn API
class SupervisedEstimator(...):
def __init__(self, hyperparam, ...):
...
def fit(self, X, y):
...
return self
def predict(self, X):
...
return y_pred
def score(self, X, y):
...
return score
... 15
Iris Dataset
Iris-Virginica
Iris-Versicolor
Iris-Setosa
16
features (columns)
sepal
length
[cm]
sepal
width
[cm]
petal
lengt
h
[cm]
petal
width
[cm]
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
50 6.4 3.5 4.5 1.2
.
.
.
150 5.9 3.0 5.0 1.8
X=
setosa
setosa
versicolor
.
.
.
virginica
y=
samples
(rows)
sepal
petal
Iris Dataset
17
Note about Non-Stratified Splits
§ training set → 38 x Setosa, 28 x Versicolor, 34 x Virginica
§ test set → 12 x Setosa, 22 x Versicolor, 16 x Virginica
18
Linear Regression Recap
Σ
.
.
.
w1
wm
w2
w0
x1
1
x2
xm
y
Activation
function
Net input
function
a
z Predicted
output
Weight
coefficients
Input
values
Bias
unit
19
Linear Regression Recap
Σ
.
.
.
w1
wm
w2
w0
x1
1
x2
xm
y
Activation
function
Net input
function
a
z Predicted
output
Weight
coefficients
Input
values
Bias
unit
Here: Identity
function
20
Logistic Regression, a Generalized Linear Model
(a Classifier)
Σ
.
.
.
w1
wm
w2
w0
x1
1
x2
xm
y
Activation
function
Net input
function
a
z
Unit step
function
Predicted
class label
Weight
coefficients
Input
values
Bias
unit
Predicted
probability
21
A “Lazy Learner:” K-Nearest Neighbors Classifier
x1
?
3 ×
1 ×
1 ×
Predict
? =
x2
22
Jupyter Notebook
23
http://scikit-learn.org/stable/supervised_learning.html
There are many, many more classification
and regression algorithms ...
24
Topics
1. Introduction to Machine Learning
2. Linear Regression
3. Introduction to Classification
4. Feature Preprocessing & scikit-learn Pipelines
5. Dimensionality Reduction: Feature Selection & Extraction
6. Model Evaluation & Hyperparameter Tuning
25
Categorical Variables
color size price
class
label
red M $10.49 0
blue XL $15.00 1
green L $12.99 1
26
Encoding Categorical Variables (Ordinal vs Nominal)
color size price class label
red M $10.49 0
blue XL $15.00 1
green L $12.99 1
size
0
2
1
red blue green
1 0 0
0 1 0
0 0 1
27
Feature Normalization
feature minmax z-score
1.0 0.0 -1.46385
2.0 0.2 -0.87831
3.0 0.4 -0.29277
4.0 0.6 0.29277
5.0 0.8 0.87831
6.0 1.0 1.46385
Min-max scaling Z-score standardization
28
Scikit-learn API
class UnsupervisedEstimator(...):
def __init__(self, ...):
...
def fit(self, X):
...
return self
def transform(self, X):
...
return X_transf
def predict(self, X):
...
return pred
29
Scikit-learn Pipelines
Class labels
Training data
Test data
Learning
Algorithm
Dimensionality
Reduction
Scaling
Model
Pipeline
fit
fit & transform
fit & transform
fit
transform
transform
Class labels
predict
predict
30
Jupyter Notebook
31
Topics
1. Introduction to Machine Learning
2. Linear Regression
3. Introduction to Classification
4. Feature Preprocessing & scikit-learn Pipelines
5. Dimensionality Reduction: Feature Selection & Extraction
6. Model Evaluation & Hyperparameter Tuning
32
Dimensionality Reduction – why?
[cm] [cm] [cm] [cm]
[cm]
[cm]
[cm]
[cm]
33
Dimensionality Reduction – why?
predictive performance
predictive performance
storage & speed
visualization &
interpretability
34
Recursive Feature Elimination
available features:
[ w1 w2 w3 w4 ]
[ w1 w2 w4 ]
[ w1 w4 ]
[ w4 ]
[ f1 f2 f3 f4 ]
fit model, remove lowest weight, repeat
fit model, remove lowest weight, repeat
fit model, remove lowest weight, repeat
35
Sequential Feature Selection
[ f1 f2 f3 f4 ]
[ f1 ] [ f2 ] [ f3 ] [ f4 ]
[ f1 f3 ] [ f1 f2 ] [ f1 f4 ]
[ f1 f3 f4 ] [ f1 f3 f2 ]
available features:
fit model, pick best, repeat
fit model, pick best, repeat
36
Principal Component Analysis
x1
x2
PC1
PC2
37
Jupyter Notebook
38
Topics
1. Introduction to Machine Learning
2. Linear Regression
3. Introduction to Classification
4. Feature Preprocessing & scikit-learn Pipelines
5. Dimensionality Reduction: Feature Selection & Extraction
6. Model Evaluation & Hyperparameter Tuning
39
Learning
Algorithm
Hyperparameter
Values
Model
Prediction
Test Labels
Performance
Model
Learning
Algorithm
Hyperparameter
Values Final
Model
2
3
4
1
Test Labels
Test Data
Training Data
Training Labels
Data
Labels
Data
Labels
Training Data
Training Labels
Test Data
“Basic” Supervised Learning Workflow
40
Holdout Method and Hyperparameter Tuning 1-3
2
1
Data
Labels
Training Data
Validation
Data
Validation
Labels
Test
Data
Test
Labels
Training Labels
Performance
Model
Validation
Data
Validation
Labels
Prediction
Performance
Model
Validation
Data
Validation
Labels
Prediction
Performance
Model
Validation
Data
Validation
Labels
Prediction
Best
Model
Learning
Algorithm
Hyperparameter
values
Model
Hyperparameter
values
Hyperparameter
values
Model
Model
Training Data
Training Labels
3
Best
Hyperparameter
values
41
Learning
Algorithm
Best
Hyperparameter
Values Final
Model
6
Data
Labels
Prediction
Test Labels
Performance
Model
4
Test Data
Learning
Algorithm
Best
Hyperparameter
Values
Model
Training Data
Training Labels
5
Validation
Data
Validation
Labels
Holdout Method and Hyperparameter Tuning 4-6
42
1st
2nd
3rd
4th
5th
K
Iterations
(K-Folds)
Validation
Fold
Training
Fold
Learning
Algorithm
Hyperparameter
Values
Model
Training Fold Data
Training Fold Labels
Prediction
Performance
Model
Validation
Fold Data
Validation
Fold Labels
Performance
Performance
Performance
Performance
Performance
1
2
3
4
5
Performance
1
10 ∑
10
i=1
Performancei
=
This work by Sebastian Raschka is licensed under a
K-fold Cross-Validation
43
K-fold Cross-Validation Workflow 1-3
Test Labels
Test Data
Training Data
Training Labels
Data
Labels
Model
Model
Model
Learning
Algorithm
Hyperparameter
values
Hyperparameter
values
Hyperparameter
values
Training Data
Training Labels
Learning
Algorithm
Best
Hyperparameter
Values
Model
Training Data
Training Labels
2
1
3
44
K-fold Cross-Validation Workflow 4-5
Prediction
Test Labels
Performance
Model
Test Data
Learning
Algorithm
Best
Hyperparameter
Values Final
Model
Data
Labels
4
5
45
More info about model evaluation (one of the most
important topics in ML):
https://sebastianraschka.com/blog/index.html
• Model evaluation, model selection, and algorithm selection in machine learning Part I - The basics
• Model evaluation, model selection, and algorithm selection in machine learning Part II -
Bootstrapping and uncertainties
• Model evaluation, model selection, and algorithm selection in machine learning Part III - Cross-
validation and hyperparameter tuning
46
Jupyter Notebook
47
BONUS SLIDES
48
https://www.tensorflow.org
49
TensorFlow:
Large-Scale Machine Learning on Heterogeneous Distributed Systems
(Preliminary White Paper, November 9, 2015)
Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,
Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,
Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,
Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,
Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,
Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,
Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
Google Research⇤
Abstract
TensorFlow [1] is an interface for expressing machine learn-
ing algorithms, and an implementation for executing such al-
gorithms. A computation expressed using TensorFlow can be
executed with little or no change on a wide variety of hetero-
geneous systems, ranging from mobile devices such as phones
and tablets up to large-scale distributed systems of hundreds
of machines and thousands of computational devices such as
GPU cards. The system is flexible and can be used to express
a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been
used for conducting research and for deploying machine learn-
ing systems into production across more than a dozen areas of
sequence prediction [47], move selection for Go [34],
pedestrian detection [2], reinforcement learning [38],
and other areas [17, 5]. In addition, often in close collab-
oration with the Google Brain team, more than 50 teams
at Google and other Alphabet companies have deployed
deep neural networks using DistBelief in a wide variety
of products, including Google Search [11], our advertis-
ing products, our speech recognition systems [50, 6, 46],
Google Photos [43], Google Maps and StreetView [19],
Google Translate [18], YouTube, and many others.
Based on our experience with DistBelief and a more
complete understanding of the desirable system proper-
ties and requirements for training and using neural net-
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf
Figure 1: Example TensorFlow code fragm
W
b
x
MatMul
Add
ReLU
...
C
50
https://sebastianraschka.com/pdf/books/dlb/appendix_g_tensorflow.pdf
at performing highly parallelized numerical computations. In addition, TensorFlow also
supports distributed systems as well as mobile computing platforms, including Android and
Apple’s iOS.
But what is a tensor? In simplifying terms, we can think of tensors as multidimensional
arrays of numbers, as a generalization of scalars, vectors, and matrices.
1. Scalar: R
2. Vector: Rn
3. Matrix: Rn × Rm
4. 3-Tensor: Rn × Rm × Rp
5. …
When we describe tensors, we refer to its “dimensions” as the rank (or order) of a tensor,
which is not to be confused with the dimensions of a matrix. For instance, an m × n matrix,
where m is the number of rows and n is the number of columns, would be a special case of
a rank-2 tensor. A visual explanation of tensors and their ranks is given is the figure below.
Index [2]
Index [0,0]
Index [0,2,1]
rank 0 tensor
dimensions [ ]
scalar
rank 2 tensor
dimensions [5, 3]
matrix
rank 1 tensor
dimensions [5]
vector
rank 3 tensor
dimensions [4, 4, 2]
Tensors?
51
GPUs
52
x =
X = np.random.random((num_train_examples, num_features))
W = np.random.random((num_features, num_hidden))
Vectorization
53
x =
Vectorization
54
Computation Graphs
a(x, w, b) = relu(w*x + b)
u
v
u = wx
x
w
b
+
*
v = u+b a = relu(v)
55
Computation Graphs
Tensor("x:0", dtype=float32) <tf.Variable 'w:0' shape=() dtype=float32_ref> <tf.Variable
'b:0' shape=() dtype=float32_ref> Tensor("mul:0", dtype=float32) Tensor("add:0",
dtype=float32) Tensor("Relu:0", dtype=float32)
import tensorflow as tf
g = tf.Graph()
with g.as_default() as g:
x = tf.placeholder(dtype=tf.float32, shape=None, name='x')
w = tf.Variable(initial_value=2, dtype=tf.float32, name='w')
b = tf.Variable(initial_value=1, dtype=tf.float32, name='b')
u = x * w
v = u + b
a = tf.nn.relu(v)
print(x, w, b, u, v, a)
56
Computation Graphs
u = wx
b=1
+
*
v = u+b a = relu(v)
with tf.Session(graph=g) as sess:
sess.run(init_op)
b_res = sess.run(’b:0')
print(b_res)
1.0
x
w=2
57
u = wx
x=3
w=2
b=1
+
*
v = u+b a = relu(v)
6
7 7
!"
!#
$#
$%
$#
$&
$&
$'
()
(*
=
(+
(*
()
(+
()
(,
=
(-
(,
()
(-
=
(-
(,
(+
(-
()
(+
= 1
= 1
= 1
= 3
= 1
= 3*1*1 = 3
https://github.com/rasbt/pydata-annarbor2017-dl-tutorial 58
g = tf.Graph()
with g.as_default() as g:
x = tf.placeholder(dtype=tf.float32, shape=None, name='x')
w = tf.Variable(initial_value=2, dtype=tf.float32, name='w')
b = tf.Variable(initial_value=1, dtype=tf.float32, name='b')
u = x * w
v = u + b
a = tf.nn.relu(v)
d_a_w = tf.gradients(a, w)
d_b_w = tf.gradients(a, b)
with tf.Session(graph=g) as sess:
sess.run(tf.global_variables_initializer())
res = sess.run([d_a_w, d_b_w], feed_dict={'x:0': 3})
[3.0] [1.0] 59
http://pytorch.org
60
d_a_w: Variable containing:
3
[torch.FloatTensor of size 1]
d_a_b: Variable containing:
1
[torch.FloatTensor of size 1]
import torch
import torch.nn.functional as F
from torch.autograd import Variable
from torch.autograd import grad
x = Variable(torch.Tensor([3]))
w = Variable(torch.Tensor([2]), requires_grad=True)
b = Variable(torch.Tensor([1]), requires_grad=True)
u = x * w
v = u + b
a = F.relu(v)
partial_derivatives = grad(a, (w, b))
for name, grad in zip("wb", (partial_derivatives)):
print('d_a_%s:' % name, grad)
61
https://github.com/rasbt/python-machine-learning-book-2nd-edition/blob/master/code/ch12/images/12_02.png
Multilayer Perceptron
62
g = tf.Graph()
with g.as_default():
# Input data
tf_x = tf.placeholder(tf.float32, [None, n_input], name='features')
tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets')
# Model parameters
weights = {
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)),
'out': tf.Variable(tf.truncated_normal([n_hidden_2, n_classes], stddev=0.1))
}
biases = {
'b1': tf.Variable(tf.zeros([n_hidden_1])),
'out': tf.Variable(tf.zeros([n_classes]))
}
# Multilayer perceptron
layer_1 = tf.add(tf.matmul(tf_x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
# Loss and optimizer
loss = tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=tf_y)
cost = tf.reduce_mean(loss, name='cost')
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train = optimizer.minimize(cost, name='train')
# Prediction
correct_prediction = tf.equal(tf.argmax(tf_y, 1), tf.argmax(out_layer, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')
with tf.Session(graph=g) as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = mnist.train.num_examples // batch_size
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
_, c = sess.run(['train', 'cost:0'], feed_dict={'features:0': batch_x,
'targets:0': batch_y})
class MultilayerPerceptron(torch.nn.Module):
def __init__(self, num_features, num_classes):
super(MultilayerPerceptron, self).__init__()
### 1st hidden layer
self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
### Output layer
self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
def forward(self, x):
out = self.linear_1(x)
out = F.relu(out)
logits = self.linear_out(out)
probas = F.softmax(logits, dim=1)
return logits, probas
model = MultilayerPerceptron(num_features=num_features,
num_classes=num_classes)
if torch.cuda.is_available():
model.cuda()
for epoch in range(num_epochs):
for batch_idx, (features, targets) in enumerate(train_loader):
features = Variable(features.view(-1, 28*28))
targets = Variable(targets)
if torch.cuda.is_available():
features, targets = features.cuda(), targets.cuda()
### FORWARD AND BACK PROP
logits, probas = model(features)
cost = cost_fn(logits, targets)
optimizer.zero_grad()
cost.backward()
### UPDATE MODEL PARAMETERS
optimizer.step()
63
Further Resources
Math-heavy Math-free scikit-learn intro Mix of code & math
(~60% scikit-learn)
64
Contact:
o E-mail: mail@sebastianraschka.com
o Website: http://sebastianraschka.com
o Twitter: @rasbt
o GitHub: rasbt
Tutorial Material on GitHub:
https://github.com/rasbt/msu-datascience-ml-tutorial-2018
Thanks for attending!
65

More Related Content

PDF
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
PPTX
lecture15-neural-nets (2).pptx
PDF
Hands-on - Machine Learning using scikitLearn
PPTX
Week_1 Machine Learning introduction.pptx
PDF
How machines can take decisions
PDF
How machines can take decisions
PPTX
Keynote at IWLS 2017
PPTX
background.pptx
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
lecture15-neural-nets (2).pptx
Hands-on - Machine Learning using scikitLearn
Week_1 Machine Learning introduction.pptx
How machines can take decisions
How machines can take decisions
Keynote at IWLS 2017
background.pptx

Similar to Machine Learning Crash Course by Sebastian Raschka (20)

PDF
Machine_Learning_Co__
PDF
Ds for finance day 3
PDF
DC02. Interpretation of predictions
PDF
Metric-learn, a Scikit-learn compatible package
PPTX
Neural Learning to Rank
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PDF
How to easily find the optimal solution without exhaustive search using Genet...
PDF
Visual diagnostics for more effective machine learning
PDF
Machine learning for sensor Data Analytics
PDF
XGBoost @ Fyber
PDF
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
PDF
Introduction of Feature Hashing
PDF
[Eestec] Machine Learning online seminar 1, 12 2016
PPTX
week 9 lec 15.pptx
PDF
PythonとAutoML at PyConJP 2019
PPTX
Data Science and Machine Learning with Tensorflow
PDF
Scalable machine learning
PPTX
presentation of IntroductionDeepLearning.pptx
PDF
The Art Of Performance Tuning
Machine_Learning_Co__
Ds for finance day 3
DC02. Interpretation of predictions
Metric-learn, a Scikit-learn compatible package
Neural Learning to Rank
Lessons Learned from Building Machine Learning Software at Netflix
How to easily find the optimal solution without exhaustive search using Genet...
Visual diagnostics for more effective machine learning
Machine learning for sensor Data Analytics
XGBoost @ Fyber
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Introduction of Feature Hashing
[Eestec] Machine Learning online seminar 1, 12 2016
week 9 lec 15.pptx
PythonとAutoML at PyConJP 2019
Data Science and Machine Learning with Tensorflow
Scalable machine learning
presentation of IntroductionDeepLearning.pptx
The Art Of Performance Tuning
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Database Infoormation System (DBIS).pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Computer network topology notes for revision
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Business Analytics and business intelligence.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Lecture1 pattern recognition............
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
ISS -ESG Data flows What is ESG and HowHow
Database Infoormation System (DBIS).pptx
1_Introduction to advance data techniques.pptx
IB Computer Science - Internal Assessment.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Computer network topology notes for revision
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Business Analytics and business intelligence.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Lecture1 pattern recognition............
Reliability_Chapter_ presentation 1221.5784
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
IBA_Chapter_11_Slides_Final_Accessible.pptx
Clinical guidelines as a resource for EBP(1).pdf
Ad

Machine Learning Crash Course by Sebastian Raschka

  • 1. Sebastian Raschka, Ph.D. MSU Data Science workshop East Lansing, Michigan State University • Feb 21, 2018 Machine Learning with Python
  • 2. Today’s focus: And if we have time, a quick overview ... 2
  • 3. Contact: o E-mail: mail@sebastianraschka.com o Website: http://sebastianraschka.com o Twitter: @rasbt o GitHub: rasbt Tutorial Material on GitHub: https://github.com/rasbt/msu-datascience-ml-tutorial-2018 3
  • 4. Machine learning is used & useful (almost) anywhere 4
  • 5. 5
  • 6. 3 Types of Learning Reinforcement Supervised Unsupervised 6
  • 7. Working with Labeled Data Supervised Learning ? x (“input”) y (“output”) x1 (“input”) x 2 (“input”) ? Regression Classification 7
  • 8. Working with Unlabeled Data Unsupervised Learning Clustering Compression 8
  • 9. Topics 1. Introduction to Machine Learning 2. Linear Regression 3. Introduction to Classification 4. Feature Preprocessing & scikit-learn Pipelines 5. Dimensionality Reduction: Feature Selection & Extraction 6. Model Evaluation & Hyperparameter Tuning 9
  • 10. y (response variable) x (explanatory variable) (xi, yi) ŷ = w0 + w1x w0 (intercept) w1 (slope) = Δy / Δx Δx Δy vertical offset |ŷ − y| Simple Linear Regression 10
  • 11. 11 Columns: features (explanatory variables, independent variables, covariates, predictors, variables, inputs, attributes) x0 x1 … xm x0,0 x0,1 x1,0 x1,1 x2,0 x2,1 x3,0 x3,1 . . . xn,0 xn,1 … xn,m X= y0 y1 y2 y3 . . . yn y= Data Representation Rows: training examples (observations, records, instances, samples) Targets (target variable,response variable, dependent variable, labels, ground truth)
  • 12. Learning Algorithm Hyperparameter Values Model Prediction Test Labels Performance Model Learning Algorithm Hyperparameter Values Final Model 2 3 4 1 Test Labels Test Data Training Data Training Labels Data Labels Data Labels Training Data Training Labels Test Data “Basic” Supervised Learning Workflow 12
  • 14. Topics 1. Introduction to Machine Learning 2. Linear Regression 3. Introduction to Classification 4. Feature Preprocessing & scikit-learn Pipelines 5. Dimensionality Reduction: Feature Selection & Extraction 6. Model Evaluation & Hyperparameter Tuning 14
  • 15. Scikit-learn API class SupervisedEstimator(...): def __init__(self, hyperparam, ...): ... def fit(self, X, y): ... return self def predict(self, X): ... return y_pred def score(self, X, y): ... return score ... 15
  • 17. features (columns) sepal length [cm] sepal width [cm] petal lengt h [cm] petal width [cm] 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 50 6.4 3.5 4.5 1.2 . . . 150 5.9 3.0 5.0 1.8 X= setosa setosa versicolor . . . virginica y= samples (rows) sepal petal Iris Dataset 17
  • 18. Note about Non-Stratified Splits § training set → 38 x Setosa, 28 x Versicolor, 34 x Virginica § test set → 12 x Setosa, 22 x Versicolor, 16 x Virginica 18
  • 19. Linear Regression Recap Σ . . . w1 wm w2 w0 x1 1 x2 xm y Activation function Net input function a z Predicted output Weight coefficients Input values Bias unit 19
  • 20. Linear Regression Recap Σ . . . w1 wm w2 w0 x1 1 x2 xm y Activation function Net input function a z Predicted output Weight coefficients Input values Bias unit Here: Identity function 20
  • 21. Logistic Regression, a Generalized Linear Model (a Classifier) Σ . . . w1 wm w2 w0 x1 1 x2 xm y Activation function Net input function a z Unit step function Predicted class label Weight coefficients Input values Bias unit Predicted probability 21
  • 22. A “Lazy Learner:” K-Nearest Neighbors Classifier x1 ? 3 × 1 × 1 × Predict ? = x2 22
  • 24. http://scikit-learn.org/stable/supervised_learning.html There are many, many more classification and regression algorithms ... 24
  • 25. Topics 1. Introduction to Machine Learning 2. Linear Regression 3. Introduction to Classification 4. Feature Preprocessing & scikit-learn Pipelines 5. Dimensionality Reduction: Feature Selection & Extraction 6. Model Evaluation & Hyperparameter Tuning 25
  • 26. Categorical Variables color size price class label red M $10.49 0 blue XL $15.00 1 green L $12.99 1 26
  • 27. Encoding Categorical Variables (Ordinal vs Nominal) color size price class label red M $10.49 0 blue XL $15.00 1 green L $12.99 1 size 0 2 1 red blue green 1 0 0 0 1 0 0 0 1 27
  • 28. Feature Normalization feature minmax z-score 1.0 0.0 -1.46385 2.0 0.2 -0.87831 3.0 0.4 -0.29277 4.0 0.6 0.29277 5.0 0.8 0.87831 6.0 1.0 1.46385 Min-max scaling Z-score standardization 28
  • 29. Scikit-learn API class UnsupervisedEstimator(...): def __init__(self, ...): ... def fit(self, X): ... return self def transform(self, X): ... return X_transf def predict(self, X): ... return pred 29
  • 30. Scikit-learn Pipelines Class labels Training data Test data Learning Algorithm Dimensionality Reduction Scaling Model Pipeline fit fit & transform fit & transform fit transform transform Class labels predict predict 30
  • 32. Topics 1. Introduction to Machine Learning 2. Linear Regression 3. Introduction to Classification 4. Feature Preprocessing & scikit-learn Pipelines 5. Dimensionality Reduction: Feature Selection & Extraction 6. Model Evaluation & Hyperparameter Tuning 32
  • 33. Dimensionality Reduction – why? [cm] [cm] [cm] [cm] [cm] [cm] [cm] [cm] 33
  • 34. Dimensionality Reduction – why? predictive performance predictive performance storage & speed visualization & interpretability 34
  • 35. Recursive Feature Elimination available features: [ w1 w2 w3 w4 ] [ w1 w2 w4 ] [ w1 w4 ] [ w4 ] [ f1 f2 f3 f4 ] fit model, remove lowest weight, repeat fit model, remove lowest weight, repeat fit model, remove lowest weight, repeat 35
  • 36. Sequential Feature Selection [ f1 f2 f3 f4 ] [ f1 ] [ f2 ] [ f3 ] [ f4 ] [ f1 f3 ] [ f1 f2 ] [ f1 f4 ] [ f1 f3 f4 ] [ f1 f3 f2 ] available features: fit model, pick best, repeat fit model, pick best, repeat 36
  • 39. Topics 1. Introduction to Machine Learning 2. Linear Regression 3. Introduction to Classification 4. Feature Preprocessing & scikit-learn Pipelines 5. Dimensionality Reduction: Feature Selection & Extraction 6. Model Evaluation & Hyperparameter Tuning 39
  • 40. Learning Algorithm Hyperparameter Values Model Prediction Test Labels Performance Model Learning Algorithm Hyperparameter Values Final Model 2 3 4 1 Test Labels Test Data Training Data Training Labels Data Labels Data Labels Training Data Training Labels Test Data “Basic” Supervised Learning Workflow 40
  • 41. Holdout Method and Hyperparameter Tuning 1-3 2 1 Data Labels Training Data Validation Data Validation Labels Test Data Test Labels Training Labels Performance Model Validation Data Validation Labels Prediction Performance Model Validation Data Validation Labels Prediction Performance Model Validation Data Validation Labels Prediction Best Model Learning Algorithm Hyperparameter values Model Hyperparameter values Hyperparameter values Model Model Training Data Training Labels 3 Best Hyperparameter values 41
  • 42. Learning Algorithm Best Hyperparameter Values Final Model 6 Data Labels Prediction Test Labels Performance Model 4 Test Data Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels 5 Validation Data Validation Labels Holdout Method and Hyperparameter Tuning 4-6 42
  • 43. 1st 2nd 3rd 4th 5th K Iterations (K-Folds) Validation Fold Training Fold Learning Algorithm Hyperparameter Values Model Training Fold Data Training Fold Labels Prediction Performance Model Validation Fold Data Validation Fold Labels Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = This work by Sebastian Raschka is licensed under a K-fold Cross-Validation 43
  • 44. K-fold Cross-Validation Workflow 1-3 Test Labels Test Data Training Data Training Labels Data Labels Model Model Model Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data Training Labels Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels 2 1 3 44
  • 45. K-fold Cross-Validation Workflow 4-5 Prediction Test Labels Performance Model Test Data Learning Algorithm Best Hyperparameter Values Final Model Data Labels 4 5 45
  • 46. More info about model evaluation (one of the most important topics in ML): https://sebastianraschka.com/blog/index.html • Model evaluation, model selection, and algorithm selection in machine learning Part I - The basics • Model evaluation, model selection, and algorithm selection in machine learning Part II - Bootstrapping and uncertainties • Model evaluation, model selection, and algorithm selection in machine learning Part III - Cross- validation and hyperparameter tuning 46
  • 50. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (Preliminary White Paper, November 9, 2015) Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng Google Research⇤ Abstract TensorFlow [1] is an interface for expressing machine learn- ing algorithms, and an implementation for executing such al- gorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of hetero- geneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learn- ing systems into production across more than a dozen areas of sequence prediction [47], move selection for Go [34], pedestrian detection [2], reinforcement learning [38], and other areas [17, 5]. In addition, often in close collab- oration with the Google Brain team, more than 50 teams at Google and other Alphabet companies have deployed deep neural networks using DistBelief in a wide variety of products, including Google Search [11], our advertis- ing products, our speech recognition systems [50, 6, 46], Google Photos [43], Google Maps and StreetView [19], Google Translate [18], YouTube, and many others. Based on our experience with DistBelief and a more complete understanding of the desirable system proper- ties and requirements for training and using neural net- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf Figure 1: Example TensorFlow code fragm W b x MatMul Add ReLU ... C 50
  • 51. https://sebastianraschka.com/pdf/books/dlb/appendix_g_tensorflow.pdf at performing highly parallelized numerical computations. In addition, TensorFlow also supports distributed systems as well as mobile computing platforms, including Android and Apple’s iOS. But what is a tensor? In simplifying terms, we can think of tensors as multidimensional arrays of numbers, as a generalization of scalars, vectors, and matrices. 1. Scalar: R 2. Vector: Rn 3. Matrix: Rn × Rm 4. 3-Tensor: Rn × Rm × Rp 5. … When we describe tensors, we refer to its “dimensions” as the rank (or order) of a tensor, which is not to be confused with the dimensions of a matrix. For instance, an m × n matrix, where m is the number of rows and n is the number of columns, would be a special case of a rank-2 tensor. A visual explanation of tensors and their ranks is given is the figure below. Index [2] Index [0,0] Index [0,2,1] rank 0 tensor dimensions [ ] scalar rank 2 tensor dimensions [5, 3] matrix rank 1 tensor dimensions [5] vector rank 3 tensor dimensions [4, 4, 2] Tensors? 51
  • 53. x = X = np.random.random((num_train_examples, num_features)) W = np.random.random((num_features, num_hidden)) Vectorization 53
  • 55. Computation Graphs a(x, w, b) = relu(w*x + b) u v u = wx x w b + * v = u+b a = relu(v) 55
  • 56. Computation Graphs Tensor("x:0", dtype=float32) <tf.Variable 'w:0' shape=() dtype=float32_ref> <tf.Variable 'b:0' shape=() dtype=float32_ref> Tensor("mul:0", dtype=float32) Tensor("add:0", dtype=float32) Tensor("Relu:0", dtype=float32) import tensorflow as tf g = tf.Graph() with g.as_default() as g: x = tf.placeholder(dtype=tf.float32, shape=None, name='x') w = tf.Variable(initial_value=2, dtype=tf.float32, name='w') b = tf.Variable(initial_value=1, dtype=tf.float32, name='b') u = x * w v = u + b a = tf.nn.relu(v) print(x, w, b, u, v, a) 56
  • 57. Computation Graphs u = wx b=1 + * v = u+b a = relu(v) with tf.Session(graph=g) as sess: sess.run(init_op) b_res = sess.run(’b:0') print(b_res) 1.0 x w=2 57
  • 58. u = wx x=3 w=2 b=1 + * v = u+b a = relu(v) 6 7 7 !" !# $# $% $# $& $& $' () (* = (+ (* () (+ () (, = (- (, () (- = (- (, (+ (- () (+ = 1 = 1 = 1 = 3 = 1 = 3*1*1 = 3 https://github.com/rasbt/pydata-annarbor2017-dl-tutorial 58
  • 59. g = tf.Graph() with g.as_default() as g: x = tf.placeholder(dtype=tf.float32, shape=None, name='x') w = tf.Variable(initial_value=2, dtype=tf.float32, name='w') b = tf.Variable(initial_value=1, dtype=tf.float32, name='b') u = x * w v = u + b a = tf.nn.relu(v) d_a_w = tf.gradients(a, w) d_b_w = tf.gradients(a, b) with tf.Session(graph=g) as sess: sess.run(tf.global_variables_initializer()) res = sess.run([d_a_w, d_b_w], feed_dict={'x:0': 3}) [3.0] [1.0] 59
  • 61. d_a_w: Variable containing: 3 [torch.FloatTensor of size 1] d_a_b: Variable containing: 1 [torch.FloatTensor of size 1] import torch import torch.nn.functional as F from torch.autograd import Variable from torch.autograd import grad x = Variable(torch.Tensor([3])) w = Variable(torch.Tensor([2]), requires_grad=True) b = Variable(torch.Tensor([1]), requires_grad=True) u = x * w v = u + b a = F.relu(v) partial_derivatives = grad(a, (w, b)) for name, grad in zip("wb", (partial_derivatives)): print('d_a_%s:' % name, grad) 61
  • 63. g = tf.Graph() with g.as_default(): # Input data tf_x = tf.placeholder(tf.float32, [None, n_input], name='features') tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets') # Model parameters weights = { 'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)), 'out': tf.Variable(tf.truncated_normal([n_hidden_2, n_classes], stddev=0.1)) } biases = { 'b1': tf.Variable(tf.zeros([n_hidden_1])), 'out': tf.Variable(tf.zeros([n_classes])) } # Multilayer perceptron layer_1 = tf.add(tf.matmul(tf_x, weights['h1']), biases['b1']) layer_1 = tf.nn.relu(layer_1) out_layer = tf.matmul(layer_1, weights['out']) + biases['out'] # Loss and optimizer loss = tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=tf_y) cost = tf.reduce_mean(loss, name='cost') optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) train = optimizer.minimize(cost, name='train') # Prediction correct_prediction = tf.equal(tf.argmax(tf_y, 1), tf.argmax(out_layer, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy') with tf.Session(graph=g) as sess: sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): avg_cost = 0. total_batch = mnist.train.num_examples // batch_size for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size) _, c = sess.run(['train', 'cost:0'], feed_dict={'features:0': batch_x, 'targets:0': batch_y}) class MultilayerPerceptron(torch.nn.Module): def __init__(self, num_features, num_classes): super(MultilayerPerceptron, self).__init__() ### 1st hidden layer self.linear_1 = torch.nn.Linear(num_features, num_hidden_1) ### Output layer self.linear_out = torch.nn.Linear(num_hidden_2, num_classes) def forward(self, x): out = self.linear_1(x) out = F.relu(out) logits = self.linear_out(out) probas = F.softmax(logits, dim=1) return logits, probas model = MultilayerPerceptron(num_features=num_features, num_classes=num_classes) if torch.cuda.is_available(): model.cuda() for epoch in range(num_epochs): for batch_idx, (features, targets) in enumerate(train_loader): features = Variable(features.view(-1, 28*28)) targets = Variable(targets) if torch.cuda.is_available(): features, targets = features.cuda(), targets.cuda() ### FORWARD AND BACK PROP logits, probas = model(features) cost = cost_fn(logits, targets) optimizer.zero_grad() cost.backward() ### UPDATE MODEL PARAMETERS optimizer.step() 63
  • 64. Further Resources Math-heavy Math-free scikit-learn intro Mix of code & math (~60% scikit-learn) 64
  • 65. Contact: o E-mail: mail@sebastianraschka.com o Website: http://sebastianraschka.com o Twitter: @rasbt o GitHub: rasbt Tutorial Material on GitHub: https://github.com/rasbt/msu-datascience-ml-tutorial-2018 Thanks for attending! 65