Practical ML

Practical ML
Antonio Pitasi
Samuele Sabella
2019 - Polo Fibonacci

Before we get started, we need some theory

Machine learning
● Practice: We define machine learning as a set of methods that can
automatically detect patterns in data, and then use the uncovered patterns to
predict future data, or to perform other kinds of decision making under
uncertainty
● Theory: How does learning performance vary with the number of training
examples presented? Which learning algorithms are most appropriate for
various types of learning tasks?
- Machine Learning, Tom Mitchell
- Machine Learning: A Probabilistic Perspective, Kevin P.
Murphy

ML is not only artificial neural networks
● Lots of mathematical models
○ Hidden Markov models
○ Support Vector Machines
○ Decision trees
○ Boltzmann machines, Deep belief network, Deep Boltzmann
● Neural network models are many...
○ Shallow network, Deep neural network
○ CNN (Yolo, AlexNet, GoogLeNet)
○ Echo state network, Deep echo state network
○ Rnn, LSTM, GRU

Machine learning categories
● Supervised: the goal is to learn a mapping from input to output given a
dataset of labeled pairs called training set (e.g. Iris Data Set
● Unsupervised: we have only a set of data points and the
goal is to find interesting patterns in the data
Example: young American males who buy diapers also have a predisposition
to buy beer (original story [3])
[2])

How does it work?
● Dataset of examples to learn from
● A model to learn from that data (e.g. neural net)
○ With some parameters to tune
○ With some hyperparameter to choose (neurons, layers, ...)
● Target function (loss) to minimize
cat: 0.9
lobster: 0.1
You are
[0.3, 0.2, 0.1]
wrong!

What is usually done
● Validation phase: compare different models and configurations
○ Which model to choose
○ Model hyper-parameters
● Test phase: loss, accuracy, recall, precision…
● We skip all this for seek of simplicity
OR?
Note: train/validation/test on different data

Models: feed-forward neural networks
virginica if argmax(net(input))==0 else setosa
non-linear
function
Note: A non-linear function ensure ANN to be a universal approximator [4]
Stack neurons in layers

Models: feed-forward neural networks
non-linear
function
*
*The optimizer will tune this
weights to minimize the loss
function in batch of examples
(btw we will use Adam [5])
Stack neurons in layers
virginica if argmax(net(input))==0 else setosa

A lot more stuff to know but for us...
UNDERSTANDING
MACHINE LEARNING
import keras

- Interactive
- Collaborative
- Python, R, Julia, Scala, ...

Easy to build a neural network
Easy to build a neural network wrong
Features:

Accuracy
Fitting
Performance
Keep an eye for:

Problem 1
Points classification

Our model
● For non-linearity: rectifier linear
unit
● We use a softmax function in the
output layer to represent a
probability distribution

Let’s code!
https://colab.research.google.com
https://ml.anto.pt

Me after
training a neural
network

Back to theory - Convolving Lenna
● Given a function f, a convolution g with a kernel w is given by a very complex
formula with a very simple meaning: “adding each element of the image to its
local neighbors, weighted by the kernel” (wikipedia)
Note: check the manual implementation at link http://bit.ly/ml-dummies-lenna References: [11]

Convolution arithmetic
● Zero-padding: deal with borders pixels by adding zeros (preserves the size)
● Pooling: helps the network to become transformation invariant (translations,
rotations...) [7]
padding=same && no strides
GIF credits - https://github.com/vdumoulin/conv_arithmetic
max pooling && 2x2 strides
No padding, no strides

Dealing with multiple input channels
References: [12]

GoogLeNet on ImageNet - Feature visualization
feature visualization of
the 1s conv. layer
layer 3a layer 4d
References: [8,9,10]
Stack multiple filters and
learn kernels dynamically
(hierarchy of features)

References
[1] Pattern Recognition in a Bucket
https://link.springer.com/chapter/10.1007/978-3-540-39432-7_63
[2] Iris dataset: https://archive.ics.uci.edu/ml/datasets/iris
[3] Beer and diapers: http://www.dssresources.com/newsletters/66.php
[4] Multilayer feedforward networks are universal approximators:
http://cognitivemedium.com/magic_paper/assets/Hornik.pdf
[5] Adam: A Method for Stochastic Optimization: https://arxiv.org/abs/1412.6980
[6] MNIST dataset: http://yann.lecun.com/exdb/mnist/

References
[7] Bengio, Yoshua, Ian Goodfellow, and Aaron Courville. Deep learning. Vol. 1.
MIT press, 2017: http://www.deeplearningbook.org/
[8] Feature-visualization: https://distill.pub/2017/feature-visualization/
[9] Going deeper with convolutions: https://arxiv.org/pdf/1409.4842.pdf
[10] Imagenet: A large-scale hierarchical image database: http://www.image-
net.org/papers/imagenet_cvpr09.pdf
[11] Culture, Communication, and an Information Age Madonna:
http://www.lenna.org/pcs_mirror/may_june01.pdf

References
[12] Intuitively Understanding Convolutions for Deep Learning:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-
learning-1f6f42faee1

Antonio Pitasi
Software Engineer, Nextworks
https://anto.pt
Samuele Sabella
https://github.com/samuelesabella

Practical ML

More Related Content

Similar to Practical ML (20)

Recently uploaded (20)

Practical ML

Editor's Notes