MachineLearning_Road to deep learning.pdf

Lecture 1: Machine Learning
Andreas Wichert
Department of Computer Science and Engineering
Técnico Lisboa

Corpo docente – Alameda/Tagus
• Andreas (Andrzej) Wichert
• andreas.wichert@tecnico.ulisboa.pt
• tel: 214233231
• room: N2 5-7 (Taguspark)
• http://web.tecnico.ulisboa.pt/andreas.wichert/

Main Literature
• Christopher M. Bishop, Pattern Recognition and Machine
Learning (Information Science and Statistics), Springer 2006
• https://www.microsoft.com/en-us/research/people/cmbishop/#!prml-book
• Simon O. Haykin, Neural Networks and Learning
Machine, (3rd Edition), Pearson 2008
• Deep Learning, I. Goodfellow, Y. Bengio, A. Courville
MIT Press 2016
• https://www.deeplearningbook.org

Main Literature
• Machine Learning - A Journey to Deep Learning, A.
Wichert, Luis Sa-Couto, World Scientific, 2021
• Intelligent Big Multimedia Databases, A. Wichert,
World Scientific, 2015
• Preprocessing, Feature Extraction like DFT, Wavelets, will
be not covered in the lecture….

Additional Literature
• Machine Learning: A Probabilistic Perspective, K. Murphy, MIT
Press 2012
• Introduction To The Theory Of Neural Computation (Santa Fe
Institute Series Book 1), John A. Hertz, Anders S. Krogh, Richard
G. Palmer, Addison-Wesley Pub. Co, Redwood City, CA; 1
edition (January 1, 1991)
• I find this book to be one of the best written mathematical guides for
Neural Networks. See Perceptron, Backpropagation…

Literature Software
• Hands-On Machine Learning with Scikit-Learn and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems 1st
Edition, Aurélien Géron , O'Reilly Media; 1 edition (April 9, 2017)
• https://github.com/amitanalyste/aurelienGeron
• https://scikit-learn.org/stable/index.html
• http://www.numpy.org

I) Outline:
Introduction: What is Machine Learning?
1. Introduction
2. Decision Trees
Mathematical Tools:
3. Probability theory & Information (Naive Bayes)
4. Linear Algebra & Optimization (Simple NN)
Road to deep learning: Error Minimization (Loss), Regularization, Optimization by Gradient descent
5. Linear Regression & Bayesian Linear Regression
6. Perceptron & Logistic Regression
7. Multilayer Perceptrons

II) Outline
Why do the neural works work :
8. Learning theory, Bias-Variance
9. K-Means, EM-Clustering
10. Kernel Methods & RBF
11. Support Vector Machines
How to use the models:
12. Model Selection

III) Outline
Deep Learning solves the problem of high dimensionality which is
related to the training database size!
13. Deep Learning
14. Convolutional Neural Networks
15. Recurrent Neural Networks
Dimension Reduction:
16. PCA, ICA
17. Autoencoders

IV) Outline
Alternative Road to Machine Learning (Classical Approach):
18. Feature Extraction (FFT, SFT, Edge Detection)
19. k Nearest Neighbour & Locally Weighted Regression
20. Ensemble Methods
Probabilistic and Stochastic Approach:
21. Bayesian Networks
22. Stochastic Methods

What is machine Learning?
• Parallels between “animals” and machine learning
• Many techniques derived from efforts of psychologist / biologists to
make more sense “animal” learning through computational models

Machine Learning
• Statistical Machine Learning
• Linear Regression
• Clustering, Self Organizing Maps (SOM)
• Artificial Neural Networks, Kernel Machines
• Bayesian Network
• We will not cover….
• Inductive Learning (ID3)
• Knowledge Learning
• Analogical Learning
• SOAR: Model of Cognition and Learning

An Example of Symbolical Learning
(Patrick Winston-1975)

An Example (Patrick Winston-1975)

Statistical Machine Learning
• Changes in the system that perform tasks associated with AI
• Recognition
• Prediction
• Planning
• Diagnosis

Learning Input output functions
• Supervised
• With a teacher
• Unsupervised
• Without a teacher
• Reinforcemet Learning
• Actions within & responses from the environment
• Absence of a designated teacher to give positive and negative examples

MachineLearning_Road to deep learning.pdf

• We might add other features that are not correlated with the
ones we already have. A precaution should be taken not to
reduce the performance by adding such “noisy features”
• Ideally, the best decision boundary should be the one which
provides an optimal performance such as in the following figure:

• However, our satisfaction is premature because the
central aim of designing a classifier is to correctly
classify novel input
Issue of generalization!

• 1040 Neurons
• 104-5 connections
per neuron

Perceptron (1957)
• Linear threshold unit (LTU)
S
x1
x2
xn
.
.
.
w1
w2
wn
w0
x0=1
o
McCulloch-Pitts model of a neuron (1943)
The “bias”, a constant term that does
not depend on any input value

Linearly separable patterns
X0=1, bias...

(a) The two classes 1 (indicated by a big point) and −1 (indicated by a small point) are separated
by the line −1 + x1 + x2 = 0.
(b) The hyperplane −1+x1+x2 =y defines the line for y=0.

• The goal of a perceptron is to correctly classify the set of pattern
D={x1,x2,..xm} into one of the classes C1 and C2
• The output for class C1 is o=1 and for C2 is o=-1
• For n=2 è

Perceptron learning rule
• Consider linearly separable problems
• How to find appropriate weights
• Initialize each vector w to some small random values
• Look if the output pattern o belongs to the desired class, has the
desired value d
• h is called the learning rate
• 0 < h ≤ 1
Δw =η⋅(d −o)⋅ x

• In supervised learning the network has its output compared with
known correct answers
• Supervised learning
• Learning with a teacher
• (d-o) plays the role of the error signal

Frank Rosenblatt
• 1928-1971

• Rosenblatt's bitter rival and professional nemesis was Marvin Minsky of Carnegie
Mellon University
• Minsky despised Rosenblatt, hated the concept of the perceptron, and wrote
several polemics against him
• For years Minsky crusaded against Rosenblatt on a very nasty and personal level,
including contacting every group who funded Rosenblatt's research to denounce
him as a charlatan, hoping to ruin Rosenblatt professionally and to cut off all
funding for his research in neural nets

XOR problem and Perceptron
• By Minsky and Papert in mid 1960

k Means Clustering (Unsupervised Learning)
• The standard algorithm was first proposed by Stuart Lloyd in 1957

Back-propagation (1980)
• Back-propagation is a learning algorithm for multi-layer neural
networks
• It was invented independently several times
• Bryson an Ho [1969]
• Werbos [1974]
• Parker [1985]
• Rumelhart et al. [1986]
Parallel Distributed Processing - Vol. 1
Foundations
David E. Rumelhart, James L. McClelland and the PDP Research
Group
What makes people smarter than computers? These volumes by
a pioneering neurocomputing.....

Everyone was doing Back-propagation….

Kunihiko Fukushima
Kunihiko Fukushima received a B.Eng. degree in electronics in 1958 and
a PhD degree in electrical engineering in 1966 from Kyoto University,
Japan. He was a professor at Osaka University from 1989 to 1999, at the
University of Electro-Communications from 1999 to 2001, at Tokyo
University of Technology from 2001 to 2006; and a visiting professor at
Kansai University from 2006 to 2010. Prior to his Professorship, he was a
Senior Research Scientist at the NHK Science and Technology Research
Laboratories. He is now a Senior Research Scientist at Fuzzy Logic
Systems Institute (part-time position), and usually works at his home in
Tokyo.

Over-fitting
Root-Mean-Square (RMS) Error:

Data Set Size:
9th Order Polynomial

Regularization
Penalize large coefficient values

Problem of Local Minima
• The immediate solution to this is to build networks with more hidden
layers with regularization
• “Deep Learning”…
• Déjà vu?

Artificial intelligence pioneer (Geoffrey Hinton )
says we need to start over
• Back-propagation still
has a core role in AI's
future.
• Entirely new methods
will probably have to
be invented
• "I don't think it's how
the brain works," he
said. "We clearly don't
need all the labeled
data.

What is an „A“ ?
• What makes something similar to something else (specifically what
makes, for example, an uppercase letter 'A' recognisable as such)
• Metamagical Themas, Douglas Hoffstader, Basic Books, 1985

•What is the essence of dogness or house-ness?
•What is the essence of 'A'-ness?
•What is the essence of a given person's face, that it
will not be confused with other people's faces?
• How to convey these things to computers, which seem to
be best at dealing with hard-edged categories--categories
having crystal-clear, perfectly sharp boundaries?

• What Next?
• Example of what is machine learning: Decision Trees

Literature
• Simon O. Haykin, Neural Networks and Learning
Machine, (3rd Edition), Pearson 2008
• Chapter 1
• Christopher M. Bishop, Pattern Recognition and Machine
Learning (Information Science and Statistics), Springer
2006
• Section 1.1

Literature
• Machine Learning - A Journey to Deep Learning, A.
Wichert, Luis Sa-Couto, World Scientific, 2021
• Chapter 1

MachineLearning_Road to deep learning.pdf

More Related Content

Similar to MachineLearning_Road to deep learning.pdf (20)

More from ssuser012286 (9)

Recently uploaded (20)

MachineLearning_Road to deep learning.pdf