The fundamentals of Machine Learning

Hichem Felouat - hichemfel@gmail.com - Algeria 1
The Fundamentals
of
Machine Learning
Hichem Felouat
hichemfel@gmail.com

2Hichem Felouat - hichemfel@gmail.com - Algeria
What Is Artificial Intelligence?
Artificial intelligence (AI) is an area of computer science
that emphasizes the creation of intelligent machines that work
and react like humans.
• AI is an interdisciplinary science with multiple approaches.
• AI has become an essential part of the technology industry.

Subdomains of Artificial Intelligence

What Is Machine Learning?
• Machine Learning is the science (and
art) of programming computers so
they can learn from data.
• Machine Learning is the field of
study that gives computers the ability
to learn without being explicitly
programmed. —Arthur Samuel, 1959

What Does Learning Mean?
A computer program is said to
learn from experience E with
respect to some task T and some
performance measure P, if its
performance on T, as measured by
P, improves with experience E. —
Tom Mitchell, 1997

Timeline of Machine Learning

Why Use Machine Learning?
The traditional approach. If the problem is not trivial, your program will
likely become a long list of complex rules pretty hard to maintain.

Machine Learning approach. The program is much shorter, easier to
maintain, and most likely more accurate.

Machine Learning can help humans learn.

AI Index 2019 Annual Report.

Applications of Machine Learning
Machine learning is currently the preferred approach in the following
domains:
1) Speech analysis: e.g., speech recognition, synthesis.
2) Computer vision: e.g., object recognition/detection.
3) Robotics: e.g., position/map estimation.
4) Bio-informatics: e.g., sequence alignment, genetic analysis.
5) E-commerce: e.g., automatic trading, fraud detection.
6) Financial analysis: e.g., portfolio allocation, credits.
7) Medicine: e.g., diagnosis, therapy conception.
8) Web: e.g., Content management, social networks, etc.

Applications of Machine Learning
To summarize, Machine Learning is great for:
• Problems for which existing solutions require a lot of hand-tuning or
long lists of rules: one Machine Learning algorithm can often simplify
code and perform better.
• Complex problems for which there is no good solution at all using a
traditional approach: the best Machine Learning techniques can find a
solution.

How to get started with ML
1) Mathematics: statistics, probability, and
linear algebra.(NumPy, SciPy)
2) Programming: data structures, OOP, and
parallel programming. (Python)
3) Databases: SQL and NOSQL.
4) ML algorithms: regression, classification,
and clustering.
5) ML Tools: Scikt learn, TensorFlow and
Keras.

How to get started with ML

Machine Learning Vocabulary 1
1) Examples: Items or instances of data used for learning or evaluation. In our
spam problem, these examples correspond to the collection of email
messages we will use for learning and testing.
2) Training sample: Examples used to train a learning algorithm. In our spam
problem, the training sample consists of a set of email examples along with
their associated labels.
3) Labels: Values or categories assigned to examples. In classification
problems, examples are assigned specific categories, for instance, the spam
and non-spam categories in our binary classification problem. In regression,
items are assigned real-valued labels.

5) Test sample: Examples used to evaluate the performance of a learning algorithm. The test
sample is separate from the training and validation data and is not made available in the
learning stage. In the spam problem, the test sample consists of a collection of email
examples for which the learning algorithm must predict labels based on features. These
predictions are then compared with the labels of the test sample to measure the performance
of the algorithm.
4) Features: The set of attributes, often represented as a vector, associated to an example. In
the case of email messages, some relevant features may include the length of the message,
the name of the sender, various characteristics of the header, the presence of certain
keywords in the body of the message, and so on.
6) Loss function: A function that measures the difference, or loss, between a
predicted label and a true label.

Types of Machine Learning Systems
There are so many different types of Machine Learning systems that it is
useful to classify them in broad categories based on:
• Whether or not they are trained with human supervision (supervised,
unsupervised, semisupervised, and Reinforcement Learning).
• Whether or not they can learn incrementally on the fly (online versus
batch learning).
• Whether they work by simply comparing new data points to known data
points, or instead detect patterns in the training data and build a
predictive model, much like scientists do (instance-based versus model-
based learning).

Supervised learning :
In supervised learning, the training data you feed to the
algorithm includes the desired solutions, called labels.
• When y is real, we talk about regression.
• When y is discrete, we talk about classification.

A labeled training set for supervised learning.

Here are some of the most important supervised
learning algorithms:
• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks*

Unsupervised Learning:
In unsupervised learning, as you might guess, the training data is
unlabeled. The system tries to learn without a teacher.
No labels are given to the learning algorithm, leaving it on its own to
explore or find structure in the data.

An unlabeled training set for unsupervised learning.

Here are some of the most important unsupervised
learning algorithms:
• Clustering
• Visualization and dimensionality reduction

Semi-Supervised Learning :
Some algorithms can deal with partially labeled training data,
usually a lot of unlabeled data and a little bit of labeled data. This
is called semi-supervised learning.
Most semi-supervised learning algorithms are combinations of
unsupervised and supervised algorithms.

Reinforcement Learning :
• The learning system called an agent in this context.
• Can observe the environment, select and perform actions, and get
rewards in return (or penalties in the form of negative rewards).
• It must then learn by itself what is the best strategy, called a policy, to get
the most reward over time.
• A policy defines what action the agent should choose when it is in a given
situation.

Reinforcement Learning

Batch learning:
In batch learning, the system is incapable of learning
incrementally: it must be trained using all the available
data. This will generally take a lot of time and computing
resources, so it is typically done offline. First, the system is
trained, and then it is launched into production and runs
without learning anymore; it just applies what it has learned.
This is called offline learning.

On-line learning:
In online learning, you train the system incrementally by
feeding it data instances sequentially, either individually
or by small groups called mini batches. Each learning step is
fast and cheap, so the system can learn about new data
on the fly, as it arrives.

Online learning

Instance-Based VS Model-Based Learning
One more way to categorize Machine Learning systems is by how
they generalize. Most Machine Learning tasks are about making
predictions. This means that given a number of training examples,
the system needs to be able to generalize to examples it has never
seen before.
Having a good performance measure on the training data is good,
but insufficient; the true goal is to perform well on new instances.
There are two main approaches to generalization: instance-based
learning and model-based learning.

Instance-based learning:
The system learns the examples by heart, then generalizes to new
cases using a similarity measure.

Model-based learning:
Build a model of these examples, then use that model to make
predictions.

Loss Function
The loss function computes the error for a single training
example, while the cost function is the average of the loss
functions of the entire training set.

• Hyperparameters : are configuration variables that are external to the model
and whose values cannot be estimated from data. That is to say, they can not
be learned directly from the data in standard model training. They are almost
always specified by the machine learning engineer prior to training.
• Regression: this is the problem of predicting a real value for each item.
Examples of regression include prediction of stock values or that of variations
of economic variables.
• Classification: this is the problem of assigning a category to each item.
• Clustering: this is the problem of partitioning a set of items into
homogeneous subsets.

In Summary
1) You studied the data.
2) You selected a model.
3) You trained it on the training data.
4) Finally, you applied the model to make predictions
on new cases.

Main Challenges of Machine Learning
In short, since your main task is to select
a learning algorithm and train it on some
data, the two things that can go wrong are
“bad data” and “bad algorithm”.

1- Database

1- Database
1- Insufficient Quantity of Training Data :
Machine Learning takes a lot of data for most Machine
Learning algorithms to work properly. Even for very simple
problems you typically need thousands of examples, and
for complex problems such as image or speech
recognition you may need millions of examples (unless
you can reuse parts of an existing model).

1- Database
2) Non-representative Training Data:
In order to generalize well, it is crucial that your training data be representative of
the new cases you want to generalize to. This is true whether you use instance-
based learning or model-based learning.

1- Database
3) Poor-Quality Data:
If your training data is full of errors, outliers, and noise (e.g., due to poor quality
measurements), it will make it harder for the system to detect the underlying patterns, so your
system is less likely to perform well. It is often well worth the effort to spend time cleaning up
your training data. The truth is, most data scientists spend a significant part of their time
doing just that. For example:
1) If some instances are clearly outliers, it may help to simply discard them or try to fix
the errors manually.
2) If some instances are missing a few features (e.g., 5% of your customers did not
specify their age), you must decide whether you want to ignore this attribute altogether,
ignore these instances, fill in the missing values (e.g., with the median age), or train
one model with the feature and one model without it, and so on.

1- Database
4) Irrelevant Features:
Your system will only be capable of learning if the training data contains enough
relevant features and not too many irrelevant ones. A critical part of the success
of a Machine Learning project is coming up with a good set of features to train on.
This process, called feature engineering, involves:
1) Feature selection: selecting the most useful features to train on among
existing features.
2) Feature extraction: combining existing features to produce a more useful
one (dimensionality reduction algorithms can help).
3) Creating new features by gathering new data.

2- Algorithm
1) Overfitting the Training Data:
Overfitting happens when a model learns the detail and noise in the training
data to the extent that it negatively impacts the performance of the model on
new data. This means that the noise or random fluctuations in the training data
is picked up and learned as concepts by the model. The problem is that these
concepts do not apply to new data and negatively impact the models ability to
generalize.
The model performs well on the training data, but it does not
generalize well.

2- Algorithm
2) Underfitting the Training Data:
Underfitting is the opposite of overfitting: it occurs
when your model is too simple to learn the
underlying structure of the data.

2- Algorithm

How to Avoid Underfitting and Overfitting
Underfitting :
• Complexify model
• Add more features
• Train longer
Overfitting :
• validation
• Perform regularization
• Get more data
• Remove/Add some features

Common Classification Model Evaluation
Metrics : Confusion Matrix
The confusion matrix is used to describe the performance of a
classification model on a set of test data for which true values are known.

metrics : Main Metrics

Common Regression Model Evaluation
metrics : Mean Absolute Error

metrics : Mean Square Error

metrics : Mean Absolute Percentage Error

metrics : Mean Percentage Error

Testing and Validating
It is common to use 80% of the data for training and hold out 20% for
testing.
If the training error is low (i.e., your model makes few mistakes on the training
set) but the generalization error is high, it means that your model is overfitting the
training data.
A common solution to this problem is to have a second holdout set called the
validation set. You train multiple models with various hyperparameters using the
training set, you select the model and hyperparameters that perform best on the
validation set, and when you’re happy with your model you run a single final test
against the test set to get an estimate of the generalization error.

Testing and Validating : Cross-Validation
Cross-Validation (CV) : the training set is split into
complementary subsets, and each model is trained against
a different combination of these subsets and validated
against the remaining parts. Once the model type and
hyperparameters have been selected, a final model is
trained using these hyperparameters on the full training set,
and the generalized error is measured on the test set.

Testing and Validating : Cross-Validation

Boosting
Boosting refers to any Ensemble method that can combine
several weak learners into a strong learner. The general
idea of most boosting methods is to train predictors
sequentially, each trying to correct its predecessor. There
are many boosting methods available, but by far the most
popular are AdaBoost (Adaptive Boosting) and Gradient
Boosting.

Boosting
AdaBoost sequential training with instance weight updates

Voting Classifiers
The Voting Classifier: is a meta-classifier for combining similar or
conceptually different machine learning classifiers for classification via majority
or plurality voting. (For simplicity, we will refer to both majority and plurality voting
as majority voting.)

Dimensionality Reduction
Many Machine Learning problems involve thousands or even millions of features
for each training instance. Not only does this make training extremely slow, but it
can also make it much harder to find a good solution. This problem is often
referred to as the curse of dimensionality.
Principal Component Analysis

Hyperparameter Tuning
Hyperparameter Tuning : works by running multiple trials in a
single training job. Each trial is a complete execution of your training
application with values for your chosen hyperparameters, set within
limits you specify. The AI Platform training service keeps track of the
results of each trial and makes adjustments for subsequent trials.
When the job is finished, you can get a summary of all the trials along
with the most effective configuration of values according to the
criteria you specify.

Steps to Build a Machine Learning System
1. Data collection.
2. Improving data quality (data preprocessing).
3. Feature engineering (feature extraction and
selection, dimensionality reduction).
4. Splitting data into training and evaluation sets.
5. Algorithm selection.
6. Training.
7. Evaluation + Hyperparameter tuning.
8. Testing.
9. Deployment

Deep Learning is a subfield of machine learning
concerned with algorithms inspired by the structure and
function of the brain called artificial neural networks.
Deep Learning

Deep Learning VS Machine Learning

Feature extraction
Engineering of features is , however, a tedious process for several
reasons: Takes a lot of time and Requires expert knowledge.
For learning-based applications, a lot of time is spent to adjust the
features.
Extracted features often lack a structural representation reflecting
abstraction levels in the problem at hand.

Representation learning
Deep Learning aims at learning automatically
representations from large sets of labeled data:
• The machine is powered with raw data.
• Automatic discovery of representations.

Deep learning models
Several DL models have been proposed :
• Autoencoders (Aes)
• Deep belief networks (DBNs)
• Convolutional neural networks (CNNs).
• Recurrent neural networks (RNNs).
• Generative adversial networks (GANs), etc.

Convolutional neural networks (CNNs)

Thank you for your
attention

The fundamentals of Machine Learning

More Related Content

What's hot (20)

Similar to The fundamentals of Machine Learning (20)

More from Hichem Felouat (11)

Recently uploaded (20)

The fundamentals of Machine Learning