BE ML Module 1A_Introduction to Machine Learning.pptx

Introduction to
Machine Learning
Indu Dokare

What is machine learning?
● A branch of artificial intelligence, concerned with the
design and development of algorithms that allow
computers to evolve behaviors based on empirical data.
● As intelligence requires knowledge, it is necessary for the
computers to acquire knowledge

History of AI and ML
•Around 1960:
–First wave of AI
–Inference: Given knowledge, “I” (AI) can make decisions like a “MAN”.
Around 1990:
–Second wave of AI
–Learning: Given data, “I” can learn like a “MAN”.
•Around 2020:
–Third wave of AI
–Cyber-space: Given the internet, “I” can collect data and learn in a way
different from “MAN”.
•Around 2050:
–Fourth wave of AI
–Super AI: Nothing needed from MAN. “I” can do everything myself.

● Machine Learning takes advantage of the ability of computer
systems to learn from correlations hidden in the data; this ability
can be further utilized by programming or developing intelligent
and efficient Machine Learning algorithms.
● They use computational methods to “learn” information directly
from data without relying on a predetermined equation as a
model.
● As the number of samples available for learning increases,
machine learning algorithms adaptively improve their
performance.

● Machine Learning
¨Study of algorithms that improve their performance ¨at some task ¨with experience
● Optimize a performance criterion using example data or past
experience.
● Role of Statistics: Inference from a sample
● Role of Computer science: Efficient algorithms to
❖ Solve the optimization problem
❖ Representing and evaluating the model for inference

Machine learning vs. “classic” programming.
Sebastian Raschka:Intro to Machine Learning

Arthur Samuel (1959)
● Machine learning: "Field of study that gives computers the ability to learn without
being explicitly programmed"
○ Samuels wrote a checkers playing program
■ Had the program play 10000 games against itself
■ Work out which board positions were good and bad depending on wins/losses
Tom Michel (1999)
● Well posed learning problem: "A computer program is said to learn from experience
E with respect to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with experience E."
○ The checkers example,
■ E = 10000s games
■ T is playing checkers
■ P if you win or not

Definition of learning
Definition: A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasksT, as
measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
Machine - Learning - Tom Mitchell.
Inducting Learning: Learning from experience

ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• Training experience: A sequence of images and steering commands recorded while observing a
human driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself
Definition of learning
Machine - Learning - Tom Mitchell.

Supervised Learning :Teach the computer how to do something, then let it use it;s
new found knowledge to do it
In supervised learning, the computer is provided with example inputs that are
labeled with their desired outputs.
The purpose of this method is for the algorithm to be able to “learn” by
comparing its actual output with the “taught” outputs to find errors, and modify
the model accordingly.
Supervised learning therefore uses patterns to predict label values on additional
unlabeled data.
Types of Machine Learning

Unsupervised Learning: Let the computer learn how to do something, and
use this to determine structure and patterns in data
In unsupervised learning, data is unlabeled, so the learning algorithm is left to find
commonalities among its input data
Types of Machine Learning

Reinforcement learning
● Reinforcement learning (RL) is important for “strategy learning”. It is
useful for robotics, for playing games, etc.
● The well-known alpha-GO actually combined RL with deep learning,
and was the first program that defeated human expert Go-players.
● In RL, a learner is called an agent. The point is to take a correct
“action” for each environment “situation”.
● If there is a teacher who can tell the correct actions for all
situations, we can use supervised learning.
● In RL, we suppose that the teacher only “rewards” or
“punishes” the agent under some (not all) situations.
Machine Learning: Produced by Qiangfu Zhao

● RL can find a “map” (a Q-table) that defines the relation between the situation set
and the action set, so that the agent can get the largest reward by following this
map.
● To play a game successfully, the computer can generate many different situations,
and find a map between situation set and action set in such a way to win the game
(with a high probability).
● Thus, even if there is no human opponent, a machine can improve its skill by
playing with itself, using RL.
● Of course, if the machine has the honor to play many games with human experts,
it can find the best strategy more “efficiently” without generating many “impossible”
situations; or find good computer game players more acceptable to human.
● Examples: playing backgammon or chess, scheduling jobs, and controlling robot
limbs
Reinforcement learning
Machine Learning: Produced by Qiangfu Zhao

Semi Supervised Learning
In Semi supervised learning, some input data is labeled and some is unlabeled.

Classification
In machine learning, classification is the problem of identifying to which of a set of
categories a new observation belongs, on the basis of a training set of data containing
observations (or instances) whose category membership is known.
Example 1
Consider the following data: -training set of data.
-two attributes “Score1” and “Score2”.
-class label is called “Result”.
-class label has two possible values
“Pass” and “Fail”.
-The data can be divided into two
categories or classes:

If we have some new data, say “Score1 = 25”
and “Score2 = 36”, what value should be
assigned to “Result” corresponding to the new
data; in other words, to which of the two
categories or classes the new observation
should be assigned?
To answer this question, using the given data
alone we need to find the rule, or the formula,
or the method that has been used in assigning
the values to the class label “Result”.
The problem of finding this rule or formula or
the method is the classification problem.
Classification

Example 2 of Classification
Can we define breast cancer as malignant or benign based on tumour size
A tumor can be benign (not dangerous to health) or malignant (has the potential to be dangerous)
This is an example of a classification problem
● Classify data into one of two discrete classes - either malignant or not
● In classification problems, can have a discrete number of possible values for the
output
*e.g. maybe have four values
■ 0 - benign
■ 1 - type 1
■ 2 - type 2
■ 3 - type 4
Used only one attribute (size)

● In other problems may have multiple attributes for same problem definition of
breast cancer
● We may also, for example, know age and tumor size
● Based on that data, you can try and define separate classes by
a. Drawing a straight line between the two groups
b. Using a more complex function to define the two groups (which we'll discuss later)
c. Then, when you have an individual with a specific
tumor size and who is a specific age,
you can hopefully use that information to place
them into one of your classes
● You might have many features to consider
a. Clump thickness
b. Uniformity of cell size
c. Uniformity of cell shape
Example 3 of Classification

Classification: Few Real life examples
Optical character recognition: problem of recognizing character codes from their
images, multiple classes.
Face recognition :the input is an image, the classes are people to be recognized,
multiple classes.
Speech recognition: the input is acoustic and the classes are words that can be
uttered
Medical diagnosis : the inputs are the relevant information about the patient and
the classes are the illnesses.

Classification Algorithms
a) Logistic regression
b) Naive Bayes algorithm
c) k-NN algorithm
d) Decision tree algorithm
e) Support vector machine algorithm
f) Random forest algorithm

Regression
In machine learning, a regression problem is the problem of predicting the value of
a numeric variable based on observed values of the variable. The value of the
output variable may be a number, such as an integer or a floating point value.
These are often quantities, such as amounts and sizes. The input variables may
be discrete or real-valued.

Consider the data on car prices given in Table
Regression : Example 1
Suppose we are required to estimate the price
of a car aged 25 years with distance 53240 KM
and weight 1200 pounds.
This is an example of a regression problem
because we have to predict the value of the
numeric variable “Price”.

○ How do we predict housing prices
■ Collect data regarding housing prices and how they relate to size in feet
● Example problem: "Given this data, a friend has a house 750 square feet - how much can
they be expected to get?"
● What approaches can we use to solve this?
○ Straight line through data
■ Maybe $150 000
○ Second order polynomial
■ Maybe $200 000
○ One thing we discuss later - how to chose
straight or curved line?
○ Each of these approaches represent
a way of doing supervised learning
Example 2 of Regression

● What does this mean?
○ We gave the algorithm a data set where a "right answer" was provided
○ So we know actual prices for houses
■ The idea is we can learn what makes the price a certain value from the
training data
■ The algorithm should then produce more right answers based on new
training data where we don't know the price already
1. i.e. predict the price
● We also call this a regression problem
○ Predict continuous valued output (price)
○ No discrete values
Example 2 of Regression continued….

Basic components of learning process
Definition
A computer program which learns from experience is called a machine learning
program or simply a learning program. Such a program is sometimes also referred to
as a learner.

1. Data storage: Facilities for storing and retrieving huge amounts of data are an
important component of the learning process. Humans and computers alike utilize
data storage as a foundation for advanced reasoning.
2. Abstraction : Abstraction is the process of extracting knowledge about stored
data. This involves creating general concepts about the data as a whole. The creation
of knowledge involves application of known models and creation of new models.
The process of fitting a model to a dataset is known as training. When the model has
been trained, the data is transformed into an abstract form that summarizes the
original information.

3. Generalization : The term generalization describes the process of turning the
knowledge about stored data into a form that can be utilized for future action.
These actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to discover those
properties of the data that will be most relevant to future tasks.
4. Evaluation : It is the process of giving feedback to the user to measure the
utility of the learned knowledge. This feedback is then utilised to effect
improvements in the whole learning process.

Understanding data
Unit of observation
By a unit of observation we mean the smallest entity with measured properties of
interest for a study.
Examples
• A person, an object or a thing
• A time point
• A geographic region
• A measurement

Examples and features
Datasets that store the units of observation and their properties can be imagined as
collections of data consisting of the following:
• Examples
An “example” is an instance of the unit of observation for which properties have been
recorded.
An “example” is also referred to as an “instance”, or “case” or “record.” (It may be noted
that the word “example” has been used here in a technical sense.)
• Features
A “feature” is a recorded property or a characteristic of examples. It is also referred to as
“attribute”, or “variable” or “feature.”

Examples for “examples” and “features”
1. Cancer detection
Consider the problem of developing an algorithm for detecting cancer. In this study we note the
following.
(a) The units of observation are the patients.
(b) The examples are members of a sample of cancer patients.
(c) The following attributes of the patients may be chosen as the features:
• gender
• age
• blood pressure
• the findings of the pathology report after a biopsy

2.Spam e-mail
Let it be required to build a learning algorithm to identify spam e-mail.
(a) The unit of observation could be an e-mail messages.
(b) The examples would be specific messages.
(c) The features might consist of the words used in the messages.
Examples for “examples” and “features”

Examples and features: Representation
Examples and features are generally collected in a “matrix format”. Figure shows
such a data set. Feature Vector
N dimensional
Feature Space

Different forms of data
1. Numeric data
If a feature represents a characteristic measured in numbers, it is called a numeric feature.
2. Categorical or nominal
A categorical feature is an attribute that can take on one of a limited, and usually fixed,
number of possible values on the basis of some qualitative property. A categorical feature
is also called a nominal feature.
3. Ordinal data
This denotes a nominal variable with categories falling in an ordered list. Examples include
clothing sizes such as small, medium, and large, or a measurement of customer satisfaction
on a scale from “not at all happy” to “very happy.”
Examples
In the data given in the previous table., the features “year”, “price” and “mileage” are
numeric and the features “model”, “color” and “transmission” are categorical.

Labeled and unlabeled data
Data in the Machine learning context can either be labeled or unlabeled.Unlabeled
data is usually the raw form of the data. It consists of samples of natural or
human-created artifacts. This category of data is easily available in abundance.
For example, video streams,
audio, photos, and tweets among
others.The unlabeled data
becomes labeled data the moment
a meaning is attached.
Both triangles and bigger circles represent labeled data and small circles represent unlabeled data.

Tasks
A task is a problem that the Machine learning algorithm is built to solve. It is
important that we measure the performance on a task.
The term "performance" in this context is nothing but the extent or confidence with
which the problem is solved.
Different algorithms when run on different datasets produce a different model. It is
important that the models thus generated are not compared, and instead, the
consistency of the results with different datasets and different models is measured.

Algorithms
After getting a clear understanding of the Machine learning problem at hand, the
focus is on what data and algorithms are relevant or applicable.
There are several algorithms available. These algorithms are either grouped by
the learning subfields (such as supervised, unsupervised, reinforcement, semi-
supervised, or deep) or the problem categories (such as Classification,
Regression, Clustering or Optimization).
These algorithms are applied iteratively on different datasets, and output models
that evolve with new data are captured.

Input representation
The general classification problem is concerned with assigning a class label to an
unknown instance from instances of known assignments of labels.
In a real world problem, a given situation or an object will have large number of features
which may contribute to the assignment of the labels.
But in practice, not all these features may be equally relevant or important.
Only those which are significant need be considered as inputs for assigning the class
labels.
These features are referred to as the “input features” for the problem. They are also said
to constitute an “input representation” for the problem.

More detailed illustration of the supervised learning process
In supervised learning, we are given a labeled training dataset from which a
machine learning algorithm can learn a model that can predict labels of unlabeled
data points.
we define h as the hypothesis,a function that we use to approximate some
unknown function
f(x) = y; (1)
where x is a vector of input features associated with a training example or
dataset instance (for example, the pixel values of an image) and y is the outcome
we want to predict (e.g., what class of object we see in an image).

In other words, h(x) is a function that predicts y.

Fig. supervised learning process.

Fig. More detailed illustration of the supervised learning process

Definition
1. Hypothesis
In a binary classification problem, a hypothesis is a statement or a proposition
purporting to explain a given set of facts or observations.
2. Hypothesis space
The hypothesis space for a binary classification problem is a set of hypotheses for
the problem that might possibly be returned by it.
Hypothesis
Same applicable for multiclass classification

Hypothesis
Examples
1. Consider the set of observations of a variable x with the associated class labels
given in Table
Refer Doc

Formulation of machine learning

Issues in Machine Learning
● What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired
function, given sufficient training data? Which algorithms perform best for which
types of problems and representations?
● How much training data is sufficient? What general bounds can be found to relate
the confidence in learned hypotheses to the amount of training experience and the
character of the learner's hypothesis space?
● When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is only
approximately correct?

● What is the best strategy for choosing a useful next training experience,
and how does the choice of this strategy alter the complexity of the
learning problem?
● What is the best way to reduce the learning task to one or more
function approximation problems? Put another way, what specific
functions should the system attempt to learn? Can this process itself be
automated?
● How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?
Issues in Machine Learning

Steps to solve Machine Learning Problems

BE ML Module 1A_Introduction to Machine Learning.pptx

Algorithm Selection and Training

Applications of machine learning
● Email spam detection
● Face detection and matching (e.g.,
iPhone X)
● Web search (e.g., DuckDuckGo, Bing,
Google)
● Sports predictions
● Post office (e.g., sorting letters by zip
codes)
● ATMs (e.g., reading checks)
● Credit card fraud
● Stock predictions
● Smart assistants (Apple Siri, Amazon
Alexa, . . . )
● Product recommendations (e.g.,
Netflix, Amazon)
● Self-driving cars (e.g., Uber, Tesla)
● Language translation (Google
translate)
● Sentiment analysis
● Medical diagnoses

BE ML Module 1A_Introduction to Machine Learning.pptx

More Related Content

Similar to BE ML Module 1A_Introduction to Machine Learning.pptx (20)

Recently uploaded (20)

BE ML Module 1A_Introduction to Machine Learning.pptx