Machine learning Introduction

Machine Learning
Dr. P. Kuppusamy
Prof / CSE

2
Machine Learning
Machine learning is an application of artificial intelligence (AI)
that provides systems the ability to automatically learn, think
and improve from experience without being explicitly
programmed.

3
Difference Between Traditional Programming and Machine Learning

ML Software Development
5
automatically learn and improve from experience

Features (Variables/Attributes) in ML
• Feature is an individual measurable attribute or characteristic of a
phenomenon being observed.
• Choosing informative, discriminating and independent features is
a crucial for effective algorithms in pattern
recognition, classification and regression.
• Features are usually numeric, but structural features such
as strings and graphs are used in syntactic pattern recognition.
• Eg. Table – Length, Breadth, Height, Weight, Color, Location,
no_of_draws, no_of _doors, Price

Features (Variables/Attributes) in ML
• Vector - collection / array of numbers in similar data type
• Feature Vector is an n-dimensional vector of
numerical features that represent some object.
• Eg. Length of 3 tables in feet
𝐿[1]
𝐿[2]
𝐿[3]
=
5
7
3

Feature extraction - definition
• Given a set of features 𝐹 = {𝑥1, … , 𝑥𝑁}
the Feature Extraction(“Construction”) problem is to map 𝐹 to
some feature set 𝐹′′ that maximizes the learner’s ability to
classify patterns

Feature Extraction
• Find a projection matrix w from n-dimensional to m-dimensional
vectors that keeps error low
𝒛 = 𝑤𝑇𝑿
w – Parameters
X – Set of features

Types of Learning
• Supervised (inductive) learning
• Training data includes desired outputs
• Unsupervised learning
• Training data does not include desired outputs
• Semi-supervised learning
• Training data includes a few desired outputs
• Reinforcement learning
• Rewards from sequence of actions

Supervised (Inductive) Learning
• Training data includes desired outputs
• Given examples of a function (X, F(X))
• Predict function F(X) for new examples X
• Discrete data - F(X): Classification
• Continuous data - F(X): Regression
• F(X) = Probability(X): Probability estimation

12
Supervised learning:
Learning a model from labeled data.

13
Supervised learning
Algorithms: Regression, Support Vector Machines, neural
networks, decision trees, K-nearest neighbors, naive Bayes, etc.

14
Unsupervised learning
Algorithms: K-means, gaussian mixtures, hierarchical clustering,
spectral clustering, etc.
Learning a model from unlabeled data.

15
Semi supervised learning:
Learning a model from unlabeled and labeled data.

Linear Regression
• Linear Regression analysis is a statistical tool
• Predictive modeling method to investigate the mathematical
relationship between a dependent variable (outcome – y) and an
independent variable (predictor – x).
• Predictor shows the changes in Dependent variable (y axis) to the
changes in explanatory variables in X axis.

Linear Regression
• It is quantitative analysis tool
• It uses current information about a phenomenon to predict its future behavior.
• Involves the graphical lines over a set of data points that most closely towards
all shape of the data.
• when the data form a set of pairs of numbers, it is interpreted as the observed
values of an independent (or predictor ) variable X and a dependent ( or
response) variable Y.

y x
  
  
0 1
Data model in Linear Regression
• Data is modelled using a straight line with continuous variable
• Relationship between variables is a linear function
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
Population
Slope
Population
y-intercept
Random
Error

y
β0 = y-intercept
x
Change
in y
Change in x
β1 = Slope
Data model in Linear Regression
Data is modelled using a straight line

Types of Relationships
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships

Types of Relationships
Y
X
Y
X
No relationship
(continued)

Plot for x and actual y values
 Plot the graph using x and y values

Random Error Identification
 Random Error  = Estimated Value (yi) – Actual Value (yi)

Minimize the Random Error
 Reduce the distance between estimated and actual value
 Find the best fit of the line using least square method

Least Squares Method to Minimize the Error
• ‘Best fit’ means difference between actual y values and
predicted y values are a minimum
• But positive differences off-set negative
 
2 2
1 1
ˆ ˆ
n n
i
i i
i i
y y 
 
 
 
• Least Squares minimizes the Sum of the Squared Differences
(SSE)

Least Squares Graphically
2
y
x
1 3
4
^
^
^
^
2 0 1 2 2
ˆ ˆ ˆ
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
2 2 2 2 2
1 2 3 4
1
ˆ ˆ ˆ ˆ ˆ
LS minimizes
n
i
i
    

   


Case Study
 Let consider x and y values and mark in scatter plot

 Find mean of x, and mean of y

 Find the coefficients m and c in the straight line y = mx+c

Plot the x, y values in the graph
x = {1, 2, 3, 4, 5} y = {3, 4, 2, 4, 5}

Plot the regression line using estimated y values
x = {1, 2, 3, 4, 5} y = {2.8, 3.2, 3.6, 4, 4.4}
Estimated y values
Estimated y values

Find the Error 
Mean Square Error minimizes the
error in the linear regression.
Regression Line with least error is
the ‘best fit’ line
 
2 2
1 1
ˆ ˆ
n n
i
i i
i i
y y 
 
 
 

How would you draw a line through the points in real time?
 Initial values (iteration 0) for slope m = 0 and y-intercept b = 0

How would you draw a line through the points?
iteration 1, slope m = 0.04 and y-
intercept b = 0
intercept b = 0.01

Determine which line ‘fits best’ in 100 iterations
intercept b = 0.02
intercept b = 0.03

3 major Uses of Regression
•Determining the strength of predictors
•Forecasting an effect
•Trend forecasting

Where Linear Regression used?
• Evaluating trends and sales estimates
• Analyze the impact of price changes
• Insurance domain

Squared Error Cost Function
• Cost Function - J(ɵ) =
1
2𝑚
σ𝑖=1
𝑚
(𝑌(𝑖)
− 𝑦′ 𝑖
)2
𝑌(𝑖) - Ground truths or Actual output or label
𝑦′ 𝑖
- Prediction output
m - No. of data points or samples

Gradient Descent
• The objective of training a machine learning model is to minimize the loss or error
between ground truths and predictions by changing the trainable parameters.
• Gradient is the extension of derivative in multi-dimensional space, tells the direction
along which the loss or error is optimally minimized.
• Gradient is defined as the maximum rate of change.
𝜃𝑗 = 𝜃𝑗 − 𝛼
𝜕
𝜕𝜃𝑗
𝐽(𝜃)
• 𝜃𝑗-Training parameter 𝛼 – Learning rate 𝐽(𝜃) – Error / Cost function

Gradient Descent
• Gradient Descent:
𝜃𝑗 = 𝜃𝑗 − 𝛼
1
𝑚
σ𝑖=1
𝑚
(𝑦′(𝑖)
−𝑌 𝑖 )𝑥𝑗
𝑖
for All j
j=0; 𝜃0 = 𝜃0 − 𝛼
1
𝑚
σ𝑖=1
𝑚
(𝑦′(𝑖)
−𝑌 𝑖 )𝑥0
𝑖
for All j
j=1; 𝜃1 = 𝜃1 − 𝛼
1
𝑚
σ𝑖=1
𝑚
(𝑦′(𝑖)
−𝑌 𝑖 )𝑥1
𝑖
for All j
………..
j=n; 𝜃𝑛 = 𝜃𝑛 − 𝛼
1
𝑚
σ𝑖=1
𝑚
(𝑦′(𝑖)
−𝑌 𝑖 )𝑥𝑛
𝑖 for All j

References
• Tom Markiewicz& Josh Zheng,Getting started with Artificial
Intelligence, Published by O’Reilly Media,2017
• Stuart J. Russell and Peter Norvig,Artificial Intelligence A Modern
Approach
• Richard Szeliski, Computer Vision: Algorithms and Applications,
Springer 2010
• Artificial Intelligence and Machine Learning, Chandra S.S. & H.S.
Anand, PHI Publications
• Machine Learning, Rajiv Chopra, Khanna Publishing House
•

Machine learning Introduction

More Related Content

What's hot (20)

Similar to Machine learning Introduction (20)

More from Kuppusamy P (20)

Recently uploaded (20)

Machine learning Introduction