regression.pptx

Regression
• In machine learning, a regression problem is the problem of
predicting the value of a numeric variable based on observed values
of the variable.
• The value of the output variable may be a number, such as an integer
or a floating point value.
• These are often quantities, such as amounts and sizes. The input
variables may be discrete or real-valued.

• Different regression models
• There are various types of regression techniques available to make
predictions.
• These techniques mostly differ in three aspects, namely, the number
and type of independent variables, the type of dependent variables
and the shape of regression line.

• Simple linear regression: There is only one continuous independent
variable x and the assumed relation between the independent variable
and the dependent variable y is y = a + bx.
• Multivariate linear regression: There are more than one independent
variable, say x1, . . . , xn, and the assumed relation between the
independent variables and the dependent variable is y = a0 + a1x1 + ⋯
+ anxn.
.

• Polynomial regression: There is only one continuous independent
variable x and the assumed model is y = a0 + a1x + ⋯ + anx n .
• Logistic regression: The dependent variable is binary, that is, a
variable which takes only the values 0 and 1. The assumed model
involves certain probability distributions

Errors in Machine Learning
•Irreducible errors are errors which will always be present in a machine learning model, because of
unknown variables, and whose values cannot be reduced.
•Reducible errors are those errors whose values can be further reduced to improve a model. They are
caused because our model’s output function does not match the desired output function and can be
optimized.

What is Bias?
• To make predictions, our model will analyze our data and find patterns in it.
Using these patterns, we can make generalizations about certain instances in
our data. Our model after training learns these patterns and applies them to
the test set to predict them.
• Bias is the difference between our actual and predicted values. Bias is the
simple assumptions that our model makes about our data to be able to predict
new data.

• When the Bias is high, assumptions made by our model are too basic, the
model can’t capture the important features of our data.
• This means that our model hasn’t captured patterns in the training data and
hence cannot perform well on the testing data too.
• If this is the case, our model cannot perform on new data and cannot be sent
into production.

• This instance, where the model cannot find patterns in our training set and
hence fails for both seen and unseen data, is called Underfitting.
• The below figure shows an example of Underfitting. As we can see, the model
has found no patterns in our data and the line of best fit is a straight line that
does not pass through any of the data points. The model has failed to train
properly on the data given and cannot predict new data either.

variance
• Variance is the very opposite of Bias. During training, it allows our model to
‘see’ the data a certain number of times to find patterns in it.
• If it does not work on the data for long enough, it will not find patterns and bias
occurs. On the other hand, if our model is allowed to view the data too many
times, it will learn very well for only that data.
• It will capture most patterns in the data, but it will also learn from the
unnecessary data present, or from the noise.

• We can define variance as the model’s sensitivity to fluctuations in the data.
Our model may learn from noise. This will cause our model to consider trivial
features as important.

• In the above figure, we can see that our model has learned extremely well for
our training data, which has taught it to identify cats.
• But when given new data, such as the picture of a fox, our model predicts it as
a cat, as that is what it has learned. This happens when the Variance is high,
our model will capture all the features of the data given to it, including the
noise, will tune itself to the data, and predict it very well but when given new
data, it cannot predict on it as it is too specific to training data.

Bias-Variance Tradeoff
• For any model, we have to find the perfect balance between Bias and
Variance.
• This just ensures that we capture the essential patterns in our model while
ignoring the noise present it in. This is called Bias-Variance Tradeoff. It helps
optimize the error in our model and keeps it as low as possible.
• An optimized model will be sensitive to the patterns in our data, but at the
same time will be able to generalize to new data. In this, both the bias and
variance should be low so as to prevent overfitting and underfitting.

• we can see that when bias is high, the error in both testing and training set is
also high.
• If we have a high variance, the model performs well on the testing set, we can
see that the error is low, but gives high error on the training set.
• We can see that there is a region in the middle, where the error in both
training and testing set is low and the bias and variance is in perfect balance.

The best fit is when the data is concentrated in the center, ie: at the bull’s eye. We can see that as we get farther
and farther away from the center, the error increases in our model. The best model is one where bias and
variance are both low.

Theorem of total probability
• Let B1, B2, …, BN be mutually exclusive events whose union equals the
sample space S. We refer to these sets as a partition of S.
• An event A can be represented as:
• Since B1, B2, …, BN are mutually exclusive, then
P(A) = P(A  B1) + P(A  B2) + … + P(A  BN)
• And therefore
P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN)
= i P(A | Bi) * P(Bi) Exhaustive conditionalization
Marginalization

Bayes theorem
• P(A  B) = P(B) * P(A | B) = P(A) * P(B | A)
A
P
B
P
A
B
P
)
(
)
(
)
|
( =
=>
Posterior probability Prior of A (Normalizing constant)
B
A
P )
|
( Prior of B
Conditional probability
(likelihood)
This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful
relations in probability and statistics
Bayes Theorem is definitely the fundamental relation in Statistical Pattern
Recognition

Bayes theorem (cont’d)
• Given B1, B2, …, BN, a partition of the sample space S.
Suppose that event A occurs; what is the probability of
event Bj?
• P(Bj | A) = P(A | Bj) * P(Bj) / P(A)
= P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj)
Bj: different models / hypotheses
In the observation of A, should you choose a model that maximizes P(Bj | A)
or P(A | Bj)? Depending on how much you know about Bj !
Posterior probability
Likelihood Prior of Bj
Normalizing constant
(theorem of total probabilities)

Another example
• We’ve talked about the boxes of casinos: 99% fair, 1% loaded (50% at
six)
• We said if we randomly pick a die and roll, we have 17% of chance to
get a six
• If we get 3 six in a row, what’s the chance that the die is loaded?
• How about 5 six in a row?

• P(loaded | 666)
= P(666 | loaded) * P(loaded) / P(666)
= 0.53 * 0.01 / (0.53 * 0.01 + (1/6)3 * 0.99)
= 0.21
• P(loaded | 66666)
= P(66666 | loaded) * P(loaded) / P(66666)
= 0.55 * 0.01 / (0.55 * 0.01 + (1/6)5 * 0.99)
= 0.71

Classification : Bayes’ decision theory

regression.pptx

More Related Content

Similar to regression.pptx (20)

Recently uploaded (20)

regression.pptx