2. Acknowledgement
● These slides are mainly inspired by the online course offered by Prof Andrew
Ng (stanford university) at coursera
● The slides and videos are available online at
Coursera: https://www.coursera.org/learn/machine-learning
Youtube: https://www.youtube.com/watch?v=qeHZOdmJvFU&list=PLZ9qNFMHZ-A4rycgrgOYma6zxF4BZGGPW
3. Regression? Which curve better represents data pattern
House
Size
House
Price
House
Size
House
Price House
Size
House
Price
θ0
+ θ1
x θ0
+ θ1
x +θ2
x2
θ0
+ θ1
x +θ2
x2
+θ3
x3
● x = house size
● Which curve better predicts the price for house
5. ● House price prediction
5 10 15 20
House Size
10
20
30
House
Price
Examples: Regression
6. ● House price prediction
5 10 15 20
House Size
10
20
30
House
Price
Examples: Regression
7. ● GPA prediction
600 700 800 900
FSc Marks
1
2
3
GPA 4
● Regression: Predict continuous valued output
● Supervised Learning: Given the right answer for each example.
Examples: Regression
8. ● Current Prediction
● Reinventing Ohm’s Law V = IxR
5 10 15 20
Voltage
10
20
30
Current
Examples: Regression
Voltage Applied
● Regression: Predict continuous valued output
● Supervised Learning: Given the right answer for each example.
9. ● Current Prediction
● Reinventing Ohm’s Law
5 10 15 20
Voltage
10
20
30
Current
Examples: Regression
Voltage Applied
● Regression: Predict continuous valued output
● Supervised Learning: Given the right answer for each example.
10. ● Predicting the score of the rain affected match, e.g., Duckworth-Lewis
5 10 15 20
Overs Remaining
10
20
30
Runs
Scored
in
these
overs
Examples: Regression
● Regression: Predict continuous valued output
● Supervised Learning: Given the right answer for each example.
11. Regression? Why not just fit a curve?
House
Size
House
Price
House
Size
House
Price House
Size
House
Price
θ0
+ θ1
x θ0
+ θ1
x +θ2
x2
θ0
+ θ1
x +θ2
x2
+θ3
x3
● x = house size
● Which curve better predicts the price for house
12. Linear Regression. How to choose the line?
House
Size
House
Price
θ0
+ θ1
x
θ0
+ θ1
x
θ0
+ θ1
x
● How to automatically choose the best line from infinite lines possible?
13. Regression Notation
Size in feet2
(x) House Price in 1000$ (y)
2104 460
1416 232
1534 315
... ...
● m = number of training examples
● x = input variable/ feature
● y = target or output variable
● (x,y) = one training example
● (x(i)
,y(i)
) = ith
training example
● x(1)
= 2104
● x(2)
= 1416
● y(1)
= 460
Training set of housing prices
m
15. Regression Notation
Size in feet2
(x) House Price in 1000$ (y)
2104 460
1416 232
1534 315
... ...
● m = number of training examples
● x = input variable/ feature
● y = target or output variable
● (x,y) = one training example
● (x(i)
,y(i)
) = ith
training example
● x(1)
= 2104
● x(2)
= 1416
● y(1)
= 460
Training set of housing prices
● Hypothesis: hθ
(x) = θ0
+ θ1
x
● θi’s
: Parameters
● How to choose θi’s
automatically?
House
Size
House
Price
θ0
+ θ1
x
θ0
+ θ1
x
θ0
+ θ1
x
16. How to choose θi’s
automatically?
● Hypothesis: hθ
(x) = θ0
+ θ1
x
● θi’s
: Parameters
● How to choose θi’s
automatically?
● Let’s choose θ0
and θ1
so that hθ
(x) is close to y
for our training example (x,y) House
Size
House
Price
θ0
+ θ1
x
θ0
+ θ1
x
θ0
+ θ1
x
Σ(hθ
(xi
) - yi
)2
i=1
i=m
1
__
2m Σ(θ0
+ θ1
xi
- yi
)2
i=1
i=m
1
__
2m
minimize
θ0
, θ1
minimize
θ0
, θ1
J(θ0
,θ1
)
J(θ0
,θ1
)
Cost Function
Minimize the squared error cost function
17. How to choose θi’s
automatically?
● Hypothesis: hθ
(x) = θ0
+ θ1
x
● θ0
, θ1
: Parameters
● Let's set θ0
= 0
● Simplified hypothesis: hθ
(x) = θ1
x
● Cost function
● Goal
Σ(θ1
xi
- yi
)2
i=1
i=m
1
__
2m
J(θ1
)
minimize
θ1
J(θ1
)
1 2 3 x
1
2
3
hθ
(x) = x
θ1
= 0.5
0.5 1 1.5 2
0.5
1
1.5
J(θ1
)
θ
hθ
(x)
J(1) = ((1-0.5)2
+ (2-1)2
+ (3-1.5)2
)/(2x3) =
0.58
18. How to choose θi’s
automatically?
● Hypothesis: hθ
(x) = θ0
+ θ1
x
● θ0
, θ1
: Parameters
● Let's set θ0
= 0
● Simplified hypothesis: hθ
(x) = θ1
x
● Cost function
● Goal
Σ(θ1
xi
- yi
)2
i=1
i=m
1
__
2m
J(θ1
)
minimize
θ1
J(θ1
)
1 2 3 x
1
2
3
hθ
(x) = x
θ1
= 1.5
0.5 1 1.5 2
0.5
1
1.5
J(θ1
)
θ
hθ
(x)
J(1.5) = ((1-1.5)2
+ (2-3)2
+ (3-4.5)2
)/(2x3) = 0.58
19. How to choose θi’s
automatically?
● Hypothesis: hθ
(x) = θ0
+ θ1
x
● θ0
, θ1
: Parameters
● Let's set θ0
= 0
● Simplified hypothesis: hθ
(x) = θ1
x
● Cost function
● Goal
Σ(θ1
xi
- yi
)2
i=1
i=m
1
__
2m
J(θ1
)
minimize
θ1
J(θ1
)
1 2 3 x
1
2
3
hθ
(x) = x
θ1
= 1
0.5 1 1.5 2
1
2
3
J(θ1
)
θ
hθ
(x)
J(1) = ((1-1)2
+ (2-2)2
+ (3-3)2
)/(2x3) = 0
20. How to choose θi’s
automatically?
● Have some function
● Goal
Outline
● Start with some θ1
,e.g., θ1
= 0.5
● Keep changing θ1
to reduce until we
reach the minimum
J(θ1
)
1 2 3 x
1
2
3
hθ
(x) = x
θ1
= 1
0.5 1 1.5 2
1
2
3
J(θ1
)
θ
hθ
(x)
minimize
θ1
J(θ1
)
J(θ1
)
What is minimum? 0 or eps
21. J(θ1
)
θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
22. J(θ1
)
θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
● If slope is negative you want to increase θ1
● If slope is positive you want to decrease θ1
23. Gradient Descent Algorithm
J(θ1
)
θ1
● Which of the following is true?
● Blue slope (gradient) is negative
● Red slope (gradient) is positive
● Magenta slope is less negative than blue slope
● Yellow slope is close to zero
● If slope is negative you want to increase θ1
● If slope is positive you want to decrease θ1
θ1
:= θ1
- α J(θ1
)
dθ1
__
d , α = 1
24. How to choose θi’s
automatically?
● Have some function
● Goal
Outline
● Start with some θ1
,e.g., θ1
= 0.5
● Keep changing θ1
to reduce until we
reach the minimum
J(θ1
)
1 2 3 x
1
2
3
hθ
(x) = x
θ1
= 1
0.5 1 1.5 2
1
2
3
J(θ1
)
θ
hθ
(x)
minimize
θ1
J(θ1
)
J(θ1
)
What is minimum? 0 or eps
31. How does cost function J(θ0,
θ1
) Look Like?
● Does it matter where we start from?
● Is the solution unique?
Start point
32. ● Does it matter where we start from?
● Is the solution unique?
How does cost function J(θ0,
θ1
) Look Like?
52. ● Multivariate linear regression. It means multiple features (x1
,x1, …,
xn
)
● Previously it was univariate linear regression.
56. Regression? Why not just fit a curve?
House
Size
House
Price
House
Size
House
Price House
Size
House
Price
θ0
+ θ1
x θ0
+ θ1
x +θ2
x2
θ0
+ θ1
x +θ2
x2
+θ3
x3
● x = house size
● Which curve better predicts the price for house
58. ● How to craft new features?
● Hand crafted features
● Is it possible to auto-create new features? Yes.