Gentlest Introduction to Tensorflow - Part 3

Gentlest Intro to Tensorflow (Part 3)
Khor Soon Hin, @neth_6, re:Culture
In collaboration with Sam & Edmund

Overview
● Multi-feature Linear Regression
● Logistic Regression
○ Multi-class prediction
○ Cross-entropy
○ Softmax
● Tensorflow Cheatsheet #1

Review: Predict from Single Feature with Linear Regression

Quick Review: Predict from Single Feature (House Size)

Quick Review: Use Linear Regression

Quick Review: Predict using Linear Regression

Linear Regression: Predict from Two (or More) Features

Two Features: House Size, Rooms
Source: teraplot.com
House Price, $
Rooms, unit
House Size, sqm

Same Issue: Predict for Values without Datapoint
House Price, $
Rooms, unit
House Size, sqm

Same Solution: Find Best-Fit
House Price, $
House Size, sqm
Rooms, unit
Prediction!

Tensorflow Code
# Model linear regression y = Wx + b
x = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([1,1]))
b = tf.Variable(tf.zeros([1]))
product = tf.matmul(x,W)
y = product + b
y_ = tf.placeholder(tf.float32, [None, 1])
# Cost function 1/n * sum((y_-y)**2)
cost = tf.reduce_mean(tf.square(y_-y))
# Training using Gradient Descent to minimize cost
train_step = tf.train.GradientDescentOptimizer(0.0000001).minimize(cost)

Multi-feature: Change in Model & Cost Function

Model
1 Feature
y = W.x + b
y: House price prediction
x: House size
Goal: Find scalars W,b

Model
1 Feature
y = W.x + b
x: House size
2 Features
y = W.x + W2.x2 + b
x: House size
x2: Rooms
Goal: Find scalars W, W2, b

Tensorflow Graph
1 Feature
y = tf.matmul(x, W) + b
W = tf.Variable(tf.zeros[1,1])
b = tf.Variable(tf.zeros[1])
x = tf.placeholder(tf.float, [None, 1])
y_ = tf.placeholder(tf.float, [None, 1])

Tensorflow Graph
1 Feature
2 Features
y = matmul(x, W) + matmul(x2, W2) + b
W2 = tf.Variable(tf.zeros[1,1])
x2 = tf.placeholder(tf.float, [None, 1])

Tensorflow Graph: Train
1 Feature
2 Features
Train: feed = { x: … , y_: … }
Train: feed = { x: … , x2: ... y_: … }

Tensorflow Graph: Scalability Issue
1 Feature
2 Features
y = tf.matmul(x, W) + tf.matmul(x2,
W2) + b
y_ = tf.placeholder(tf.float, [None, 1]Train: feed = { x: … , y_: … }
Train: feed = { x: … , x2: ... y_: … }
3 Features
W2) + tf.matmul(x3, W3) +b
y_ = tf.placeholder(tf.float, [None, 1]
Train: feed = { x: … , x2: ... , x3: … , y_: … }
Model gets messy!

Data Representation
House #1 Size_1 Rooms_1 Price_1
… … … ...
House #m Size_m Rooms_m Price_m
Feature values Actual outcome

… … … ...
Lots of Data Manipulation
2 Features
W2) + b
Train: feed = { x: … , x2: ... y_: … }

… … … ...
Lots of Data Manipulation 2
2 Features
W2) + b
Train: feed = { x: … , x2: ... y_: … }

… … … ...
2 Features
W2) + b
Train: feed = { x: … , x2: ... y_: … }

… … … ...
2 Features
W2) + b
Train: feed = { x: … , x2: ... y_: … }
Data manipulation
gets messy!

… … … ...
2 Features
W2) + b
Train: feed = { x: … , x2: ... y_: … }
y = W.x + W2.x2 + b
Find better way

Matrix: Cleaning Up Representations

Matrix Representation
… … … ...

… … … ...
House #0
House #1
…
House #m
Size_0 Rooms_0
Size_1 Rooms_1
… …
Size_m Rooms_m
Price_0
Price_1
...
Price_m
ActualFeaturesLabels

… … … ...
House #0
House #1
…
House #m
Size_0 Rooms_0
Size_1 Rooms_1
… …
Size_m Rooms_m
Price_0
Price_1
...
Price_m
... ...
Align all data by row so NO need for label

… … … ...
House #0
House #1
…
House #m
Size_0 Rooms_0
Size_1 Rooms_1
… …
Size_m Rooms_m
Price_0
Price_1
...
Price_m
... ...
Align all data by row so NO need for label
Let’s Focus Here

… … … ...
Better Model Equation
2 Features
W2) + b
Train: feed = { x: … , x2: … , y_: …
}
y = W.x + W2.x2 + b
Find better way

… … … ...
2 Features
W2) + b
Train: feed = { x: … , x2: … , y_: … }
y = W.x + W2.x2 + b
Find better way

… … … ...
2 Features
W2) + b
Train: feed = { x: [size_i, rooms_i], … , x2: … y_: … }
y = W.x + W2.x2 + b
Find better way

2 Features
W2) + b
… … … ...
Train: feed = { x: [size_i, rooms_i], … , x2: …, y_: …
}
y = W.x + W2.x2 + b
Find better way

Tensorflow Graph (Messy)
1 Feature
2 Features

Tensorflow Graph (Clean)
1 Feature
2 Features

Tensorflow Graph (Clean and Formatted)
1 Feature
2 Features
y = matmul(x, W) + b

Tensorflow Graph (Illustration)
1 Feature
2 Features
scalar scalar.. 1 feature ..
..1coeff..
scalar scalar.. 2 features ..
..2coeffs..

Linear vs. Logistic Regression
Size
1
42
3 5
Rooms
ML
price (scalar)
Linear Regression
predict

Linear vs. Logistic Regression
Size
1
42
3 5
Rooms
ML
price (scalar)
ML
0
1
9
2
.
.
.
number (discrete classes)
Linear Regression
Logistic Regression
predict
predict
Image

Image Features 1
23 53 … 33
53 20 … 88
… … … ....
62 2 … 193
Grayscale value of each pixel

Image Features 2
23 53 … 33
53 20 … 88
… … … ....
62 2 … 193

Image Features 3
23 53 … 33
53 20 … 88
… … … ....
62 2 … 193

Logistic Regression: Change in Models

Model
Linear Regression
y = W.x + b
y: House price (scalar) prediction
x: [House size, Rooms]
Logistic Regression
y = W.x + b
y: Discrete class [0,1,...9] prediction
x: [2-Dim pixel grayscale colors]
Goal: Find scalars W, b

Data Representation Comparison
… … … ...
23 53 … 33
53 20 … 88
… … … ....
62 2 … 193
Image #1
Image #2
250 10 … 33
103 5 … 88
… … … ...
5 114 … 193
Image #m ...
5
2
3
x x y_y_

… … … ...
23 53 … 33
53 20 … 88
… … … ....
62 2 … 193
Image #1
Image #2
250 10 … 33
103 5 … 88
… … … ...
5 114 … 193
Image #m ...
5
2
3
2 Features 1-Dim X x Y 2-Dim Features
x x y_y_

… … … ...
Image #1
Image #2 250 10 … 33 103 5 … 88 … 5 114 … 193
Image #m ...
5
2
3
2 Features 1-Dim
X.Y 2 1-Dim Features
x x y_y_
23 53 … 33 53 20 … 88 … 62 2 … 193
Generalize into a multi-feature problem

Model
Linear Regression
y = W.x + b
Logistic Regression
y = W.x + b
x: [Pixel 1, Pixel 2, …, Pixel X.Y]

Model
Linear Regression
y = W.x + b
Logistic Regression
y = W.x + b
x: [Pixel 1, Pixel 2, …, Pixel X.Y]
This needs change as well!

Why Can’t ‘y’ be left as scalar of 0-9?
HINT: Beside the model, when doing ML we need this function!

Logistic Regression: Change in Cost Function

Linear Regression: Cost
Scalar

Linear Regression: Cost, NOT
Number
6
7
5
4
3
2
0
1
9
8
Pixel grayscale values
Discrete
The magnitude of difference (y_ - y) does NOT matter
Wrongly predicting 4 as 3 in as bad as 9 as 3

Logistic Regression: Prediction
ML
Logistic Regression
Image
1

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual
Compare

Cross-Entropy: Cost Function 1
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

Prediction
Probability
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual

Prediction
-log (Probability)
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual
Almost inverse because Probability < 1

Graph: -log(probability)
-log(probability)
probability
10
Probability always between 0 and 1.
Thus -log(probability) is inversely proportional to probability

Cross-Entropy: Cost Function x x xx
Prediction
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual
-log (Probability)

Cross-Entropy: Cost Function
Prediction
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual
-log (Probability)

Cross-Entropy: Cost Function
Prediction
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual
-log (Probability)
● Only y’i = 1 and log(yi) matters
● The smaller the log(yi) the lower cost
● The higher the yi the lower the cost

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9
ML
Linear Regression
Image
1

Tensorflow Graph (Review)
1 Feature
2 Features

Tensorflow Graph (Review 2)
1 Feature
2 Features
..1coeff..
..2coeffs..

Tensorflow Graph (Review 2)
1 Feature
2 Features
..1coeff..
..2coeffs..
Apply multi-feature linear regression

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9
ML
Linear Regression
Image
1
y = tf.matmul(x,W) + b
scalar scalar.. n features ..
..ncoeffs..

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9
ML
Linear Regression
Image
1
.. k class ..
..ncoeffs..

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9
ML
Linear Regression
Image
1
.. k class ..
..ncoeffs..
.. k class .... n features ..
..ncoeffs..
.. k class ..

Tensorflow Graph: Basic, Multi-feature, Multi-class
1 Feature
2 Features
..1coeff..
..2coeffs..
2 Features, 10 Classes
k class k class.. 2 features ..
..2coeffs..
k class

ML
Logistic Regression
Image
Probability
0 1 2 3 4 5 6 7 8 9
Probability
0 1 2 3 4 5 6 7 8 9
Actual
Compare
Great but….sum of all prediction ‘probability’ NOT 1

ML
Logistic Regression
Image
Value
0 1 2 3 4 5 6 7 8 9

ML
Logistic Regression
Image
Value
0 1 2 3 4 5 6 7 8 9
exp(Value)
0 1 2 3 4 5 6 7 8 9

ML
Logistic Regression
Image
Value
0 1 2 3 4 5 6 7 8 9
exp(Value)
0 1 2 3 4 5 6 7 8 9
SUM

ML
Logistic Regression
Image
Value
0 1 2 3 4 5 6 7 8 9
exp(Value)
0 1 2 3 4 5 6 7 8 9
exp(Value)
0 1 2 3 4 5 6 7 8 9
SUM
SUM

ML
Logistic Regression
Image
Value
0 1 2 3 4 5 6 7 8 9
exp(Value)
0 1 2 3 4 5 6 7 8 9
exp(Value)
0 1 2 3 4 5 6 7 8 9
SUM
SUM
sum of all prediction ‘probability’ is 1

Before softmax

With softmax
y = tf.nn.softmax(tf.matmul(x, W) + b)

Summary
● Cheat sheet: Single feature, Multi-feature, Multi-class
● Logistic regression:
○ Multi-class prediction: Ensure that prediction is one of a discrete set of values
○ Cross-entropy: Measure difference between multi-class prediction and actual
○ Softmax: Ensure the multi-class prediction probability is a valid distribution (sum = 1)

Congrats!
You can now understand Google’s Tensorflow Beginner’s Tutorial
(https://www.tensorflow.org/versions/r0.7/tutorials/mnist/beginners/index.html)

References
● Perform ML with TF using multi-feature linear regression (the wrong way)
○ https://github.
com/nethsix/gentle_tensorflow/blob/master/code/linear_regression_multi_feature_using_mini_
batch_with_tensorboard.py
● Perform ML with TF using multi-feature linear regression
○ https://github.
com/nethsix/gentle_tensorflow/blob/master/code/linear_regression_multi_feature_using_mini_
batch_without_matrix_with_tensorboard.py
● Tensorflow official tutorial for character recognition
○ https://www.tensorflow.org/versions/r0.7/tutorials/mnist/beginners/index.html
● Colah’s excellent explanation of cross-entropy
○ http://colah.github.io/posts/2015-09-Visual-Information/

Gentlest Introduction to Tensorflow - Part 3

More Related Content

What's hot (18)

Similar to Gentlest Introduction to Tensorflow - Part 3 (20)

More from Khor SoonHin (6)

Recently uploaded (20)

Gentlest Introduction to Tensorflow - Part 3