SlideShare a Scribd company logo
TF GRAPH TO TF EAGER
Guy Hadash
IBM Research AI
WHY MOVE TO EAGER?
Eager Execution changes the core idea of TensorFlow.
Instead of describing the execution graph in Python, compiling it and then running it,
the framework is now imperative. This means it creates the graph on the fly and runs
operations immediately.
This brings some significant improvements:
• An intuitive interface
• Fast development iterations
• Easier debugging
• Natural control flow
WHY MOVE TO EAGER?
The main ability which we gain now is:
This will simply gives us the value of the tensor. As we said, this allow us much easier
debugging, and we can also control the model flow based on the tensors values.
There is no need for session anymore, and we can stop worry about graph
dependencies and etc.
tensor.numpy()
SESSION PROGRAM
• Data pipeline
• Classifier
• go over the necessary stuff for building and training custom model
• Autoencoder
• building custom layer
• Text classification
• controlling model flow with python, and working with sequence data
All the code inside colab: https://goo.gl/q3rHNT
BEST PRACTICES
Moving from graph mode to eager mode also makes it much more natural to now
work in OOPier way.
We will inherit from tf.keras.Model and tf.keras.layers.Layer.
We will use tf.Data for easy and fast data pipeline.
DATA PIPELINE
tf.Data is the current best practice for handling the data pipeline.
This will allow us easy and fast data pipeline. From my experience the most common
used initializers are:
from_tensor_slices – retrieve one sample at a time from given tensor. Best for
simple cases training.
from_tensors – returns the full dataset at once. Helpful for testing.
from_generator – a more flexible way, useful in more complicated use cases.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
TIP: All dataset initializers works naturally with tuples and dictionaries (also nested)
DATA PIPELINE
Now that we have Dataset object we can quickly build pipeline:
By creating this pipeline, we allow TF to utilize the CPU in parallel to our training and
prepare the next batch in the GPU waiting for the next optimization step.
train_ds = train_ds.map(_normalize, num_parallel_calls=4)
train_ds = train_ds.apply(tf.contrib.data.shuffle_and_repeat(buffer_size, num_epochs))
# train_ds = train_ds.shuffle(buffer_size).repeat(num_epochs)
train_ds = train_ds .batch(batch_size).apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
TIP: Even when running in Eager mode the pipeline runs as a graph
TIP: Buffer size should be big enough for effective shuffling
TIP: prefetch_to_device must be last operation
THE BASIC BUILDING BLOCKS
tf.keras.layers.Layer
Layer - a group of variables tied together.
tf.keras.Network
Network – a group of layers tied together
tf.keras.Model
Model – network with all the training utils
Each of them is callable.
BUILDING MODEL
When building a model in eager execution, we derive from tf.keras.Model.
This gives us a few important properties which we will use:
• model.variables - automatically returns all model’s variables.
• It does it by taking all variables from layers (inherit from tf.layers.Layer), and models
(inherit from tf.keras.Network)
• model.save_weights – allow us to save (and load) the model weights easily
• There is also an option to save the model itself and not only the weights. However, it
doesn’t work well when building custom models.
MNIST MODEL
We first initialize the layers we will use,
but not describing the model flow
(different from graph mode).
You can notice the real variables sized
are unknown, so it can’t be initialized
yet.
Here we override the call function, this
will be called each time our model is
activated.
class SimpleClassifier(tf.keras.Model):
def __init__(self):
super().__init__()
self.fc1 = tf.keras.layers.Dense(100, activation=tf.nn.relu)
self.fc2 = tf.keras.layers.Dense(50, activation=tf.nn.relu)
self.fc3 = tf.keras.layers.Dense(FLAGS.classes_amount)
self.optimizer = tf.train.AdamOptimizer()
def call(self, inputs, training=None, **kwargs):
x = tf.reshape(inputs, [inputs.shape[0], -1])
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
TIP: if you want to define list of layers use tf.contrib.checkpoint.List, for all APIs to work.
RUNNING THE MODEL
When we want to run the model, we treat it as runnable.
This will run the call function we wrote in the previous slide, with some extra logic.
model = SimpleClassifier()
results = model(inputs)
OPTIMIZATION PROCESS
Now in Eager mode, when we optimize, we need to clarify the model which variables
it should calculate gradients by.
This is what tf.GradientTape() as tape context is for. All intermediate results we need
for the gradients calculations of variables are saved. We can also use watch
command to tell the framework watch for any arbitrary tensor results.
We can later use tape.gradient(loss, variables) to get the gradients of the loss with
respect to each of the variables. This will automatically reset the tape and free the
memory.
TIP: If you need to call tape.gradient more the once use tf.GradientTape(persistent=True) - and use del later
MNIST MODEL
We define the loss function for the
model – nothing changed here.
No special reason to be in the model
class.
Here is the optimization process. As
described we run the forward in the
tape context, and then we calculate the
gradients and apply them.
Notice we use the model instance as
runnable, it will use the call function we
override before.
@staticmethod
def loss(logits, labels):
return tf.losses.sparse_softmax_cross_entropy(labels, logits)
def optimize(self, inputs, labels):
with tf.GradientTape() as tape:
logits = self(inputs)
batch_loss = self.loss(logits, labels)
gradients = tape.gradient(batch_loss, self.variables)
self.optimizer.apply_gradients(zip(gradients, self.variables))
return batch_loss
RUNNING TOGETHER
Now we just need to iterate over the data and optimize for each batch. This is done
very naturally in eager mode:
with tf.device('/gpu:0'):
model = SimpleClassifier()
for step, (batch_x, batch_y) in enumerate(train_ds):
loss = model.optimize(batch_x, batch_y)
if step % FLAGS.print_freq == 0:
print("Step {}: loss: {}".format(step, loss))
if step % FLAGS.validate_freq == 0:
accuracy = model.accuracy(x_test, y_test)
print("Step {}: test accuracy: {}".format(step, accuracy))
TIP: since tf1.8 there is automatically placement, however currently (tf1.9) it is still better to state the
device, performance wise
BUILDING CUSTOM LAYERS
We will now build MNIST autoencoder, where the reconstruct is the transpose of the
encoder layers.
In graph mode, we will have the variable w, and we can just use it again. This method
is also possible in eager mode, but since we try to work in a more OOP way, we will
construct a custom layer.
BUILDING CUSTOM LAYERS
To create custom layer, we need to implement:
• __Init__ - constructor, preparing everything we can without the input shape
• build – function which gets called in the first time the layer is running, here we will
know the input shape, and we can initialize all weights of the layer.
• call – the layer logic, will be called each time the layer runs on input.
BUILDING CUSTOM LAYERS
In init function, we define all the
information we need and variables that
possible.
This is where we define the layer logic.
You can see that the if statement
controls the flow and evaluates each
batch separately.
class InvDense(tf.keras.layers.Layer):
def __init__(self, dense_layer, activation=None, **kwargs):
super().__init__(**kwargs)
self.dense_layer = dense_layer
self.activation_func = activation
def build(self, input_shape):
out_hiddens = self.dense_layer.kernel.get_shape()[-2]
self.bias = self.add_variable("b", [out_hiddens],
initializer=tf.zeros_initializer)
super().build(input_shape)
def call(self, inputs, **kwargs):
x = tf.matmul(inputs, tf.transpose(self.dense_layer.kernel))
x += self.bias
if self.activation_func:
x = self.activation_func(x)
return x
BUILDING CUSTOM LAYERS
Now we initialize and use the custom
layers the same as any other
keras.layers.*.
class AutoEncoder(tf.keras.Model):
def __init__(self):
super().__init__()
self.fc1 = tf.keras.layers.Dense(100, activation=tf.nn.relu)
self.fc2 = tf.keras.layers.Dense(50, activation=tf.nn.relu)
self.fc2_t = InvDense(self.fc2, activation=tf.nn.relu)
self.fc1_t = InvDense(self.fc1)
self.optimizer = tf.train.AdamOptimizer()
def call(self, inputs, training=None, **kwargs):
x = tf.reshape(inputs, [inputs.shape[0], -1])
x = self.fc1(x)
x = self.fc2(x)
x = self.fc2_t(x)
x = self.fc1_t(x)
x = tf.reshape(x, inputs.shape)
return x
TEXT CLASSIFICATION
Now we do IMDb sentiment classification. We will start by building a suitable data
pipeline:
Notice we use from_generator since each example has different length
def _add_length(x, y):
x = x[:FLAGS.max_len]
x_dict = {"seq": x, "seq_len": tf.size(x)}
return x_dict, y
ds = tf.data.Dataset.from_generator(lambda: zip(x_train, y_train), output_types=(tf.int32, tf.int32),
output_shapes=([None], []))
ds = ds.map(_add_length, num_parallel_calls=4)
ds = ds.apply(tf.contrib.data.shuffle_and_repeat(len(x_train), FLAGS.num_epochs))
ds = ds.padded_batch(FLAGS.batch_size, padded_shapes=({"seq": [None], "seq_len": []}, []))
ds = ds.apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
TIP: from_generator accept any callable object which returns __iter__ supporting object
CONTROLLING FLOW
One of the significant advantages that eager brings us is the ability to control our flow
using python and tensors values. No need for tf.while_loop and tf.cond no more!
TEXT CLASSIFICATION
Notice that we now go
through the words using
python for loop.
This method won’t work
in graph mode since each
batch the shape in axis=1
is different.
def call(self, inputs, training=None, **kwargs):
seqs = self.word_emb(inputs["seq"])
state = [tf.zeros([seqs.shape[0], self.rnn_cell.state_size])]
seqs = tf.unstack(seqs, num=int(seqs.shape[1]), axis=1)
hiddens = []
for word in seqs:
h, state = self.rnn_cell(word, state)
hiddens.append(h)
hiddens = tf.stack(hiddens, axis=1)
last_hiddens = self.get_last_relevant(hiddens, inputs["seq_len"])
x = self.fc1(last_hiddens)
x = self.fc2(x)
return x
ADDING REGULARIZATION
In order to add the common regularizations, we can use tf.contrib.layers.*_regularizer
as we did in tensorflow.
Now, in order to get the regularization loss, instead of using
tf.GraphKeys.REGULARIZATION_LOSSES we will use keras.Model losses
l2_reg = tf.contrib.layers.l2_regularizer(FLAGS.reg_factor)
self.fc1 = tf.keras.layers.Dense(100, kernel_regularizer=l2_reg, bias_regularizer=l2_reg)
loss += tf.reduce_sum(self.losses)

More Related Content

PDF
Google TensorFlow Tutorial
PDF
Structure Unstructured Data
PPTX
H2 o berkeleydltf
PPTX
Introduction to TensorFlow 2
PPTX
Introduction to Deep Learning, Keras, and Tensorflow
PPTX
Introduction to TensorFlow 2
PPTX
Introduction to TensorFlow 2 and Keras
PPTX
TensorFlow in Your Browser
Google TensorFlow Tutorial
Structure Unstructured Data
H2 o berkeleydltf
Introduction to TensorFlow 2
Introduction to Deep Learning, Keras, and Tensorflow
Introduction to TensorFlow 2
Introduction to TensorFlow 2 and Keras
TensorFlow in Your Browser

What's hot (20)

PDF
Simple, fast, and scalable torch7 tutorial
PDF
Introduction to ad-3.4, an automatic differentiation library in Haskell
PDF
Intel® Xeon® Phi Coprocessor High Performance Programming
PPTX
Quality Python Homework Help
PPTX
Deep Learning and TensorFlow
PPT
Memory Management In C++
PDF
[2A3]Big Data Launching Episodes
PPT
Part 3-functions
PPT
Scmad Chapter07
PDF
Memory Management C++ (Peeling operator new() and delete())
PDF
Multirate sim
PDF
Learn a language : LISP
PDF
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
DOCX
Collection frame work
PPTX
DOCX
Memory management in c++
PPTX
Node.js behind: V8 and its optimizations
PPTX
Intro to Python (High School) Unit #3
PDF
Python Programming: Data Structure
KEY
openFrameworks 007 - graphics
Simple, fast, and scalable torch7 tutorial
Introduction to ad-3.4, an automatic differentiation library in Haskell
Intel® Xeon® Phi Coprocessor High Performance Programming
Quality Python Homework Help
Deep Learning and TensorFlow
Memory Management In C++
[2A3]Big Data Launching Episodes
Part 3-functions
Scmad Chapter07
Memory Management C++ (Peeling operator new() and delete())
Multirate sim
Learn a language : LISP
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
Collection frame work
Memory management in c++
Node.js behind: V8 and its optimizations
Intro to Python (High School) Unit #3
Python Programming: Data Structure
openFrameworks 007 - graphics
Ad

Similar to From Tensorflow Graph to Tensorflow Eager (20)

PDF
Keras and TensorFlow
PDF
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
PDF
OpenPOWER Workshop in Silicon Valley
PPTX
slide-keras-tf.pptx
PDF
Tensor flow description of ML Lab. document
PDF
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
PDF
Power ai tensorflowworkloadtutorial-20171117
PDF
Introduction to TensorFlow 2.0
PDF
A Tour of Tensorflow's APIs
PDF
TensorFlow and Keras: An Overview
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
PDF
maXbox starter65 machinelearning3
PDF
Icpp power ai-workshop 2018
PPTX
Detailed_TensorFlow_Keras_CheatSheet.pptx
PDF
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
PDF
TensorFlow example for AI Ukraine2016
PDF
Introduction To Using TensorFlow & Deep Learning
PDF
Introduction to Tensor Flow for Optical Character Recognition (OCR)
PPTX
TensorFlow in Practice
PDF
Assignment 5.2.pdf
Keras and TensorFlow
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
OpenPOWER Workshop in Silicon Valley
slide-keras-tf.pptx
Tensor flow description of ML Lab. document
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
Power ai tensorflowworkloadtutorial-20171117
Introduction to TensorFlow 2.0
A Tour of Tensorflow's APIs
TensorFlow and Keras: An Overview
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
maXbox starter65 machinelearning3
Icpp power ai-workshop 2018
Detailed_TensorFlow_Keras_CheatSheet.pptx
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
TensorFlow example for AI Ukraine2016
Introduction To Using TensorFlow & Deep Learning
Introduction to Tensor Flow for Optical Character Recognition (OCR)
TensorFlow in Practice
Assignment 5.2.pdf
Ad

Recently uploaded (20)

PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Lecture1 pattern recognition............
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Global journeys: estimating international migration
PDF
Foundation of Data Science unit number two notes
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Clinical guidelines as a resource for EBP(1).pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Moving the Public Sector (Government) to a Digital Adoption
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
IBA_Chapter_11_Slides_Final_Accessible.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Lecture1 pattern recognition............
Miokarditis (Inflamasi pada Otot Jantung)
Global journeys: estimating international migration
Foundation of Data Science unit number two notes
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Knowledge Engineering Part 1
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Supervised vs unsupervised machine learning algorithms

From Tensorflow Graph to Tensorflow Eager

  • 1. TF GRAPH TO TF EAGER Guy Hadash IBM Research AI
  • 2. WHY MOVE TO EAGER? Eager Execution changes the core idea of TensorFlow. Instead of describing the execution graph in Python, compiling it and then running it, the framework is now imperative. This means it creates the graph on the fly and runs operations immediately. This brings some significant improvements: • An intuitive interface • Fast development iterations • Easier debugging • Natural control flow
  • 3. WHY MOVE TO EAGER? The main ability which we gain now is: This will simply gives us the value of the tensor. As we said, this allow us much easier debugging, and we can also control the model flow based on the tensors values. There is no need for session anymore, and we can stop worry about graph dependencies and etc. tensor.numpy()
  • 4. SESSION PROGRAM • Data pipeline • Classifier • go over the necessary stuff for building and training custom model • Autoencoder • building custom layer • Text classification • controlling model flow with python, and working with sequence data All the code inside colab: https://goo.gl/q3rHNT
  • 5. BEST PRACTICES Moving from graph mode to eager mode also makes it much more natural to now work in OOPier way. We will inherit from tf.keras.Model and tf.keras.layers.Layer. We will use tf.Data for easy and fast data pipeline.
  • 6. DATA PIPELINE tf.Data is the current best practice for handling the data pipeline. This will allow us easy and fast data pipeline. From my experience the most common used initializers are: from_tensor_slices – retrieve one sample at a time from given tensor. Best for simple cases training. from_tensors – returns the full dataset at once. Helpful for testing. from_generator – a more flexible way, useful in more complicated use cases. (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)) TIP: All dataset initializers works naturally with tuples and dictionaries (also nested)
  • 7. DATA PIPELINE Now that we have Dataset object we can quickly build pipeline: By creating this pipeline, we allow TF to utilize the CPU in parallel to our training and prepare the next batch in the GPU waiting for the next optimization step. train_ds = train_ds.map(_normalize, num_parallel_calls=4) train_ds = train_ds.apply(tf.contrib.data.shuffle_and_repeat(buffer_size, num_epochs)) # train_ds = train_ds.shuffle(buffer_size).repeat(num_epochs) train_ds = train_ds .batch(batch_size).apply(tf.contrib.data.prefetch_to_device("/gpu:0")) TIP: Even when running in Eager mode the pipeline runs as a graph TIP: Buffer size should be big enough for effective shuffling TIP: prefetch_to_device must be last operation
  • 8. THE BASIC BUILDING BLOCKS tf.keras.layers.Layer Layer - a group of variables tied together. tf.keras.Network Network – a group of layers tied together tf.keras.Model Model – network with all the training utils Each of them is callable.
  • 9. BUILDING MODEL When building a model in eager execution, we derive from tf.keras.Model. This gives us a few important properties which we will use: • model.variables - automatically returns all model’s variables. • It does it by taking all variables from layers (inherit from tf.layers.Layer), and models (inherit from tf.keras.Network) • model.save_weights – allow us to save (and load) the model weights easily • There is also an option to save the model itself and not only the weights. However, it doesn’t work well when building custom models.
  • 10. MNIST MODEL We first initialize the layers we will use, but not describing the model flow (different from graph mode). You can notice the real variables sized are unknown, so it can’t be initialized yet. Here we override the call function, this will be called each time our model is activated. class SimpleClassifier(tf.keras.Model): def __init__(self): super().__init__() self.fc1 = tf.keras.layers.Dense(100, activation=tf.nn.relu) self.fc2 = tf.keras.layers.Dense(50, activation=tf.nn.relu) self.fc3 = tf.keras.layers.Dense(FLAGS.classes_amount) self.optimizer = tf.train.AdamOptimizer() def call(self, inputs, training=None, **kwargs): x = tf.reshape(inputs, [inputs.shape[0], -1]) x = self.fc1(x) x = self.fc2(x) x = self.fc3(x) return x TIP: if you want to define list of layers use tf.contrib.checkpoint.List, for all APIs to work.
  • 11. RUNNING THE MODEL When we want to run the model, we treat it as runnable. This will run the call function we wrote in the previous slide, with some extra logic. model = SimpleClassifier() results = model(inputs)
  • 12. OPTIMIZATION PROCESS Now in Eager mode, when we optimize, we need to clarify the model which variables it should calculate gradients by. This is what tf.GradientTape() as tape context is for. All intermediate results we need for the gradients calculations of variables are saved. We can also use watch command to tell the framework watch for any arbitrary tensor results. We can later use tape.gradient(loss, variables) to get the gradients of the loss with respect to each of the variables. This will automatically reset the tape and free the memory. TIP: If you need to call tape.gradient more the once use tf.GradientTape(persistent=True) - and use del later
  • 13. MNIST MODEL We define the loss function for the model – nothing changed here. No special reason to be in the model class. Here is the optimization process. As described we run the forward in the tape context, and then we calculate the gradients and apply them. Notice we use the model instance as runnable, it will use the call function we override before. @staticmethod def loss(logits, labels): return tf.losses.sparse_softmax_cross_entropy(labels, logits) def optimize(self, inputs, labels): with tf.GradientTape() as tape: logits = self(inputs) batch_loss = self.loss(logits, labels) gradients = tape.gradient(batch_loss, self.variables) self.optimizer.apply_gradients(zip(gradients, self.variables)) return batch_loss
  • 14. RUNNING TOGETHER Now we just need to iterate over the data and optimize for each batch. This is done very naturally in eager mode: with tf.device('/gpu:0'): model = SimpleClassifier() for step, (batch_x, batch_y) in enumerate(train_ds): loss = model.optimize(batch_x, batch_y) if step % FLAGS.print_freq == 0: print("Step {}: loss: {}".format(step, loss)) if step % FLAGS.validate_freq == 0: accuracy = model.accuracy(x_test, y_test) print("Step {}: test accuracy: {}".format(step, accuracy)) TIP: since tf1.8 there is automatically placement, however currently (tf1.9) it is still better to state the device, performance wise
  • 15. BUILDING CUSTOM LAYERS We will now build MNIST autoencoder, where the reconstruct is the transpose of the encoder layers. In graph mode, we will have the variable w, and we can just use it again. This method is also possible in eager mode, but since we try to work in a more OOP way, we will construct a custom layer.
  • 16. BUILDING CUSTOM LAYERS To create custom layer, we need to implement: • __Init__ - constructor, preparing everything we can without the input shape • build – function which gets called in the first time the layer is running, here we will know the input shape, and we can initialize all weights of the layer. • call – the layer logic, will be called each time the layer runs on input.
  • 17. BUILDING CUSTOM LAYERS In init function, we define all the information we need and variables that possible. This is where we define the layer logic. You can see that the if statement controls the flow and evaluates each batch separately. class InvDense(tf.keras.layers.Layer): def __init__(self, dense_layer, activation=None, **kwargs): super().__init__(**kwargs) self.dense_layer = dense_layer self.activation_func = activation def build(self, input_shape): out_hiddens = self.dense_layer.kernel.get_shape()[-2] self.bias = self.add_variable("b", [out_hiddens], initializer=tf.zeros_initializer) super().build(input_shape) def call(self, inputs, **kwargs): x = tf.matmul(inputs, tf.transpose(self.dense_layer.kernel)) x += self.bias if self.activation_func: x = self.activation_func(x) return x
  • 18. BUILDING CUSTOM LAYERS Now we initialize and use the custom layers the same as any other keras.layers.*. class AutoEncoder(tf.keras.Model): def __init__(self): super().__init__() self.fc1 = tf.keras.layers.Dense(100, activation=tf.nn.relu) self.fc2 = tf.keras.layers.Dense(50, activation=tf.nn.relu) self.fc2_t = InvDense(self.fc2, activation=tf.nn.relu) self.fc1_t = InvDense(self.fc1) self.optimizer = tf.train.AdamOptimizer() def call(self, inputs, training=None, **kwargs): x = tf.reshape(inputs, [inputs.shape[0], -1]) x = self.fc1(x) x = self.fc2(x) x = self.fc2_t(x) x = self.fc1_t(x) x = tf.reshape(x, inputs.shape) return x
  • 19. TEXT CLASSIFICATION Now we do IMDb sentiment classification. We will start by building a suitable data pipeline: Notice we use from_generator since each example has different length def _add_length(x, y): x = x[:FLAGS.max_len] x_dict = {"seq": x, "seq_len": tf.size(x)} return x_dict, y ds = tf.data.Dataset.from_generator(lambda: zip(x_train, y_train), output_types=(tf.int32, tf.int32), output_shapes=([None], [])) ds = ds.map(_add_length, num_parallel_calls=4) ds = ds.apply(tf.contrib.data.shuffle_and_repeat(len(x_train), FLAGS.num_epochs)) ds = ds.padded_batch(FLAGS.batch_size, padded_shapes=({"seq": [None], "seq_len": []}, [])) ds = ds.apply(tf.contrib.data.prefetch_to_device("/gpu:0")) TIP: from_generator accept any callable object which returns __iter__ supporting object
  • 20. CONTROLLING FLOW One of the significant advantages that eager brings us is the ability to control our flow using python and tensors values. No need for tf.while_loop and tf.cond no more!
  • 21. TEXT CLASSIFICATION Notice that we now go through the words using python for loop. This method won’t work in graph mode since each batch the shape in axis=1 is different. def call(self, inputs, training=None, **kwargs): seqs = self.word_emb(inputs["seq"]) state = [tf.zeros([seqs.shape[0], self.rnn_cell.state_size])] seqs = tf.unstack(seqs, num=int(seqs.shape[1]), axis=1) hiddens = [] for word in seqs: h, state = self.rnn_cell(word, state) hiddens.append(h) hiddens = tf.stack(hiddens, axis=1) last_hiddens = self.get_last_relevant(hiddens, inputs["seq_len"]) x = self.fc1(last_hiddens) x = self.fc2(x) return x
  • 22. ADDING REGULARIZATION In order to add the common regularizations, we can use tf.contrib.layers.*_regularizer as we did in tensorflow. Now, in order to get the regularization loss, instead of using tf.GraphKeys.REGULARIZATION_LOSSES we will use keras.Model losses l2_reg = tf.contrib.layers.l2_regularizer(FLAGS.reg_factor) self.fc1 = tf.keras.layers.Dense(100, kernel_regularizer=l2_reg, bias_regularizer=l2_reg) loss += tf.reduce_sum(self.losses)

Editor's Notes

  • #8: If possible, set buffer_size to train set size