1. Stefan Seegerer, hi@stefanseegerer.de Matthias Zürl, matthias.zuerl@fau.de CC-BY-SA Last updated: 10/2021
PyTorch CHEAT SHEET
General
PyTorch is a open source machine learning framework. It uses torch.Tensor – multi-dimensional
matrices – to process. A core feature of neural networks in PyTorch is the autograd package,
which provides automatic derivative calculations for all operations on tensors.
There are several ways to
define a neural network in
PyTorch, e.g. with
nn.Sequential (a), as a
class (b) or using a
combination of both.
import torch
import torch.nn as nn
Root package
Neural networks
import torch.nn.functional as F Collection of layers,
activations & more
from torchvision import
datasets, models, transforms
Popular image datasets,
architectures & transforms
torch.randn(*size)
tnsr.view(a,b, ...)
Create random tensor
Reshape tensor to
size (a, b, ...)
requires_grad=True tracks computation history
for derivative calculations
torch.Tensor(L) Create tensor from list
class Net(nn.Module):
def __init__():
super(Net, self).__init__()
self.conv
= nn.Conv2D( , , )
self.pool
= nn.MaxPool2D( )
self.fc = nn.Linear( , )
def forward(self, x):
return x
model = Net()
x = self.pool(
F.relu(self.conv(x))
)
x = self.fc(x)
x = x.view(-1, )
nn.Conv2D( , , )
nn.MaxPool2D( )
nn.ReLU()
nn.Flatten()
nn.Linear( , )
model = nn.Sequential(
) a
Define model
b
It is common practice to save only the model parameters, not the
whole model using model.state_dict()
Save/Load model
model = torch.load('PATH')
torch.save(model, 'PATH') Save model
Load model
GPU Training
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
If a GPU with CUDA support is available, computations are sent to
the GPU with ID 0 using model.to(device) or
inputs, labels = data[0].to(device), data[1].to(device).
Activation functions
nn.ReLU() or F.relu()
Output between 0 and ∞,
most frequently used activation function
nn.Sigmoid() or F.sigmoid()
Output between 0 and 1,
often used for predicting probabilities
nn.Tanh() or F.tanh()
Output between -1 and 1,
often used for classification with two classes
Common activation functions include ReLU,
Sigmoid and Tanh, but there are other activation
functions as well.
Evaluate model
model.eval() Activates evaluation mode, some layers
behave differently
Prevents tracking history, reduces memory
usage, speeds up calculations
torch.no_grad()
The evaluation examines whether the model provides
satisfactory results on previously withheld data.
Depending on the objective, different metrics are used,
such as acurracy, precision, recall, F1, or BLEU.
Train model
LOSS FUNCTIONS
OPTIMIZATION (torch.optim)
PyTorch already offers a bunch of different
loss fuctions, e.g.:
Optimization algorithms are used to update
weights and dynamically adapt the learning
rate with gradient descent, e.g.:
nn.L1Loss Mean absolute error
Stochastic gradient descent
Adaptive moment estimation
optim.SGD
optim.Adam
Adaptive gradient
Root mean square prop
optim.Adagrad
optim.RMSProp
nn.MSELoss Mean squared error (L2Loss)
nn.CrossEntropyLoss Cross entropy, e.g. for single-label
classification or unbalanced training set
nn.BCELoss Binary cross entropy, e.g. for multi-label
classification or autoencoders
Load data
A dataset is represented by a class that
inherits from Dataset (resembles a list
of tuples of the form (features, label)).
DataLoader allows to load a dataset
without caring about its structure.
Usually the dataset is split into training
(e.g. 80%) and test data (e.g. 20%).
Layers
nn.Linear(m, n): Fully Connected
layer (or dense layer) from
m to n neurons
nn.BatchNormXd(n): Normalizes a X-dimensional
input batch with n features; X {1, 2, 3}
nn.RNN/LSTM/GRU: Recurrent networks
connect neurons of one layer with neurons of the
same or a previous layer
nn.Dropout(p=0.5): Randomly
sets input elements to zero during
training to prevent overfitting
nn.Flatten(): Flattens a contiguous
range of dimensions into a tensor
nn.ConvXd(m, n, s): X-dimensional
convolutional layer from m to n channels
with kernel size s; X {1, 2, 3}
nn.MaxPoolXd(s): X-dimensional pooling
layer with kernel size s; X {1, 2, 3}
torch.nn offers a bunch of other building blocks.
A list of state-of-the-art architectures can be found at https://paperswithcode.com/sota.
nn.Embedding(m, n): Lookup table
to map dictionary of size m to
embedding vector of size n
1 Load data 2 Define model 3 Train model 4 Evaluate model
nn.ReLU() creates a nn.Module for example to be used in
Sequential models. F.relu() ist just a call of the ReLU function
e.g. to be used in the forward method.