From the course: Hands-On Introduction to Transformers for Computer Vision
Unlock this course with a free trial
Join today to access over 24,700 courses taught by industry experts.
Building a simple ViT - PyTorch Tutorial
From the course: Hands-On Introduction to Transformers for Computer Vision
Building a simple ViT
Now, we're going to go for gold and build it all from scratch. We've been using all these helper libraries and I promise, we're not quite done yet. But other than Torch, we're going to be building a transformer EPOCHS, we're only going to train for 5 epochs. If you don't know what an epoch is, it's basically how many times you train your model on the dataset. Typically, one epoch means one full pass through your training dataset. LR, that's our learning rate. We talked about that in some of our training strategies. IMAGE_SIZE is only 28. These are really small pictures. And since they're really small pictures, we're going to do 4 by 4 patches for our transformer here. We can calculate the number of patches. We also have our embedded dimensions, the number of classes, the number of heads, the number of layers, and our MLP dimensions. So first thing we need to do is load in the MNIST dataset. We are going to do this with torch this time. Granted, now MNIST is built into torch. It's very…