From the course: Hands-On Introduction to Transformers for Computer Vision

Unlock this course with a free trial

Join today to access over 24,700 courses taught by industry experts.

Building a simple ViT

Building a simple ViT

Now, we're going to go for gold and build it all from scratch. We've been using all these helper libraries and I promise, we're not quite done yet. But other than Torch, we're going to be building a transformer EPOCHS, we're only going to train for 5 epochs. If you don't know what an epoch is, it's basically how many times you train your model on the dataset. Typically, one epoch means one full pass through your training dataset. LR, that's our learning rate. We talked about that in some of our training strategies. IMAGE_SIZE is only 28. These are really small pictures. And since they're really small pictures, we're going to do 4 by 4 patches for our transformer here. We can calculate the number of patches. We also have our embedded dimensions, the number of classes, the number of heads, the number of layers, and our MLP dimensions. So first thing we need to do is load in the MNIST dataset. We are going to do this with torch this time. Granted, now MNIST is built into torch. It's very…

Contents