LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Hands-On Introduction to Transformers for Computer Vision

Unlock this course with a free trial

Join today to access over 24,700 courses taught by industry experts.

Building a simple ViT

Building a simple ViT - PyTorch Tutorial

From the course: Hands-On Introduction to Transformers for Computer Vision

Start my 1-month free trial Buy for my team

Building a simple ViT

“

Now, we're going to go for gold and build it all from scratch. We've been using all these helper libraries and I promise, we're not quite done yet. But other than Torch, we're going to be building a transformer EPOCHS, we're only going to train for 5 epochs. If you don't know what an epoch is, it's basically how many times you train your model on the dataset. Typically, one epoch means one full pass through your training dataset. LR, that's our learning rate. We talked about that in some of our training strategies. IMAGE_SIZE is only 28. These are really small pictures. And since they're really small pictures, we're going to do 4 by 4 patches for our transformer here. We can calculate the number of patches. We also have our embedded dimensions, the number of classes, the number of heads, the number of layers, and our MLP dimensions. So first thing we need to do is load in the MNIST dataset. We are going to do this with torch this time. Granted, now MNIST is built into torch. It's very…

Contents