From the course: Introduction to Transformer Models for NLP

Introduction

- Welcome to Introduction to Transformer Models for NLP, using BERT GPT and more to solve modern NLP tasks. I'm Sinan Ozdemir. I'm a tech entrepreneur focusing on applications and natural language processing and artificial intelligence, and I've been working in the field of deep learning and NLP for the last decade. I have previously lectured at Johns Hopkins University on the topics of mathematics, computer science, and machine learning. I've also written five books focusing on data science, deep learning, and feature engineering. In these lessons, you learn how transformers have revolutionized natural language processing in the last few years, and how to apply multiple transformer based architectures to perform multiple modern NLP tasks. The first lesson will provide an overview of the history of modern NLP and language modeling, including the powerful mechanisms that make the transformer model so versatile. The next lesson takes a deep dive into the mathematical formulas that bring the transformer to life, and power large scale effective and efficient text processing systems. After an Introduction to Transformers, we'll take a look at what makes large pre-trained NLP models usable by the masses transfer learning. With all of that history, math, and theory in place, the next lesson focuses on natural language understanding using BERT. We'll see how BERT is pre-trained on huge corpora to understand language as a whole, and how we can take that learning and transfer it to a fine tuned BERT using our own custom data sets. We'll then present multiple use cases of fine tuning models using a pre-trained BERT as a starting point. With an understanding of how BERT understands text, we'll turn our focus to how natural language generation architectures like GPT change the way that machines write free text. We'll then see how we can fine tune GPT to learn new syntaxes, translations, and styles. The next lesson will kick things up a notch by introducing two complex use cases of BERT and GPT showcasing what these models can really do. The next lesson focuses on the power of the end-to-end transformer with a complete encoder and decoder stack using the T5 model. We'll see how pre-training T5 leads to excellent off the shelf results and how easy it can be to fine tune even a large and complicated model. After T5, we'll take a brief tangent into how the transformer architecture entered the field of computer vision with the vision transformer, and how we can combine transformers together to create our own custom image captioning system from scratch. We'll then learn how to share all of our hard work with the community by looking at the basics of MLOps and strategies to deploy transformer models to the cloud. Finally, we'll venture into the world of massively large language models like GPT-3 and ChatGPT to see how we can harness the power of state-of-the-art closed source language models for our own benefit.

Contents