From the course: Large Language Models: Text Classification for NLP using BERT
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Multi-head attention and feedforward network - Python Tutorial
From the course: Large Language Models: Text Classification for NLP using BERT
Multi-head attention and feedforward network
- [Instructor] Earlier, we looked at how self-attention can help us provide context for a word for the sentence, "The monkey ate that banana because it was too hungry." But what if you could get multiple instances of this self-attention mechanism, so that each can perform a different task? One could make up a link between nouns and adjectives. Another could connect the pronouns to their subjects. Now that's the idea behind multi-headed attention. And what's particularly impressive, is that we don't need to create these relations in the model, they are fully learned from the data. So BERT has 12 such heads, and each multi-headed attention block gets three inputs, the query, the key and the value. These are put through linear or dense layers before the multi-headed attention function. The query, key and value are passed through separate fully connected linear layers for each attention head. The model can jointly attend to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.