LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Large Language Models: Text Classification for NLP using BERT

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Multi-head attention and feedforward network

Multi-head attention and feedforward network - Python Tutorial

From the course: Large Language Models: Text Classification for NLP using BERT

Start my 1-month free trial Buy for my team

Multi-head attention and feedforward network

“

- [Instructor] Earlier, we looked at how self-attention can help us provide context for a word for the sentence, "The monkey ate that banana because it was too hungry." But what if you could get multiple instances of this self-attention mechanism, so that each can perform a different task? One could make up a link between nouns and adjectives. Another could connect the pronouns to their subjects. Now that's the idea behind multi-headed attention. And what's particularly impressive, is that we don't need to create these relations in the model, they are fully learned from the data. So BERT has 12 such heads, and each multi-headed attention block gets three inputs, the query, the key and the value. These are put through linear or dense layers before the multi-headed attention function. The query, key and value are passed through separate fully connected linear layers for each attention head. The model can jointly attend to…

Contents

- Natural language processing with transformers
  
  34s
- How to use the exercise files
  
  34s