Exploring the Transformer architecture behind GPT and Llama

View profile for Shimi Carmeli

Sr, Engineering Executive | Leadership | Cyber Security | AI | Cloud Platforms | Data | Analytics & Observability | DevOps | Digital Transformation | Industrial IoT | On-Prem | Building High-Growth Tech Teams

We are using text-generative models like GPT, Llama and Gemini in almost all our day-2-day activities. But did we stop thinking what happens under the hood and how it works? These models are based on Transformer architecture, which was first introduced at 2017. The brake down of this architecture can be a bit hard to explain, but there is a great visualization (credit to: https://lnkd.in/dnyivbv5) that helps understanding the different phases and calculation along the way and thought to share with you: https://lnkd.in/dcggCpz9 I hope that can help make some order :) If not, feel free to reach out to discuss.

  • diagram
Kobi Lemberg

Senior Big Data Software Engineer & Consultant @ BigData Boutique

3w

Great visualization!

To view or add a comment, sign in

Explore content categories