Exploring the Transformer architecture behind GPT and Llama

We are using text-generative models like GPT, Llama and Gemini in almost all our day-2-day activities. But did we stop thinking what happens under the hood and how it works? These models are based on Transformer architecture, which was first introduced at 2017. The brake down of this architecture can be a bit hard to explain, but there is a great visualization (credit to: https://lnkd.in/dnyivbv5) that helps understanding the different phases and calculation along the way and thought to share with you: https://lnkd.in/dcggCpz9 I hope that can help make some order :) If not, feel free to reach out to discuss.

1 Comment

Kobi Lemberg

Senior Big Data Software Engineer & Consultant @ BigData Boutique

Great visualization!

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Muks Syed

Founder @ HookedGrowth
1w
Report this post
Lightbulb moment: LLMs are pure functions 💡 Unless you are involved in building a foundational model (FM), the single most important thing that matters when building AI applications is how you 'engineer' everything around your chosen FMs and how you evaluate and improve the performance of your 'engineered' architecture (evals).
Like Comment
To view or add a comment, sign in
Paul Preiss

Building and growing the architect profession
1mo Edited
Report this post
Another amazing free architecture event from Iasa Communities: AI and Architecture with Jesper Lowgren Sept 9th Register here: https://lnkd.in/eJiFcRag What is different in the agentic world is simple to name and profound to design: 1. A new lexicon takes centre stage, with terms like autonomy, emergence, intelligence, cognition, ontology and semantics. 2. We move from frameworks to models to simulations; architects shift from static blueprints to living representations and simulation-first validation that proves behaviour before release. 3. Systems move from determinism to autonomy to emergence; as autonomous agents interact, novel patterns appear, and risk grows non-linearly with scale. The agentic challenge is huge, and success depends on the right foundation across business, application, data and technology architectures. This presentation outlines why architects are in a pole position to drive agentic architectures and transformation.
4 Comments
Like Comment
To view or add a comment, sign in
MANOLO Project

694 followers
2w
Report this post
✨ What makes time series analysis unique compared to other machine learning approaches is the central role of time representation in shaping experiment design. In our latest work, we explore two variations of the Transformer architecture: 🔹 One using a fixed time representation proposed in the literature 🔹 One where the time representation is learned directly from data 👉 Read the full article here: https://lnkd.in/dVhnUREE 𝐘𝐨𝐮 𝐜𝐚𝐧 𝐫𝐞𝐚𝐝 𝐚𝐥𝐥 𝐨𝐮𝐫 𝐩𝐮𝐛𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐌𝐀𝐍𝐎𝐋𝐎 𝐰𝐞𝐛𝐬𝐢𝐭𝐞: https://lnkd.in/dWybAK7w #MachineLearning #TimeSeries #Transformers #AIResearch #RenewableEnergy #HumanInTheLoop #MANOLOProject
Like Comment
To view or add a comment, sign in
AEC Magazine

10,916 followers
1w
Report this post
Egnyte puts AEC AI Agents to work - extracting details from specification files and delivering guidance for building code compliance https://lnkd.in/eSm-Kmz3 #AI #Architecture #AEC #buildingcodes #specifications
Like Comment
To view or add a comment, sign in
Erin Mancini

Global Public Relations | Corporate Communications | External Affairs Strategist
1w
Report this post
There has been considerable discussion around "practical AI" as the appetite for incorporating AI into workflows continues to rise. Last week's announcement from Egnyte is a prime example of what practical AI looks like. Our new AEC AI Agents are specifically designed to solve challenges facing architecture, engineering, and construction firms on a daily basis by delivering AI guidance on lengthy, complex files. Thanks to Greg Corke for the great writeup!
AEC Magazine

10,916 followers
1w

Egnyte puts AEC AI Agents to work - extracting details from specification files and delivering guidance for building code compliance https://lnkd.in/eSm-Kmz3 #AI #Architecture #AEC #buildingcodes #specifications
1 Comment
Like Comment
To view or add a comment, sign in
Kushan P.

Software Engineering @ Northwell | Healthcare & Life Sciences | Functioned as Engineering Manager, Cloud + AI Solutions Architect, AI Product Manager
3w
Report this post
Building for production isn't the final step; it's the first. Your AI's architecture needs to be ready for prime time.
Like Comment
To view or add a comment, sign in
Jinman P.

Shaping next-gen AI architectures with intent and causality.
2w
Report this post
Rajeev Bhargava’s latest article contrasts two paradigms of agency. The traditional Outcome → Calculation → Execution model has powered human civilization with predictability and control. But in today’s complex environments, a new model: Intent → Causality → Control is emerging, where systems adapt in real time and intelligence is embedded within. This shift raises deep questions about responsibility, governance, and trust in autonomous systems. It also points toward hybrid architectures that blend the stability of established frameworks with the adaptability of emergent reasoning. Read Full Article: https://lnkd.in/gsfy_vzn

The Architecture of Agency: From Predetermined Paths to Emergent Understanding medium.com
Like Comment
To view or add a comment, sign in
Infrastructures MDPI

501 followers
3w Edited
Report this post
#Highlycitedpaper 📖 U-Net-Based CNN Architecture for Road Crack Segmentation ✍ By Alessandro Di Benedetto, Margherita Fiani and Lucas Gujski 🏘️ From University of Salerno 👉 https://lnkd.in/gvnrexP8 ✨ The aim of this study is to optimize the crack segmentation process through the implementation of a modified U-Net model based algorithm. For this, the Crack500 Dataset proposed by Yang et al. in 2019 was used, and then the results were compared with those obtained from the algorithm that is currently found to be the most accurate and performant in the literature, U-Net by Lau et al. The results are promising and accurate, and the shape and width of the segmented cracks are very close to reality. #infrastructures #road #roadcrack #convolutionalneuralnetworks

U-Net-Based CNN Architecture for Road Crack Segmentation mdpi.com
Like Comment
To view or add a comment, sign in
Chris Carcich

Technology Sales Leader | Strategic Leadership | Building High-Performing Teams & Market Impact
3w
Report this post
Andy Neill does a great job breaking down Double Retrieved Augmented Generation — a powerful way to optimize how LLMs interact with enterprise data. What I love here is the bigger takeaway: AI isn’t plug-and-play. The architecture patterns we choose today will define whether AI becomes a productivity booster… or a bottleneck. Has me thinking…I’d be curious to hear how this resonates with others for application in your environments?

Andy Neill

Architect, Polymath & Servant Leader
3w

Lets talk about new AI Architecture Patterns! #BytesOnBikes Justin St-Maurice, PhD Martin Bufi Info-Tech Research Group
Like Comment
To view or add a comment, sign in
Ali Mansouri

Emerging Trends Futurist | Jira Developer | Jira Administrator | Data & Strategy Architect | Data Scientist
3w
Report this post
On Brittle Architectures: The Cost of Overfitting in Foresight Models In computational foresight, there’s a seductive trap: building models that achieve near-perfect accuracy on historical data. These are “overfitted” systems—architectures that have memorized the past so flawlessly they can no longer generalize to the future. Their elegance is a mask for their fragility. Consider their defining traits: Feature Fixation: The model develops an extreme sensitivity to a static set of input features it learned during training. It dismisses novel or “noisy” data points (e.g., emergent technologies, new working modalities) not because they lack value, but because they don’t conform to its rigid input structure. Innovation becomes an anomaly. High Sensitivity to Input Variance: These systems demand sanitized, predictable data streams. They exhibit low tolerance for the ambiguity and “noise” inherent in complex, real-world environments. The need for constant, direct observation isn’t a strategy for better performance; it’s a workaround for a model too brittle to handle asynchronous or distributed inputs. The Static Loss Function: The model optimizes for a single, archaic “loss function” (e.g., minimizing short-term computational cost) while ignoring critical, dynamic metrics (e.g., model degradation over time or the cost of discarding valuable out-of-distribution data). The system’s prime directive is to minimize its predefined error, even if that error metric is now irrelevant. The ultimate paradox of an overfitted architecture is that it appears highly optimized. It’s fast, precise, and predictable within its narrow operational domain. But this local optimum is a dead end. Such systems are not learning; they are merely confirming their own biases. Their future is not adaptation, but a graceful, then sudden, slide into irrelevance. #Foresight #MachineLearning #SystemsThinking #Overfitting #DataScience #Innovation #Complexity
Like Comment
To view or add a comment, sign in

6,188 followers

View Profile Follow

LinkedIn respects your privacy

Exploring the Transformer architecture behind GPT and Llama

More from this author

We're hiring!!!

Explore content categories