Transformers and Large Language Models: Intro to the foundational architecture of Generative AI

Rakuten Symphony

The mission is to connect everybody and enable all to be. Rakuten. Telecom Reinvented.

Published Jul 31, 2025

More than a billion+ users later, LLMs have been adopted faster than any tech in history, powered by the experience shift from “searching” for information and being presented links to getting targeted answers that are “generated” in milliseconds. With this shift came fast-changing expectations across enterprise software, telco operations and daily productivity.

What is happening and what is behind the boost into the future?

It’s been one year since Partha Seetala, president of Rakuten Cloud, launched his AI training series, A Comprehensive and Intuitive Introduction to Deep Learning (CIDL).

The first season opened with “An Intuitive Introduction to Neural Networks,” delivering a clear message: in today’s landscape, not understanding AI can seriously hold back your career or your product. What makes Partha’s sessions stand out is how well they balance depth and accessibility, crafted to make complex AI concepts understandable, whether you're an engineer or an executive. They have become required viewing for teams in telco, tech and beyond.

Recently, we discussed key takeaways from season two, which focused on how neural networks process sequence data like text and timeseries information. This spanned techniques like embeddings, RNNs, LSTMs, Seq2Seq and attention. (Check out our interview with Partha on the role of these approaches from last week’s Zero-Touch Live.)

Understanding AI model behavior and how to influence it is critical. Equally important is understanding the role architecture plays.

In season three, viewers learn the details behind how Large Language Models (LLMs) work, including how machines compress large volumes of human knowledge into a transformer neural network and present it back in highly targeted ways when queried. Season three will focus not just on the what and how of LLMs and transformers, but also the why. In particular, why components are structured the way they are.

Episode one is now available and kicks off with the architecture that redefined AI. This puts the focus on transformers, which is the architecture powering today’s LLMs and enterprise AI systems.

Why transformers matter

Transformers represent a leap in design, introducing parallelism, context awareness and general purpose learning. They are the foundation of modern LLMs, evolving generative AI from theory to practical deployment and giving us household names like ChatGPT, Gemini and Claude.

Two breakthroughs have been incredibly important:

Positional encoding enabled models to process entire sequences in parallel versus simply word-by-word like RNNs.
Self-attention allowed models to dynamically weigh context and meaning for each word/token (i.e., not just memorize, but actually understand).

In telecom especially, AI models won’t be delivered as boxed solutions from vendors. (If you are offered one, be incredibly wary!) Rather, these models are becoming embedded in infrastructure, workflows and especially data. That means engineers must understand how transformers fundamentally work.

It goes back to Partha’s recurring mantra that AI cannot be viewed as a black box or magic.

What to expect in season three

Season three dives into three transformer types:

Encoder-only. Used for classification, extractive QA, etc. (e.g., BERT, Electra).
Decoder-only. Used for generative tasks, including LLMs (e.g., GPT).
Encoder-decoder. Used for translation and generation tasks (e.g., T5, MarianMT, BART).

As in previous seasons, the focus is on intuitive understanding, not just formulas. With this in mind, Partha breaks down each architectural component of the Transformer, including embedding, positional encoding, self-attention, feedforward layers, normalization and stacking:

Embedding. Words are turned into dense vectors so the model can “see” them as numbers.
Positional encoding. Extra numbers are added to tell the model where each word sits in the sentence.
Self-attention. Every word looks at every other word to decide which ones matter most.
Feed-forward layers. Simple neural nets give each word a quick, non-linear polish between attention rounds.
Normalization. Outputs are scaled and shifted so training stays stable and fast.
Stacking. Blocks are piled atop one another to build deeper, more powerful understanding.

Throughout the season, approaches for training and fine-tuning will be covered, as well as the role of emergent behavior, agent architectures, retrieval-augmented generation (RAG) and reasoning models. This means ultimately expanding focus beyond today’s LLMs.

This course isn’t just for teams building models from scratch. It’s equally valuable for evaluating, tuning and integrating foundation models into real systems. This is especially true in telecom, where alignment with operational data, constraints and intent is essential.

Check out season three today

In telecom and enterprise tech, deploying AI isn’t just about what models can do but about understanding how they work. Season three teaches the architectural fluency to build, adapt and apply transformer models in ways that align with real-world constraints and goals. Episode one is available now with more episodes on the way soon.

Have a question for Partha Seetala or want to see specific topics covered in an upcoming course? Mention him in the comments to start a conversation. And remember to subscribe to the Zero-Touch newsletter to have insights like these sent to your inbox every week.

Transformers and Large Language Models: Intro to the foundational architecture of Generative AI

Rakuten Symphony

The mission is to connect everybody and enable all to be. Rakuten. Telecom Reinvented.

Why transformers matter

What to expect in season three

Check out season three today

Zero-Touch

69,257 followers

More articles by this author

Others also viewed

The 5 Biggest Computer Vision Trends In 2022

⚛️ Quantum-Enhanced AI - It's Here

Intellectual abilities of artificial intelligence (AI)

🧠⚙️ Neuro-Symbolic Reinforcement Learning: Building Trustworthy and Generalizable AI

Building generative AI applications with Amazon Bedrock Agents – Part 2

The Research Philosophy of DeepMind 🧠

Creating Generative AI Models: A Beginner's Guide

AI by AI

Understanding AI's Inner Workings: The Quest for LLM Interpretability

AI is Dead! Long Live AI

Explore topics

Why transformers matter

What to expect in season three

Check out season three today

Zero-Touch

69,257 followers

AI answers you can’t buy: a live discussion

Aug 14, 2025

AI is having a Model T moment

Aug 7, 2025

Making AI modeling intuitive: a live discussion

Jul 24, 2025

Making AI work in the real world: Latest web training series focuses on behavior and model control

Jul 17, 2025

Unlocking telco data intelligence value: A live discussion

Jun 26, 2025

Time to scale: A telco roadmap for cloud-native success

Jun 19, 2025

Scaling open source in telecom: A live discussion with Orange

Jun 12, 2025

Data intelligence breakthroughs at Rakuten Mobile

Jun 5, 2025

Planning for day zero AI-driven operations: a live discussion

May 29, 2025

Telco reality: Intelligent operations begin day zero

May 22, 2025

Others also viewed

The 5 Biggest Computer Vision Trends In 2022

⚛️ Quantum-Enhanced AI - It's Here

Intellectual abilities of artificial intelligence (AI)

🧠⚙️ Neuro-Symbolic Reinforcement Learning: Building Trustworthy and Generalizable AI

Building generative AI applications with Amazon Bedrock Agents – Part 2

The Research Philosophy of DeepMind 🧠

Creating Generative AI Models: A Beginner's Guide

AI by AI

Understanding AI's Inner Workings: The Quest for LLM Interpretability

AI is Dead! Long Live AI

Explore topics