Understanding the Magic of GenAI: the “Next Word Predictor”
In the past few years, we’ve witnessed the rapid evolution of AI — or more aptly, the evolution of “THE NEXT WORD PREDICTOR.” In this blog, I’ll share my learnings from the GenAI Python cohort by Chai Code. By the end of this article, I promise you won’t feel like it’s “not your cup of tea.”
This article dives into the heart of the most buzzing word of our time — GenAI. In a remarkably short period, Generative AI has started to reshape the global economy and influence investment patterns across the world. It’s not just a tech trend anymore; it’s a shift in how we work, think, and build the future.
But to truly understand what GenAI is all about, let’s first break it down — just like we do in Hindi with “Sandhi-Viched” — dissecting the term to uncover the meaning and magic behind it.
GEN — means Generative, and AI → means Artificial Intelligence. Now, there’s a question: What kind of intelligence are we talking about? And what exactly is the meaning of Artificial Intelligence? Most people don’t really know what it means, so let’s break it down piece by piece right here.
The “generative” aspect comes from its ability to generate new things based on patterns it has learned from vast amounts of existing data. These models often utilize sophisticated deep learning techniques, such as Large Language Models (LLMs) for text generation or Diffusion Models and Generative Adversarial Networks (GANs) for image generation.
AI — is nothing but a technique to find the max probability of finding the next move. To understand this better, let’s dive into a familiar and iconic scenario: the beginning of GTA: Vice City.
The game opens with Tommy Vercetti arriving in Vice City, only to be left stranded after a drug deal goes wrong. He’s alone, standing on a street with no money or weapons — just a scooter nearby and a vague direction to go to the hotel. At this point, the game’s internal logic kicks in.
What most players don’t realize is that the world around Tommy is reacting to his potential decisions. The developers have built the environment with an AI-like system that predicts what a player is most likely to do next. For instance:
Note : Behind the scenes, the game isn’t just rendering a fixed world. It’s making smart decisions based on probabilities:
“If the player goes left, spawn this vehicle.” “If the player walks forward, load the next street and nearby NPCs.” “If the player enters combat, trigger police response within X seconds.”
In a way, the entire game is running like a lightweight AI system — constantly observing, predicting, and updating the world around Tommy based on the most likely actions. Just like AI models do in real life: they don’t “know” the future, but they calculate the most probable one and act accordingly.
So whether you’re building an AI model or playing Vice City, it’s all about understanding context, predicting outcomes, and making the smartest next move.
“So now we understand the true essence of AI — it’s fundamentally about predicting the next best move.”
Now, let’s take a deeper dive into how Generative AI actually works. In this article, we’ll explore some of the key concepts that power these intelligent systems — terms like tokenization, vector embeddings, positional encoding, inference, and training.
These might sound technical at first, but don’t worry — we’ll break them down in a simple, intuitive way so you can truly understand what’s happening behind the scenes when AI generates text, answers questions, or even mimics human conversation.
UNDERSTAND THE BUILDING BLOCKS OF GEN AI
Before you’re going to understand the building blocks of GENAI, I would like you to know about some terms that are useful for you to understand the whole context of this blog. There is a term called sequence and tokenization lets deep dive into it and know the difference between the two terms.
Characteristics of a Sequence:
In the world of LLMs like GPT or Gemini, the “items” in a sequence are called tokens.
Input Sequence (Your Prompt): When you type a prompt into an LLM, you are providing it with an input sequence of tokens.
For ex — “I Love Playing Open World Games.”
2. Tokenization — First, the LLM’s tokenizer breaks this sentence down into its individual components. For simplicity, let’s assume each word and punctuation is a token: ["I", "Love", "Playing", "Open", "World", "Games", "."] These are like the smallest meaningful units of text the model understands.
Note : Every LLM model have unique way to tokenize the input sequence.
3. Vector Embedding — Once the text is tokenized into smaller pieces (tokens), the AI still can’t understand words as humans do. It needs to represent each token in a form it can process mathematically. That’s where vector embeddings come in.
A vector embedding is a way to convert each token into a list of numbers — called a vector — that captures the meaning and relationships of that token in a mathematical space. Imagine that every word or concept in a language has a unique “meaning fingerprint.” This fingerprint isn’t made of ink; it’s made of numbers. So what you see in the above example, its the same.
Now lets take an example to make it more crystal and understand the essence of Vector Embedding by one example. Lets suppose I gave input in non sequential manner for ex- “fantastic gta 5 play i game”
If a human reads that, they immediately try to reorder it into something like: “I play fantastic GTA 5 game.” or “I play a fantastic GTA 5 game.”
It Doesn’t “Recorrect” Before Processing:
The most important thing to understand is that the LLM doesn’t first “fix” or “reorder” your scrambled sentence into a grammatically correct one before it starts processing. It takes your input exactly as it is, in its original order.
Tokenization and Embedding (As Always):
(embedding_fantastic + pos_1)
(embedding_gta + pos_2)
(embedding_5 + pos_3)
(embedding_play + pos_4)
(embedding_i + pos_5)
(embedding_game + pos_6)
4. Position Encoding — When an AI model processes language, it looks at sequences of tokens. But here’s the catch: once tokens are converted into vectors (embeddings), the model loses the information about their original order — because vectors themselves don’t carry any notion of position.
This is where positional encoding comes in.
Positional encoding is a way to add information about the position of each token within the sequence, so the model knows where each word is in the sentence.
Think of its like Adding the GPS coordinate to each word
Okay, let’s connect the dots from positional encoding to inference and training with backpropagation, keeping it simple.
Recap: Positional Encoding’s Role
You’re right: after you break your text into tokens and give each token a numerical “meaning fingerprint” (vector embedding), you then add a “position stamp” (positional encoding) to each of those fingerprints.
So, for your input “The quick brown fox”, the LLM now has something like this:
This entire collection of stamped fingerprints for your input sequence is what the Transformer (specifically the Decoder in GPT’s case) then processes. It’s essentially a “matrix” or a “table” of these numbers.
Two Modes of Operation for the LLM:
An LLM operates in two main phases:
Training: The LLM Learns to Predict the Next Word (with Backpropagation)
Imagine the LLM as a student trying to learn a language. It has a massive textbook (the “pre-trained data” — all the internet text).
The Goal: The student (LLM) wants to learn to predict the next word in any given sentence.
The Process (Simplified):
Showing Examples: The “teacher” (training algorithm) shows the LLM a sentence from the textbook, but hides the very next word.
LLM Makes a Guess: The LLM processes “The quick brown fox” (including all the token embeddings and positional encodings). Based on its current, untrained knowledge (its randomly initialized internal “weights” or “brain connections”), it makes a guess about the next word.
Comparing Guess vs. Correct Answer (Calculating “Error”): The teacher compares the LLM’s guess (“apple”) with the correct answer (“jumps”).
“Backpropagation” (Figuring Out What Went Wrong): This is the core of learning.
Note : This technique of back-propogation is making the difference between the LLMs Models.
Based on the back-propagation’s “advice,” the LLM’s internal “brain connections” (the vast number of numerical weights) are slightly adjusted. The goal is to make the LLM’s next guess a tiny bit closer to the correct answer.
Inference: The LLM Uses Its Learned Knowledge
Once the training (learning) is complete, the LLM is “frozen.” It’s no longer learning or adjusting its internal connections. Now, it’s ready to put its knowledge to use. This is when you interact with it (e.g., asking ChatGPT a question).
The Goal: Predict the most likely next word based on the prompt you give it.
The Process (Simplified):
Your Input: You give the LLM a prompt (e.g., “Tell me a short story about a brave knight and a dragon.”).
Tokenization, Embedding, Positional Encoding: The LLM processes your input, converting it into that sequence of “stamped fingerprints.”
Forward Pass Through the Transformer: This is where the learned knowledge comes into play. The LLM’s internal “brain connections” (the weights that were adjusted during training) are now fixed.
Prediction of the Next Word’s Probability: The final layer of the LLM outputs a probability distribution over its entire vocabulary for the next word.
Word Selection: The LLM picks a word based on these probabilities (often the most probable, but sometimes with a bit of randomness for creativity).
Loop and Generate: The selected word is added to the sequence, and the entire new, longer sequence is fed back into the LLM for the next prediction. This repeats, word by word, until the response is complete.
The Key Difference:
After learning all this, I’m curious — what really happens when we ask the model something like, “Write a for loop in Java that runs 100 times”? Does it go straight to the inference phase, since the model is already pre-trained and has that knowledge built in?
So the answer is yes. It goes directly into the inference phase.
Think of the LLM’s pre-training as it having “read” and “understood” an enormous library of code, including countless examples of Java for loops. It has seen:
During its training, it learned the patterns, syntax, and common structures of Java for loops. It learned that:
So, when you give it the prompt “write a for loop in Java up to 100 times”:
It’s like asking a programmer who already knows Java to write a for loop. They don’t need to go back to school and “learn” how to write a for loop again (that’s the training phase). They simply apply their existing knowledge to generate the code (that’s the inference phase).
The “calculation” isn’t a mathematical computation to derive the loop from first principles, but rather the model’s ability to statistically “recall” and generate the most probable and correct sequence of tokens that represents a Java for loop, based on the patterns it absorbed during its massive pre-training.
💡 How GPUs Power GenAI — and Reshape the Global Economy
Generative AI (GenAI) might feel magical, but behind the scenes, it runs on something very real: graphics processing units (GPUs). Originally designed for gaming, GPUs have become the backbone of modern AI — especially during two key phases: training and inference.
Both processes need powerful hardware, but training is where GPUs shine. A single large language model (LLM) can take thousands of GPUs working for weeks to complete training. Inference is faster but still GPU-dependent — especially for real-time applications like chatbots, coding assistants, or image generation tools.
💰 The Economic Ripple: Why GPUs Are Gold
The GenAI boom has sparked a global race for computing power. Every company building or using AI needs access to high-performance GPUs. Enter NVIDIA, the leader in GPU technology.
In 2023–2024, as GenAI adoption exploded, NVIDIA’s market share soared, its stock surged, and it briefly joined the $2 trillion club. Why? Because nearly every AI company — from OpenAI to startups — was buying its chips, especially the H100s, like tech gold.
Thanks Hitesh Choudhary and Piyush Garg for the beautiful session.