Understanding the Magic of GenAI: the “Next Word Predictor”

Understanding the Magic of GenAI: the “Next Word Predictor”

In the past few years, we’ve witnessed the rapid evolution of AI — or more aptly, the evolution of “THE NEXT WORD PREDICTOR.” In this blog, I’ll share my learnings from the GenAI Python cohort by Chai Code. By the end of this article, I promise you won’t feel like it’s “not your cup of tea.”

This article dives into the heart of the most buzzing word of our time — GenAI. In a remarkably short period, Generative AI has started to reshape the global economy and influence investment patterns across the world. It’s not just a tech trend anymore; it’s a shift in how we work, think, and build the future.

But to truly understand what GenAI is all about, let’s first break it down — just like we do in Hindi with “Sandhi-Viched” — dissecting the term to uncover the meaning and magic behind it.

GEN — means Generative, and AI → means Artificial Intelligence. Now, there’s a question: What kind of intelligence are we talking about? And what exactly is the meaning of Artificial Intelligence? Most people don’t really know what it means, so let’s break it down piece by piece right here.

The “generative” aspect comes from its ability to generate new things based on patterns it has learned from vast amounts of existing data. These models often utilize sophisticated deep learning techniques, such as Large Language Models (LLMs) for text generation or Diffusion Models and Generative Adversarial Networks (GANs) for image generation.

AI — is nothing but a technique to find the max probability of finding the next move. To understand this better, let’s dive into a familiar and iconic scenario: the beginning of GTA: Vice City.

Article content

The game opens with Tommy Vercetti arriving in Vice City, only to be left stranded after a drug deal goes wrong. He’s alone, standing on a street with no money or weapons — just a scooter nearby and a vague direction to go to the hotel. At this point, the game’s internal logic kicks in.

What most players don’t realize is that the world around Tommy is reacting to his potential decisions. The developers have built the environment with an AI-like system that predicts what a player is most likely to do next. For instance:

  • The nearest vehicle is a scooter — low risk, high probability the player will use it.
  • The hotel is clearly marked on the map — guiding the player intuitively.
  • As soon as Tommy moves, the world around him begins to render — pedestrians, cars, sounds — all based on what’s likely to happen next.

Note : Behind the scenes, the game isn’t just rendering a fixed world. It’s making smart decisions based on probabilities:

“If the player goes left, spawn this vehicle.” “If the player walks forward, load the next street and nearby NPCs.” “If the player enters combat, trigger police response within X seconds.”

In a way, the entire game is running like a lightweight AI system — constantly observing, predicting, and updating the world around Tommy based on the most likely actions. Just like AI models do in real life: they don’t “know” the future, but they calculate the most probable one and act accordingly.

Article content

So whether you’re building an AI model or playing Vice City, it’s all about understanding context, predicting outcomes, and making the smartest next move.

“So now we understand the true essence of AI — it’s fundamentally about predicting the next best move.”

Now, let’s take a deeper dive into how Generative AI actually works. In this article, we’ll explore some of the key concepts that power these intelligent systems — terms like tokenization, vector embeddings, positional encoding, inference, and training.

These might sound technical at first, but don’t worry — we’ll break them down in a simple, intuitive way so you can truly understand what’s happening behind the scenes when AI generates text, answers questions, or even mimics human conversation.

UNDERSTAND THE BUILDING BLOCKS OF GEN AI

Article content

Before you’re going to understand the building blocks of GENAI, I would like you to know about some terms that are useful for you to understand the whole context of this blog. There is a term called sequence and tokenization lets deep dive into it and know the difference between the two terms.

  1. Sequence — In the world of NLP (Natural Language Processing), sequence means an ordered list of tokens.Think of a sequence as the exact order in which words (or pieces of words) appear — like beads on a string or the steps in a recipe.

Characteristics of a Sequence:

  1. Order Matters: This is the defining feature. A,B,C is a different sequence from C,B,A.
  2. Made of “Items”: These items can be anything: numbers, letters, words, sounds, images, or, in the context of LLMs, tokens.
  3. Variable Length: A sequence can be short (e.g., three items) or very long (e.g., thousands of items).

In the world of LLMs like GPT or Gemini, the “items” in a sequence are called tokens.

Input Sequence (Your Prompt): When you type a prompt into an LLM, you are providing it with an input sequence of tokens.

For ex — “I Love Playing Open World Games.”

2. Tokenization — First, the LLM’s tokenizer breaks this sentence down into its individual components. For simplicity, let’s assume each word and punctuation is a token: ["I", "Love", "Playing", "Open", "World", "Games", "."] These are like the smallest meaningful units of text the model understands.

Article content

Note : Every LLM model have unique way to tokenize the input sequence.

3. Vector Embedding — Once the text is tokenized into smaller pieces (tokens), the AI still can’t understand words as humans do. It needs to represent each token in a form it can process mathematically. That’s where vector embeddings come in.

A vector embedding is a way to convert each token into a list of numbers — called a vector — that captures the meaning and relationships of that token in a mathematical space. Imagine that every word or concept in a language has a unique “meaning fingerprint.” This fingerprint isn’t made of ink; it’s made of numbers. So what you see in the above example, its the same.

Now lets take an example to make it more crystal and understand the essence of Vector Embedding by one example. Lets suppose I gave input in non sequential manner for ex- “fantastic gta 5 play i game”

If a human reads that, they immediately try to reorder it into something like: “I play fantastic GTA 5 game.” or “I play a fantastic GTA 5 game.”

It Doesn’t “Recorrect” Before Processing:

The most important thing to understand is that the LLM doesn’t first “fix” or “reorder” your scrambled sentence into a grammatically correct one before it starts processing. It takes your input exactly as it is, in its original order.

Tokenization and Embedding (As Always):

  • First, your input “fantastic gta 5 play i game” is broken down into tokens: ["fantastic", "gta", "5", "play", "i", "game"].
  • Each of these tokens is then converted into its initial vector embedding, and positional encoding is added to each one, indicating its position in your provided (scrambled) sequence.

(embedding_fantastic + pos_1) 
(embedding_gta + pos_2) 
(embedding_5 + pos_3) 
(embedding_play + pos_4) 
(embedding_i + pos_5) 
(embedding_game + pos_6)        

4. Position Encoding — When an AI model processes language, it looks at sequences of tokens. But here’s the catch: once tokens are converted into vectors (embeddings), the model loses the information about their original order — because vectors themselves don’t carry any notion of position.

This is where positional encoding comes in.

Positional encoding is a way to add information about the position of each token within the sequence, so the model knows where each word is in the sentence.

Think of its like Adding the GPS coordinate to each word

Article content

Okay, let’s connect the dots from positional encoding to inference and training with backpropagation, keeping it simple.

Recap: Positional Encoding’s Role

You’re right: after you break your text into tokens and give each token a numerical “meaning fingerprint” (vector embedding), you then add a “position stamp” (positional encoding) to each of those fingerprints.

So, for your input “The quick brown fox”, the LLM now has something like this:

  • [fingerprint_The + stamp_1]
  • [fingerprint_quick + stamp_2]
  • [fingerprint_brown + stamp_3]
  • [fingerprint_fox + stamp_4]

This entire collection of stamped fingerprints for your input sequence is what the Transformer (specifically the Decoder in GPT’s case) then processes. It’s essentially a “matrix” or a “table” of these numbers.

Two Modes of Operation for the LLM:

An LLM operates in two main phases:

  1. Training (Learning Phase): This is where the LLM becomes smart. It’s like a student in school.
  2. Inference (Using Phase): This is where the LLM applies what it learned. It’s like a student taking a test or using their knowledge in the real world.

Training: The LLM Learns to Predict the Next Word (with Backpropagation)

Article content

Imagine the LLM as a student trying to learn a language. It has a massive textbook (the “pre-trained data” — all the internet text).

The Goal: The student (LLM) wants to learn to predict the next word in any given sentence.

The Process (Simplified):

Showing Examples: The “teacher” (training algorithm) shows the LLM a sentence from the textbook, but hides the very next word.

  • Input Example: “The quick brown fox [???]
  • Correct Answer (from textbook): “jumps”

LLM Makes a Guess: The LLM processes “The quick brown fox” (including all the token embeddings and positional encodings). Based on its current, untrained knowledge (its randomly initialized internal “weights” or “brain connections”), it makes a guess about the next word.

  • LLM’s Guess: “apple” (It’s just starting, so its guess might be wildly wrong!)

Comparing Guess vs. Correct Answer (Calculating “Error”): The teacher compares the LLM’s guess (“apple”) with the correct answer (“jumps”).

  • Difference (Error): “apple” is very different from “jumps.” This difference is the “error” or “loss.”

“Backpropagation” (Figuring Out What Went Wrong): This is the core of learning.

  • Think of it like a coach analyzing a basketball shot: If the shot misses, the coach doesn’t just say “missed!” The coach analyzes how the shot went wrong: “Your elbow was out,” “You didn’t follow through,” “Your stance was off.”
  • Back propagation is this analysis: It’s a clever algorithm that traces the “error” backward through all the layers of the LLM’s neural network. It figures out which internal “brain connections” (weights) were most responsible for the wrong guess.
  • It tells each connection: “You contributed to this error by this much, and you need to adjust yourself by this much to make a better guess next time.”

Note : This technique of back-propogation is making the difference between the LLMs Models.

Based on the back-propagation’s “advice,” the LLM’s internal “brain connections” (the vast number of numerical weights) are slightly adjusted. The goal is to make the LLM’s next guess a tiny bit closer to the correct answer.

  • Analogy: The basketball player adjusts their elbow and follow-through based on the coach’s feedback.
  • This whole process (guess, compare, backpropagate, adjust) is repeated billions of times with countless sentences from the pre-training data. Over time, the LLM’s internal connections get finely tuned, and it becomes incredibly good at predicting the next word correctly. This is how it learns grammar, facts, common sense, and context.

Inference: The LLM Uses Its Learned Knowledge

Once the training (learning) is complete, the LLM is “frozen.” It’s no longer learning or adjusting its internal connections. Now, it’s ready to put its knowledge to use. This is when you interact with it (e.g., asking ChatGPT a question).

The Goal: Predict the most likely next word based on the prompt you give it.

The Process (Simplified):

Your Input: You give the LLM a prompt (e.g., “Tell me a short story about a brave knight and a dragon.”).

  • Input Sequence: ["Tell", "me", "a", "short", "story", ...]

Tokenization, Embedding, Positional Encoding: The LLM processes your input, converting it into that sequence of “stamped fingerprints.”

Forward Pass Through the Transformer: This is where the learned knowledge comes into play. The LLM’s internal “brain connections” (the weights that were adjusted during training) are now fixed.

  • They efficiently analyze the relationships between all the tokens in your input sequence, creating that rich, contextualized understanding.
  • This processing moves forward through the layers of the network. There’s no “error” calculation or adjustment happening here, because it’s not learning.

Prediction of the Next Word’s Probability: The final layer of the LLM outputs a probability distribution over its entire vocabulary for the next word.

  • “The most probable next word is ‘Once’ (80% chance), then ‘A’ (10% chance), then ‘In’ (5% chance), etc.”

Word Selection: The LLM picks a word based on these probabilities (often the most probable, but sometimes with a bit of randomness for creativity).

Loop and Generate: The selected word is added to the sequence, and the entire new, longer sequence is fed back into the LLM for the next prediction. This repeats, word by word, until the response is complete.

The Key Difference:

  • Training: Involves both a forward pass AND a backward pass (backpropagation) to adjust the model’s internal parameters based on errors. It’s about learning.
  • Inference: Involves only a forward pass through the model’s fixed, learned parameters to make predictions. It’s about applying what has been learned.

After learning all this, I’m curious — what really happens when we ask the model something like, “Write a for loop in Java that runs 100 times”? Does it go straight to the inference phase, since the model is already pre-trained and has that knowledge built in?

So the answer is yes. It goes directly into the inference phase.

Think of the LLM’s pre-training as it having “read” and “understood” an enormous library of code, including countless examples of Java for loops. It has seen:

  • for (int i = 0; i < 10; i++) { ... }
  • for (int j = 1; j <= 50; j++) { ... }
  • And millions of variations, including loops up to 100, 1000, etc.

During its training, it learned the patterns, syntax, and common structures of Java for loops. It learned that:

  • for is followed by parentheses.
  • Inside, there’s usually an initialization, a condition, and an increment/decrement.
  • Curly braces {} define the loop body.

So, when you give it the prompt “write a for loop in Java up to 100 times”:

  1. Your Prompt is the Input Sequence: The LLM receives and processes your prompt as an input sequence of tokens.
  2. It Activates Learned Patterns: Its internal “brain connections” (the weights adjusted during training) recognize the patterns associated with “Java,” “for loop,” and “100 times.”
  3. It Predicts the Next Token: Based on these activated patterns and its vast learned knowledge, it starts predicting the most probable next token to generate the correct Java code. It’s not trying to “figure out” what a for loop is; it already “knows” (in a statistical sense) how to generate one.

It’s like asking a programmer who already knows Java to write a for loop. They don’t need to go back to school and “learn” how to write a for loop again (that’s the training phase). They simply apply their existing knowledge to generate the code (that’s the inference phase).

The “calculation” isn’t a mathematical computation to derive the loop from first principles, but rather the model’s ability to statistically “recall” and generate the most probable and correct sequence of tokens that represents a Java for loop, based on the patterns it absorbed during its massive pre-training.

💡 How GPUs Power GenAI — and Reshape the Global Economy

Generative AI (GenAI) might feel magical, but behind the scenes, it runs on something very real: graphics processing units (GPUs). Originally designed for gaming, GPUs have become the backbone of modern AI — especially during two key phases: training and inference.

Both processes need powerful hardware, but training is where GPUs shine. A single large language model (LLM) can take thousands of GPUs working for weeks to complete training. Inference is faster but still GPU-dependent — especially for real-time applications like chatbots, coding assistants, or image generation tools.

💰 The Economic Ripple: Why GPUs Are Gold

The GenAI boom has sparked a global race for computing power. Every company building or using AI needs access to high-performance GPUs. Enter NVIDIA, the leader in GPU technology.

In 2023–2024, as GenAI adoption exploded, NVIDIA’s market share soared, its stock surged, and it briefly joined the $2 trillion club. Why? Because nearly every AI company — from OpenAI to startups — was buying its chips, especially the H100s, like tech gold.

Thanks Hitesh Choudhary and Piyush Garg for the beautiful session.

To view or add a comment, sign in

Others also viewed

Explore topics