You ask an advanced model to count the B’s in Blueberry. It hesitates, it bluffs, or it’s just wrong. How can something that drafts legalese and writes Python fail a primary-school puzzle?
Because large language models (LLMs) don’t “run rules.” They predict text.
They’re astonishing at shaping language; they’re not built to count letters—unless they’ve learned a reliable pattern for doing so in context. That simple difference unlocks most of the mystery (and the hype) around LLMs.
The short version
- Traditional programs: You write explicit step-by-step rules. Same input → same output.
- LLMs: We train a neural network on massive text to estimate “What token likely comes next?” Outputs are probabilistic, guided by statistics, not by a hardcoded algorithm.
This shift—sometimes called “Software 2.0”—moves power from rules to data + optimisation. When we scale data, compute, and model size (hello, Transformers), we get capabilities that look like reasoning and understanding, but are really pattern mastery at an industrial scale.
How an LLM actually works (in plain English)
- Tokenisation: Text gets chopped into tiny pieces called tokens (often sub-words).
- Embeddings: Each token becomes a vector—a point in a high-dimensional space where “nearby” often means “related.”
- Attention: The model learns what to focus on in the context (“pay more attention to this clause, ignore that aside”).
- Training objective: Minimise the surprise of the next token across billions of examples (maximum likelihood).
- Decoding: At run time, we sample the next token. Controls like temperature, top-k, and top-p decide how adventurous or conservative the model is.
Think of it as a statistical autocomplete on steroids, running through a map of vector spaces where meaning is approximated by geometry.
Tokenisation: how LLMs chop text into subword pieces
So…why the “Blueberry” face-plant?
Letter-counting is a symbolic problem (an exact algorithm). LLMs are statistical engines. They can simulate counting with patterns they’ve seen (“count characters one by one”). Still, they don’t execute a guaranteed algorithm internally unless the prompt or toolchain forces a step-by-step, verifiable method (e.g., having the model write and run code), errors happen—especially with quirky casing, repeated letters, or tokenisation edge cases.
This is the same reason they sometimes fumble arithmetic, dates, or long chains of logic: the core objective is plausible continuation, not correctness.
Six myths I hear every week (and the reality)
- “LLMs know facts and think like humans.”
- They don’t know; they’ve learned patterns of how facts are written. Fluent ≠ true.
- “They can count perfectly.”
- Not reliably. They weren’t designed as symbolic calculators. They can imitate counting; that’s different from doing it with guarantees.
- “Their goal is to be correct.”
- The base goal is to predict the next token. We align them (instruction-tuning, RLHF) to be helpful and honest, but the underlying objective doesn’t change.
- “They’re bad at math because they’re not smart.”
- They’re just the wrong tool. Please give them a calculator or a Python runtime and watch accuracy jump. Tools add symbolic rigour to a statistical brain.
- “They understand the world like we do.”
- No senses, no experience, no consciousness—just text associations. That’s powerful, but different from human understanding.
- “They keep learning after deployment.”
- Not by default. Weights are frozen until retrained/fine-tuned. Some products add memory or retrieval, but that’s system design, not magical self-learning.
Hallucinations, bias, and other “gotchas” (the candid section)
- Hallucinations: Confident, detailed wrongness. When the context is thin or the question nudges beyond training data, the model still must output something, so it “fills in” with statistically plausible but ungrounded text.
- Mitigate with: retrieval (ground answers in sources), asking for citations, letting the model say “I don’t know,” or routing to tools.
- Bias: Models inherit patterns—including undesirable ones—from data.
- Mitigate with: careful data curation, post-training filters, evaluation on fairness/toxicity benchmarks, and human oversight for high-stakes uses.
- Temperature & randomness:
- Higher temperature → more creative, more error-prone. Lower temperature → steadier, sometimes bland. Tune it to the job.
- Knowledge cutoffs:
- A model’s internal knowledge is frozen at its last training date. For “What happened today?”, you need to search for an external data connector.
How to get reliably better answers
- Be specific. Give constraints, formats, and success criteria (“Return a table with columns X/Y/Z”).
- Make it think in steps. Ask for reasoning or an outline before the final answer.
- Ask for sources. Or better, attach them and say, “Only cite from these.”
- Use tools. Let the model call search, code, calculators, or company systems.
- Control randomness. Lower the temperature for accuracy; raise it for brainstorming.
- Validate. For facts, require citations. For numbers, have it compute (or compute externally). For long tasks, break them into verifiable chunks.
Where LLMs do shine (and where to pair them with tools)
- Language-shaped work: drafting, summarising, translating, rewriting, brainstorming.
- Knowledge navigation: Q&A over docs—with retrieval for grounding.
- Coding assistance: generate scaffolds, tests, and refactors; execute to verify.
- Customer & market insight: cluster themes from interviews, tickets, or reviews; then a human validates takeaways.
- Ops copilots: turn plain-English intent into workflows—provided a system or a person-in-the-loop checks each step.
When the task demands truth, math, or compliance, couple the model with retrieval, rules, or execution environments. That’s the winning pattern: LLM + Tools + Guardrails.
A quick mental model to take with you
LLM = probabilistic text engine that becomes dependable when you add structure (clear prompts), add grounding (your data), and add execution (tools & checks).
If you remember that, you won’t be surprised when it flubs the B’s in Blueberry—and you’ll know exactly how to make it perform when it really matters.
💡 Cloud-Native Engineering Leader | .NET Solutions Architect | Azure | AI/ML-Integrations
1moAnol as I know him from the initial days; despite carving out a great career in creative and digital marketing space, comes from a tech background, so he is that person who can blend in the tech aspects in his creativity very well which goes on to resonate well with stakeholders in the digital industry. To simplify the AI for non tech people to understand its pitfalls and nuances he is the best guy to take up this venture and his writings deserve a good read.
🚀 Founder @eLearning Industry | Forbes Contributor | Growth Partner to L&D & HR Innovators
1moSuch a clear way to explain it. “Predict ≠ count” will stick with me. For marketers, that shift in thinking changes everything about how we brief, prompt, and sanity-check AI output.