Context Engineering: The Real Challenge Behind Building AI Agents

Martin Treiber

AI consultant, computer scientist and part-time farmer.

Published Jul 29, 2025

Remember when we thought building AI applications was just about writing clever prompts? Those days feel quaint now. As enterprise AI deployments scale and agents tackle increasingly complex tasks, a new discipline has emerged from the trenches: context engineering. It's not just about what you tell an AI anymore—it's about orchestrating an entire symphony of information flowing through limited computational windows.

The RAM of AI Systems

Andrej Karpathy, former Tesla AI director and OpenAI researcher, frames it perfectly: LLMs are like a new kind of operating system where the model is the CPU and its context window is the RAM. Just as your computer's performance depends on what's loaded into memory, an AI agent's effectiveness hinges on what information fills its context window at any given moment.

This isn't merely academic. Shopify CEO Tobi Lütke calls context engineering a "core skill," while teams at Anthropic report that agents routinely engage in conversations spanning hundreds of turns. Managing this information flow has become, as Cognition puts it, "effectively the #1 job of engineers building AI agents."

Beyond the ChatGPT Wrapper

The term "ChatGPT wrapper" has become tired and, frankly, wrong. Modern AI applications orchestrate complex workflows that would make traditional software architects dizzy. Consider what happens when you ask an AI agent to deploy your backend:

First, it needs task descriptions and deployment instructions. Then it pulls in repository information through RAG (Retrieval Augmented Generation). It examines past deployment logs, checks current system state, reviews error histories, and maintains conversation context—all while staying within token limits and avoiding performance degradation.

Too little context and the agent lacks critical information. Too much and you face ballooning costs, increased latency, and paradoxically, worse performance. Drew Breunig identified several failure modes: context poisoning (when hallucinations contaminate the workspace), context distraction (when irrelevant information overwhelms the model), and context clash (when different pieces of information contradict each other).

The Four Pillars of Context Engineering

Through examining production systems from companies like Anthropic, HuggingFace, and various code assistants, four key patterns emerge:

1. Write Context: External Memory Systems

Just as humans use notebooks, modern agents employ "scratchpads"—external storage that persists information outside the context window. Anthropic's multi-agent researcher system explicitly saves its planning to memory, knowing that exceeding 200,000 tokens will trigger truncation.

These aren't just temporary notes. Products like ChatGPT, Cursor, and Windsurf have implemented long-term memory systems that synthesize and store insights across sessions. The challenge? Ensuring these memories enhance rather than hijack future interactions. Simon Willison shared a telling example where ChatGPT unexpectedly injected his stored location into an image generation request—a reminder that memory retrieval can surprise even experienced users.

2. Select Context: Strategic Information Retrieval

Having external memory means nothing without intelligent retrieval. Code agents like Claude Code use dedicated files (CLAUDE.md) for persistent instructions, while Cursor and Windsurf employ rules files. But scaling beyond simple file systems requires sophistication.

Modern systems use embedding-based search and knowledge graphs to select relevant memories. Some cutting-edge implementations apply RAG to tool descriptions themselves, improving tool selection accuracy by 3x. Windsurf's approach is particularly enlightening: they combine AST parsing, semantic chunking, grep/file search, knowledge graphs, and re-ranking algorithms—because simple embedding search breaks down as codebases grow.

3. Compress Context: Information Distillation

When Claude Code hits 95% context capacity, it automatically summarizes the entire user-agent interaction history. This "auto-compact" feature represents just one compression strategy among many. Recursive summarization, hierarchical compression, and strategic trimming all serve to retain essential information while shedding tokens.

Cognition takes this further with fine-tuned models specifically for summarization at agent boundaries. The challenge is preserving critical details while achieving meaningful compression—a balance that often requires domain-specific solutions.

4. Isolate Context: Divide and Conquer

Perhaps the most architecturally interesting pattern is context isolation through multi-agent systems. OpenAI's Swarm library explicitly promotes "separation of concerns," where specialized agents handle subtasks with their own context windows and tool sets.

Anthropic's research shows this isn't just organizational elegance—multi-agent systems with isolated contexts outperformed monolithic agents on complex tasks. Each subagent can dedicate its entire context window to a focused problem, operating in parallel without context interference.

HuggingFace's CodeAgent takes a different isolation approach: instead of returning JSON from tool calls, it generates and executes code in sandboxed environments. This keeps token-heavy objects (images, audio files, large datasets) out of the LLM's context while maintaining programmatic access.

The Hidden Infrastructure Layer

What makes context engineering particularly challenging is its invisible complexity. Users see a helpful AI assistant; engineers see a delicate orchestration of information flows, state management, and computational constraints. As Karpathy notes, context engineering is just one piece of an emerging "thick layer of non-trivial software" that includes:

Control flow decomposition
Dynamic capability routing
Generation-verification loops
Parallelism and prefetching
Security and guardrails
Evaluation frameworks

This infrastructure operates like a modern operating system's memory manager—constantly shuffling, prioritizing, and optimizing what information receives the AI's attention.

Where do we go from here?

Context engineering represents a fundamental shift in how we think about AI applications. We're moving from prompt crafting to system architecture, from single queries to persistent workflows, from stateless functions to stateful agents with rich memory systems.

The challenges are real: token economics force hard choices, retrieval systems can surface unwanted information, and compression risks losing critical details. Yet the potential is transformative. As models gain larger context windows and better reasoning capabilities, context engineering will determine whether we build AI assistants or genuine AI colleagues.

The next time someone dismisses an AI application as "just a ChatGPT wrapper," remember the hidden complexity beneath. Like the memory management in your operating system, context engineering does its job best when you never have to think about it—but getting to that point requires some of the most sophisticated engineering in modern software development.

We're not just writing prompts anymore. We're building the cognitive infrastructure for a new class of digital workers. And that changes everything.

https://www.ikangai.com/context-engineering-the-real-challenge-behind-building-ai-agents/

Context Engineering: The Real Challenge Behind Building AI Agents

Martin Treiber

AI consultant, computer scientist and part-time farmer.

The RAM of AI Systems

Beyond the ChatGPT Wrapper

The Four Pillars of Context Engineering

The Hidden Infrastructure Layer

Where do we go from here?

IKANGAI Tech Updates

2,080 followers

More articles by this author

Others also viewed

Tutorial: The Hidden Power of System Prompts: Unlocking Purpose in Prompt Engineering

Context is All You Need: The System-Level Discipline of Context Engineering for Advanced AI

Context Engineering in Action

Inside AlphaEvolve: Orchestrating LLMs for Code & Algorithm Generation and Evolution

I Bet You Don’t Know How to Measure AI Engineering Impact

🧭 Context Engineering: The Secret Sauce for Next-Gen Applied AI Products

What Builder.ai's Fall Means for AI Development

💡 AI for Code Made Me 40% Faster. Until I Measured It: 19% Slower!

Hammers, AI, and the Illusion of Mastery

Prompt Engineering and Function Calling

Explore topics

The RAM of AI Systems

Beyond the ChatGPT Wrapper

The Four Pillars of Context Engineering

The Hidden Infrastructure Layer

Where do we go from here?

IKANGAI Tech Updates

2,080 followers

China’s Moonshot AI Drops Kimi-K2: A Potential Game-Changer for Tool Use in AI

Aug 12, 2025

Software 3.0 Revolution: The AI-Driven Programming Paradigm Shift

Aug 5, 2025

The Art of Metaprompting: How Top Startups Are Engineering Intelligence

Jul 22, 2025

When AI Hits a Wall: Limits of Reasoning Models Revealed

Jul 15, 2025

The AI Job Apocalypse Is Already Here—But It’s Not What You Think

Jul 8, 2025

GitHub CEO Thomas Dohmke on the Future of Programming: Why Kids Should Still Learn to Code

Jun 10, 2025

Zero Data, Superhuman Code: A New AI Paradigm Emerges—and It Has an “Uh-Oh Moment”

Jun 2, 2025

Duolingo’s AI-First Pivot Isn’t Just Smart. It’s Survival.

May 20, 2025

Model Context Protocol Comparison: MCP vs Function Calling, Plugins, APIs

May 12, 2025

The Jagged Frontier of AGI: Surprising Superpowers, Baffling Failures

May 5, 2025

Others also viewed

Tutorial: The Hidden Power of System Prompts: Unlocking Purpose in Prompt Engineering

Context is All You Need: The System-Level Discipline of Context Engineering for Advanced AI

Context Engineering in Action

Inside AlphaEvolve: Orchestrating LLMs for Code & Algorithm Generation and Evolution

I Bet You Don’t Know How to Measure AI Engineering Impact

🧭 Context Engineering: The Secret Sauce for Next-Gen Applied AI Products

What Builder.ai's Fall Means for AI Development

💡 AI for Code Made Me 40% Faster. Until I Measured It: 19% Slower!

Hammers, AI, and the Illusion of Mastery

Prompt Engineering and Function Calling

Explore topics