From Knowledge to Action

From Knowledge to Action

GPT-5 launched yesterday. 94.6% on AIME 2025. 74.9% on SWE-bench.

As we approach the upper bounds of these benchmarks, they die.

What makes GPT-5 and the next generation of models revolutionary isn’t their knowledge. It’s knowing how to act. For GPT-5 this happens at two levels. First, deciding which model to use. But second, and more importantly, through tool calling.

We’ve been living in an era where LLMs mastered knowledge retrieval & reassembly. Consumer search & coding, the initial killer applications, are fundamentally knowledge retrieval challenges. Both organize existing information in new ways.

We have climbed those hills and as a result competition is more intense than ever. Anthropic, OpenAI, and Google’s models are converging on similar capabilities. Chinese models and open source alternatives are continuing to push ever closer to state-of-the-art. Everyone can retrieve information. Everyone can generate text.

The new axis of competition? Tool-calling.

Tool-calling transforms LLMs from advisors to actors. It compensates for two critical model weaknesses that pure language models can’t overcome.

First, workflow orchestration. Models excel at single-shot responses but struggle with multi-step, stateful processes. Tools enable them to manage long workflows, tracking progress, handling errors, maintaining context across dozens of operations.

Second, system integration. LLMs live in a text-only world. Tools let them interface predictably with external systems like databases, APIs, and enterprise software, turning natural language into executable actions.

In the last month I’ve built 58 different AI tools.

Email processors. CRM integrators. Notion updaters. Research assistants. Each tool extends the model’s capabilities into a new domain.

The most important capability for AI is selecting the right tool quickly and correctly. Every misrouted step kills the entire workflow.

When I say “read this email from Y Combinator & find all the startups that are not in the CRM,” modern LLMs execute a complex sequence.

One command in English replaces an entire workflow. And this is just a simple one.

Even better, the model, properly set up with the right tools, can verify its own work that tasks were completed on time. This self-verification loop creates reliability in workflows that is hard to achieve otherwise.

Multiply this across hundreds of employees. Thousands of workflows. The productivity gains compound exponentially.

The winners in the future AI world will be the ones who are most sophisticated at orchestrating tools and routing the right queries. Every time. Once those workflows are predictable, that’s when we will all become agent managers.

Always enjoy your insights. What is your thought about Model Context Protocol (MCP) in this regard? Tools enabled with MCP that can execute agent driven orchestration? Any vendors leading there?

Like
Reply

💯 Agree, Tomasz. Also think that getting into tool specific workflows provides guardrails and puts you back into the domain of deterministic systems which guards again hallucinations. Do you think MCP will be the dominate method by which tools are invoked?

Like
Reply
Matt Slotnick

Insight to Action with Poggio POVs. Scale standardized POVs tailored to your business and your customers that position you for above the line conversations about business outcomes.

2d

indeed

Like
Reply
Shirin Barkam

Reimagining Startup Growth at the Speed of AI | Strategy Meets Velocity

2d

When benchmarks reach their ceiling, the true differentiator shifts from hitting the score to redefining what the score means. The next frontier may not be about measuring AI's performance but its capacity to reshape the metrics themselves

Like
Reply
Filip Filipov

COO at OAG | ex-Skyscanner exec | operations, product, strategy

2d

The speed limit doesn’t seem to be the technology advancement, but enterprise adoption from the workforce. Fantastic piece - thank you so much for making it superbly clear.

Like
Reply

To view or add a comment, sign in

Explore topics