NewMind AI Journal

Beyond Brute Force: How Reflective AI is Outsmarting Reinforcement Learning

By Lakshya A Agrawal et al.

📌 Training large language models (LLMs) for complex tasks typically relies on reinforcement learning, which is inefficient and rollout-heavy.

📌 Researchers from UC Berkeley, Stanford, and others propose GEPA (Genetic-Pareto), a new prompt optimizer that learns through natural language reflection.

📌 GEPA analyzes reasoning steps, tool use, and errors to understand failures and improve prompts more efficiently than traditional RL.

How It Works

GEPA combines three core principles into a powerful optimization loop. First, it uses an evolutionary algorithm to iteratively "mutate" and improve prompts. Second, and most crucially, it employs natural language reflection. After a task attempt, GEPA examines the serialized trace of the AI system's actions. An LLM then reflects on this textual feedback to diagnose issues and propose specific, high-level improvements to the prompt. Third, it uses Pareto-based selection. Rather than chasing a single "best" prompt, which can lead to getting stuck, GEPA maintains a diverse set of top-performing prompts for different scenarios. This encourages strategic diversity and helps the system generalize better across a wide range of problems.

Key Findings & Results

Across four diverse tasks and two different LLMs (Qwen3 8B and GPT-4.1 Mini), GEPA demonstrates superior performance and remarkable sample efficiency. It outperforms the leading reinforcement learning method, GRPO, by an average of 10% (and up to 19%) while using up to 35 times fewer rollouts. Furthermore, GEPA surpasses the previous state-of-the-art prompt optimizer, MIPROv2, by over 10%, more than doubling its performance gains over the baseline. The prompts generated by GEPA are not only more effective but also significantly more compact, improving inference efficiency.

Why It Matters

GEPA offers a practical and scalable path to optimizing complex AI agents and systems, especially in environments where data is scarce or computation is expensive. By shifting from scalar rewards to rich, language-based feedback, this research paves the way for more "human-like" learning processes that are both effective and interpretable. It suggests that the future of AI optimization may lie in unifying prompt-based learning with traditional weight-space adaptation, opening up new possibilities for creating more adaptive, efficient, and intelligent systems. The paper also shows promising early results for using GEPA as an inference-time strategy for difficult tasks like code optimization.

Our Mind

This paper is a significant step forward in making the optimization of powerful AI systems less about brute force and more about intelligent reflection. GEPA's ability to learn from its mistakes in a nuanced, language-driven way feels closer to how humans solve problems. It’s a compelling demonstration that for language models, the richest learning signals may come from language itself, not just numerical scores. This work provides both a powerful new tool and a fresh perspective on how we can build more sophisticated AI.

Source: July 25, 2025 "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" by Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, et al., from UC Berkeley, Stanford University, BespokeLabs.ai, Notre Dame, Databricks, and MIT 

GLM-4.5: Unifying Reasoning, Coding, and Agentic Capabilities in Frontier LLMs

By Z.ai Research

📌 Large Language Models have excelled in specific domains, but a truly general AI remains elusive.

📌 The GLM-4.5 series tackles this by introducing models designed to unify reasoning, coding, and agentic capabilities into a single, comprehensive framework.

📌 This research addresses the critical need for LLMs that can handle increasingly complex, multi-faceted tasks, pushing the frontier towards more versatile and human-like cognitive abilities.

How It Works

GLM-4.5 and GLM-4.5-Air leverage a Mixture-of-Experts (MoE) architecture, optimizing for efficiency and deeper reasoning by increasing model depth. They feature hybrid thinking and non-thinking modes for nuanced responses. Training involves extensive pre-training followed by domain-specific fine-tuning. A key innovation is slime, an open-sourced RL infrastructure enabling highly efficient and scalable reinforcement learning, crucial for enhancing agentic capabilities like deep search and general tool-using.

Key Findings & Results

GLM-4.5 ranks 3rd and GLM-4.5-Air 6th across 12 diverse benchmarks. In agentic tasks, GLM-4.5 matches Claude 4 Sonnet and outperforms Claude-4-Opus on BrowseComp, boasting a 90.6% tool-calling success rate. For coding, it shows competitive performance against top models, achieving significant win rates over Kimi K2 and Qwen3-Coder in agentic coding. The models also demonstrate strong reasoning abilities across various academic benchmarks, showcasing a superior performance-scale trade-off.

Why It Matters

This research is a significant step towards general-purpose LLMs, offering a unified solution for complex challenges in reasoning, coding, and agentic applications. Its ability to generate sophisticated artifacts, facilitate full-stack development, and create presentations streamlines workflows for developers and businesses. The open-sourcing of slime further democratizes advanced RL research, fostering community contributions and accelerating future innovations in large-scale model training and deployment.

Our Mind

The GLM-4.5 series represents a compelling stride in LLM development, particularly in its ambitious goal of unifying diverse capabilities. The emphasis on agentic tasks and the introduction of slime for scalable RL are noteworthy. While competitive, the acknowledged room for optimization against top-tier models highlights the continuous journey of AI advancement. This work underscores the growing importance of multi-modal, generalist models.

Source: July 28, 2025 "GLM-4.5: Reasoning, Coding, and Agentic Abililties" Z.ai Research

Unlocking Efficiency: Power of Subagents in Claude Code

By Anthropic

📌 The Claude Code introduces custom subagents: specialized AI assistants for enhanced problem-solving.

📌 These agents operate with distinct configurations, including tailored system prompts, specific tools, and separate context windows.

📌 This innovation enables intelligent task delegation, moving beyond a general AI model to a more focused, efficient approach for diverse development challenges, bringing specialized AI expertise directly to the task.

How They Work

Subagents are pre-configured AI personalities, each with a specific purpose. Defined via Markdown files, they detail name, description, and optional tools. Management is intuitive through the /agents command, facilitating creation, editing, and tool control. Claude Code can automatically delegate tasks based on description and expertise, or be explicitly invoked, ensuring tasks are handled by the most appropriate AI.

Key Benefits & Capabilities

Subagents significantly improve task efficiency and context management. By isolating task-specific information, they prevent main conversation clutter, enabling longer, more productive sessions. Users gain granular control over AI behavior and tool access, customizing assistants for precise needs. This modularity supports versatile applications, from "Code Reviewer" to "Data Scientist," streamlining complex workflows and optimizing development processes.

Why They Matter

Subagents represent a crucial evolution in AI-assisted development, fostering a more modular and intelligent system. This paradigm allows for automating and optimizing specialized tasks, significantly boosting developer productivity. By enabling dedicated AI expertise for specific functions, Claude Code empowers users to build robust, efficient automated workflows. The overall impact is a more adaptable and powerful AI development environment.

Our Mind

The implementation of subagents in Claude Code brilliantly applies the "divide and conquer" principle to AI. It underscores the growing need for specialized AI components to tackle increasing complexity. This modularity is key to building scalable and effective AI solutions. Claude Code's subagents offer a compelling vision for the future of AI assistance: highly focused, efficient, and deeply integrated tools that truly elevate the development experience.

Source: July 27, 2025 “Subagents” Anthropic

NewMind AI Journal #111

NewMind AI

Where data finds its mind

Beyond Brute Force: How Reflective AI is Outsmarting Reinforcement Learning

How It Works

Key Findings & Results

Why It Matters

Our Mind

GLM-4.5: Unifying Reasoning, Coding, and Agentic Capabilities in Frontier LLMs

How It Works

Key Findings & Results

Why It Matters

Our Mind

Unlocking Efficiency: Power of Subagents in Claude Code

How They Work

Key Benefits & Capabilities

Why They Matter

Our Mind