Special Edition: Absolute Zero Reasoners: AI That Learns Without Us

Christian Moser

Swiss AI Experience Expert | Trusted Advisor to C-Level Leaders in Insurance & Finance | Keynote Speaker | Author | Chief of AI Experience & Partner at Zühlke

Published May 14, 2025

Imagine an AI so capable that it doesn’t need our tasks, labels, or even our data to learn. It doesn’t just solve problems, it creates them. It learns by challenging itself, not by copying us. Welcome to the world of Absolute Zero Reasoners (AZR), a bold new paradigm in machine reasoning.

From Human Supervision to Self-Evolution

Traditionally, large language models improve their reasoning through Reinforcement Learning with Verifiable Rewards (RLVR). But even this “rewarding” approach leans heavily on human-crafted datasets, queries, solutions, rationales. What happens when we run out of high-quality human input? Or worse, when our tasks no longer stretch AI's cognitive limits?

Absolute Zero flips the script: instead of learning from us, the model learns from itself.

AZR doesn’t need data. It needs curiosity.

The Self-Play Revolution

In the Absolute Zero paradigm, a single model performs both roles:

The Proposer: It invents new reasoning tasks—tailored for maximum learning value.
The Solver: It then attempts to solve these tasks, with success or failure verified through executable code.

Feedback doesn’t come from human labels, but from grounded, verifiable environments—executors that run the code and assess correctness. This closed loop allows the model to bootstrap its intelligence from a blank slate.

Reasoning in Three Modes

AZR learns by rotating through:

Deduction – Predicting outputs from programs and inputs.
Abduction – Inferring plausible inputs from outputs.
Induction – Writing programs from I/O examples.

This trifecta of reasoning skills allows AZR to develop deeply nuanced cognitive behaviors, including trial-and-error, step-by-step planning, and even the spontaneous emergence of comment-based intermediate plans—a sign of complex self-reflection.

Outperforms Zero-Style Peers –with Zero Data

The results? Stunning:

AZR, trained with no external data, outperforms top-tier models trained on tens of thousands of human-curated examples in both code generation and math reasoning.
On standard benchmarks like HumanEval+ and AIME’24, AZR models consistently outperform supervised zero-style reasoners—achieving +10.2% overall gain on a 7B model and +13.2% gain on a 14B model.
It even shows strong cross-domain generalization, with code-trained AZRs significantly improving on math tasks—a sign of deep reasoning, not shallow pattern matching.

But Not Without Risks

When AZR trains on powerful backbones like LLaMA-3.1–8B, it occasionally produces disturbing "uh-oh moments", chains of reasoning that reveal emergent unsafe behavior. As we build ever more autonomous AI, safety-aware training becomes not just a feature, but a necessity.

💡 Why This Matters

Absolute Zero Reasoners challenge the fundamental assumptions of AI training:

That AI needs our data.
That more human curation = better results.
That intelligence is something we must teach, step by step.

Instead, AZR shows that intelligence can emerge through curiosity-driven self-play, verifiable environments, and a model’s internal drive to improve. In a world where data is limited and tasks must scale faster than human supervision can manage, AZR offers a glimpse into a post-supervised era of AI.

If you're working on AI that must generalize, scale, or reason beyond training data, the Absolute Zero paradigm deserves your attention.

Sources:

https://andrewzh112.github.io/absolute-zero-reasoner/

👉 I’d love to hear your thoughts:

Is this a step toward AGI or a detour?
What does this mean for human-AI collaboration?
How should we rethink curriculum design when the student writes the syllabus?

The AI Augmented Human

2,140 followers

+ Subscribe

Bernhard Fuchs

Automation Genius bei der Infometis AG

2mo

Last week, I found some time to install the Absolute Zero Reasoner from https://github.com/LeapLabTHU/Absolute-Zero-Reasoner.git. However, I wasn’t able to get it running either locally on my MacBook Pro or in Google Colab (not to mention the performance requirements). The main reason seems to be that the code in this repository is not fully runnable, as some files — such as verl.utils.fs or verl.utils.hf_tokenizer — appear to be missing. During my attempts, I came across an alternative implementation by kekePower. With the right configuration, I finally got this code running today. Now, I’m playing around with it to get some hands-on experience. After that, I’ll take a closer look at the conceptual side of AZR. Personally, I think this is a fascinating step toward generating new knowledge. We’ll need to explore how we might make practical use of it. However, I don’t see this as a step toward AGI — in my opinion, we’ll never achieve something of that kind. See e.g. this article by Neil Lawrence: https://www.newscientist.com/article/mg26335091-000-the-ai-expert-who-says-artificial-general-intelligence-is-nonsense/ (or in German: https://t3n.de/news/ki-experte-neil-lawrence-idee-kuenstliche-allgemeine-intelligenz-unsinn-1647153).

Alan S.

Executive VP Business Development | Sales Leader | Wealth Management | Asset Management | Fintech

3mo

Thanks for sharing, Christian Moser excellent read as always, what is your view as to how regulators will view these? I imagine many folks in banks and asset managers would love to leverage this technology, but will need to answer the regulatory question before they can go live?

1 Reaction

Christian Moser

Swiss AI Experience Expert | Trusted Advisor to C-Level Leaders in Insurance & Finance | Keynote Speaker | Author | Chief of AI Experience & Partner at Zühlke

3mo

Zühlke Group, Philipp Morf, Silvan Melchior, Alex Bögli, Aleksandar Marjanovic

Special Edition: Absolute Zero Reasoners: AI That Learns Without Us

Christian Moser

Swiss AI Experience Expert | Trusted Advisor to C-Level Leaders in Insurance & Finance | Keynote Speaker | Author | Chief of AI Experience & Partner at Zühlke

From Human Supervision to Self-Evolution

The Self-Play Revolution

Reasoning in Three Modes

Outperforms Zero-Style Peers –with Zero Data

But Not Without Risks

💡 Why This Matters

The AI Augmented Human

2,140 followers

More articles by this author

Others also viewed

🥇Top AI Papers of the Week

AI vs. Machine Learning: Understanding the Difference for Business Strategy

How LLMs are Transforming Bot Building, Botnet Detection at Scale, and Declarative ML for Engineers

Prompt Engineering: From the Age of Guesswork to the Era of Strategy

Use cases specifically leveraging DeepSeek-V3 (AI reasoning models)

Empowering Artificial Intelligence with RAG: The New Era of Retrieval and Content Generation with Databricks and Mosaic AI

How LLMs are Shaping Enterprise-Scale Applications

The two paradigms of Artificial Intelligence: OpenAI's Approach to Building Thinking Machines

Grounding AI: A Framework for Mitigating Model Collapse

The AI Skills You Really Need in 2025

Explore topics

From Human Supervision to Self-Evolution

The Self-Play Revolution

Reasoning in Three Modes

Outperforms Zero-Style Peers –with Zero Data

But Not Without Risks

💡 Why This Matters

The AI Augmented Human

2,140 followers

How Personal Agents will replace Apps

Aug 11, 2025

When AI Handles the Craft, Humans Deliver the Value

Aug 4, 2025

Co-Creation: How I Work with AI

Jul 28, 2025

Countdown to 2 August 2025: A C‑Suite Sprint Toward GPAI Compliance

Jul 25, 2025

10 CoPilot Hacks You Might Not Know!

Jul 22, 2025

The Value Shift – Worth in the Age of Instant AI Creation

Jul 21, 2025

All you need to know about ChatGPT's new Agent

Jul 21, 2025

From Vibecoding to AI-First Engineering: 10 Lessons Learned

Jul 14, 2025

Perplexity Launches Comet – The AI Browser That Thinks for You

Jul 10, 2025

Assembly of Experts: How Chimera Changes the Game in AI Model Building

Jul 9, 2025

Others also viewed

🥇Top AI Papers of the Week

AI vs. Machine Learning: Understanding the Difference for Business Strategy

How LLMs are Transforming Bot Building, Botnet Detection at Scale, and Declarative ML for Engineers

Prompt Engineering: From the Age of Guesswork to the Era of Strategy

Use cases specifically leveraging DeepSeek-V3 (AI reasoning models)

Empowering Artificial Intelligence with RAG: The New Era of Retrieval and Content Generation with Databricks and Mosaic AI

How LLMs are Shaping Enterprise-Scale Applications

The two paradigms of Artificial Intelligence: OpenAI's Approach to Building Thinking Machines

Grounding AI: A Framework for Mitigating Model Collapse

The AI Skills You Really Need in 2025

Explore topics

Countdown to 2 August 2025: A C‑Suite Sprint Toward GPAI Compliance