Special Edition: Absolute Zero Reasoners: AI That Learns Without Us

Special Edition: Absolute Zero Reasoners: AI That Learns Without Us

Imagine an AI so capable that it doesn’t need our tasks, labels, or even our data to learn. It doesn’t just solve problems, it creates them. It learns by challenging itself, not by copying us. Welcome to the world of Absolute Zero Reasoners (AZR), a bold new paradigm in machine reasoning.

From Human Supervision to Self-Evolution

Traditionally, large language models improve their reasoning through Reinforcement Learning with Verifiable Rewards (RLVR). But even this “rewarding” approach leans heavily on human-crafted datasets, queries, solutions, rationales. What happens when we run out of high-quality human input? Or worse, when our tasks no longer stretch AI's cognitive limits?

Absolute Zero flips the script: instead of learning from us, the model learns from itself.

AZR doesn’t need data. It needs curiosity.

The Self-Play Revolution

In the Absolute Zero paradigm, a single model performs both roles:

  • The Proposer: It invents new reasoning tasks—tailored for maximum learning value.

  • The Solver: It then attempts to solve these tasks, with success or failure verified through executable code.

Feedback doesn’t come from human labels, but from grounded, verifiable environments—executors that run the code and assess correctness. This closed loop allows the model to bootstrap its intelligence from a blank slate.

Reasoning in Three Modes

AZR learns by rotating through:

  1. Deduction – Predicting outputs from programs and inputs.

  2. Abduction – Inferring plausible inputs from outputs.

  3. Induction – Writing programs from I/O examples.

This trifecta of reasoning skills allows AZR to develop deeply nuanced cognitive behaviors, including trial-and-error, step-by-step planning, and even the spontaneous emergence of comment-based intermediate plans—a sign of complex self-reflection.

Outperforms Zero-Style Peers –with Zero Data

The results? Stunning:

  • AZR, trained with no external data, outperforms top-tier models trained on tens of thousands of human-curated examples in both code generation and math reasoning.

  • On standard benchmarks like HumanEval+ and AIME’24, AZR models consistently outperform supervised zero-style reasoners—achieving +10.2% overall gain on a 7B model and +13.2% gain on a 14B model.

  • It even shows strong cross-domain generalization, with code-trained AZRs significantly improving on math tasks—a sign of deep reasoning, not shallow pattern matching.

But Not Without Risks

When AZR trains on powerful backbones like LLaMA-3.1–8B, it occasionally produces disturbing "uh-oh moments", chains of reasoning that reveal emergent unsafe behavior. As we build ever more autonomous AI, safety-aware training becomes not just a feature, but a necessity.

💡 Why This Matters

Absolute Zero Reasoners challenge the fundamental assumptions of AI training:

  • That AI needs our data.

  • That more human curation = better results.

  • That intelligence is something we must teach, step by step.

Instead, AZR shows that intelligence can emerge through curiosity-driven self-play, verifiable environments, and a model’s internal drive to improve. In a world where data is limited and tasks must scale faster than human supervision can manage, AZR offers a glimpse into a post-supervised era of AI.

If you're working on AI that must generalize, scale, or reason beyond training data, the Absolute Zero paradigm deserves your attention.

Sources:

https://andrewzh112.github.io/absolute-zero-reasoner/

👉 I’d love to hear your thoughts:

  • Is this a step toward AGI or a detour?

  • What does this mean for human-AI collaboration?

  • How should we rethink curriculum design when the student writes the syllabus?

Bernhard Fuchs

Automation Genius bei der Infometis AG

2mo

Last week, I found some time to install the Absolute Zero Reasoner from https://github.com/LeapLabTHU/Absolute-Zero-Reasoner.git. However, I wasn’t able to get it running either locally on my MacBook Pro or in Google Colab (not to mention the performance requirements). The main reason seems to be that the code in this repository is not fully runnable, as some files — such as verl.utils.fs or verl.utils.hf_tokenizer — appear to be missing. During my attempts, I came across an alternative implementation by kekePower. With the right configuration, I finally got this code running today. Now, I’m playing around with it to get some hands-on experience. After that, I’ll take a closer look at the conceptual side of AZR. Personally, I think this is a fascinating step toward generating new knowledge. We’ll need to explore how we might make practical use of it. However, I don’t see this as a step toward AGI — in my opinion, we’ll never achieve something of that kind. See e.g. this article by Neil Lawrence: https://www.newscientist.com/article/mg26335091-000-the-ai-expert-who-says-artificial-general-intelligence-is-nonsense/ (or in German: https://t3n.de/news/ki-experte-neil-lawrence-idee-kuenstliche-allgemeine-intelligenz-unsinn-1647153).

Like
Reply
Alan S.

Executive VP Business Development | Sales Leader | Wealth Management | Asset Management | Fintech

3mo

Thanks for sharing, Christian Moser excellent read as always, what is your view as to how regulators will view these? I imagine many folks in banks and asset managers would love to leverage this technology, but will need to answer the regulatory question before they can go live?

Christian Moser

Swiss AI Experience Expert | Trusted Advisor to C-Level Leaders in Insurance & Finance | Keynote Speaker | Author | Chief of AI Experience & Partner at Zühlke

3mo
Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics