DeepSeek, OpenAI and the Jevons Paradox

DeepSeek, OpenAI and the Jevons Paradox


This is the 12th article of Beyond Entropy, a space where the chaos of the future, the speed of emerging technologies and the explosion of opportunities are slowed down, allowing us to turn (qu)bits into our dreams.


The outline of today's post we be:

  • 📚 DeepSeek, Qwen, and OpenAI-o3: Reasoning Capabilities and Energy Efficiency;
  • 📉 NVIDIA stocks and the Jevons Paradox;
  • 🇪🇺 Some tech news from Europe;
  • 🚀 Job & Research opportunities, talks, and events in AI.

Let’s start!

DeepSeek, Qwen, and OpenAI-o3: Reasoning Capabilities

In the past 15 days there has been an incredible acceleration of releases of new AI models. This has disrupted not only the tech community, but the entire market and even some geopolitical balances. This clearly shows that the race to dominate AI is becoming strategic and high on the political agenda of major countries. Words such as Open Source, LLMs, or Reinforcement Learning have passed almost everyone's lips.

In particular, three AI models were released (DeepSeek-R1, Qwen 2.5 Max, and OpenAI o3-mini) that competed on one of the most strategic and important tasks for model supremacy: Reasoning Capabilities. These capabilities involve mathematical skills, logical-problem solving reasoning, coding, and PhD-level knowledge of certain topics, especially analytical and scientific.

Reasoning capabilities are measured on benchmarks such as AIME 2024, GPQA Diamond, FrontierMath, SWE-bench, and others. Some preliminary results (check this blog post) shows that:

  • OpenAI o3-mini outperforms DeepSeek-R1 on AIME 2024, which measures complex instruction understanding (but requires high reasoning effort);
  • OpenAI o3-mini overcomes DeepSeek-R1 by 0.1 points on SWE-bench, which measures coding and software engineering skills (again needing high reasoning efforts);
  • On the other hand, the DeepSeek-R1 model performs better than OpenAI o3-mini in mathematics tasks, demonstrating stronger numerical reasoning and problem-solving skills;
  • OpenAI o3-mini score slightly better than DeepSeek-R1 on others benchmarks such as Codeforces and NYT Connections Puzzles.

What about Energy Efficiency?

We have all heard that DeepSeek has been able to compete with the better OpenAI and Anthropic models at a significantly lower cost (although it is not yet entirely clear by how many orders of magnitude). This includes both training and inference costs.

How is this possible? In recent years, AI research has focused on developing new techniques that are even more efficient. Among these, the most popular are Pruning, Quantization, Distillation, LoRA, and Experts Routing. In particular, the DeepSeek team has exploited:

  • Multi-Head Latent Attention (MHLA): it is a more efficient version of the self-attention mechanism that reduce keys and values cache size with latent projections;
  • Mixture-of-Experts (MoEs) with Auxiliary-Loss-Free Load Balancing: out of the over 600B parameters, specific experts fo 37B parameters are selected in an optimized way;
  • Multi-Token Prediction: speeds up inference by predicting multiple tokens at once.

DeepSeek team also claims to take advantage of Knowledge Distillation, which consists of transfer knowledge from a pre-trained big model (the teacher) to a smaller, more efficient model (the student) by training the later to mimic the former. However, it seems that DeepSeek is not exactly leveraging distillation but Supervised Fine Tuning (SFT), as noted by Andriy Burkov in this interesting discussion.

How does energy efficiency translate into final costs for using these models?

  • DeepSeek’s R1 reasoning model costs $0.14 per million cached input tokens and $2.19 per million output tokens via its API;
  • OpenAI o3-mini is priced at $0.55 per million cached input tokens and $4.40 per million output tokens;

Therefore, to date DeepSeek is 4 times cheaper than OpenAI o3-mini, which is 63% cheaper than OpenAI o1-mini (Pricing: DeepSeek, OpenAI).

The Jevons Paradox and Market Fall

The race toward efficiency of LLMs (keeping their performances constant) naively suggests less and less need for computational power. For this reason the release of the efficient DeepSeek triggered a very heavy market reaction for the shares of USA big tech companies and in particular NVIDIA.

Only the most tech-savvy and careful observers have not sold NVIDIA's stock, knowing full well that counterintuitively the computational power required by AI can only increase. This is the famous Jevons Paradox that states:

Any increase in resource efficiency generates an increase in long-term resource consumption, rather than a decrease.

In the context of LLMs, this translates into the fact that by becoming less and less expensive (both for training and for inference) they will be able to be adopted by more companies, startups and users. Each of these entities will necessarily have to use computational resources. Extreme example: if today only one player can train an LLM with 10 thousand GPUs, tomorrow 10 million players can train an LLM with only one GPU each. If you do the math, the number of total GPUs can only increase.

Digital & Tech news from Europe

In this section I want to highlight some tech news from Europe (to stay updated consider to follow the official EU Digital & Tech ):

  • The EU Commission presented the EU Competitiveness Compass, the initiatives programme that sets the future technological path for Europe. Among other initiatives, it worth mentioning the AI Gigafactories and the Apply AI Strategy, whose goals are to drive development and industrial adoption of AI in key sectors, such as advanced materials, quantum, biotech, robotics and space technologies. Furthermore, a dedicated EU Start-up and Scale-up Strategy will be implemented too to allow new companies to emerge.
  • The prestigious Strategic Technologies for Europe Platform (STEP) Seal was awarded by the Eu Commission to the Multilingual AI project OpenEuroLLM. This project is working on the first family of open-source Large Language Models that cover all official EU languages. Bringing together EU startups, research labs and supercomputing hosting entities, the project intends to train these AI models on European supercomputers.  
  • The European Quantum Communication Infrastructure (EuroQI) initiative, whose goal is to develop quantum-safe communication networks, has partnered with the European Space Agency (ESA), which will provide technical support for the implementation of EuroQCI;

Opportunities, talks, and events

I share some opportunities from my network that you might find interesting:

🚀 Job opportunities:

🔬 Research opportunities:

📚 Other opportunities:

  • Summer School for PhDs on Physics-informed Neural Networks at KTH Royal Institute of Technology, Stockholm, 15-29th of June 2025 (apply here!).

Thanks for reading and thanks for sharing if you found this content useful! Until the next post!

To view or add a comment, sign in

Others also viewed

Explore content categories