My 2025 AI Predictions

Denis V. Batalov

Published Jan 1, 2025

I started writing this a couple of days ago on an ocean front balcony of a rental condo in Maui - a much needed family respite after an eventful 2024. It was a beautiful sunny day with humpback whales breaching in the distance and turtles gliding over coral in the surf. It's a magical sight, but my mind wandered onto another "kind of magic" - driven by the seemingly unstoppable progress of AI.

A year ago I shared some thoughts about what might develop over the course of 2024. Let's first examine how things turned out.

Domain-specific models and cost-efficiency

While many domain-specific models have been built or in the process of being built (couple of examples for Telco and Accounting), it's hard to argue that it has taken the industry by storm. Certainly, domain customization is very popular, but using RAG-like techniques may still be the preferred approach over extensive pretraining and fine-tuning. Primary reasons, of course, include efforts required to acquire and prepare data as well as the training costs vs the incremental value. And yet, advanced model customization techniques, like model merging and its optimization, various forms of PEFT and quantization, or the novel Spectrum technique and the Reinforcement Fine-tuning are paving the way for more cost efficiency and one can argue became the new norm. Such techniques paired with advanced cluster management, e.g. as offered by SageMaker HyperPod are enabling recent successes for, say, non-English LLMs.

Speaking of various cost-efficiency techniques, both smaller model size and specialized AI hardware naturally continue to proliferate. The pace of investment into new AI accelerators shows no signs of deceleration, pun intended. NVIDIA has captured the mindshare with Blackwell architecture, and the cloud providers keep pace with Trainium/Inferentia, Trillium and Maia.

Work on dataset curation continued in 2024 as expected, with many approaches being developed, evident in both paper publications like this or this (just to give a few random examples) as well as in a very practical plane - the GIGO principle still very much applies!

As for optimization tricks, like caching, semantic caching feels so 2023. New kid on the block is caching of prompt prefixes, called prompt caching leading to dramatic improvements in cost and latency when, say, asking questions about the same large document. Here is how to get started if you are using AWS Bedrock.

While I can call the above predictions a moderate success, it's hard to be very proud of them given that they were in the Duh! category.

Keeping models up-to-date and novel applications

Taking a look at the Huh? section, it's clear that no real industry that offers up-to-date datasets to solve the recency problem has really emerged. Dataset curation companies exist, but the "out-of-date" problem is being solved primarily by RAG being able to access latest news via internet. And frankly the pace of model version update is fast enough to make the overall problem smaller. That said, few business applications would really allow access to internet search, so the problem remains.

The second prediction in that section, though, feels spot on - we have seen a number of examples of Gen AI models being applied to non-traditional content. Again to be clear, I am not talking about text, images, video or audio - these modalities already come with lots of potential training data. Instead, I was thinking about other types of data and so it was great to see the development of Chronos models for time-series forecasting. And as the year closes, we see tremendous progress towards SOTA weather forecasting with GenCast based on a special kind of a diffusion model! These predictions, is something to be proud of, especially since I explicitly called out forecasting.

Finally, while I can't say we've seen the kind of black swan event I was worried about, I was only giving it 30% odds, so let's not call it a miss.

Deep Fakes

Before we move on to 2025 predictions, I would like to spend a moment on deep fakes from the Responsible AI section. I guess the good news is that they have not yet played a major role in affecting electoral processes either in US or elsewhere, at least based on what I've personally seen. Sure, there were numerous clips made, but the vast majority were designed to make fun of the other side in a not especially subtle way. Or at least unlikely to be successful at artful manipulation when presented to a voter of even moderate judiciousness. If I missed something noteworthy, please, point me to it, but I can certainly state that nothing of the sort reverberated in the news in a big way.

The quality of synthetic video and audio non the less continues to grow to the point of being indistinguishable from reality to an untrained eye, so I think it's only time before a sophisticated enough ruse is used for political means. The most common deep fakes today seem to instead target gullible "investors", with famous businessmen convincingly endorsing new profitable ventures. Additionally, we seem to be bombarded by AI generated videos on social media such as this "polar bear cub rescue" likely designed to elicit emotional response and consequently higher engagement and ad revenue. Regardless of the impact, the intent seems to be enrichment.

Ok, enough of a retrospective. On to 2025!

2025 Predictions

In keeping with tradition I am sticking to same sections.

Duh!

Return on Investment: The LLM training methods remain fairly extensive, requiring lots of training data and post training alignment. Some even argue that we've exhausted all the training data available to continue substantially improving them in this way. So the cost of training (or retraining with new optimizations) is still very high, reaching 10's of millions of dollars by various estimates. And API inference remains to be costly not only for the end user, but also for model deployer, hovering around 1 cent per thousand output tokens. This reminds me of a fun stat that Amazon displayed for many books - words you are getting per dollar of book cost. As all the major model providers compete for highest quality at lowest cost, the quality seems to be going up steadily but the price seem to be dropping even faster. All driven by the desire for a market share grab. We may be witnessing a tactic that Uber was accused of successfully employing - keep the prices low to take business from the incumbents and once the fait is accompli ratchet up the prices to get to profitability. Are we seeing the same with Gen AI? Seems likely, except there are a few solid competitors in this space.

Accelerators, energy and scale: To win in this game one needs the talent to both make better models and come up with more cost effective training and inference technology. I don't expect this equation to change in 2025. To achieve more cost effective training/inference without a dramatic change to the model architecture, one could lower hardware cost via volume discounts (and other economies of scale), obtain cheaper energy sources and build more advanced accelerators. Nuclear power seems like the only viable long term energy source that provides the necessary scale while allowing to keep the current climate commitments. Data center expansion is expected to continue well into the next year, per Gartner:

Project Rainer is a recent example of the massive AI infrastructure being assembled taking advantage of AI accelerator innovations.

Agents: Much has been said about 2025 being the year of Agents and Agentic architectures, these clearly belong to the Duh! category, and I will not pile on. Already we see agents playing a key role in improving the quality of RAG systems with Agentic RAG. Deeper specialization is likewise a natural evolution of agentic systems where achieving a complex multistep outcome can be done in terms of employing and orchestrating multiple specialist agents. And this is quickly moving from just a concept a year ago to actual practical implementations.

When it comes to examples where such deeply specialized agents are useful, take a look at the active space of recommender systems starting with automatically constructing common sense knowledge graphs all the way to infusing agents with such graphs.

This is where this train of though naturally leads us to the Huh? section.

Huh?

Explicit reasoning: For any agentic system to be successful, its planning capabilities need to be improved with self-validation of reasoning steps especially when it comes out-of-distribution situations. It has long been recognized that the simple prediction of the next token allows LLMs to acquire reasoning capabilities as well. Acquire the "rules of thinking", the rules of logic, the ability to plan, the latter being a form of deductive reasoning. The challenge is that these reasoning capabilities are captured in an non-interpretable and non-debuggable myriad of weights. When simple example "plans" are present in the training datasets, it's not had for LLMs to generate accurate and functioning plans. But when solutions need to be found (and spelled out in plans) need to be found to less common or entirely novel problems, even specially trained agentic "planning" LLMs are likely to fail.

The work on this front consists of two primary directions: 1/ training and tuning models specifically designed to be good at reasoning and planning and 2/ incorporating explicit logic-based formal reasoning. In combination, these approaches show tremendous progress with LLMs capable of relatively complex problem solving, showing intermediate steps, veracity of each can be easily tested. This is achieved through specialized fine-tuning, multi-step actor-critic techniques and other forms of grounding. Recently, though, we've started to see explicit validation of LLM output based on extracted predicates with variables and using predicate calculus. A good example of a foray into this space is Automated Reasoning checks. The latter looks very promising and I expect this and similar efforts to develop and advance substantially in 2025. Imagine applying this technology not just for post-completion validation, but actually as part of a more formal reasoning system that augments "intuitive" but informal reasoning that LLMs offer today.

Tools marketplace: Agents work by breaking down a problem into simple executable steps, like running a database query or updating the calendar and sending an notification. Ultimately, each step represents a well-defined, non-ambiguous operation with an expectation of a clear outcome. Each such operation provides a useful abstraction that is designed to hide some "undifferentiated heavy lifting". An ability to book a table at a restaurant for a party of a particular size with a few typical constraints (culinary preferences, dietary restrictions, ambiance, location convenience, cost, etc) is an example of such a useful operation. In the LLM world these are often called "tools". Creating an order for marketing swag with other sets of constraints (sizes, colors, branding, to be delivered on time to a specific location, etc) is another example. Agents need to have access to the catalog of such tools, which become the new operation "primitives", their reliability and scaling characteristics (can I book a table for a party of 30?, can I order swag for 10,000 people?) and, of course, cost. I can envision a veritable marketplace of such services with clearly defined interfaces. The individual services or tools don't even need themselves to incorporate AI, that's not the point. The point is that this is not different from Gen AI coding - creating a plan of execution composed of tools with well-defined interfaces.

I may be dating myself here, but this reminds me of the past attempts for similar interoperability like Common Object Request Broker Architecture. These attempts have been mired in lengthy interface standardization efforts, while this may be exactly what LLMs are good at - translating natural language to domain-specific, tool-specific interfaces. While I am skeptical that we would make great progress towards this vision in 2025, I am expecting more progress to be made as we are already seeing some attempts at such abstraction layers, with LlamaIndex Tool abstraction and Anthropic's Model Context Protocol to name a few.

Model Architecture: The transformer architecture has been at the heart of Gen AI and yet it has a number of well-known problems: high computational cost, sample inefficiency, poor self-attention mechanism scalability and lack of explainability, to name a few. Many different tweaks have been proposed, ranging from HyperMixer to a more recent Kolmogorov-Arnold Networks (KANs) and even their incorporation into transformer architectures.

Similarly, altogether novel architectures have been proposed in the past, notably Mamba and a slew of variants such as Cobra, SiMBA which attempt to deal with some of the transformer problems using selective state spaces and eliminating some inference bottlenecks related to attention mechanism.

Another notable attempt is one based on Liquid time-constant Networks, which culminated in an entire apparently well-funded startup emerging - Liquid AI. While such approaches need to clearly prove themselves in the market with better perming and widely applicable models, I feel like the critical mass has been attained to have a substantial improvement over the current transformer based architectures to emerge in 2025!

RAI

Content Credentials: We've talked about deep fakes above with financial incentive being the primary one. And so it's worth talking about outright crime. Over centuries humans have learned to trust their senses and recognition abilities, recognizing faces and voices specifically. And yet, audio-visual deep fakes have recently become the tool of the criminals relying on our sense of trust in our innate recognition systems. Quality voice and video cloning has become more and more accessible over the years to the point that you don't need to be well versed in Machine Learning at all to achieve good results. This is the concerning kind of "democratization" that we need to collectively be concerned about. The trust in the foundations of our society and institutions is at stake.

While watermarking of synthetic content is a basic mechanism for detecting fakes, content credentials embedded into the content metadata is a more reliable mechanism. If you receive a voice recording that was actually recorded on by your friend on their phone, the phone software should automatically attach a credential to that effect and software on your own phone should be able to verify that credential for you, in case the message is asking for, say, an urgent transfer of some of money.

Regulations: Finally, EU AI Act is upon is - the first steps to comply need to be taken in 2025 already. While this is a somewhat complicated subject that would be impossible for me to cover in any meaningful detail, it's high time you started planning for this now.

Needless to say, the above is a highly subjective account. I am sure I have forgotten about some other important aspects. Whether you agree or disagree with any of the above, your comments would be highly appreciated.

Let's check back in a year. Happy 2025!

Leonid Meyerguz

Senior Staff Software Engineer at Oscar Health

6mo

Denis, chanced upon this post in my feed, and very glad I did. Great read! Thank you.

1 Reaction

Prasanth Ponnoth

Solutions Architect @AWS

7mo

Great read ! loved it !

Stefan Christoph

AWS Principal Solutions Architect | ML/Generative AI Expert | Media & Entertainment Solutions | Global Speaker | Opinions are my own.

7mo

Strong one - thanks for sharing, Denis!

Ahmed Raafat

AI/ML Specialist Lead | Field CTO | Keynote Speaker

7mo

Nice one , thanks for sharing Denis V. Batalov and happy new year 🎉🎉

1 Reaction

Thomas S.

Principal, Generative AI Specialist @ Amazon Web Services (AWS) | AWS GenAI Ambassador

7mo

Happy New Year Denis! Thanks for a very insightful post with strong predictions for 2025

1 Reaction

See more comments

To view or add a comment, sign in

My 2024 AI Predictions

Jan 1, 2024

My 2025 AI Predictions

Denis V. Batalov

Domain-specific models and cost-efficiency

Keeping models up-to-date and novel applications

Deep Fakes

2025 Predictions

Duh!

Huh?

RAI

Let's check back in a year. Happy 2025!

More articles by this author

Others also viewed

TAI #110; Llama 3.1’s scaling laws vs 100k+ H100 clusters?

Understanding the AI Tech Stack

Global AI Server Market Industry Overview: Key Trends and Regional Market Insights

Google Leaked Memo "We Have No Moat (and Neither Does OpenAI)" through the Lens of Slowify, Simplify, Amplify

What Grok3 Has to Say About Removing Key Safety Mechanisms for Efficiency:

Computing for AI at Scale – Identification 3 of 8

Sunday April 6: DeepSeek test-time compute; The most neglected social issue; US Innovation vs Chinese adoption; AI reporters for local news

Implementing Retrieval-Augmented Generation (RAG) with Amazon Bedrock Knowledge Bases

AI-Centric Data Centers: A Deep Dive into the Future of Investment

The AI-Native Telco Network III

Explore topics

Domain-specific models and cost-efficiency

Keeping models up-to-date and novel applications

Deep Fakes

2025 Predictions

Duh!

Huh?

RAI

Let's check back in a year. Happy 2025!

My 2024 AI Predictions

Jan 1, 2024

Others also viewed

TAI #110; Llama 3.1’s scaling laws vs 100k+ H100 clusters?

Understanding the AI Tech Stack

Global AI Server Market Industry Overview: Key Trends and Regional Market Insights

Google Leaked Memo "We Have No Moat (and Neither Does OpenAI)" through the Lens of Slowify, Simplify, Amplify

What Grok3 Has to Say About Removing Key Safety Mechanisms for Efficiency:

Computing for AI at Scale – Identification 3 of 8

Sunday April 6: DeepSeek test-time compute; The most neglected social issue; US Innovation vs Chinese adoption; AI reporters for local news

Implementing Retrieval-Augmented Generation (RAG) with Amazon Bedrock Knowledge Bases

AI-Centric Data Centers: A Deep Dive into the Future of Investment

The AI-Native Telco Network III

Explore topics