Why Fully Autonomous Software Engineering with AI Remains a Distant Dream

Richard Davies

Head of Artificial Intelligence @ AI-XON | Artificial Intelligence Expert | Author

Published Jan 11, 2025

I am often surprised by how many people in technology remain unaware of why fully autonomous software engineering is impractical with Artificial Intelligence agents based on current methods (Neural Networks, including Reinforcement Learning). Multi agent frameworks (Devin, etc) based solely on language modelling, both now and for the foreseeable future (until a breakthrough occurs in Neurosymbolic AI), become caught in infinite loops due to the four major limitations of neural networks, along with an additional limitation specific to language models. I will now outline these limitations and explain how they impact our ability to fully automate software engineering.

The primary reason multi agent frameworks become stuck in loops when writing software is the Out of Distribution (OOD) problem. This occurs when the language model encounters input data, such as a novel bug, for which no information exists within its training distribution. Neural networks typically experience a significant decline in performance on out of distribution inputs because they are unable to extrapolate effectively. While Chain of Thought (CoT) can occasionally help address such bugs, the solutions generated are often irrelevant, causing the system to enter infinite loops. This remains true even when additional context is provided through prompts that include error messages from the Language Server Protocol (LSP), linting tools, compilers, or interpreter exception handling. Attempts to use Retrieval Augmented Generation (RAG) for out of distribution data would still require examples of all possible out of distribution scenarios, an unachievable task akin to modelling infinity.

This issue is closely related to the problem of inference on under represented data within the training distribution, known as the Long Tail Problem. When inference is performed on under represented cases, performance suffers because sparse training data limits the model’s ability to interpolate effectively. This leads to a higher rate of errors for edge cases. Attempting to model (through Pre-training, Fine-Tuning, and RLHF) all potential edge cases is infeasible, as it would again require modelling infinity. Slightly off topic but Adversarial attacks exploit out of distribution and under represented observations from the long tail within the training distribution.

The third issue is known as the Autoregressive Error Propagation. Machine learning involves an optimisation process where the difference between input data and labels is minimised across a dataset. Even at a global minimum, errors persist due to noise within the dataset. Each inference carries a small probability of error, and these errors propagate in autoregressive language models. As the length of the generated output increases, so does the error rate and the likelihood of producing an incorrect output. In a multi agent setup, error cascades occur when the output from one agent is passed to another, compounding the error rate further.

The fourth issue is the Symbol Grounding Problem. In language models, tokens (or sequences of tokens) represent symbols in our languages, but these symbols lack inherent meaning. A symbol serves as input that the mind interprets into a stored abstract representation. However, this abstract representation does not exist within the symbol itself. These representations are learned through cultural and experiential data (including multi sensory fusion across our senses), and multi agent collaboration that cannot be replicated by models trained solely on natural language data.

As machine learning models identify shallow statistical relationships between sequences of tokens, they lack the ability to reason meaningfully because meaning is not encoded within the symbols themselves. Without the capability to learn true abstract representations, they are unable to perform extrapolation, logical deduction, or abductive reasoning required for high level software engineering, language models only imitate reasoning through fuzzy retrieval from their training distribution.

The final issue, Hallucinations, specific to language models, is already widely recognised as such I will spend little time on the issue. Hallucinations, where the model generates incorrect or fabricated information, further increase the error rate, exacerbating the challenges discussed above.

The first four issues have been well documented in the literature for decades, yet many remain unaware of them. I'll finish up with Software engineers roles will not disappear. Code generation will become ubiquitous, with software engineers reviewing the generated output for errors and alignment with requirements and designing the architecture of the solution, tasks at which language models perform poorly because they lack the ability to understand abstract concepts.

Jayakumar S

8mo

Till Semiconductor devices realise its full potential. Else, human drive Semiconductor devices are better compared to device itself

Mohammad Oghli

Software Lead @ Archireef | Data Solutions | MLOps | Tech Author

8mo

Agree with you till now fully autonomous Software Engineer with AI is impossible unless we achieve a breakthrough in Artificial Neural Networks current implementation. As well great deep analysis for the reasons behind that but IMO if you put real world examples from software development problems encountered during using LLMs in code generation it will be stronger proof for the issues you mentioned. I have been in software development industry for 10 years and I have written advanced software solutions with different technology stacks. To be honest I didn't see anything more impressive than what we are witnessing recently in software dev advancement using AI co-pilots with LLMs such ChatGPT, Claude, Llama 3, etc. Despite all the issues you mentioned it's still great evolution in the software development domain with AI code generation tools. I would say a good engineer will be really 10x more productive with it. I think it will even automate many routine tasks in software development and we will dispense some junior roles in software engineering industry in near future.

2 Reactions

Richard Davies

Head of Artificial Intelligence @ AI-XON | Artificial Intelligence Expert | Author

8mo

I received feedback suggesting that this article could have been framed more effectively from a software engineering perspective, and I agree. I focused too heavily on Artificial Intelligence and the limitations of Machine Learning and Language Models, which was not an ideal approach. Instead, I should have provided practical examples of multi agent systems encountering bugs during code generation, issues that cannot be resolved with the available context due to limitations, such as the long tail problem, resulting in the agent becoming caught in an infinite loop. I will take this into account and hope to write a better article next time.

1 Reaction

Sergii Shcherbak

Creator of 💎 ContextGem | Founder & CEO Shcherbak AI

8mo

Great breakdown.

1 Reaction

Michael van den Heever

Driving Collaborative Change @ the University of Capilano

8mo

That phenomenal breakdown made my brain bleed. Beautifully articulated. I've experimented enough interactively with LLMs to see the quirks or problems you've mentioned. I have an opposite interest in this study, and that is the frightful consequence of the human not realizing or being able to spot these problem in the data being provided back from LLMs. It's clear that one whom surrenders to Ai should also be consciously mature enough to discern. Anyhow... I just love what you've expressed here.

LinkedIn respects your privacy

Why Fully Autonomous Software Engineering with AI Remains a Distant Dream

Richard Davies

Head of Artificial Intelligence @ AI-XON | Artificial Intelligence Expert | Author

More articles by this author

Explore content categories

University Final Results

Jun 20, 2018

Explore content categories