"Fixing" Gen Code
Nevelfjell

"Fixing" Gen Code

In this edition I'd like to look into the issue of working with AI output in the context of software development. Again, as noted beforehand (previous edition), this text is not AI enhanced or AI generated. It is hand made full of good human flaws and errors. This is intentional, and not an accident.

Some definitions ... vibe coding refers to the use of AI tools for code generation, and those are based on LLMs (large language models). They promise to have rapid creation of wonderful code. Deterministic means based on identical input, you'll receive identical output. Nondeterministic means identical input, and differing output. LLMs are not deterministic. Traditional software development is 100% deterministic.

Article content
Nevelfjell

The Promise

When you work with the latest generation of "code generators", then they will be based on large language models, and some creative context management (and even use the new MCP technology to attach databases or other data sources) and integrated automated cycles to fix flaws of the process (called self-healing).

You would look at tools like the "cursor" code editor, or "Claude Code". They allow you to talk to the engine, and let it generate frontend, backend or whatever kind of code.

So, even if you have no clue about software development, and have never produced any line of code, and have zero talent, you can create an amazing application. At least, that's what companies such as OpenAI, Microsoft, Anthropic, etc. make you want to believe. They tell you "30% of all our code is AI generated", or "by 2030, 50% of all code will be AI generated", or similar promises. And people believe.

If you try it out, even as a seasoned software developer you'll discover that this new style of software development, called vibe coding has advantages. Then you'll get various subscriptions, and try it out yourself.

Article content
Nevelfjell

The Issue

LLM based software generators do not generate code on basis of deterministic rules, but solely in nondeterministic fashion. Which means, they don't know why it's generated, but it's roughly "ok".

If you use it for a "hello world" program, it'll be perfectly ok. If you use it for a simple mobile app in various software dialects, it's still mostly ok. Interestingly, if you become serious and push it to the edge, the whole promise crashes totally. In production ready code, various edge cases have to be identified and systematically mitigated . Those are in a grey zone, which is completely overlooked by LLMs, simply because it's not part of the most common scenarios.

As a sample use case, I used Cursor and Claude Code to come up with a series of Python scripts to automatically create non-fiction books of some kind, and would use open AI api or Claude code to perform actual content generation for the outline of the book, concepts of the scenes and actual scene output. So it should generate a full random book that reads plausible in a fully automatic fashion. Something that would be more than a weekend task if done without AI. But the promise of Claude code is that this should be a piece of cake.

However, it's not.

Claude Code is not able to anticipate the output flaws of AI generators itself in order to generate code that is able to handle that type of output. At least not yet. Therefore, it has to either run those APIs itself, or I have to manually feed the error output into Claude code. Which I did for several days.

So, the underlying issue of all code generators based on AI/LLM is that they work entirely nondeterministically, and if you don't investigate the code yourself by hand, you are stuck with hoping that the engine is able to fix the generated code itself, because it became already super huge and since it's not from yourself, you feel less motivated to read through it.

Article content
Nevelfjell

The Fix

Not sure if I have really good fixes, but here are some thoughts.

One is, we'll have to keep complexity of the generated code to a minimum. Best would be to encapsulate the code into separate modules, including a ton of test cases, which can be automatically be run by e.g. Claude Code. Then, you could be a bit more positive on that a module is actually correct.

Another one, is to stay conservative with what you ask for. Don't ask for super advanced features if you don't need them, as they would accidentally bloat the code.

Also, be aware of dead code. The LLM tools produce a ton of code, and they are extremely agile with adding more stuff, however, you may earn more dead code over time, because it's not used a lot. So keep a full list of features and test cases, so you can be totally sure that your code is doing what it's supposed to do.

You may need to clean up your code regularly, because of additional unused code.

Make sure to verify that your code is actually tested against misuse, because as I said, undocumented code might still be there, with unwanted functionality like a back-door, so better have some independent pen testing as well to make sure you have solid code.

If you have awesome hand written code that you're proud of, don't mix it with gen code. At least you'll know what you know.

For the business critical logic, you may want to separate it into stuff that code by hand. Or, if generated, at least debugged by hand that it is what you think it is.

Also note, if you use LLM models be aware that what they generate is not stable, but changes very frequently. Sometimes randomly, sometimes with a version change.

Finally, you'll need to embrace uncertainty and nondeterministic behaviour.

Article content
Nevelfjell

Closing Thoughts

Today (4. July 2025), I am undecided whether LLM code generation is suitable for production code or not. There are still too many things beyond the gen magic that are required to fix such issues.

So it's good for rapid prototyping. Everything else, be very very careful. Watch your step. Mind the gap. And beware of the foolish promises of the vendors. At the end, the code has to work, and be error free. Whatever that means in reality (since you can't check 100% of everything, you always focus on a specific window of it, except if you just focus on mathematics).

Another aspect which is not looked at here is how you work in teams with generated code. Maybe something to explore in a next edition.

What do you think? Do you agree? Or are you a non-paid cheerleader of the AI industry already? Am I wrong? Tell me :)

BTW: I really enjoy producing python code that generates CLI interfaces for a discussion with the user, a joyful alternative for users if they don't like to configure an application with scripts, and you don't want to create a super complicated UI. If you're interested in the above code base let me know.

Torger Kjeldstad

VP Technology | IT Director at Altibox & Ice Mobil | Experienced Leader & Entrepreneur | MSc in Tech & Product Development | Passionate About Innovation, Strategy & Execution

1w

Thanks for the insights, Marco! During my vacation, my feed has been flooded with pro–Gen AI content. The challenge is that much of it is AI-generated itself, and often directly or indirectly sponsored by vendors. As humans working with software, we need to be thoughtful and cautious about how we apply AI in production.

Like
Reply
kundan verma

Enterprise Architect

1mo

Thanks for this thoughtful analysis, Marco. I also liked your suggestions on "The Fix" section. I found another article from Steve jones with a different approach which you can try on your non fiction book case and let us know your experience with it. The article is https://blog.metamirror.io/claude-code-v-gemini-cli-e144feafbcf2

To view or add a comment, sign in

Explore topics