Prompting is Not Enough! Software generation with LLMs, lessons learnt

Mohammed Brueckner

Strategic IT-Business Interface Specialist | Microsoft Cloud Technologies Advocate | Cloud Computing, Enterprise Architecture

Published Jan 16, 2025

My initial goal was to translate a cherished game book into an engaging web app text adventure, a task that proved more intricate than initially imagined. The software is now operational; it functions. It is what it is. Some elements might be glitchy. The important thing is, it can be improved. This endeavor taught me a valuable lesson: The allure of instantly generated code often blinds us to the importance of careful planning when wielding large language models (LLMs). It's not just about crafting the perfect prompt; it is very much about meticulously planning the software creation process.

a) Never rely on the LLM to make important architectural decisions - instead, blueprint and plan in advance how the work should be conducted way before you proceed implementing

Diving directly into LLM integration without a solid plan is akin to building a house without blueprints; the result is likely to be unstable and inefficient. Constructing software with LLMs demands a very different skill set than traditional coding, requiring careful consideration of datasets, embeddings, and parameter weights to achieve consistent, accurate results. The architecture of any LLM-powered application should always start with a well-defined problem statement and the careful choice of the correct LLM.

b) Ideally you would even go so far to craft interactively with the LLM(s) models of elements of your software solution and processes before you go ahead, since that will not only give you clarity of thoughts and improve your prompts but greatly support the LLM

Interacting iteratively with the model is a critical practice. Designing prompts and refining them is not a task that should be skipped. We should be crafting model representations of elements of our software solution and processes, since such interaction enhances the effectiveness of the model and improves our prompts. The full power of an LLM is only unlocked through careful prompt engineering.

c) Make sure to break down the work into chunks/packages that are actually manageable by the LLM- even if the token windows are massive assume the LLM will struggle to maintain relationships across various data points and across volume of data - in my case the LLM struggled severely maintaining a coherent format for JSON payload files as well as keeping track of work items, resulting in completely skipping vast amounts of important data objects for no obvious reason

When dealing with significant volumes of data, remember to break down your work into smaller pieces that the LLM can process reliably. Token windows might be large, but that does not mean an LLM can maintain coherent data relationships across various points and data loads. I have experienced first hand the models' struggle to maintain formats, and keep track of even the most important data objects. Effective chunking methods are absolutely key here, transforming large texts into manageable units of information.

d) It is a good idea to plan for scripts that would check the integrity of the output. in my case, a sophisticated python script would go through all content files to check and double check all required data points are actually really present - you might think that is something the LLM would be able to do, and some of them do, but honestly not reliably!

Do not expect the LLM to reliably self-check its output. You need to plan for scripts that can scrutinize the integrity of the generated content. Even if some models can check themselves to some extent, they are, put it mildly, not reliable on that front. Consider a script that rigorously inspects content files to verify the presence of all essential data points.

e) Speaking of using LLMs - I made the experience that using different LLMs in tandem will get you over issues in the software design that one LLM alone might not get beyond, because the reality is this - the LLM will get you probably to 80% to the working code but then struggle badly with the remainder and this is where other LLMs sense checking and debugging can do real wonders!

Using multiple LLMs in tandem can greatly enhance the quality of the output. In my own experience, while one LLM can achieve a good 80% of the project goals, the final stretch can be incredibly difficult. This is where other models can sense-check, debug, and often elevate the quality of the output.

f) I used a mix of proprietary and open LLMs which has proven to be working very well

A mix of both proprietary and open source models is a sound tactic to work around the limitations each single model may have. The world of open-source LLMs is experiencing fast advancements and a growing number of options are now available. These models offer transparency, accessibility, and a way to control one's personal data.

Okay - I hope you find this useful. Please share your experiences with LLMs generating code and even complete software solutions! I am curious.

Personally, I think we are only at the very beginning of something - and yet do believe we'll see strong results at some point, even though it might take a lot longer than we think right now. (A little bit like with self-driving cars which should be roaming around everywhere by now except they are not, in this far future of 2025 we live in - and yes, that will happen eventually.)

Find more great things that fit to the use of LLMs at https://platformeconomies.com

Prompting is Not Enough! Software generation with LLMs, lessons learnt

Mohammed Brueckner

Strategic IT-Business Interface Specialist | Microsoft Cloud Technologies Advocate | Cloud Computing, Enterprise Architecture

More articles by this author

Others also viewed

The New Minimum for LLM Developers Just Changed. So Did the Course That Teaches It.

🤖 AI Will Write Most of Code by 2026

Become an LLM dev in 50 hours—learn, code, ship, and certify

AI won’t thrive in legacy interfaces

What AI code editors means for API development

How to Agentify Wisely

The AI Acceleration of Our Efficiency Crisis: How LLMs Are Making Bad Software Worse

‘It Works on My Machine’ is Not a Feature: On Documentation and Discoverability

Not Just Vibes. Purpose-Built Government AI.

Building for a new user

Explore topics

Microsoft’s Summer Update is a Masterclass in Boring Dominance

Aug 14, 2025

Nvidia’s $4 Trillion Dreams

Jul 14, 2025

How AI Video Generation is Reshaping Content Creation

Jun 17, 2025

Automation’s Odd Couple: Why Microsoft’s Tools Refuse to Marry

May 14, 2025

A No-Nonsense Look at IoT Analytics Costs (Power BI & Fabric)

Apr 16, 2025

When AI Becomes Your Development Partner

Mar 12, 2025

Microsoft Cloud: Whole Greater Than Parts

Feb 12, 2025

Chips, Chats & Changes - The New AI Frontier

Jan 21, 2025

Reflections, Strategies, and Opportunities for 2025

Dec 25, 2024

Ignite 24: The Microsoft AI Meetup

Nov 24, 2024