Prompting is Not Enough! Software generation with LLMs, lessons learnt
My initial goal was to translate a cherished game book into an engaging web app text adventure, a task that proved more intricate than initially imagined. The software is now operational; it functions. It is what it is. Some elements might be glitchy. The important thing is, it can be improved. This endeavor taught me a valuable lesson: The allure of instantly generated code often blinds us to the importance of careful planning when wielding large language models (LLMs). It's not just about crafting the perfect prompt; it is very much about meticulously planning the software creation process.
a) Never rely on the LLM to make important architectural decisions - instead, blueprint and plan in advance how the work should be conducted way before you proceed implementing
Diving directly into LLM integration without a solid plan is akin to building a house without blueprints; the result is likely to be unstable and inefficient. Constructing software with LLMs demands a very different skill set than traditional coding, requiring careful consideration of datasets, embeddings, and parameter weights to achieve consistent, accurate results. The architecture of any LLM-powered application should always start with a well-defined problem statement and the careful choice of the correct LLM.
b) Ideally you would even go so far to craft interactively with the LLM(s) models of elements of your software solution and processes before you go ahead, since that will not only give you clarity of thoughts and improve your prompts but greatly support the LLM
Interacting iteratively with the model is a critical practice. Designing prompts and refining them is not a task that should be skipped. We should be crafting model representations of elements of our software solution and processes, since such interaction enhances the effectiveness of the model and improves our prompts. The full power of an LLM is only unlocked through careful prompt engineering.
c) Make sure to break down the work into chunks/packages that are actually manageable by the LLM- even if the token windows are massive assume the LLM will struggle to maintain relationships across various data points and across volume of data - in my case the LLM struggled severely maintaining a coherent format for JSON payload files as well as keeping track of work items, resulting in completely skipping vast amounts of important data objects for no obvious reason
When dealing with significant volumes of data, remember to break down your work into smaller pieces that the LLM can process reliably. Token windows might be large, but that does not mean an LLM can maintain coherent data relationships across various points and data loads. I have experienced first hand the models' struggle to maintain formats, and keep track of even the most important data objects. Effective chunking methods are absolutely key here, transforming large texts into manageable units of information.
d) It is a good idea to plan for scripts that would check the integrity of the output. in my case, a sophisticated python script would go through all content files to check and double check all required data points are actually really present - you might think that is something the LLM would be able to do, and some of them do, but honestly not reliably!
Do not expect the LLM to reliably self-check its output. You need to plan for scripts that can scrutinize the integrity of the generated content. Even if some models can check themselves to some extent, they are, put it mildly, not reliable on that front. Consider a script that rigorously inspects content files to verify the presence of all essential data points.
e) Speaking of using LLMs - I made the experience that using different LLMs in tandem will get you over issues in the software design that one LLM alone might not get beyond, because the reality is this - the LLM will get you probably to 80% to the working code but then struggle badly with the remainder and this is where other LLMs sense checking and debugging can do real wonders!
Using multiple LLMs in tandem can greatly enhance the quality of the output. In my own experience, while one LLM can achieve a good 80% of the project goals, the final stretch can be incredibly difficult. This is where other models can sense-check, debug, and often elevate the quality of the output.
f) I used a mix of proprietary and open LLMs which has proven to be working very well
A mix of both proprietary and open source models is a sound tactic to work around the limitations each single model may have. The world of open-source LLMs is experiencing fast advancements and a growing number of options are now available. These models offer transparency, accessibility, and a way to control one's personal data.
Okay - I hope you find this useful. Please share your experiences with LLMs generating code and even complete software solutions! I am curious.
Personally, I think we are only at the very beginning of something - and yet do believe we'll see strong results at some point, even though it might take a lot longer than we think right now. (A little bit like with self-driving cars which should be roaming around everywhere by now except they are not, in this far future of 2025 we live in - and yes, that will happen eventually.)
Find more great things that fit to the use of LLMs at https://platformeconomies.com
BUILDING. DIGITAL. ARCHITECTURE.
6moGreat work and thoughts - thanks for sharing.