Hidden Technical Debt in AI

Hidden Technical Debt in AI

That little black box in the middle is machine learning code.

I remember reading Google’s 2015 Hidden Technical Debt in ML paper & thinking how little of a machine learning application was actual machine learning.

The vast majority was infrastructure, data management, & operational complexity.

With the dawn of AI, it seemed large language models would subsume these boxes. The promise was simplicity : drop in an LLM & watch it handle everything from customer service to code generation. No more complex pipelines or brittle integrations.

But in building internal applications, we’ve observed a similar dynamic with AI.

Agents need lots of context, like a human : how is the CRM structured, what do we enter into each field - but input is expensive the Hungry, Hungry AI model.

Reducing cost means writing deterministic software to replace the reasoning of AI.

For example, automating email management means writing tools to create Asana tasks & update the CRM.

As the number of tools increases beyond ten or fifteen tools, tool calling no longer works. Time to spin up a classical machine learning model to select tools.

Then there’s watching the system with observability, evaluating whether it’s performant, & routing to the right model. In addition, there’s a whole category of software around making sure the AI does what it’s supposed to.

Guardrails prevent inappropriate responses. Rate limiting stops costs from spiraling out of control when a system goes haywire.

Information retrieval (RAG - retrieval augmented generation) is essential for any production system. In my email app, I use a LanceDB vector database to find all emails from a particular sender & match their tone.

There are other techniques for knowledge management around graph RAG & specialized vector databases.

More recently, memory has become much more important. The command line interfaces for AI tools save conversation history as markdown files.

When I publish charts, I want the Theory Ventures caption at the bottom right, a particular font, colors, & styles. Those are now all saved within .gemini or .claude files in a series of cascading directories.

The original simplicity of large language models has been subsumed by enterprise-grade production complexity.

This isn’t identical to the previous generation of machine learning systems, but it follows a clear parallel. What appeared to be a simple “AI magic box” turns out to be an iceberg, with most of the engineering work hidden beneath the surface.

Rhett Sampson

Founder and CTO at GT Systems

3w

100% Tomasz Tunguz we collapse this debt into #SPAN_AI the #semanticfabric for the #agenteconomy. See link below. Love to have a chat. https://www.linkedin.com/feed/update/urn:li:activity:7350522766414012416/

Like
Reply

Tomasz deeply agree. Your post perfectly illustrates why we've built a developer focused, platform-as-a-service for AI agents. My bet is: the surface of white boxes in your picture will continue to grow and multiply in complexity. We provide that surface as opinionated managed services - with a dead simple DX. To add some spice to the mix, we've also made it trivial to create embarrassingly parallel fleets of collaborating agents - so they can achieve goals orders of magnitude faster.

Like
Reply
Paresh Yadav

AI/Agentic AI/AIOPs/MLOPs/AI Agents/GCP- Architect/Engineer

3w

Oh and we haven't shown the backstage processes/work needed like version control, CI/CD, code promotion from Dev to QA to Prod, maintaining multiple versions of the code base for different clients (if applicable), maintaining shared code/codee dependencies if any between this product and other products etc.

Like
Reply
Mary Mendoza

Salesforce Certificated Administrator & Platform App Builder Certified | 4x Trailhead Ranger | 3 Trailhead Super Badges | Boston #SalesforceSaturday Co-Lead

3w

And this doesn't even begin to address how clean or not clean the data is that the LLM or AI is using. Throw in a little bias, and you have a really spicy mix. Now, while that may be good for food, any chef can tell you that not everyone appreciates spicy food, and my instincts are that even fewer will appreciate the spicy results of data that isn't clean and is full of bias.

Chris Parsons

Levelling up tech teams with AI that works | CTO | Agent builder | Cherrypick co-founder

3w

Definitely seeing the same thing in my agent builds, and in my training cohorts. People are figuring out that determinism is actually really valuable to structure the system and LLMs add the magic at set points. Still a lot of work to be done to figure out the best interfaces.

To view or add a comment, sign in

Others also viewed

Explore topics