Integration Reveals All: How Building File Analysis Exposed Hidden Architecture
June 27
When we started PM-011 — adding file upload to our AI PM assistant — it seemed straightforward (and by we I mean me, Claude Opus, Claude Sonnet, and Cursor Agent). Users should be able to upload a CSV or PDF and ask questions about it.
“Just add a file picker and wire it to the LLM,” I thought.
One week later, we’d built a file analysis system with type detection, domain-specific analyzers, and comprehensive error handling. More importantly, we’d discovered something experienced programmes and technical architects no doubt already knew: architectural inconsistencies love to hide between systems, and integration is where they throw their surprise party.
Domain models are sacred (no, really!)
The first critical lesson came when we were implementing file analyzers. We had a domain model for AnalysisResult that looked like this:
When we started testing the DocumentAnalyzer, we discovered it was storing extracted information in metadata['key_points'] rather than the key_findings field. The temptation was obvious: modify the domain model to have a key_points field to make the test pass.
This would have been architectural poison.
Instead, we established a non-negotiable principle handed down from our chief architect when I asked how to build Piper Morgan the right way: Domain models are the contract. Implementation serves the domain, not vice versa. If a test expects different structure than the domain model provides, fix the test or the implementation — never the domain model.
This isn’t revolutionary thinking — it’s just applying Eric Evans’ DDD (domain-driven design) principles to AI systems. But here’s what’s interesting: with AI outputs being inherently unpredictable, domain consistency becomes even more critical. You’re essentially building a normalizing layer between chaos and your application.
This discipline paid dividends throughout the project. When we integrated multiple analyzers, they all conformed to the same contract. When we added error handling, we used the metadata field consistently. When we built the orchestration layer, it could handle all analysis types uniformly.
Key Insight: In AI systems, domain consistency is even more critical because you’re dealing with unpredictable outputs that need to be normalized into reliable structures.
Error handling as a design principle (not an afterthought)
Traditional software fails predictably — null references, type mismatches, network timeouts. AI systems fail creatively. Files might be corrupted in novel ways, LLMs might return unexpected formats, or analysis might succeed partially.
When working on a natural language chatbot at a mental health startup, we had to worry even more about how error conditions might lead to irregular and distressing communications!
What I learned building Piper Morgan is to design error handling as a first-class architectural concern:
This pattern — always return a result object with error information in metadata — enabled sophisticated error recovery. The orchestration layer could provide partial results, explain what went wrong, and suggest next steps.
Test-driven development with AI assistance
Building with AI assistance, we discovered that TDD requires extra discipline. The AI can generate impressive code that passes tests, but it can also generate tests that accommodate broken implementations.
I’ve been getting implementation guidance from Claude and then using Cursor Agent to edit files and run tests without my usual typo-prone, indentation-deprived mistakes that add hours to debugging. Unfortunately, CA (as I’ve come to think of it) is always a bit too eager and the moment a test fails, it offers to edit the test so it will pass.
This is anathema and it even claims to be aware that we are doing TDD! I have learned to ask Claude to write prompts for CA for me that box it in and strictly tell it to report back and not start hacking away to chase bugs or editing tests willy-nilly
We established strict TDD discipline:
Write tests that reflect the domain contract first
Verify tests fail for the right reasons
Implement minimal code to pass tests
Refactor while maintaining test integrity
The critical insight: When tests fail, don’t change the test unless the domain contract is wrong.
During DocumentAnalyzer implementation, we had two failing tests that expected the analyzer to throw exceptions. But our established pattern was to return results with error metadata. The failing tests weren’t indicating missing implementation — they were documenting an obsolete contract that needed updating.
AI assistance makes it easy to “fix” tests by changing expectations rather than implementation. Resist this temptation.
Architecture emerges from integration (surprise!)
OK, this is where it gets interesting. The most revealing architectural discoveries came not from planning but from trying to make components work together.
When implementing file resolution (matching “analyze the uploaded spreadsheet” to specific files), we discovered we had two separate orchestration systems:
WorkflowExecutor: Legacy prototype code from initial GitHub integration
OrchestrationEngine: Canonical task-based architecture per our design docs
This duplication was invisible until integration forced us to choose. Classic. Not unlike the way we frequently stumble on parallel government systems doing almost the exact same thing.
But here’s the twist: What looked like technical debt was actually intentional architectural separation:
AsyncPG database: For operational entities (files, workflows) requiring performance
SQLAlchemy ORM: For domain entities (projects, features) needing rich relationships
Sometimes “technical debt” is actually undocumented architectural decisions. Who knew? (Everyone who’s maintained legacy systems, that’s who.)
Building feedback loops from day one
The most important architectural decision wasn’t about code — it was about learning. Every analysis result includes metadata that enables improvement:
This metadata serves multiple purposes:
Quality Assessment: Track which analyses are most/least reliable
Performance Monitoring: Identify bottlenecks and optimize accordingly
Learning Data: Capture user corrections for future model improvements
Debugging Context: Understand failures with complete context
Critical Insight: AI systems aren’t just software that works — they’re software that learns. Building learning mechanisms after the fact is much harder than designing them in from the beginning.
Really, this is just good instrumentation. The difference is that with AI systems, the instrumentation becomes training data.
Vertical slices reveal truth
Our biggest breakthrough came from testing complete user journeys rather than individual components. We called this “vertical slice development” — implementing the thinnest possible end-to-end feature and then expanding. (The vertical slice concept seems to have arisen in the early agile/XP days, but I still haven’t tracked down where the metaphor of falling through a series of holes in swiss cheese came from, though “swiss cheeseholes” quickly became a shorthand in our chats.)
The vertical slice for file analysis was: upload CSV → detect type → analyze → return results. This simple journey revealed:
File resolution needed confidence scoring for ambiguity (“analyze the spreadsheet” when multiple CSVs exist)
Type detection couldn’t just check file extensions
We had duplicate orchestration systems (oops)
Some “technical debt” was intentional separation (double oops)
None of these requirements were visible when building components in isolation.
Building horizontally (all file types, then all analyzers, then all integrations) would have missed these insights until much later in development.
Domain-first AI architecture (AI is “just another service”)
The overarching lesson: successful AI systems are domain-driven systems that happen to use AI, not AI systems that happen to solve domain problems (just applying what we learned from the microservices hype cycle to the AI hype cycle).
Our file analysis system works because:
Domain models define clear contracts that AI outputs must conform to
Business logic lives in services, not in prompt engineering
Error handling follows established patterns, making the system predictable
Integration points are explicit, making the system composable
The AI provides intelligence within structure
The AI is basically a really clever but somewhat chaotic service in our architecture. No more magical than a database or message queue — just different failure modes.
What this means for your AI projects
Design domain models first — The AI will thank you later (well, not literally)
Treat AI like any external service — Validate outputs, handle errors, monitor performance
Build feedback loops early — Not because it’s “AI best practice” but because it’s good engineering
Use vertical slices — Integration reveals truth faster than isolated components
Maintain architectural discipline even when AI assistance makes shortcuts tempting: resist
The Path Forward
A week before, “analyze this file” was a nice-to-have feature. As of this day, it’s a sophisticated system that can intelligently process CSV data, extract insights from PDFs, understand document structure, handle errors gracefully, and learn from user feedback.
More importantly, we’ve established patterns for building AI capabilities that integrate cleanly with traditional software architecture. The next features — document ingestion, GitHub analysis, workflow automation — can build on this foundation rather than starting from scratch.
The ultimate insight: Building AI products isn’t about having the smartest models. It’s about creating systems that make AI capabilities reliable, predictable, and composable.
Building AI products isn’t about having the smartest models. It’s about creating systems that make AI capabilities reliable, predictable, and composable. That requires the same architectural discipline we’ve always needed — just with weirder bugs.
It remains to be seen how some of these trusted patterns will hold up as the software models continue to evolve but I find it interesting right now that the practices most needed to develop reliable LLM-powered systems are more or less just today’s best practices for robust, well-architected software, occasionally stretched to accommodate this strange new world we’re in.
Next week on Building Piper Morgan: Battle-Testing GitHub Integration: When Recovery Becomes Learning because every successful day of development must apparently be followed by one of discovering and cleaning up more messes. This weekend we will share two flashback blog entries from times I tried to demo Piper while building it, "Always Keep Something Showable" and "The Demo The Broke."
This journey from simple file upload to intelligent analysis system demonstrates how domain-driven design principles enable sophisticated AI capabilities while maintaining architectural integrity. The key is treating AI as a powerful capability within a well-designed system, not as a replacement for good system design. Let me know if you are finding out ways to use AI as a modular function in an otherwise well behaved system?