When Your Tests Tell You What Your Code Should Do
June 27
We were down to just two failing tests out of 64, and my first instinct was what it always is when tests fail after escaping a crisis: these tests must be wrong.
I mean, it made sense. We’d just come through the 48-hour rollercoaster — that epic journey to complete disaster and back with from “LIFE SAVER !!!” redemption. The system was working. File uploads were processing. Analysis was running end-to-end. Surely a couple of stubborn test failures were just… artifacts from the chaos.
Turns out, the tests weren’t wrong. They were trying to teach me about my own architecture.
Coming back from chaos
After the 48-hour rollercoaster, we had msfr this beautiful recovery: 64 tests, clean architecture, everything working. But when I ran the full suite, 2 tests were still failing in the DocumentAnalyzer.
Classic developer reaction: “Well, those tests are probably just outdated expectations from before we fixed everything.”
The failing tests were complaining that DocumentAnalyzer was throwing FileAnalysisError exceptions instead of returning AnalysisResult objects with error metadata. My first thought? “These tests just haven’t been updated to match our new exception-handling approach.”
So I started to “fix” the tests. (Always a bad instinct.)
The moment everything clicked
But then I did something I don’t always remember to do: I looked at the other analyzers to see what pattern they were actually following (OK, I asked Cursor to look and tell me).
CSVAnalyzer: Returns AnalysisResult with error info in metadata.
TextAnalyzer: Returns AnalysisResult with error info in metadata.
DocumentAnalyzer: Throws FileAnalysisError exceptions.
Hmm.
One of these things was not like the others.
That’s when it hit me: the tests weren’t wrong about what they expected. DocumentAnalyzer was doing it wrong.
Tests as architectural documentation
Here’s what I learned from those two “failing” tests: they weren’t testing what the code currently did. They were documenting what the code should do according to our domain model.
The AnalysisResult domain object was designed with a clear contract: analyzers always return results, with errors going into the metadata field. Never throw exceptions. Always return something useful.
CSVAnalyzer and TextAnalyzer were following this contract perfectly. But somewhere along the way — probably during one of those frantic “let’s just get this working” moments — someone (okay, it was probably me) had allowed DocumentAnalyzer to throw exceptions instead.
The tests were trying to tell me: “Hey, you have a domain contract here. DocumentAnalyzer is violating it.”
When code drifts from intention
This is one of those subtle ways systems drift from their original intentions. It’s not dramatic architectural decay — it’s just inconsistency. One component doing things slightly differently from its siblings.
But that inconsistency compounds. Today it’s “DocumentAnalyzer throws exceptions while everyone else returns error metadata.” Tomorrow it’s “well, some analyzers throw exceptions, so I guess PDFAnalyzer can too.” Before you know it, your error handling is a mess and nobody remembers what the original pattern was supposed to be.
Mind you, this kind of drift happens to everyone. Especially when you’re moving fast, fixing urgent issues, or trying to get something working after a crisis. The question is whether you catch it before it spreads.
The beauty of domain contracts
What struck me about this whole episode was how clear the domain contract actually was, once I paid attention to it.
AnalysisResult was designed to always be returned. Success or failure, you get an AnalysisResult object. If something went wrong, the error information goes in the metadata field, but you still get a proper result object with whatever analysis could be completed.
This isn’t just a nice-to-have pattern. It’s what allows the rest of the system to handle analysis results consistently, regardless of which analyzer produced them or whether everything went perfectly.
When DocumentAnalyzer started throwing exceptions instead, it broke that contract. The calling code had to start handling two different error patterns: sometimes you get an AnalysisResult with error metadata, sometimes you get an exception. That’s cognitive overhead for every developer who touches the code.
Following the test guidance
So instead of changing the tests to match the exception-throwing behavior, I changed DocumentAnalyzer to honor the domain contract.
The fix was straightforward: wrap the analysis logic in try-catch, and when things go wrong, return an AnalysisResult with the error information in metadata instead of throwing the exception.
The moment I made that change: 64/64 tests passing.
Tests as conversation partners
What this experience taught me is that tests can be conversation partners in architectural decisions, not just validators of current behavior.
When tests fail, especially after a period of rapid change, the question isn’t just “what do I need to change to make this pass?” It’s “what is this test trying to tell me about my system’s intentions?”
Those two “failing” tests weren’t obstacles to overcome. They were documentation of a better way to structure error handling across the analysis layer. They were architectural guidance disguised as test failures.
The larger pattern
This connects to something larger about how tests function in a mature codebase. Good tests don’t just verify that code works — they document what “working” means according to your system’s design principles.
When you write tests, you’re not just checking current behavior. You’re encoding architectural intentions. You’re creating a conversation between your current understanding and your future self’s implementation decisions.
The pattern recognition — seeing that DocumentAnalyzer was the outlier — came from slowing down enough to actually look at what the tests were expecting versus what the code was doing.
When to trust your tests
So when should you trust your tests over your code? Here are some signals I’ve learned to watch for:
Trust tests when: They’re checking domain contracts, not just implementation details. When multiple similar components follow one pattern and one outlier follows another. When tests are failing after rapid changes or crisis periods.
Question tests when: They’re testing implementation details that could reasonably change. When they’re checking specific error messages instead of error handling patterns. When they were written for a different phase of the project’s evolution.
The key is learning to distinguish between tests that encode architectural wisdom and tests that encode temporary implementation choices.
The 64/64 moment
There’s something satisfying about seeing 64/64 tests pass after making an architectural decision based on test guidance. It’s not just “yay, green checkmarks.” It’s validation that your system has a coherent design philosophy, and that your tests are documenting it accurately.
That moment when DocumentAnalyzer fell into line with the established pattern, and suddenly all the error handling across the analysis layer was consistent — that’s what good architecture feels like. Not perfect code, but coherent code.
Listening to your codebase
The broader lesson here is about listening to your codebase. Tests are one way it talks to you. Patterns across similar components are another. Inconsistencies that make you pause and think “wait, why does this one work differently?” — those are conversations waiting to happen.
Your codebase is constantly trying to tell you about its own design principles. Sometimes through test failures. Sometimes through code that feels awkward to write. Sometimes through inconsistencies that make onboarding new team members harder than it should be.
The trick is slowing down enough to listen.
Next on Building Piper Morgan: “Following Your Own Patterns” on that magical state where your architecture makes new features feel inevitable rather than difficult.
What’s your experience with tests as architectural guidance? Have you had moments when “failing” tests led you to better design decisions? I’d love to hear about times when your codebase taught you something you didn’t expect to learn.
Founder, UX Innovation LLC
2wWriting or at least making sure you have good tests is even more critical with AI generated code. Even if your tests are generated with the assistance of AI. In fact I have been telling folks that the best prompt engineering for application development is evolving into something like Gherkin (associated with Cucumber testing tool) and mock objects and stubs.
Design Org Designer: Design Operations, Design Management, Design Process, and Design Strategy.
2wTHIS! "When you write tests, you’re not just checking current behavior. You’re encoding architectural intentions. You’re creating a conversation between your current understanding and your future self’s implementation decisions."