The Day We Stopped Fighting the System

July 3

The day started with what seemed like a simple mission: complete UI testing for Piper Morgan. You know how these things go by now, I’m sure? Six hours later, I’m having philosophical conversations with my Lead Developer about architectural integrity and the nature of document summarization.

Let me walk you through this particular journey into the depths of technical debt, POC contamination, and the humbling realization that sometimes your AI knows better than you do.

The ghost of POC past

We kicked off with what looked like a straightforward import error:

ImportError: cannot import name 'WorkflowDefinition' from 'services.domain.models'

“This should be quick,” I thought. Famous last words, right?

What we uncovered was a classic case of POC haunting: code from a month-old proof of concept had survived to contaminate the MVP implementation. The POC used input_data and output_data fields, while the production code had moved to context and result. Tests were passing because they were mocked to death, but the actual persistence layer was failing silently.

My gut check to the Lead Developer: “Are we still properly following domain-driven design?” Often, this question helped nudge us back on track, but this time the answer was yes. It’s just that we’d allowed POC code corrupt our clean architecture. Time for an exorcism.

The repository that wasn’t

After successfully removing the POC contamination, we hit our next puzzle. File queries were returning “file not found” despite the file clearly existing in the database. This led to my favorite exchange of the session:

Lead Developer: [Writes elaborate grep commands to investigate repository patterns]

Me: “Why were you guessing at all? Just ask me for a tree or for files if you need to inspect directly?”

Touché. Sometimes the PM needs to remind the architect-bot that they have a human with direct file access right there. (This is one of those things they don’t teach you in distributed team management courses.)

What we discovered was architecturally fascinating: Piper Morgan was using a two-tier data access pattern:

Tier 1: Domain entities use SQLAlchemy with session management
Tier 2: File operations use raw asyncpg with connection pooling

This wasn’t a bug — it was an intentional performance optimization. Files don’t need ORM overhead. But I hadn’t reviewed the architecture docs recently enough to remember this design “decision.”

The case of the stubborn LLM

Here’s where things got genuinely interesting. We wanted document summarization to be a simple QUERY operation. The LLM had other ideas:

Intent classified as - Category: IntentCategory.SYNTHESIS, Action: generate_summary

We updated prompts. We added examples. We restructured the classification logic. The LLM remained unmoved — summarization was SYNTHESIS, and that was final. I started wonderign if we had someone mistrained something but was assured we weren’t even doing any of our own training yet.

My response: “OK back… what’s next? I can do a few more steps then need to run out to pick up dinner.”

This perfectly captured our incremental approach — life happens, architecture endures. But also: I was starting to feel that familiar itch of fighting the system instead of listening to it.

The architectural reckoning

As we tried to patch our way to a working summarization feature, I had to drop the wisdom bomb on myself:

“I am concerned we may be losing our architectural perspective if we keep patching bugs as we find them.”

I sometimes feel like the stern papa asking my bots if they really brushed their teeth and washed their hands before bed but the question needed to be asked.

We were fighting the system instead of working with it. The LLM classified summarization as SYNTHESIS because it is synthesis — creating new content from existing material. Our attempts to force it into a QUERY pattern were architectural hubris.

You know that feeling when you’re trying to force a USB cable in upside down? That’s what we were doing to our intent classification system.

The plot twist

The real discovery? SYNTHESIS was completely unimplemented. It had no routing, no handlers, just a generic “I’ll help you create that” response. We’d been trying to shoehorn functionality into the wrong category when the right category was sitting there, abandoned and waiting.

It’s like finding out you’ve been trying to unlock your front door with your car key when the right key was in your other pocket the whole time.

Some things I learned (or how systematic thinking saved us from ourselves)

1. Consult architecture docs first

My failure to review the architecture docs cost us investigation time. The two-tier data pattern was documented — I just didn’t look. This is the PM equivalent of RTFM, and I earned that lesson the hard way.

2. Work WITH the system

When the intent classifier consistently makes a choice, maybe it knows something we don’t. Document summarization IS synthesis, not a query. Sometimes the AI is trying to tell you something about the nature of the work itself.

3. TDD works

Our FileQueryService succeeded because we followed TDD strictly: Red → Green → Refactor. No shortcuts. The discipline pays off every time, even when (especially when) you’re tempted to skip it.

4. Know when to stop

The moment we started our third “quick fix,” we should have stopped. Multiple patches indicate architectural work is needed, not more patches.

5. Session logs are critical

I noticed the Lead Developer wasn’t maintaining a session log as I generally ask each bot to do. Creating one, even retrospectively, immediately helped track our journey and decisions. Documentation isn’t bureaucracy — it’s institutional memory.

The code we shipped

Despite the challenges, we accomplished significant work:

✅ Removed POC contamination completely
✅ Implemented FileQueryService with proper TDD
✅ Fixed file query routing (discovered file_id vs resolved_file_id mismatch)
✅ Passed UI Tests 1, 2.1, and 2.2
✅ Documented architectural patterns for future reference

The code we didn’t ship

More importantly, we chose NOT to ship a hacky summarization implementation. Instead, we documented exactly what still needs to be built:

A proper document operations layer
TaskType.GENERATE_SUMMARY with an appropriate handler
SYNTHESIS category implementation
LLM-based summarization service

Sometimes the best code is the code you choose not to write. Know when to fold ’em and all that, right?

The human side

This session reminded me why being the primate in the loop working with AI agents is so valuable. My interventions weren’t disruptions — they were course corrections:

“Did the instructions also provide a template for how to structure the instructions for Cursor agent?”
“Let’s also do an architectural gut check”
“Let’s stop here… We don’t need to rush to meet an arbitrary cutoff time”

Each intervention prevented technical debt and maintained architectural integrity. It’s like having a very patient pair programming partner who never gets tired of answering the question, “are we sure this is the right approach?”

The denouement

We ended the session not because we ran out of time, but because we chose to stop. The next session would begin with architectural design, not bug fixes.

This is what systematic thinking looks like in practice: recognizing when you’re fighting the system instead of working with it, and having the discipline to pause and reassess.

Final thought: If your AI assistant argues with you about categorization, maybe listen. It might be trying to tell you something about the nature of the work itself.

Session Stats:

Duration: 6 hours (with dinner break)
Lines of code: ~200 added, ~150 removed
Architectural insights: Priceless
Times the system saved us from ourselves: At least 4

Next on Building Piper Morgan: We’ll implement SYNTHESIS properly and finally get that document summary in “The Day We Taught Piper to Summarize (Almost).” But first, architecture.

Have you ever found yourself fighting your own system’s wisdom? When has stepping back and reassessing turned a frustrating debugging session into an architectural insight? I’d love to hear your stories of learning to work with, rather than against, the systems you’re building.

The Day We Stopped Fighting the System

Christian Crumlish

Kind Director of Product, 18F alum, Product Management for UX People author, Piper Morgan (AI product assistant) maker, Design in Product curator, Layers of Meta bandleader

The ghost of POC past

The repository that wasn’t

The case of the stubborn LLM

The architectural reckoning

The plot twist

Some things I learned (or how systematic thinking saved us from ourselves)

1. Consult architecture docs first

2. Work WITH the system

3. TDD works

4. Know when to stop

5. Session logs are critical

The code we shipped

The code we didn’t ship

The human side

The denouement

Building Piper Morgan

627 followers

More articles by this author

Explore topics

The ghost of POC past

The repository that wasn’t

The case of the stubborn LLM

The architectural reckoning

The plot twist

Some things I learned (or how systematic thinking saved us from ourselves)

1. Consult architecture docs first

2. Work WITH the system

3. TDD works

4. Know when to stop

5. Session logs are critical

The code we shipped

The code we didn’t ship

The human side

The denouement

Building Piper Morgan

627 followers

The Bug That Made Us Smarter

Aug 22, 2025

Weekly Ship #005: From Infrastructure to Impact

Aug 22, 2025

When the Bugs Lead You Home

Aug 21, 2025

Two-Fisted Coding: Wrangling Robot Programmers When You’re Just a PM

Aug 20, 2025

The Coordination Tax: When Copy-Paste Becomes Your Biggest Bottleneck

Aug 19, 2025

The Debugging Cascade: A 90-Minute Journey Through Integration Hell

Aug 18, 2025

Teaching an AI to Sound Like Me (Without Losing My Mind)

Aug 17, 2025

Session Logs: A Surprisingly Useful Practice for AI Development

Aug 16, 2025

The Zeno's Paradox of Debugging: A Weekend with Piper Morgan

Aug 15, 2025

Weekly Ship #004: Building the Foundation for Federated Spatial Intelligence

Aug 15, 2025

Explore topics