Building AI That Actually Thinks About Product Work
"Let's take it from the top"

Building AI That Actually Thinks About Product Work

Still June 2, 2025

I’ve explained why we burned down a working prototype to build something more ambitious. Now let’s talk about what it means to build an AI that understands product management work — and what “understanding” really means when you’re eight hours into a development session and still debugging environment variables.

The LLM integration reality

June 2 started with grand ambitions and ended with a working system, but the path between was… instructive.

First challenge: getting the AI to work at all. The old POC had mock responses everywhere. We needed real LLM integration that could handle different types of PM thinking with different AI models.

The implementation reality involved discovering that environment variables don’t load automatically and getting familiar with error messages like “No ANTHROPIC_API_KEY found” despite having a perfectly good .env file.

Pro tip: add load_dotenv() before imports that use environmental variables.

Task-based model selection (when it works)

The breakthrough concept was treating different cognitive tasks differently. Intent classification needs speed and consistency — temperature 0.3, fast model. Strategic reasoning needs creativity and depth — higher temperature, more powerful model.

We built this as explicit configuration:

When it worked, the intent classifier was hitting 0.95 confidence scores on test cases. That’s the kind of accuracy that makes you think “okay, maybe this AI thing has potential.”

The orchestration insight

Here’s where it gets interesting. Product management isn’t really about individual tasks — it’s about workflows where each step informs the next.

We built this as an explicit design principle: context flows forward through multi-step processes. When analyzing a feature request, the insights from understanding user needs should inform requirement extraction, which should influence technical constraint analysis.

By the end of June 2, we had a working orchestration engine that could execute workflows like:

  1. Analyze request → 2. Extract requirements → 3. Create work item

Each step used AI analysis and passed rich context to the next step. Not just text — structured data about stakeholders, assumptions, risks, and success criteria.

What “understanding PM work” really means

Let me be clear about what we built versus what we’re working toward.

What actually worked as of June 2:

  • Natural language intent classification for PM requests

  • Multi-step workflow orchestration with context preservation

  • AI-powered analysis at each workflow step

  • Database persistence of workflow results

What we’re still figuring out:

  • Whether the AI’s analysis is actually insightful or just well-formatted

  • How to capture feedback to improve recommendations over time

  • Pattern recognition across different projects and teams

The system can execute a workflow that looks intelligent. Whether it’s actually intelligent… that’s harder to measure.

Domain modeling: the unglamorous foundation

The real work wasn’t the AI — it was modeling PM concepts properly. We spent hours defining what a Feature actually is versus a WorkItem versus a Product. Boring? Yes. Essential? Absolutely.

When you define these relationships clearly, the AI can reason about them:

  • Features belong to Products

  • WorkItems implement Features

  • Stakeholders care about Products

  • Decisions affect multiple Features

This isn’t just data modeling — it’s teaching the system the vocabulary of product management.

The circular dependency dance

Around hour 6 of the June 2 session, we hit our first major architectural challenge: circular dependencies. The database layer needed workflow types, the orchestration needed database repositories, and Python was not having it.

This is actually a classic sign that a system is growing from prototype to platform. The solution revealed an important design principle: shared vocabulary, independent implementation.

We extracted shared enumerations (IntentCategory, WorkflowType) into a common module. Every service speaks the same language about PM concepts, but implements its own concerns independently.

The solution to the circular imports problem? Creating a shared_types.py file in the services directory.

When AI meets reality

The most humbling moment came when testing the end-to-end workflow. Everything looked perfect in theory. The AI classified intents correctly. The orchestration engine routed them properly. The database persisted results.

But the actual AI analysis? Generic and obvious. “This feature request requires stakeholder alignment and technical investigation.” Well, yes. That’s true of most feature requests.

The system was working mechanically but not intellectually. We’d built the plumbing for intelligence without the intelligence itself.

Integration philosophy: PM concepts first

One insight from the June 2 session: every external system must be a plugin. We caught ourselves designing GitHub-centric workflows (a habit inherited from the prototype) and had to course-correct.

The system thinks in PM concepts first: Features, Stakeholders, Decisions. Whether those map to GitHub issues, Jira tickets, or Notion pages is an implementation detail.

This matters because tools change constantly. The AI’s understanding of product management concepts should be stable even when your team switches from GitHub to Linear.

The testing reality check

By the end of June 2, we had working code, but working code that hadn’t been stress-tested. The intent classifier achieved high confidence on our test cases, but test cases written by the same person who built the system aren’t exactly unbiased.

The orchestration engine executed our demo workflow successfully, but we’d only tested the happy path. What happens when the AI returns malformed JSON? When the database connection fails mid-workflow? When someone asks for something the system has never seen before?

It “worked” but it was super brittle.

What we learned about AI development speed

Eight hours on June 2 produced more working code than a weeks of iterating on the POC. Not because we got faster at coding, but because we stopped fighting the architecture.

When the foundation matches your goals, everything builds naturally. When it doesn’t, every feature is a hack.

The new system let us add capabilities instead of patching problems.

Current status: promising foundation

As of June 2, we had:

  • Real AI integration (no more mocks)

  • Working multi-step workflows

  • Database persistence

  • Plugin architecture foundation

  • Domain models that make sense

What we didn’t have:

  • Genuinely insightful AI analysis

  • Learning from user feedback

  • Cross-project pattern recognition

  • The strategic thinking capabilities that justified burning down the POC

The foundation is solid. The building is just getting started.


Next in Building Piper Morgan: How we gave the AI memory, and why that turned out to be more complicated than expected

Sometimes you have to establish the pathways before you send the signals through the wires. Have you ever had to frame out a complex system and then get it working for simple cases?

Bernhard von Allmen

Founder | Enabling Start-ups with User-Centered Design

1mo

Outstanding, finally a storyline which goes beyond the usual „look at these 5 magic promps which will lead to world peace“ - looking forward to the next episode!

To view or add a comment, sign in

Others also viewed

Explore topics