Building AI That Actually Thinks About Product Work

Christian Crumlish

Kind Director of Product, 18F alum, Product Management for UX People author, Piper Morgan (AI product assistant) maker, Design in Product curator, Layers of Meta bandleader

Published Jul 17, 2025

Still June 2, 2025

I’ve explained why we burned down a working prototype to build something more ambitious. Now let’s talk about what it means to build an AI that understands product management work — and what “understanding” really means when you’re eight hours into a development session and still debugging environment variables.

The LLM integration reality

June 2 started with grand ambitions and ended with a working system, but the path between was… instructive.

First challenge: getting the AI to work at all. The old POC had mock responses everywhere. We needed real LLM integration that could handle different types of PM thinking with different AI models.

The implementation reality involved discovering that environment variables don’t load automatically and getting familiar with error messages like “No ANTHROPIC_API_KEY found” despite having a perfectly good .env file.

Pro tip: add load_dotenv() before imports that use environmental variables.

Task-based model selection (when it works)

The breakthrough concept was treating different cognitive tasks differently. Intent classification needs speed and consistency — temperature 0.3, fast model. Strategic reasoning needs creativity and depth — higher temperature, more powerful model.

We built this as explicit configuration:

When it worked, the intent classifier was hitting 0.95 confidence scores on test cases. That’s the kind of accuracy that makes you think “okay, maybe this AI thing has potential.”

The orchestration insight

Here’s where it gets interesting. Product management isn’t really about individual tasks — it’s about workflows where each step informs the next.

We built this as an explicit design principle: context flows forward through multi-step processes. When analyzing a feature request, the insights from understanding user needs should inform requirement extraction, which should influence technical constraint analysis.

By the end of June 2, we had a working orchestration engine that could execute workflows like:

Analyze request → 2. Extract requirements → 3. Create work item

Each step used AI analysis and passed rich context to the next step. Not just text — structured data about stakeholders, assumptions, risks, and success criteria.

What “understanding PM work” really means

Let me be clear about what we built versus what we’re working toward.

What actually worked as of June 2:

Natural language intent classification for PM requests
Multi-step workflow orchestration with context preservation
AI-powered analysis at each workflow step
Database persistence of workflow results

What we’re still figuring out:

Whether the AI’s analysis is actually insightful or just well-formatted
How to capture feedback to improve recommendations over time
Pattern recognition across different projects and teams

The system can execute a workflow that looks intelligent. Whether it’s actually intelligent… that’s harder to measure.

Domain modeling: the unglamorous foundation

The real work wasn’t the AI — it was modeling PM concepts properly. We spent hours defining what a Feature actually is versus a WorkItem versus a Product. Boring? Yes. Essential? Absolutely.

When you define these relationships clearly, the AI can reason about them:

Features belong to Products
WorkItems implement Features
Stakeholders care about Products
Decisions affect multiple Features

This isn’t just data modeling — it’s teaching the system the vocabulary of product management.

The circular dependency dance

Around hour 6 of the June 2 session, we hit our first major architectural challenge: circular dependencies. The database layer needed workflow types, the orchestration needed database repositories, and Python was not having it.

This is actually a classic sign that a system is growing from prototype to platform. The solution revealed an important design principle: shared vocabulary, independent implementation.

We extracted shared enumerations (IntentCategory, WorkflowType) into a common module. Every service speaks the same language about PM concepts, but implements its own concerns independently.

The solution to the circular imports problem? Creating a shared_types.py file in the services directory.

When AI meets reality

The most humbling moment came when testing the end-to-end workflow. Everything looked perfect in theory. The AI classified intents correctly. The orchestration engine routed them properly. The database persisted results.

But the actual AI analysis? Generic and obvious. “This feature request requires stakeholder alignment and technical investigation.” Well, yes. That’s true of most feature requests.

The system was working mechanically but not intellectually. We’d built the plumbing for intelligence without the intelligence itself.

Integration philosophy: PM concepts first

One insight from the June 2 session: every external system must be a plugin. We caught ourselves designing GitHub-centric workflows (a habit inherited from the prototype) and had to course-correct.

The system thinks in PM concepts first: Features, Stakeholders, Decisions. Whether those map to GitHub issues, Jira tickets, or Notion pages is an implementation detail.

This matters because tools change constantly. The AI’s understanding of product management concepts should be stable even when your team switches from GitHub to Linear.

The testing reality check

By the end of June 2, we had working code, but working code that hadn’t been stress-tested. The intent classifier achieved high confidence on our test cases, but test cases written by the same person who built the system aren’t exactly unbiased.

The orchestration engine executed our demo workflow successfully, but we’d only tested the happy path. What happens when the AI returns malformed JSON? When the database connection fails mid-workflow? When someone asks for something the system has never seen before?

It “worked” but it was super brittle.

What we learned about AI development speed

Eight hours on June 2 produced more working code than a weeks of iterating on the POC. Not because we got faster at coding, but because we stopped fighting the architecture.

When the foundation matches your goals, everything builds naturally. When it doesn’t, every feature is a hack.

The new system let us add capabilities instead of patching problems.

Current status: promising foundation

As of June 2, we had:

Real AI integration (no more mocks)
Working multi-step workflows
Database persistence
Plugin architecture foundation
Domain models that make sense

What we didn’t have:

Genuinely insightful AI analysis
Learning from user feedback
Cross-project pattern recognition
The strategic thinking capabilities that justified burning down the POC

The foundation is solid. The building is just getting started.

Next in Building Piper Morgan: How we gave the AI memory, and why that turned out to be more complicated than expected

Sometimes you have to establish the pathways before you send the signals through the wires. Have you ever had to frame out a complex system and then get it working for simple cases?

Building AI That Actually Thinks About Product Work

Christian Crumlish

Kind Director of Product, 18F alum, Product Management for UX People author, Piper Morgan (AI product assistant) maker, Design in Product curator, Layers of Meta bandleader

The LLM integration reality

Task-based model selection (when it works)

The orchestration insight

What “understanding PM work” really means

Domain modeling: the unglamorous foundation

The circular dependency dance

When AI meets reality

Integration philosophy: PM concepts first

The testing reality check

What we learned about AI development speed

Current status: promising foundation

Building Piper Morgan

621 followers

More articles by this author

Others also viewed

Iterative AI, optimism & the future

AI and the new era of software innovation

Crafting Prompt Engineering for broader implications.

Smarter Models, Bigger Impact: Machine Learning Trends

The Agentic Age Has Arrived: Reimagining IT Services Business Models with Generative and Agentic AI

Context Is the New Code: How Contextual Engineering Is Powering the Next Wave of AI

How to Build an AI Copilot developers for Your Application

Mastering Prompt Engineering: The Most In-Demand Skill of 2025

Be a Prompt CRAFTER to make GenAI Work for you

Context is King: Mastering the Art of Prompt Design

Explore topics

The LLM integration reality

Task-based model selection (when it works)

The orchestration insight

What “understanding PM work” really means

Domain modeling: the unglamorous foundation

The circular dependency dance

When AI meets reality

Integration philosophy: PM concepts first

The testing reality check

What we learned about AI development speed

Current status: promising foundation

Building Piper Morgan

621 followers

The Coordination Tax: When Copy-Paste Becomes Your Biggest Bottleneck

Aug 19, 2025

The Debugging Cascade: A 90-Minute Journey Through Integration Hell

Aug 18, 2025

Teaching an AI to Sound Like Me (Without Losing My Mind)

Aug 17, 2025

Session Logs: A Surprisingly Useful Practice for AI Development

Aug 16, 2025

The Zeno's Paradox of Debugging: A Weekend with Piper Morgan

Aug 15, 2025

Weekly Ship #004: Building the Foundation for Federated Spatial Intelligence

Aug 15, 2025

The Day We Taught Piper to Summarize (Almost)

Aug 14, 2025

The Day We Stopped Fighting the System

Aug 13, 2025

The Real Bugs Live in the UI (A Testing Reality Check)

Aug 12, 2025

Piper Morgan for UX People

Aug 11, 2025

Others also viewed

Iterative AI, optimism & the future

AI and the new era of software innovation

Crafting Prompt Engineering for broader implications.

Smarter Models, Bigger Impact: Machine Learning Trends

The Agentic Age Has Arrived: Reimagining IT Services Business Models with Generative and Agentic AI

Context Is the New Code: How Contextual Engineering Is Powering the Next Wave of AI

How to Build an AI Copilot developers for Your Application

Mastering Prompt Engineering: The Most In-Demand Skill of 2025

Be a Prompt CRAFTER to make GenAI Work for you

Context is King: Mastering the Art of Prompt Design

Explore topics