The Gameboard for AI in Healthcare

The Gameboard for AI in Healthcare

Healthcare was built for calculators. GPT-5 sounds like a colleague. Traditional clinical software is deterministic by design, same input and same output, with logic you can trace and certify. That is how regulators classify and oversee clinical systems, and how payers adjudicate claims. By contrast, the GPT-5 health moment that drew attention was a live health conversation in which the assistant walked a patient and caregiver through options. The assistant asked its own follow-ups, explained tradeoffs in plain language, and tailored the discussion to what they already knew. Ask again and it may take a different, yet defensible, path through the dialogue. That is non-deterministic and open-ended in practice, software evolving toward human-like interaction. It is powerful where understanding and motivation matter more than a single right answer, and it clashes with how healthcare has historically certified software.

This tension explains how the industry is managing the AI transition. “AI doesn’t do it end to end. It does it middle to middle. The new bottlenecks are prompting and verifying.” Balaji Srinivasan’s line captures the current state. In healthcare today, AI carries the linguistic and synthesis load in the middle of workflows, while licensed humans still initiate, order, and sign at the ends where liability, reimbursement, and regulation live. Ayo Omojola makes the same point for enterprise agents. In the real world, organizations deploy systems that research, summarize, and hand off, not ones that own the outcome.

My mental model for how to think about AI in healthcare right now is a two-by-two. One axis runs from deterministic to non-deterministic. Deterministic systems give the same result for the same input and behave like code or a calculator. Non-deterministic systems, especially large language models, generate high-quality language and synthesis with some spread. The other axis runs from middle to end-to-end. Middle means assistive. A human remains in the loop. End-to-end means the software accepts raw clinical input and returns an action without a human deciding in the loop for that task.

Deterministic, middle. Think human-in-the-loop precision. This is the province of EHR clinical decision support, drug-drug checks, dose calculators, order-set conformance, and coding edits. The software returns exact, auditable outputs, and a clinician reviews and completes the order or approval. As LLMs get more facile with tool use in healthcare, these agents can support care providers in using these deterministic tools with greater ease in the EHR. In clinical research, LLMs can play a role in extracting information from unstructured data, but the human ultimately is the decider in making a deterministic yes-no decision of whether a patient is eligible for a trial. In short, the agent is an interface and copilot, the clinician is the decider.

Deterministic, end to end. Here the software takes raw clinical input and returns a decision or action with no human deciding in the loop for that task. Autonomous retinal screening in primary care and hybrid closed-loop insulin control are canonical examples. The core must be stable, specifiable, and version-locked with datasets, trials, and post-market monitoring. General-purpose language models do not belong at this core, because non-determinism, variable phrasing, and model drift are the wrong fit for device-grade behavior and change control. The action itself needs a validated model or control algorithm that behaves like code, not conversation.

Non-deterministic, middle. This is the hot zone right now. Many bottlenecks in care are linguistic, not mathematical. Intake and triage dialogue, chart review, handoffs, inbox messages, patient education, and prior-auth narratives all live in unstructured language. Language models compress that language. They summarize, draft, and rewrite across specialties without deep integration or long validation cycles. Risk stays bounded because a human signs off. Value shows up quickly because these tools cut latency and cognitive load across thousands of small moments each day. The same economics hold in other verticals. Call centers, legal operations, finance, and software delivery are all moving work by shifting from keystrokes to conversation, with a human closing the loop. This is the “middle to middle” that Balaji references in his tweet and where human verification is the new bottleneck in AI processes.

Non-deterministic, end to end. This is the AI doctor. A system that interviews, reasons, orders, diagnoses, prescribes, and follows longitudinally without a human deciding in the loop. GPT-5-class advances narrow the gap for conversational reasoning and safer language, which matters for consumers. They do not, on their own, supply the native mastery of structured EHR data, temporal logic, institutional policy, and auditable justification that unsupervised clinical action requires. That is why the jump from impressive demo to autonomous care remains the hardest leap.

What it takes to reach Quadrant 4

Quadrant 4 is the payoff. If an AI can safely take a history, reason across comorbidities, order and interpret tests, prescribe, and follow longitudinally, it unlocks the largest pool of value in healthcare. Access expands because expertise becomes available at all hours. Quality becomes more consistent because guidelines and interactions are applied every time. Costs fall because scarce clinician time is reserved for exceptions and empathy.

It is also why that corner is stubborn. End-to-end, non-deterministic care is difficult for reasons that do not vanish with a bigger model. Clinical data are partial and path dependent. Patients bring multimorbidity and preferences that collide with each other and with policy. Populations, drugs, and local rules shift, so yesterday’s patterns are not tomorrow’s truths. Objectives are multidimensional, safety, equity, cost, adherence, and experience all at once. Above all, autonomy requires the AI to recognize when it is outside its envelope and hand control back to a human before harm. That is different from answering a question well. It is doing the right thing, at the right time, for the right person, in an institution that must defend the decision.

Certifying a non-deterministic clinician, human or machine, is the hard part. We do not license doctors on a single accuracy score. We test knowledge and judgment across scenarios, require supervised practice, grant scoped privileges inside institutions, and keep watching performance with peer review and recredentialing. The right question is whether AI should be evaluated the same way. Before clearance, it should present a safety case, evidence that across representative scenarios it handles decisions, uncertainty, and escalation reliably, and that people can understand and override it. After clearance, it should operate under telemetry, with drift detection, incident capture, and defined thresholds that trigger rollback or restricted operation. Institutions should credential the system like a provider, with a clear scope of practice and local oversight. Above all, decisions must be auditable. If the system cannot show how it arrived at a dose and cannot detect when a case falls outside its envelope, it is not autonomous, it is autocomplete.

I believe regulators are signaling this approach. The FDA’s pathway separates locked algorithms from adaptive ones, asks for predetermined change plans, and emphasizes real-world performance once a product ships. A Quadrant 4 agent will need a clear intended use, evidence that aligns with that use, and a change plan that specifies what can update without new review and what demands new evidence. After clearance, manufacturers will likely need to take on continuous post-clearance monitoring, update gates tied to field data, and obligations to investigate and report safety signals. Think of it as moving from a one-time exam to an ongoing check-ride.

On the technology front, Quadrant 4 demands a layered architecture. Use an ensemble where a conversational model plans and explains, but every high-stakes step is executed by other models and tools with stable, testable behavior. Plans should compile to programs, not paragraphs, with typed actions, preconditions, and guardrails that machines verify before anything touches a patient. If data are missing, the plan pauses. If constraints are violated, the plan stops. Language is the interface, code is the adjudicator.

This only works on a stronger scaffold of knowledge. Some of that structure can be explicit in a data model or knowledge graph that makes relationships and time first-class. Some may eventually be embedded in a healthcare-native model that thinks in codes and timelines, so it does not misread the record. Neither is a silver bullet. Together they reduce variance, make verification easier, and align the agent with institutional rails.

From copilots to autonomy is a climb, not a leap. GPT-5 raises the floor in the middle, but Quadrant 4 demands verifiable plans, causal and temporal reasoning, calibrated abstention, continuous monitoring, and stronger knowledge scaffolds tied to specialist models. Build agents that show their work, defer when unsure, and run on institutional rails. If you are working on Quadrant 4, I would love to compare notes!

Vasu Avadhanula

CPO & Technology Leader | Driving AI-Powered Solutions | Digital Health Innovator

1mo

Love the framing, Myoung Cha. In addition to Quadrant 4 activity, the real breakthrough will come from bridging quadrants by integrating middle and end-to-end layers, and by combining deterministic precision, non-deterministic flexibility, and agentic execution across broader patient care pathways. In breast imaging, this layered model is already in play. Vertical AI (Deterministic) provides FDA-cleared lesion detection and risk scoring. LLMs handle unstructured inputs, such as free-text clinical indications, prior imaging reports, and physician notes, to generate summaries and patient letters, all of which are reviewed by the radiologist. Once the radiologist signs off, AI agents take over by auto-scheduling follow-ups, ensuring referrals, and tracking downstream care. Imaging centers adopting this approach are already seeing faster workflows, fewer missed follow-ups, and better patient communication. There are several early Quad 4 activity examples. Suspicious scans can now be flagged, prioritized high on the worklist, and paired with draft reports before a radiologist even opens the case. At the same time, AI can handle > 70% of normal screenings, generating reports, and sending patient result letters automatically.

Nate Gagne

Healthcare Leader | Venture Capital Investor | Business Strategy & Commercial Growth Expert | Value-based Care & Behavioral Health Advocate

1mo

This is a great way to frame the AI journey in healthcare. I’ve seen firsthand how starting with AI as a collaborative tool, not a replacement, builds trust and delivers real impact. Autonomy is the goal, but the path requires thoughtful integration and respect for clinical expertise.

Like
Reply
Bradley Swenson

Healthcare SaaS Revenue Leader |CRO | GTM & Enterprise Growth Strategist | Ex-SVP at Solv & Cue Health | Speaker

1mo

This is great, Myoung. Automating routine yet complex tasks with AI not only boosts access and consistency but also lets clinicians focus where human touch truly matters, and where we need it most!

Raj Iyer

Director, EYP | Corporate Development | M&A | Product Strategy & Operations | AI Enthusiast | GTM & Strategic Partnerships | Specialized in Healthcare | Experience with B2B2C, Tools | Former Dentist | Fitness Aficionado

1mo

Excellent synthesis Myoung Cha. The practical playbook I’m seeing: start in the middle, measure latency/throughput wins, require audit trails and abstain-when-unsure, then add change-plans tied to field data. The big open question is governance—who signs off on a model’s scope of practice: regulator, payer, or the local medical staff?

Tim Suther

Senior Executive | Advisor | Board Member | Driving Value Through Data, Analytics, and Technology | Transformation and Change Leader | Product and Business Line Innovator | Strategist | Collaborator | Partner

1mo

Excellent & digestible framework Myoung Cha. All compelling but this particularly resonates: "AI carries the linguistic and synthesis load in the middle of workflows, while licensed humans still initiate, order, and sign at the ends where liability, reimbursement, and regulation live"

To view or add a comment, sign in

Others also viewed

Explore content categories