Enterprise AI Agents: The 3 Core Components of Reliability and the 'Ambient' Future

Enterprise AI Agents: The 3 Core Components of Reliability and the 'Ambient' Future

In the world of artificial intelligence, especially with the rise of generative models, a dichotomy exists: on one side, there are dazzling demos and viral "one-click, do-it-all" agents on X. On the other, there is the sobering silence of those trying to integrate this technology into the stark reality of the corporate world. For many tech leaders, engineers, and product managers, this dilemma is familiar: the potential is immense, but the path from prototype to production is fraught with invisible walls, unpredictable errors, and the justified concerns of stakeholders. How can we transform AI agents from mere toys into reliable, trustworthy business partners?

Recently, I came across a presentation that gets to the very heart of this question. Harrison Chase, a leading figure building the infrastructure of this field with tools like LangChain and LangGraph, delivered a talk titled "3 ingredients for building reliable enterprise agents." om YouTube. He distilled this seemingly complex problem into a surprisingly simple yet powerful mental model. The framework Chase presented is not just a "how-to" guide; it also serves as a compass for understanding the current state of this technology, its challenges, and most importantly, where it is headed in the near future.


The Mathematics of Success: The Expected Value Formula for Enterprise Agents

Chase begins by laying a theoretical foundation. He states that three fundamental factors influence the adoption probability of an agent in an enterprise environment and breaks it down into a simple equation. While this equation might seem like a collection of "open secrets" at first glance, it is the cornerstone upon which the entire strategy is built.

Chase's formula can be summarized as follows:

Expected Value = (Probability of Success x Value Delivered on Success) - (Probability of Failure x Cost Incurred on Failure)

To deploy an agent to production, the result of this equation must be significantly greater than the cost of running the agent (infrastructure, maintenance, LLM calls, etc.). This formula provides a roadmap for the rest of Chase's presentation. If we want an agent project to succeed, there are three fundamental levers we can pull:

  1. Increase its Value.

  2. Increase its Probability of Success.

  3. Decrease its Cost if Wrong.

Now, let's examine these three components one by one, with the examples Chase provided and the analysis I'll add.

1. Component: Increasing Value - "It's Not Just What It Does, But How Deeply It Does It"

The most obvious way to increase the value provided by agents is to focus on domains that are inherently high-value. Chase gives examples like legal tech (such as Harvey) and financial research/summarization. In these fields, the cost of human labor is already very high, so the value brought by automation is automatically amplified. This is the first and simplest step of the strategy.

However, the point Chase draws attention to, and which I find more revolutionary, is changing the nature of the task rather than the vertical the agent operates in. This signals a quiet revolution in our interaction paradigm with AI.

“We're starting to see a trend towards things like deep research, which go and run for an extended period of time. We're seeing the same with code... Seven different examples of these ambient agents that run in the background for like hours at time. And I think this speaks to ways that people are trying to get their agents to provide more value. They're getting them to do more work.”

This quote plays a key role in understanding the topic. What Chase is pointing out here is the need to move away from the "chatbot" paradigm. RAG (Retrieval-Augmented Generation) systems that provide a quick answer in five seconds or instant code completion tools are certainly valuable, but they function as a "co-pilot." The real increase in value begins when the agent transforms from a co-pilot into an autonomous expert.

This transformation means giving the agent more complex and longer-running tasks: preparing a "deep research" report that takes hours instead of a few minutes; developing an entire feature instead of fixing a single line of code. The more "work" the agent does, the more value it creates. The true power of this approach comes from the profound strategic shift behind its simplicity: we move from seeing AI as an instant Q&A mechanism to adopting a managerial stance, delegating time- and effort-intensive projects to it.

2. Component: Increasing the Probability of Success - The Quest for Determinism and Trust

It's easy for an agent to work wonders in the prototype stage. The hard part is sustaining this performance consistently and reliably in a production environment. According to Chase, the way to increase the probability of success is not, contrary to popular belief, by writing better prompts, but by making the agent's nature more deterministic.

“Especially in the enterprise, we see that there are workflow like things where you need more controllability, more predictability than you get by just prompting... If you put that in a deterministic kind of like workflow or code, then it will always do that.”

This highlights a fundamental tension faced by everyone developing AI agents: the conflict between the probabilistic (stochastic) and sometimes unpredictable nature of LLMs and the deterministic, reliable structure required by enterprise software. This is where the core philosophy of Chase and LangGraph lies: we don't have to choose between the two. Instead of the "Workflows vs. Agents" debate, we should embrace the "Workflows and Agents" approach.

What does this mean? Instead of entrusting an agent's entire journey to a single, massive prompt, we break it down into smaller, controllable steps. Some steps might require the LLM's creativity (e.g., generating a text draft), but other steps must be rigidly coded (e.g., "ALWAYS run step B after step A is complete"). Frameworks like LangGraph are designed precisely to create this hybrid structure, allowing you to define where the agent can operate freely and where it must follow the rails of a workflow. This dramatically increases the probability of success by enhancing predictability.

The Psychology of Trust: Observability is More Than a Debugging Tool

Chase touches on a second dimension of increasing success probability: reducing perceived risk. This is less a technical problem and more a communication and trust issue.

“…a really important thing that we see to do inside the enterprise... is to work to kind of like reduce the way that people see the error bars of how this agent performs... This is where observability and evals actually plays a slightly different role than than we would maybe think or we would maybe intend.”

Observability tools like LangSmith were born as debugging tools for developers. However, as Chase emphasizes, one of their greatest benefits is transparently showing non-technical stakeholders (managers, review boards, etc.) what the agent is doing. In a manager's eyes, when an agent is a "black box" with unclear operations, the perception of risk skyrockets. But when you present them with an interface that shows every step, every LLM call, and every tool usage, that black box becomes illuminated.

The story Chase tells of a user who got their project approved by a review panel in record time by presenting LangSmith is a perfect example of this. Here, observability transforms from an engineering tool into a trust-building and persuasion tool. This is a vital lesson in the corporate world, where selling the technology is as important as building it.

3. Component: Decreasing the Cost of Failure - UI/UX as the Strongest Safety Net

The final piece of the equation is perhaps the most overlooked but psychologically the most potent: What happens if something goes wrong? How can we minimize the cost of an error?

Chase argues that the answer often lies not in complex security protocols but in cleverly designed user experiences (UI/UX). Two key strategies stand out: Reversibility and Human in the Loop.

“I think it completely changes the cost calculations in people's minds about what the cost of the agent doing something bad is, because now it's reversible and you have a human who's going to prevent it from even going in the first place if it's bad.”

Have you ever wondered why code-generating agents are so popular and successful? Chase's analysis comes into play here. Code, by its nature, is a reversible medium. Thanks to version control systems like Git, every change made by the agent can be saved as a commit and reverted at any time. Clever designs, like Replit saving every file change as a new commit, reduce the cost of failure to nearly zero.

The second, more powerful strategy is the "Human in the Loop." Having the agent open a "Pull Request" (PR) instead of directly pushing its code to the main branch (main/master) completely changes the game. The agent is no longer an autonomous entity making irreversible changes. It is a powerful assistant that speeds up work but always leaves the final approval to a human.

This idea, I believe, can yield surprisingly good results not only in software but in many fields, from project management to legal document preparation. As Chase also noted in the Q&A section of his talk, the "first draft" concept is a perfect mental model for this approach. An agent working for hours to create the first draft of a report, a contract, or a marketing copy creates immense value. But since the human has the final say—approving or revising the text—the cost of error drops dramatically. This strikes a perfect balance between the agent's autonomy and human control.


Synthesis and Future Vision: From Chat Agents to 'Ambient' Agents

So, what happens when we design an agent that combines these three components (high value, high probability of success, low cost of failure)? According to Chase, the next step is to scale this positive expected value. This is where the concept of "Ambient Agents" comes onto the stage.

Chase sees the evolution of agents in three stages:

  1. Chat/Sync Agents: The most common model today. They are triggered by a human, are expected to respond quickly, and usually perform one-off tasks.

  2. Sync-to-Async Agents: Triggered by humans but run for a long time in the background (e.g., "Deep Research" or code-generating agents).

  3. Ambient/Async Agents: This is the vision of the future. These agents are not triggered by a human but by an event. For example, they run automatically when a new email arrives, a calendar event is updated, or a new record is created in a database.

Why is this transition so important? Because it completely changes the scale. A human can manage a few chat windows at most at the same time, whereas hundreds of "ambient" agents can be constantly working in the background, reacting to events.

However, Chase offers a critical warning here, and this point needs to be heavily emphasized:

“Ambient does not mean fully autonomous. And this is really, really important. When people hear autonomous, they think the cost of this thing doing something bad is really high... And so ambient does not mean fully autonomous.”

This distinction is vital for the adoption of the vision. "Ambient" means the agent's initial trigger is autonomous, but it does not mean all of its actions will be unsupervised. At this point, Chase proposes new UI/UX paradigms like an "Agent Inbox." When the agent running in the background reaches an action that requires human approval (e.g., sending an email, making a purchase), it sends this request to the user's inbox. The user can then approve, reject, or edit these requests.

This vision incorporates the human-in-the-loop in the most efficient way. The agent autonomously performs thousands of simple, repetitive tasks while seeking human approval only at critical decision points. This is the ultimate goal, combining the scale of automation with strategic human control.


Conclusion: The Big Picture and a Question for the Future

Harrison Chase's presentation offers a roadmap for the future of AI agents that is as pragmatic as it is exciting. The most fundamental lesson to be drawn from his analysis is that building successful enterprise agents is not possible through magical prompt engineering skills, but through systematic thinking, intelligently designed interactions, keeping human in the loop.

Chase's formula—Value, Probability of Success, and Cost of Failure—provides a powerful framework that reminds us to optimize all parts of this complex equation simultaneously. We can increase value by having agents perform deeper, more meaningful work. We can raise the probability of success by combining the flexibility of LLMs with deterministic workflows. And we can reduce the cost of failure to almost zero with clever UI/UX patterns like reversibility and human approval.

The combination of these principles moves us beyond chatbots to the future of "ambient" agents—event-triggered, tirelessly working in the background, but consulting us at critical moments. This future does not promise an out-of-control autonomy like in sci-fi movies, but a trust-based collaboration that scales human capabilities like never before.

Resource

3 ingredients for building reliable enterprise agents - Harrison Chase, LangChain/LangGraph

To view or add a comment, sign in

Others also viewed

Explore topics