Rethinking Maintenance and Support: How AI Agents Will Run Your Core
Disclaimer
The views presented in this document are entirely my own. They reflect my personal analysis, experience, and aspirations for the future of technology-driven enterprises. This paper is also a way for me to put evolving thoughts to paper on a rapidly emerging topic. As such, some perspectives shared here may prove to be incomplete or even incorrect over time. They are not intended to represent the positions or opinions of any current or former employer or partner.
— Jaco van Staden
Executive Manifesto – Maintenance Without Maintenance
For decades, IT maintenance has been seen as a sunk cost—critical, complex, and increasingly invisible to the business. Support functions are often measured by response time and resolution rate, not by their strategic impact. And yet, the resources consumed by “keeping the lights on” continue to exceed those invested in innovation.
But I believe we’re now entering a decisive shift.
AI agents are no longer theoretical or confined to dashboards—they’re beginning to operate inside core systems. In a growing number of enterprises, these agents are moving beyond passive observability towards partial ownership of resolution, triage, and remediation. They interpret telemetry, reason over logs, trigger actions, and learn from system responses—all in live environments. This doesn’t eliminate the role of our support and maintenance employees and talent—it repositions them as designers, supervisors, and orchestrators of intelligent systems.
Gartner’s 2024 Hype Cycle for AIOps places AI-driven remediation and platform-native observability agents at the early slope of adoption. Meanwhile, companies like Dynatrace, Microsoft (Automanage), and ServiceNow have begun embedding proactive agents directly into operations. These are early signals—but they are real.
This shift forces us to rethink not just how we maintain systems, but what maintenance should actually mean in an AI-first enterprise. We no longer need to treat support as a disconnected function that reacts after failure. Instead, we can start treating it as a self-stewarding layer—where the very flows that generate service are also capable of sustaining it.
This is where Intelligent Flow Engineering (IFE) comes into play—a concept I introduced in “The AI-First Operating Model” as a foundational design principle for embedding intelligence directly into enterprise flows. In that context, IFE enabled AI-driven decision-making across processes and value chains. In this piece, we extend that construct deeper into support and maintenance—where IFE becomes the enabler of agentic remediation, telemetry-native flows, and continuous learning in the run layer itself.
We should also reconsider how we view Run the Business (RTB) functions themselves. Too often, they’re treated as operational burdens—ripe for outsourcing or cost-cutting. But RTB holds a unique advantage: it is repeatable, measurable, and saturated with real-time signal.
These characteristics make it the ideal environment to introduce, train, and validate new intelligent capabilities—before they move upstream into customer-facing or change-oriented functions. In this sense, AI-driven maintenance becomes more than operational hygiene—it becomes a strategic platform for innovation, system learning, and long-overdue tech debt reduction. RTB, reimagined, is where intelligence earns its right to scale.
In this piece, I want to explore how that evolution is taking shape—not in some distant future, but across the service towers, partner ecosystems, and operational environments we work in today. And I want to unpack what it means for roles, providers, and the intelligence fabric that increasingly binds the enterprise together.
1. The Hidden Cost of Traditional Maintenance
Support and maintenance have not been neglected. Over the past two decades, these functions have undergone wave after wave of transformation—process harmonisation, automation, ITIL standardisation, tooling consolidation, offshoring, nearshoring, managed services, and targeted AI investments. The goal has been consistent: control cost, reduce noise, and maintain service quality at scale.
Many organisations have delivered substantial improvements. Application Production Support (APS) teams have reduced incident volumes through automation and observability. ITSM platforms have been streamlined, with increased adoption of self-heal scripts, predictive alerts, and workflow integrations. Ambitions around “Zero-Touch Operations” have driven platforms to eliminate repetitive, high-volume tickets altogether.
These are not trivial accomplishments—they are critical foundations for what comes next.
Yet despite these advances, support remains structurally treated as a cost centre. The prevailing measure of success is how efficiently issues are resolved—not whether the system as a whole is becoming more intelligent, resilient, or adaptive. Most of the work has focused on containment, rather than contribution.
Support flows, though rich in system signal and behavioural insight, are rarely connected to upstream engineering, architecture, or design. Incidents are resolved in isolation, lessons are trapped in resolution notes, and friction is often localised rather than abstracted into enterprise patterns.
The result is a missed opportunity. The very function that sees the most failures, captures the most real-world signal, and operates closest to live system behaviour is excluded from transformation. It is optimised—but disconnected.
As we move toward agentic operations, this disconnect becomes a constraint. Autonomous remediation, self-improving logic, and intelligent co-stewardship require more than automation—they require flows engineered for learning, traceability, and action.
This is the moment to reposition support—not as back-office stability, but as the proving ground for enterprise intelligence. A place where systems become self-sustaining, employees become orchestrators, and maintenance becomes a source of continuous improvement.
2. Enter the AI Agent – Not a Tool, but a Teammate
The shift from rule-based automation to dynamic intelligence is no longer hypothetical—it’s happening across the ecosystem. But it’s not just about “AI agents.” What we’re seeing is the emergence of an intelligence layer across the enterprise run stack: a convergence of data agents, policy-aware automation, embedded telemetry, platform-native orchestration, and context-driven remediation.
This evolution goes beyond scripting the past. It introduces systems that observe, reason, and act—not based on pre-defined playbooks, but based on real-time signals and accumulated understanding of operational patterns.
In this model, intelligence is distributed:
This isn’t a monolith—it’s a composite architecture of intelligent components, each playing a role in redefining how enterprise systems maintain themselves.
But to function effectively, these systems need more than integration—they need intelligent flow design.
Without visibility into state, history, dependencies, and business context, even the most advanced agent becomes brittle. This is where Intelligent Flow Engineering (IFE) becomes critical. IFE ensures that every support flow—whether it's application recovery, user provisioning, or config rollback—is structured to:
Today, some organisations are already putting this into practice:
The opportunity is no longer theoretical. The technology exists. The question is whether enterprise environments are designed to accommodate it.
And that’s the missing link: most support flows weren’t built to be observed, reasoned over, or adapted in real time. They were built to be executed and tracked. This is where the shift to agentic operations requires more than tool deployment—it requires a structural rethink of how runbooks, policies, and response paths are engineered.
The benefit? Once these intelligent constructs are in place, support becomes something else entirely:
And critically, all of this still includes the human. Employees don’t disappear—they shift into flow designers, escalation architects, and agent supervisors. The work becomes higher-order, more strategic, and more directly connected to service resilience.
This isn’t about replacing teams with tools. It’s about building systems where intelligence runs alongside people—at scale, at speed, and without requiring manual triggers.
This is why the support and maintenance layer matters. Not just because it can be optimised—but because it offers the ideal environment to validate enterprise-grade intelligence.
With the right telemetry, engineered flows, and human oversight in place, this becomes more than automation. It becomes the proving ground for cognitive capability—where enterprise intelligence is not only tested, but refined before scaling across the broader business.
3. From Tiers to Threads – Rethinking the Structure of Support
Most support models today are still shaped by inherited constructs: L1, L2, L3. Each tier represents an escalation in complexity, specialisation, and time-to-resolution. But in practice, this model often leads to fragmentation of insight, delays in diagnosis, and loss of institutional memory between handoffs. Each ticket becomes an isolated artefact, disconnected from the system’s broader state and evolution.
In the era of agentic operations, this structure no longer fits.
Intelligent systems don’t operate in tiers—they operate in threads: persistent, context-rich sequences of system behaviour, action, and resolution. These threads are not routed by human queues—they are initiated and maintained by agents, flows, and observability triggers that track state across boundaries.
In a thread-based support model:
This model requires intentional re-architecture. Flows must be designed to:
In leading environments, we are already seeing this emerge:
Embedded Case Example: Intelligent Threads in Action
In one global consumer goods company, what began as a standard L2 escalation for intermittent SAP order posting failures evolved into a fully observable support thread. The incident was first flagged by a telemetry pattern detected by Dynatrace, integrated into their ServiceNow ITSM stack. Instead of routing the issue to a human queue, a domain-specific agent triggered an analysis thread that ran across SAP IDoc logs, middleware transaction timings, and infrastructure CPU/memory spikes.
Using ServiceNow Predictive AIOps, the system correlated these into a single service-impact narrative. A remediation suggestion was generated—a queue buffer config change in the SAP PI layer—reviewed via a Microsoft Teams approval flow, and implemented automatically through an Ansible Tower job, governed by policy-aware automation.
Total time to resolution: under 8 minutes. Traditional route? 3–4 hours minimum with 2–3 handoffs.
Behind the scenes, this capability had been incrementally implemented over ~6 weeks:
Critically, the entire thread—signals, actions, approvals, and outcomes—was captured as a persistent system object. This meant that when a similar issue reappeared a month later, it was resolved automatically. The APS team used the logged flow to inform a system redesign—removing the original failure vector entirely.
As a final step, the system automatically generated a structured insight card and flagged it to the CTB backlog via Azure DevOps. The insight—complete with signal pattern, agent action trace, and remediation impact—was logged as a design issue for future change.
This created a closed-loop from incident to improvement, allowing the transformation team to evaluate and redesign the SAP integration logic in the next sprint. What began as a system fix became a CTB-level intervention—closing the loop between support flow and design backlog, and reducing future tech debt with minimal manual effort.
This is the structural underpinning of Intelligent Flow Engineering (IFE). IFE is not just about flow instrumentation—it’s about flow design as a first-class discipline. Support becomes a co-designed experience, where system resilience is engineered at the level of flow logic, not reaction time.
And for service providers, this represents a fundamental shift. Rather than staffing based on ticket volumes and escalation tiers, providers must deliver value through:
The result is a support model that is dynamic, learning-driven, and aligned with how modern systems operate—not how support has historically been organised.
4. If AI Agents Are the Actors, Then Flow Design Is the Script
AI agents are powerful. But they’re not autonomous gods. They act based on the signals they see—and the flows they’re allowed to follow.
This makes flow design the real unlock.
In most enterprises, support flows are static. They follow ITIL-prescribed paths, defined once, rarely adapted, and invisible to the people actually working within them. Even as automation has increased, the logic behind it has remained locked in tickets, scripts, or rigid platform workflows.
If we want AI agents to become scalable contributors—not fragile bots—we need to make support flows observable, adaptive, and co-designed.
This is where Intelligent Flow Engineering (IFE) becomes more than architecture. It becomes craft—a discipline of designing flows that:
IFE means designing flows that:
Embedded Case Example: Elevating the Endpoint Support Loop
At a large European insurance firm, the Endpoint Support team had long struggled with recurring device performance issues—slow logins, profile corruption, and failed patching. Each case followed the same dance: incident raised, script run, reboot requested, root cause unclear.
But in mid-2024, the team piloted a new model: Using Microsoft Intune, ServiceNow Flow Designer, and Windows Autopilot diagnostics, they created a diagnose-and-remediate flow embedded directly in the Virtual Agent.
Here’s how it worked:
Here’s what changed:
They didn’t just fix issues faster—they transformed how issues were recognised, tracked, and elevated for structural change.
This is what happens when flow design becomes the script, and support teams become the writers.
Flow Evolution Timeline – From Execution to Cognition
This isn’t just about better automation. It’s about building a runtime system of intelligence—where agents can act with context, people can guide outcomes, and flows become both the instruction and the insight.
When we shift from “ticket follows process” to “flow guides intelligence,” support becomes more than reaction. It becomes a learning surface, a design platform, and the place where enterprise cognition begins to take form.
5. Redesigning the Maintenance Mindset: From Reactive Fixes to Intelligent Continuity
For decades, maintenance has been synonymous with reactivity: something breaks, someone fixes it. Even with the rise of preventive maintenance, the core construct remained linear—detect, diagnose, resolve. The promise of zero-touch or lights-out operations has lingered on PowerPoint slides, but rarely moved beyond a tightly scoped automation loop.
What’s changing now is not just the toolset, but the mindset. Maintenance is no longer just about “keeping the lights on.” It’s becoming a live optimisation layer, an intelligent loop where fixes, forecasts, and functional upgrades blend into one adaptive system. This evolution builds directly on our previous framing of the enterprise support landscape as a test bed for intelligence, not just a cost centre.
The Architecture Behind It
This shift is powered by three interlocking components:
Maintenance, under this model, is no longer an afterthought—it’s a design artefact and an observability surface. The intelligence we embed here doesn’t just stabilise operations. It teaches the enterprise how to improve.
Embedded Example: Proactive Capacity Tuning in Cloud Infrastructure
At a European food manufacturing client, infrastructure maintenance had long been governed by static thresholds and reactive escalations. Peak season events would trigger war rooms, not scale plans.
But in 2024, the InfraOps team rewired their model.
Using a combination of Azure Monitor, Log Analytics, and an OpenAI-powered agent running in a secure container, they introduced a proactive flow for capacity tuning:
What changed:
This wasn’t just predictive maintenance. It was adaptive optimisation, driven by an interplay of data, flows, and learning agents.
Why This Matters Now
As enterprises increase their reliance on interconnected platforms and distributed environments, downtime costs are no longer just financial—they're reputational, regulatory, and operational. But the real opportunity lies in rethinking support and maintenance not as overhead—but as the runtime memory of the business.
We’ve spent years underinvesting in maintenance functions, outsourcing them to control costs and chase SLAs. But these very teams see every exception, workaround, patch, and regression.
They are the frontline of insight—and with the right architecture, they can become the first responders to complexity and the pilots of enterprise cognition.
By shifting from reactive to intelligent continuity, we unlock more than efficiency. We activate the feedback system between design, operation, and learning—and finally make maintenance a strategic layer of the intelligent enterprise.
6. From Playbook to Practice: Operationalising the New Support Model
The shift to intelligent maintenance isn’t just a technology play—it’s an operational shift. To realise its full value, organisations must restructure how support is planned, executed, and continuously improved. This means evolving not just the systems, but the roles, processes, and incentives around them.
Where Traditional Support Models Fall Short
Legacy support models are often constrained by:
Even when automation is added, it’s typically task-based and isolated, not system-aware or context-sensitive.
What Operational Excellence Looks Like in This New Model
The intelligent support model introduces a set of new capabilities and expectations:
Support teams are no longer process executors. They become flow designers, data interpreters, and platform collaborators—contributing directly to both operational stability and platform evolution.
The Architecture of Practice
To embed this in daily operations, a few foundations are essential:
In short, practice becomes a living system—versioned, inspectable, and improvable over time.
Human Impact: The Talent Shift
This model only works if we evolve our view of talent. Maintenance engineers and support analysts are no longer "non-core" workers.
They are:
This requires a deliberate shift in training, empowerment, and recognition. Support must be embedded within agile delivery models, treated with the same investment in tooling, skills, and retrospectives as any product team.
7. Building the Observability Core: Signals, Feedback, and Data Contracts
If flow is the script and agents are the actors, then observability is the stage—the place where context, continuity, and control converge.
The move toward intelligent support cannot succeed without a foundational investment in a robust observability fabric. This layer transforms support from a reactive sequence of break-fix actions into a closed-loop system of signal, sense, and respond.
Observability is Not Just Monitoring
Traditional monitoring checks if things are working. Observability asks: Why isn’t it working? In the intelligent enterprise, it also asks: What can we learn from it?
To do this well, organisations must treat observability as a design concern, not an afterthought.
The key shift is from instrumentation of systems to instrumentation of flows:
Data Contracts and Feedback Loops
What enables this shift is the emergence of data contracts—agreements between producers (apps, services, platforms) and consumers (agents, analysts, flows) on the shape, quality, and semantics of data.
Data contracts:
This is where our prior concept of Data Agents re-emerges: they don’t just query telemetry—they negotiate it. They align signals across services, validate context windows, and identify what feedback needs to return into the CTB backlog.
Embedded Example: Signal Drift in API Performance
At a global consumer goods company, several APIs powering partner portals began showing increased latency. Traditional APM flagged the issue but offered little insight beyond stack traces.
An observability agent detected signal drift—a subtle change in call sequencing patterns. Using a telemetry data contract, it validated which downstream systems were contributing to response delays.
Instead of escalating a vague ticket, the system:
This didn’t just fix the incident—it strengthened the platform.
Why This Layer Matters
Without a mature observability core, agentic systems become brittle, blind, and biased. Worse, they risk hallucinating actions without context—introducing new tech debt while attempting to fix old.
Done right, observability becomes:
In this model, support becomes insight at scale—with every intervention teaching the system, informing the backlog, and raising the bar for how change is absorbed and intelligence is embedded.
8. A New Compact: Repositioning Support as an Engine of Intelligence
It’s time we stop seeing support and maintenance as the leftovers of transformation. In an AI-first enterprise, these functions become core to how the business learns, adapts, and grows.
We’ve built systems that can recover, but now we must build systems that can teach. Every flow, every signal, every agent intervention becomes part of a living body of operational intelligence. This is where Intelligent Flow Engineering (IFE) no longer sits in the realm of design—it becomes the day-to-day mechanism of running, refining, and reshaping the enterprise.
Support Is the System
This article has argued that support and maintenance should no longer be viewed as cost centres or outsourced necessities. When reframed through the lens of agentic automation, observability, and IFE, they become:
Rather than react to problems, the intelligent support function prevents, predicts, and prioritises—acting as a strategic filter between what is and what could be.
What This Demands
To unlock this model, we must:
Support becomes the heartbeat of change, and talent in this domain shifts from “operating cost” to change catalyst.
The New Compact
This is the compact the AI-first enterprise must now make:
We will no longer wait for issues to teach us. We will use every intervention as a trigger for intelligence. We will treat support as a platform for elevation, not a dumping ground for cost.
In doing so, we shift from treating RTB as something to control, to something that controls the quality of the enterprise itself.
Senior Digital Project Manager at Genpact #Transform #360Finance
2moHelpful insight, Jaco
Executive Leader – AI, Data & Analytics | Enterprise Transformation | Building Future-Ready Organizations | Global Innovation Strategy | Harvard SELPI & MIT | IIMC Alum | Ex US/ Australia
2moBrilliant perspective! Support is no longer just backend—it’s becoming the frontline of intelligent transformation. AI agents + human insight = next-gen resilience.