The Next Step for Automation: Systems That Validate Their Own Operational Integrity
The Revolution Begins With Built-In Assurance
To structurally control cyber-physical risks, we must redesign how we think about automation functions. These systems no longer operate in neutral or trusted environments. They now function in malicious, contested spaces — where misuse, manipulation, or deception is not a possibility, but a given.
Control and safety platforms must therefore be engineered with intrinsic robustness and resilience — not just to maintain uptime, but to prevent them from being turned against the communities they serve.
This demands more than hardening. It requires the ability to continuously monitor whether operational integrity is still intact — and to respond when it is not.
1. Thesis: Without Operational Integrity, There Is No Real Observability, Controllability, or Operability
Operational integrity is the systemic coherence between control actions, process state, and intent — both design and operational. If this coherence breaks, all three pillars of COO collapse — often while the system appears to function normally.
What Is Operational Integrity?
Operational integrity is not a feature — it is a condition of systemic coherence. It reflects whether the automation system, the physical process, and the operational intent are truly aligned.
This condition spans across four integrated dimensions:
1. Automation and Process Configuration
Control logic and safety layers
Physical design limits and equipment ratings
Inherent process constraints and dynamics (e.g. surge behavior, kinetics)
2. Interacting Functions and Infrastructure
Interdependencies between control, safety, optimization, and diagnostics
Communication latency, determinism, and data reliability
Planning system integration and business logic
3. Human and Organizational Capability
HMI visualization and alarm design
Operator cognitive load and intervention timing
4. Feedback Integrity and Change Control
Sensor calibration, time sync, and spoof detection
Feedback loop closure and effectiveness
Integrity of critical safeguards (SIS, relief, fallback modes, manual intervention)
Secure change management for logic, thresholds, and configuration
🟢 Operational integrity fails when any of these break alignment — even if the system appears to function. It is the foundation on which observability, controllability, and operability must rest.
2. Observability Is Not Data — It's Functional Process Awareness
In industrial operations, observability is often equated with the presence of data. If values are updating, trend lines are stable, and no alarms appear abnormal, operators and engineers frequently assume that the process is fully observable.
This assumption is widespread — and dangerous.
Observability is not just an HMI function. Advanced Process Control (APC), process analyzers, quality monitoring systems, and custody transfer meters all rely on continuous data streams to interpret process state. But all of these functions are built on the assumption that the data reflects the physical truth.
If operational integrity is compromised, these functions continue to operate — but their conclusions become misleading.
The presence of values does not prove the validity of the process state they represent.
Example: The Illusion Caused by Sensor Re-Characterization
Consider a heat exchanger where temperature must remain below 160°C.
On screen:
The operator sees a stable temperature of 120°C.
Trend lines are flat.
No abnormal alarms are active.
In reality:
The temperature transmitter has been re-characterized.
Its internal scaling or offset has been altered.
The actual process temperature is 170°C, approaching unsafe limits.
What Is Re-Characterization?
Re-characterization refers to the unauthorized or undetected modification of how a sensor or actuator translates real-world physical inputs or control outputs into digital process values. This can occur within the field device or at the signal conditioning layer between the field and the controller.
This includes changes such as:
Zero/span values or scaling factors
Square root extraction for flow transmitters (often silently enabled/disabled)
Linearization curves, compensation tables, or lookup profiles
Internal range configuration or device-specific transfer functions
Field calibration offsets injected via handhelds or remote access tools
Signal conditioning blocks embedded in analog input modules or digital protocol drivers
Unlike spoofing or device failure, re-characterization does not interrupt signal availability. The signal continues to update — just no longer truthfully.
Why It's Dangerous:
No alarms are triggered
Control logic sees no fault
Operator displays update as expected
Safety systems remain armed but unaware
The process can drift into unsafe or unstable territory while all system layers — from control to optimization — remain blind to the deviation.
🟢 Re-characterization is not noise — it is a structurally permitted, silently weaponized distortion of truth.
3. Controllability Requires Operational Integrity
Effective controllability assumes:
Commands cause expected actuator behavior,
The process reacts predictably,
Feedback reflects real results.
When these assumptions break — due to silent overrides, blocked actuator motion, or spoofed measurements — controllability degrades into illusion. Control actions still occur, but no longer produce functional influence.
Silent Loss Scenarios in Smart Devices:
Fixed Output Mode: controller issues setpoints, but actuator holds position.
Write Lock: command changes are ignored, no alarms triggered.
Travel Limit Clamp: valve moves, but only within a sabotaged safe-looking range.
Simulated Feedback: loop appears stable — but no actual correction happens.
🟢 Without operational integrity, controllability vanishes — even as control logic keeps running.
4. Operability: The Disguised Casualty of Integrity Loss
Operability refers to the system’s sustained ability to perform its intended production function — within safe, stable, and efficient bounds. It is not simply whether the plant is running, but whether it is running in a state that is valid, trusted, and aligned with design and operational goals.
When operational integrity is compromised, operability becomes deceptive.
⚠️ The system may appear fully functional:
Controllers are executing,
Sensors are streaming,
Product is flowing.
Yet the underlying coordination between inputs, actions, and outcomes is no longer trustworthy.
Examples of Silent Operability Failure:
Quality assurance systems confirm products “in spec,” while receiving falsified measurements or mis-calibrated data streams.
Advanced Process Control (APC) continues to optimize, but steers the process toward unstable or hazardous operating points due to manipulated feedback.
Safety Instrumented Systems (SIS) remain armed, but no longer detect critical precursors because of spoofed or suppressed sensor data.
Production reports and KPIs reflect nominal performance, while energy intensity, emissions, or off-spec drift rise unnoticed.
Operators monitor live trends, unaware that setpoints have shifted or constraints have been silently lifted.
🟢 These are not faults in functionality — they are fractures in trust. The process keeps moving, but not under conditions you would knowingly approve.
True operability requires validated integrity — not just motion. A system that is available, controllable, and observable still fails if the underlying integrity that connects those layers has been silently lost.
When operational integrity is lost, operability continues — but no longer on your terms.
5. Integrity Layers: From Signals to System Consequence
Industrial systems rely on multiple types of integrity — each with a different scope. But these layers are often treated as independent assurances, rather than as building blocks toward a higher condition: operational integrity.
Let’s unpack each:
🔹 Data Integrity
✅ Ensures:
Signals are transmitted without corruption
Time stamps are synchronized
Protocols verify source and destination
❌ Misses:
Whether values reflect real-world process behavior
Whether sensors are correctly calibrated or tampered
Any manipulation before signal transmission
🔹 System Integrity
✅ Ensures:
Devices boot with verified firmware
Software environments remain unchanged
No unauthorized binaries or memory alterations
❌ Misses:
Whether logic or data still produce intended physical outcomes
Any functional degradation due to valid (but risky) configurations
Logic errors that match hash checks but violate design
🔹 Control Integrity
✅ Ensures:
Control logic runs as written
Execution follows timing and prioritization rules
Parameters are within authorized ranges
❌ Misses:
Whether logic matches current process needs
Whether manipulated inputs distort closed-loop behavior
Blindness to override modes, stalled actuators, or faked feedback
🔹 Safety Integrity
✅ Ensures:
SIL-rated functions activate under design-defined conditions
Fault tolerance and diagnostics meet reliability targets
Safety response time is verified
❌ Misses:
Spoofed or suppressed inputs that delay or disable triggering
Dependencies on compromised control logic
Situations where inputs are valid, but false
🔵 Operational Integrity (Superset Condition)
✅ Ensures:
Data integrity is present
Systems are uncompromised
Control logic runs correctly
Safety functions are available
AND — real-world process behavior matches design and operational intent
Operational integrity is the only layer that binds signal, system, logic, and physical outcome into a single testable truth.
❌ Without it, all lower integrity types can function — while the system drifts toward failure.
🟢 Think of operational integrity not as another layer — but as the only one that closes the loop between system execution and physical reality.
6. Availability Is a Cost Metric — Not a Safety Assurance
In most automation strategies, availability is treated as the top performance objective. This reflects a cost-centric view: uptime means production, and downtime means lost revenue.
But this focus is dangerously narrow.
A system that is always available but operating without integrity is not productive — it is exposed.
🔄 Availability Optimizes Cost — Not Trust
When availability is prioritized above all, systems are engineered to:
Stay online through faults
Suppress alarms that “create noise”
Fail into manual or degraded modes without halting operation
But none of this guarantees that the process is:
Physically stable
Within engineered limits
Or functionally safe
🟢 You remain connected — but not necessarily in control.
⚠️ The Real Cost of Losing Integrity
Loss of availability causes visible and quantifiable loss:
Production stops
Delivery is delayed
Revenue takes a hit
But loss of operational integrity causes invisible and escalating damage:
🛠️ Equipment fatigue or destruction
🌫️ Environmental harm from undetected drift
👷 Safety degradation from corrupted decision paths
📉 Reputational collapse if the incident becomes public
The cost of being unavailable is high. But the cost of running falsely available is often catastrophic.
✅ Availability Keeps You Running.
Operational Integrity Keeps You Safe.
Until automation systems are designed to verify and enforce operational integrity, availability remains a misleading — and sometimes dangerous — proxy for trust.
7. Conclusion: Control Systems Don’t Fail at the Bottom — They Fail in Misalignment
“If operational integrity fails, you’re not observing. You’re not controlling. You’re not operating. You’re just connected — to risk.”
Operational integrity is not a sub-discipline. It is the system-wide condition that validates every other function — cyber, safety, optimization, and control.
Until we treat it as such, all COO attributes remain fragile — even when everything appears healthy.
8. Monitoring and Managing Operational Integrity: The Role of a Real OT SOC
A true OT SOC must monitor more than packets and user logins. It must verify that the system’s behavior remains grounded in physical truth and design logic.
What to monitor:
Sensor-to-process coherence (not just I/O faults),
Actuator effectiveness (not just command transmission),
Tampering of thresholds, constraints, calibration,
Behavioral divergence across control, safety, and optimization layers.
🟢 The mission is not to monitor the system. It’s to monitor whether the system is still meaningfully in control.
Final Recommendation: Vendors Must Embed Independent Operational Integrity Validation
Automation vendors must stop treating security and integrity as bolt-on concerns. They must accept the operational reality: control and safety systems now run in malicious environments — and without built-in checks, their digital functions can be misused to endanger the very communities they were meant to serve and protect.
Too many platforms continue to expand control and optimization features while ignoring the need for intrinsic assurance that those functions behave safely, truthfully, and within design intent.
A system that can be silently misused is not engineered — it’s exposed.
Control and safety platforms must no longer be shipped blind.
Vendors must:
🔧 Introduce native operational integrity validation, not as diagnostics, but as a design principle.
📊 Base checks on compound, independently constructed models, not the same data and thresholds used by control logic.
🧩 Architect validation as structurally separate from the functions it oversees, making it resilient to coordinated manipulation.
These checks must detect:
Command–effect mismatches,
Behavior inconsistent with process physics or safety design,
Silent logic changes, parameter tampering, or field spoofing.
This is not redundancy. This is proof of operational trust — built into the system itself.
🟢 You don’t need two systems. You need one that knows when it’s wrong — and refuses to fake being right.
Engineering Leadership | Industrial Cybersecurity | Control & Automation Systems
1moSinclair it would be good to see work by an automation & control system vendor that builds a real use case that can exemplify your built in assurance model and so justify the value.
Security Specialist at Alliander
1moDoesn't this closely align with safety systems? Perhaps the purpose and implementation may differ due to a different type of threat. But i think the paradigm and type of issues are the same; how can you trust something to check itself?
Conscious Systems Architect | Founder of Alorig Systems | CTO at ZA Technologies | Cybersecurity & SEO Strategist | 2x CCIE | 20+ Years in Digital Growth & Systems Architecture
2moOperational integrity isn’t a layer — it’s the condition all layers depend on. Systems must stop reporting "status" and start validating "truth".
PhD | OT Security Specialist | Secure by Design @ CISA
2moAgreed with the final recommendation and I like the paper. I'd add the idea of known fail states in all of this. If a malicious actor silences a transmitter or fills a network, what actually happens? If a firmware update is interrupted is the behavior predictable? How can a controller flag distress in an untrusted environment? Sometimes the protocol allows something clever.
Senior Engineer, Industrial Control Cybersecurity at FM Approvals
2moMy comment is not on the article but on the graphic used for the post. It reminded me of something I had read years ago about the control rooms of the future—that they will only have a dog and a human. The dog's job would be to prevent the human from touching anything. And the human's to feed the dog.