#5 - The Digital Twin's Hidden Engine: Your Data Refinery
Coverting raw data to useful intelligence

#5 - The Digital Twin's Hidden Engine: Your Data Refinery

Article Summary

The Digital Twin is the ultimate goal of digital transformation in Oil & Gas, but it remains a fantasy without a robust data foundation. In this article, I argue that raw OT data is like crude oil: abundant, valuable, but not much use until it's refined. A Digital Twin is only as smart as the data it’s fed, and its success hinges on a rigorous and unglamorous "data refinery process." I break down the three critical stages required to turn this "digital crude" into the high-grade fuel that powers a truly intelligent twin: a) extracting raw data from legacy and modern edge systems, b) building a secure digital pipeline for transportation, and the most crucial step c) refining the data through contextualization to create actionable intelligence. 

This is the foundational work that turns the fantasy into a reality.

In my previous article “You Dont Have a Digital Twin Strategy. You Have a Digital Twin Fantasy!”, I contrasted the alluring vision of the "push-button" oilfield with a critical reality check: too many organizations are chasing the dazzling Digital Twin, attempting to build a penthouse on a "swampy foundation." This swamp is the messy legacy of siloed IT and OT operations, a cultural chasm between teams, and a chaotic mess of uncontextualized data.

The shiny Digital Twin is merely the outcome. The real, foundational work is the "data refinery process". Just like crude oil, raw OT data is abundant and immensely valuable, but it's a sludgy, unusable mess in its raw state. A Digital Twin is only as smart as the data it's fed. Without refining this digital crude, the twin remains a hollow shell, a beautiful visualization disconnected from reality and therefore from any actionable intelligence.

There are three essential stages of building this "digital plumbing", from extraction at the edge to secure transportation and, most critically, the vital process of contextualization.

Step 1: Extracting the Digital Crude at the Edge

The data journey begins at the edge - the physical assets themselves. The edge, whether a PLC, a DCS or other IIoT device, acts as the sensory "nervous system," providing the raw data that fuels the entire ecosystem. However, extracting this data from legacy O&G infrastructure is far from simple. The reality on the ground often involves 20+ year-old DCS and PLC systems, which, while the lifeblood of the industry, were designed for reliability in isolation, not for modern data integration.

These legacy systems present a significant challenge due to insecure protocols and a lack of modern security controls. This is where the edge plays its crucial role.

Edge gateways, positioned close to the assets, perform initial data filtering, aggregation, and critical protocol conversion.  This is essential for addressing latency for real-time control, reducing bandwidth costs of transmitting raw data, and enabling autonomous operation for remote, intermittently connected assets.

Step 1: Extracting data from PLCs, DCSs, SCADAs etc (ai generated)

Step 2: Building the Secure Digital Pipeline

Once extracted, the raw data needs a secure and efficient pathway to move from the OT domain to the IT domain for integration and analysis. This is the "digital plumbing" at its crux, and it must be built on a foundation of modern standards and a robust security architecture.

A non-negotiable step is the shift from insecure legacy protocols to standards designed for an interconnected world, such as (only a few popular protocols listed):

  • OPC UA (Open Platform Communications Unified Architecture): The modern standard for secure, interoperable data exchange, with cryptography built into its specifications.

  • MQTT (Message Queuing Telemetry Transport): A lightweight and efficient protocol that has become the de facto standard for IIoT, perfect for connecting remote or constrained devices.

Beyond protocols, the architecture is paramount. The principle of defense-in-depth as a framework is critical. (Check out my prior article Digital transformation bridging the IT-OT divide within O&G industry; see also recommendations of ISA/IEC 62443 for cybersecurity.) This involves:

  • Robust Network Segmentation: Breaking the network into smaller, isolated zones separated by firewalls. A flat network is an indefensible network.

  • The Industrial Demilitarized Zone (IDMZ): A non-negotiable secure buffer that prevents any direct communication between IT and OT, ensuring all data flows are controlled and monitored.

  • Unidirectional Gateways (Data Diodes): For the most critical systems, these hardware devices enforce a one-way data flow, making it physically impossible for an attack to move from IT into a critical OT system. (Data diodes are neither universally popular nor adopted. I am neutral about data diodes at this time; please leave comments if you have strong opinions for or against them.)

Step 2: Building a secure pipeline (reusing the image from a previous article because it fits here so well) (ai generated)

This robust digital pipeline ensures that the raw data, once captured, travels securely and reliably to its next destination: the refinery.

Step 3: The Data Refinery – Where Value is Created

This is the most crucial step in the entire process. Raw OT data is a mess specially when gathered from legacy edge devices. Contextualization is the act of refining this digital crude into high-grade fuel, transforming raw numbers into meaningful information. Without this, the Digital Twin fails.

This refinement involves integrating diverse data types to add layers of meaning that a raw sensor reading alone cannot provide:

  • Enrichment with IT Data: This is where IT/OT integration becomes tangible. Raw operational data is combined with enterprise-level IT data from systems like Enterprise Asset Management (EAM), Product Lifecycle Management (PLM), and Enterprise Resource Planning (ERP).

  • Semantic Integration: Beyond connecting data, it's about ensuring different systems "speak the same language." This involves bridging the "semantic gap" between how different systems or data models define different aspects of an asset, creating a common, machine-readable language.

  • Georeferencing: Especially vital for O&G's sprawling assets, this is the process of assigning a real-world coordinate system to a digital model, ensuring an asset model of a compressor station is placed in its precise location on a GIS map of a pipeline.

Step 3: Using IT to augment OT data (ai generated)

This transformation is what turns a simple reading into actionable intelligence. For example:

A raw reading of "72.4 Hz" becomes...

"Pump B at LNG Train 3 (Asset ID: PMP-1138), last maintained on July 10th, is vibrating at 72.4 Hz, which is 15% above its baseline for current operating conditions, indicating a high probability of bearing failure within the next 7-10 days."

This is the difference between data and intelligence.

The Peril of Low-Grade Fuel

Without this rigorous refining process, you are feeding your expensive AI models and Digital Twins low-grade fuel, and you will get poor performance in return. Your predictions will be flawed, your optimizations will be based on an inaccurate reality, and your Digital Twin will remain a hollow shell - an impressive visualization that cannot deliver measurable business value.

Digital Twin is the fruit of Data Refining done well (ai generated)

Conclusion: The Real Work is the Foundation

The Digital Twin is not a strategy; it is the proof that you've successfully built the bridge. It is the ultimate application that consumes the data made available by IIoT, cloud, and AI, providing the "why" for investing in this underlying infrastructure.

For the oil and gas industry, the foundational work of the data refinery process is what creates the robust basis for a long lasting Digital Twin. The true competitive advantage isn't found in the most visually impressive Digital Twin, but in having the most resilient, secure, and unified data flow between your physical and digital worlds.

#DigitalTwin, #OilAndGas, #IIoT, #ITOTIntegration, #IndustrialAI

(written and illustrated with the help of genAI.)

References and Further Reading

  1. You Dont Have a Digital Twin Strategy. You Have a Digital Twin Fantasy!

  2. Digital transformation bridging the IT-OT divide within O&G industry

  3. A Practical Approach to Adopting the IEC 62443 Standards

  4. Grieves, M., et al. (n.d.). Digital Twins: The authoritative guide to models, systems, and applications.

To view or add a comment, sign in

Others also viewed

Explore content categories