Data-Driven Enterprise: from Data Mess to Data Mesh and Data Products
The challenge of becoming a Data-Driven (or AI-Driven) Enterprise
In today’s digital-first world, the ability to capture, process, and act on data is a critical differentiator for enterprises. However, traditional approaches to data integration often hinder organizations from becoming truly data-driven. The cost and complexity of rebuilding legacy systems—or attempting to create a unified Data Fabric, where all current and future data is stored, processed, and made available—are prohibitive for most businesses.
Leading consulting firms propose an alternative: a federated and distributed architectural paradigm. By applying abstraction and service orientation to data, organizations can leverage Data Meshes and Data Products, along with modern TableFlow standards such as Apache Iceberg and Delta Lake.
This article explores the challenges enterprises face in their transformation journey and highlights the role of Data Streaming Platforms in enabling this shift efficiently and cost-effectively.
The Data-Driven Enterprise of 2025
In its January 2022 report, The Data-Driven Enterprise of 2025, McKinsey & Company predicted that by 2025, most employees would use data to optimize nearly every aspect of their work. "Smart workflows and seamless interactions between humans and machines" would be as standard as a corporate balance sheet.
Key traits of the next-generation data-driven enterprises they identified included:
While the transition to data-driven enterprises is certainly in motion and has become a higher priority than ever for many business leaders—especially with the democratization of artificial intelligence and the vision of the AI-Driven Enterprise—several challenges remain. These include:
A New Paradigm for Data Architecture
In its 2023 report, New Data Architectures Can Help Manage Data Costs and Complexity, Boston Consulting Group (BCG) identified key technical challenges businesses must address to become truly data-driven.
They observed that few areas of infrastructure have evolved as rapidly as data management, analytics, and AI—a trend we see today with the emergence of Data Streaming Platforms as a new category of data infrastructure, alongside the rapidly evolving technology stack for Generative AI.
BCG highlights:
“As companies grow, different business units and teams build independent, often siloed data stacks to solve their specific needs, creating a brittle spider web of integration pipelines, data warehouses and lakes, and ML workflows. As companies move up the maturity curve—from data-driven to AI-driven organizations—the architectural complexity and fragmentation inevitably rise”.
They note that an enterprise architecture’s scalability and efficacy depended on two critical functions:
BCG advocates for a federated and distributed data architecture, emphasizing abstraction and service orientation to manage costs and complexity. By adopting Data Meshes and Data Products, along with standards such as Apache Iceberg, enterprises can unlock the value of their data without costly system overhauls—allowing analytics to be applied irrespective of data storage format and location.
Other consulting firms and analysts share this view, emphasizing the need for modular, domain-oriented, and scalable data architectures that support real-time processing, advanced analytics, and AI-driven decision-making.
Deloitte’s Five Steps for Building a Data Mesh
In “Are You Ready to ‘Mesh’ Your Data?” (The Wall Street Journal’s CIO Journal, July 2023), Deloitte outlined five fundamental steps for enterprises adopting a Data Mesh, which they describe as a federated data architecture that enables enterprise-wide access to data by:
Deloitte’s five steps to Data Mesh adoption are:
Deloitte’s 2024 white paper, Treating Data as a Product in the Era of GenAI, further explores this evolution from:
The paper explains how organizations can treat data as a product and embark on the "Data Mesh / Data Product journey" to maximize data value.
From Data Mess to Data Mesh: Building a Central Nervous System
As enterprises generate and consume more data, complexity and fragmentation increase, leading to the "brittle spider web" of integrations, data pipelines, and ML workflows described by BCG. This drives up costs, complicates data sharing, and makes it harder to unlock new use cases—such as real-time Generative AI applications.
Traditional batch-based ETLs and black-box integrations create disconnects between:
This disconnect limits real-time insights and prevents automation, making it difficult to leverage transactional data for Generative AI or use analytics to drive real-time operational processes.
Confluent’s Solution: A Data Mesh Overlay
For the past decade, Confluent has helped thousands of organizations worldwide transform the tangled web of complex, rigid, point-to-point connections—created by traditional data management tools—into scalable streaming data solutions.
By shifting to a decoupled Data Producer / Consumer architecture, organizations have been able to establish a continuous flow of real-time, integrated data.
In the new overlay architecture, we move away from tightly coupled point-to-point integrations to a decoupled Data Producer / Data Consumer, multi-subscriber architecture where there is a continuous flow of integrated data that’s always up to date. The curation and quality of the data is the responsibility of the Producer (the domain expert) who presents the Data “as a product” in a fully governed, trustworthy and ready-to-consume manner for one or more Consumers to use.
The available Data Products are made securely available for consumption through appropriate governance and tooling, including schemas and access controls. Different Consumers can then instantly use those data sets, independently and in their own time for their respective use cases, continuously processing, analyzing and acting on the data within milliseconds of its creation.
How Confluent’s Data Streaming Platform (DSP) Enables Data Mesh
Within a Data Streaming Platform (DSP), this is achieved this through four core capabilities:
Connect is used by both the Producer and the Consumer as it is the ability to ingest (on-ramp) and inject (off-ramp) data with any system, anywhere. It is typically triggered by some from of event occurring in the source system that typically results in new data or a change to existing data. The event and its associated data is then entered into one or more Streams where it can be Processed (joined, enriched, curated) and be made available for Consumers to retrieve the event/data for use within another application, microservice or data system.
Governance ensures trustworthy, quality data, and enforces secure reusability and discovery of Data Products. These data assets are automatically catalogued and their lineage constructed, such that they can be discovered and understood by those allowed to use them.
The event-driven messaging capability implements Apache Kafka using a cloud-native architecture design for high performance, scalability and low total cost of ownership, with the capacity to bridging seamlessly between edge hardware, legacy systems and multi-cloud platforms.
The Data Streaming Platform provides the technical enabler that organizations need to escape the “Data Mess” and build their “Data Mesh”. Adoption of the Data Streaming Platform enables organizations to rapidly build an overlay to their existing systems and leverage investments already made in Apache Kafka, thus driving down costs and unlocking valuable real-time data scenarios.
The extend capabilities provided for governance, processing and connectivity enable teams to establish the capabilities and operating model for a full enterprise “Central Nervous System”, creating the business enabler their organizations requires to deliver their Data-Driven and AI-Driven Enterprise.
Conclusion: The Path to a Real-Time Data-Driven Enterprise
The journey from Data Mess to Data Mesh is complex but essential for organizations aiming to become real-time, data-driven enterprises. It starts with a platform and platform team applied to a priority domain with a proof-of-concept that rapidly demonstrates the value and is then enriched overtime by bring on additional domains and uses-cases, while continually measuring business outcomes and value.
By leveraging Apache Kafka in a cloud-native, scalable architecture, Confluent provides the enabling capabilities enterprises can use that:
By adopting Data Mesh architectures and leveraging Data Streaming Platforms, businesses can:
Ultimately, a well-implemented Central Nervous System will unlock the full value of enterprise data, driving smarter decisions, innovation, and competitive advantage.
Founder | Senior Data Executive | 30 Years of Leadership in Data Strategy & Innovation | Executive Director | Sales Executive | Mentor | Strategy | Analytics | AI | Gen AI | Transformation | ESG
4moGreat insights, Jonathan! With data integration being such a hurdle, the shift toward federated architecture and technologies like Data Meshes seems like a smart move. How do you see the evolution of Data Streaming Platforms influencing this transition in the near future? Your thoughts on this could spark an interesting discussion!