Data Mesh: How to manage continuous change and empower value streams
Data can help businesses with making more informed decisions. Data is found everywhere these days and the amount of available data is growing exponentially, driving businesses of all shapes and sizes, and helping them make better decisions within shorter time periods. For really tapping into the potential that data may hold, an increasing amount of businesses are on a quest to become data driven, by proliferating all their available data into systems which they can use to draw actionable insights from, in the hopes of making more informed decisions and improving their value streams. Throughout the past few decades, several different technologies emerged to handle data effectively, starting with traditional relational database systems. Throughout time though, data itself grew on different aspects, such as size, complexity and diversity and traditional relational database systems were unable to handle the ever-growing complexity of data we now describe as “big data”. Then the data warehouse was born, promising to handle more complexity with higher throughput and consolidate data from a wider range of input sources. Although promising at first, the drawbacks of data warehouse systems became increasingly evident, leading to more recent innovations such as the data lake, which promised to alleviate some of the pain points that people faced with traditional data warehouse systems. Then again, the cycle continued, and the drawbacks of data lakes became evident as well, leading industry to try combining the best of both worlds into a new solution they called the “data lake-house”. And still, the vicious circle continued. It also turned out that centralising all business data ultimately causes many systemic bottlenecks and slows down decision-making. So far, none of these technologies were able to continuously deliver value at scale. Rather, they slowed down and messed up many business functions they originally promised to improve. According to highly respected consultant and author Zhamak Dehghani at thoughtworks, we need to rethink the way we handle data, through a systemic paradigm shift, and forget about the old ways of managing data.
Her solution? A promising innovation called Data Mesh. In this article, we are going to analyse the data mesh idea, what it is and whether it outperforms previous technologies.
But first, let’s define what we mean by “data” …
Let's start by decomposing data into two major types: Operational data and analytical data.
As it is clear from the above figure, operational data is any data that keeps the daily operations of a business running. It is similar for many different businesses, so examples aren’t hard to come by. Analytical data however can be very specific to a business and depends on a lot of different factors. If a business wants to include a recommender system based on a customer’s purchase history, operational data needs to be aggregated, cleaned, and prepared for training a machine learning model, which will be essential to provide a user with smart recommendations. As it is clear from this example, a lot of analytical data is based on and derived from operational data. Maybe the BI (business intelligence) team wants to analyse different factors that influence purchase behaviour. All changes in UI/UX design, product features, marketing data and product placement need to be analysed against the purchase history over time to figure out which factors influence purchase behaviour the most. For doing so, several different data models (both operational and analytical) need to be combined for conducting such an analysis. Which brings us to another distinction of data, in this case across layers of augmentation: source-aligned data, aggregate data, and fit-for-purpose data.
The orange boxes are called domains. The black labels within the boxes are called data products. And this leads us directly to the data mesh. The picture above is a top-down view of a fully implemented data mesh, where every data product that is exposed on the mesh can help downstream domain teams with using the data as input to create an aggregate model or even prepare the data for a specific business function. Note that the mesh itself is NOT a systemic architectural solution, the mesh emerges by establishing fully functional data products alongside API standards in order to slot things together. Although cloud platforms are technically not needed to create a data mesh, it makes the implementation a lot easier. Without a cloud solution such as AWS or Azure, we would have to build object stores or virtual machines and the like from the ground up. Instead of establishing unmaintainable cross-domain ETL pipelines, we expose data as data products, treating data as a first-class citizen within our organisation.
Let’s dive into the three-layer architecture shown above:
Source-aligned domains often have a close relationship to operational data. In this case, the bottom layer resembles domains related to the customer experience, with each domain exposing their own data product on the mesh. Data products expose their output, via an API for instance, to then serve as input for composing aggregate and/or fit-for-purpose data models.
Aggregate domains, although on the second layer, are considered downstream domains, because they directly take the input of source-aligned data products (in this case the upstream domains) to build a higher-order data product, such as a multi-view customer data model, or training a machine learning model on this customer data model to build a recommender system for users. Data products on this layer are called aggregates because they combine data from different upstream sources to create a more complex view of available data to draw more insights. Note that any domain on any layer can be considered a downstream domain if it takes inputs from another data product.
Fit-for-purpose domains and their data products go together with the different business functions that exist within an organisation. The data taken from the value streams end up serving these different business functions and enable true data-driven decision-making at a company from the ground up. By combining data models from different data products for specific business function needs, a deep knowledge-base can be composed for each business function to decrease the time needed- and enhance the certainty for critical decision-making.
A socio-technical framework
Designing a data mesh in a well-formed manner however does not only require the necessary technical know-how, but all implementation also needs to abide by four fundamental principles. According to its thought-leader and author, Zhamak Dehghani, data mesh is not a systemic solution, it is a socio-technical framework. It is meant to enable a certain type of technological ecosystem at a company. This ecosystem is meant to optimise many different processes, such as decreasing the time needed to put machine learning models into production, more efficient release pipelines for software or more rapid implementation of user trends into software solutions by elevating the organisation to a truly data-driven company. This means it could require a company to make a cultural shift to allow for a well-formed data mesh to take root.
Let’s dive deeper into the four data mesh principles:
1. Principle of domain ownership
This principle states that every specific domain team needs to take ownership and accountability for the data they produce. Analytical data needs to be composed according to the scope of the domain and not overstep the bounded context of the domain and its specific responsibility. By doing so, the data is moved away from central data warehouses or data lakes to the specific teams that work closely with the data, which is one way through which data mesh alleviates aforementioned issues of the centralised data models. By having such a distributed, decentralised system, domain teams can operate autonomously, and instead of creating complicated and error-prone ETL pipelines to pass on the data across teams, the output data model is exposed through agreed standards on the mesh, so that other domain teams can immediately repurpose the data. Although ETL pipelines are eliminated with this model, they are still allowed to exist WITHIN a domain. Within the domain, domain teams are allowed to have as many ETL pipelines as they want, as long as there aren’t any cross-domain ETL pipelines. By giving teams autonomy over their required data, bottlenecks are prevented from the beginning and decision making is accelerated.
2. Principle of data as a product
This principle applies product thinking to analytical data. There are data consumers beyond the domain team, and it is the responsibility of the domain team to cater high quality data that satisfies the needs of those consumers (other domain teams). This can happen by treating the domain data as any other public API and it folds with the principle of domain ownership. Data products have an input port to feed the data to, for instance, an internal data pipeline and an output port that exposes the new data model on the mesh, via an API for instance. But that much isn’t new! What’s really distinguishing about a data product is a third and fourth port, namely the control port and the discovery & observability port.
The control port, which is connected to an intermediary layer called the data product experience plane, is responsible for managing the data product on the mesh. This includes creating, deploying, and removing the data product from the mesh, or pushing updates.
The discovery & observability port on top is connected to a top-layer called the mesh experience plane, from where the mesh and the data products participating in the mesh can be observed and explored. The bottom-tier layer is called the infrastructure utility plane.
3. Principle of self-service data platform
This principle aims at applying platform thinking to data infrastructures, by dedicating a specific team which provides domain agnostic data tools, services, and systems to help domain teams with easily managing their data products on the mesh (see figure above). This includes the seamless integration, deployment & updating of their data products. The self-service data platform is achieved by the platform team through consolidating infrastructure, CI/CD, data product management requirements, tooling, etc. behind layers of abstraction, so that the domain teams can focus on the things that matter: Their designated value streams.
This is also what we do in machine learning by implementing MLOps (machine learning operations). If you are interested in this topic, check out my other blog post here.
4. Principle of federated computational governance
This principle is meant to achieve interoperability and standardisation across the organisation through a set of global policies that enable an effective and efficient collaboration of the data products and their domain teams that participate in the mesh. Policies are divided into global- and local policies. A small set of those policies are global (as domain teams have data ownership and know their own data best!), meant for enabling a smooth data mesh system. Most policies are devised locally within the domain teams, and they are important to guarantee high-quality data products that can be seamlessly consumed by other domain teams. Overall, the aim is to create a data ecosystem that adheres to organisational and industry-scope regulations but strikes a balance with allowing domain teams to work autonomously and carve their own path. Data product owners of the different domains serving data products form the governance guild, collaborating to guarantee a well-functioning and efficient data mesh.
Our analysis
Data mesh is an emerging technology and there are still a lot of unknown variables. An organisation must consider many factors before deciding whether to go down the data mesh path. Maybe there are better solutions out there for a specific organisational persona, although there are several hints suggesting that data mesh is agnostic enough to be implemented in all sorts of different tech organisations. This does not only include enterprise-level organisations, but businesses of smaller size as well that plan to scale efficiently and aim at building robust value streams from the ground up. Data mesh has been implemented by its creator, Zhamak Dehghani, across some enterprise projects, but there is no actual openly accessible data yet that describes any implementation details. Therefore any organisation that wants to jump on board as an innovator or early adopter, and wants to test this approach, should make sure that it experiments around with data mesh concepts within projects that are capable of bearing this kind of risk, such as research projects for instance.
We are amidst a Cambrian explosion of toolchains and developer-centric services centred around old concepts such as data warehouses, data lakes and data lake-houses, hyper-specialised on all sorts of different tasks.
What we need now more than ever is simplicity, and data mesh is a great candidate to reintroduce this much-needed attribute into our developer lives. Maybe at some point data mesh will alleviate the increasing pitfall of cognitive overload across development teams and revolutionise the way in which value is delivered, with scalability and consumer satisfaction in mind.
Feel free to share your thoughts or start a discussion in the comments section, and don’t forget to support this article. Stay tuned for our upcoming blog posts!