Data Governance for Everyone
Data governance is amongst the most practically needed and yet theoretically (and contentiously) debated topics in our current time. While there is wide recognition of good data being essential for the age of AI, the practice of shaping and maintaining good data (aka data governance) for its use cases remains widely debated. Every facet of the practice, from what the practice should be called, to who should be accountable for it, to who should be responsible for execution, to the value of the practice and more, are widely debated. Data leaders and practitioners, while understanding the benefits of data governance, continue to face organizational resistance in scaling the practice, often impacting wider data and AI transformation investments. I have lived through this journey in multiple contexts and know several peer data practitioners who have as well.
A top reflection for me from my experiences in navigating data governance journeys internally and externally with our customers, is that using domain terminology familiar to data governance practitioners to explain the need for the practice, is unlikely to have the envisioned impact in creating the shared clarity and energy for purpose when bridging with diverse, cross-functional stakeholders and partners. Experimenting with naming alternatives for the practice to overcome initial resistance to the term 'governance', while may catch momentary attention, is unlikely to stick with resonance. For data governance to resonate with a wider community of cross-functional leaders and practitioners, it must be described in everyday language that is accessible to all. Explaining the essence of data governance without speaking to it as "data governance" and by using common language, is a quality challenge for data governance practitioners. There is no one single way to do this. I am sharing here in this writing a narrative that has worked well in conversations and deeper engagements that I have navigated on this topic.
The visual below captures the framing for a narrative that I have found to be effective in navigating the topics of "What is data governance?", "Why do I need data governance?", "What is the best ownership and operating model for data governance?", "How do I communicate the value of data governance to my peers and stakeholders who are not familiar with (or) resistant to the practice?", and "Is data governance still pertinent in the current and future age of AI?".
The following sections unpack the visual framing. A goal in doing is to not use the term "data governance" (or) any vocabulary that is only familiar and relatable to data governance practitioners.
Anchoring on creating value from data
An emphasis on creating value from data is a unifying anchor.
Conversations on data can be effectively navigated by anchoring on use cases to create value from data. Most cross-functional leaders are generally aligned on the notion of their organization's data having plenty unearthed potential to create new value. There is also a general alignment on impactful value from AI being anchored on the data used to train AI. These alignments are strong anchors to navigate the creation of shared clarity and energy for the essence of the practice of preparing data for impactful use cases, by speaking to it in widely appreciable contexts.
Emphasizing incremental value creation
Durable value is created from incremental learning.
A pitfall to avoid when defining and scoping opportunities to create value from data, is to not (un)intentionally boil the ocean. "Let's get our entire data estate in shape prior to navigating our use cases" is a common misstep. It is a pit that I have fallen in as well when navigating my first data governance journey a decade ago. Most organizations have data estates that have evolved organically over several years. Trying to shape such a data estate in one go, while virtuous, will spike costs and delay the realization of applied value outcomes for the organization, causing angst and frustration for the business stakeholders. An iterative navigation steered by incremental use cases to create value from data, will pave path to durable impact by progressing value outcome realization in tandem with maturing data practices. The iterative learnings from such approach will enable timely decision making in persevering and pivoting courses to maximize value and minimize costs.
Preparing data for use cases
Data is seldom known and ready for its use case(s).
The success and impact of a use case to create value from data are dependent on the understanding and the readiness of the data for the use case.
"Do you have a good understanding of the data needs for your use case, whether the data exists, and where the data can be found?", "Does your understanding include related data that could benefit your use case?", "Are the datasets for your use case connected (or) connectable for their inter-related contexts?", "Is the data trustworthy for your use case?", "Do you know whether you can use this data for your use case?".
Each of these considerations are fundamentals that need to be addressed to realize a use case. Yet, they are considerations that often have no clear answers at the outset of a use case, when the starting point is an organically evolved data estate without an intentional data practice to shape.
Each use case is an opportunity to progress data readiness
Tackling the shaping of an organically evolved data estate all at once is a futile undertaking. By focusing and iterating on use cases to create value from data, organizations and teams can incrementally address and mature their data readiness, while delivering tangible value, with the velocity of value creation accelerating with progressive data maturity and readiness.
Data preparation for use cases is due diligence practice
Data preparation is shaping data readiness for a use case. Preparing data for durable value from a use case requires due diligence practice. Deploying use cases without navigating the practice of data preparation will get you a few hours (perhaps a few days) of fame in the current age of AI, prior to reality striking in the form of a hard-learning incident and grounding appreciation for the practice.
The practice of data preparation when explained in a relatable manner can make it simpler to understand and appreciate. The practice of preparing data for use cases entails the following actions:
Collecting data for a use case
Identifying the data needs for a use case, looking to see if use case ready data is available, learning where the data exists, understanding the permitted use cases for the data, and gaining compliant access to the data. When the data exists but is not use case ready or exists partially / does not exist and needs to be acquired or generated, further due diligence data preparation actions are needed. These actions are well worth reviewing for each use case even when use case ready data seems to exist, with each use case being an opportunity to bridge gaps and grow data maturity.
Classifying data
Well classified data is easier to discover, understand, and use with compliance. Data classification is commonly done by using tags to annotate datasets with their structural, semantic, and compliance contexts. Semantic and compliance classifications are essential to scale responsible data discovery and use across an organization. Semantic classifiers can be applied to classify data by domains, functions, business vocabulary, permitted use cases, and user personas. Compliance classifiers can be applied to classify data by compliance domains and regulations to scale compliant discovery and use.
Cleaning data
Fit for purpose quality of data is a make (or) break factor for a use case. Understanding the data quality contracts in effect and augmenting (or) creating when needed, is essential practice to prepare data for a use case. Clean data is a journey in and by itself. Navigating the journey by use case is a practical and progressive pathway to relate data quality management investments to applied value outcomes.
Connecting data
Data requirement(s) for use cases to create value from data are seldom for stand-alone dataset(s). New value and actionable insights are created by and from connecting related data. Related data in organically evolved data estates are seldom easy to connect as simply as they should be. Variances in the values of key attributes in data managed by diverse and distributed systems, complicate the management of data for business entities, reference data used in multiple contexts, and connecting business entities with transactional and engagement data. Applying deterministic rules and probabilistic methods including the advances in AI, to connect related data and incrementally address opportunities to standardize data creation in the source systems, are practices that are best navigated by use case, accruing progressively to greater reuse and scaling.
Complying
Compliance is the currency of trust. Compliance with user preferences, regulations, and codes of ethical practice are entry stakes to create and scale durable value from data. Compliance requirements vary by data and use case. The growing spectrum of regulations in the current and future age of AI are rightly instating the needed checks and balances for responsible use of data and AI. Compliance requirements span trust domains such as privacy, digital safety, cybersecurity, responsible AI practices, fair trade, competition law, lawful access of data by government agencies, and more. Understanding, addressing, and maintaining compliance with the requirements pertinent to the use cases served by data, is essential practice. Policy definition, management, and operational controls are elements of the practice to instate as data is prepared for its use cases.
Cataloging
Curating and publishing information on data prepared and ready for use to a data catalog, in form that is relatable to organization-wide functions and users, scales data discovery, understanding, and value creation. A common misstep in the practice of data cataloging is standing up a technical data inventory and calling it a data catalog. A technical data inventory is not a data catalog, just as a book publisher's warehouse is not a well-curated book shop (or) library. The art of data curation to describe data in a manner that is relatable to its intended use cases and users, is the foundation for data cataloging done well. Each of the data preparation actions in the sections above, contribute to shaping a data catalog that effectively serves use cases and users. Data that has navigated such preparation is akin to a product-grade offering in consumer vocabulary. "Data Product" and "Data as a Product" are terms gaining attention in context. Here is a related read on data products that you may find useful. Cataloging data (products) prepared for use case(s) as a practice will incrementally evolve a widely relatable data catalog that fosters greater data sharing, reuse, and value creation from data.
Explaining the practice as above and illustrating with relatable use cases (which I will look to share in a follow on to this writing), can help achieve cross-functional alignment on the essence and value of the practice. Evolving the alignment to do and scale in applied practice requires a higher order cultural impetus.
Culture; the nucleus of a data practice
The practice of data is a Shared Accountability for Everyone (SAFE)
Culture is the nucleus of a data practice. The notion of data being a shared accountability for everyone (SAFE) is well understood and practiced in most organizations in the applied use cases of data. The same however is not commonly the case in the practice of preparing and maintaining a data estate to scale use cases, the practice that gets labeled as "data governance" and anchored in a central data office (or) is non-existent and organically navigated as an afterthought. Applying the SAFE culture to this foundational data practice is an internalization and a shift in operating model for cross-functional leaders to champion and for the data leader(s) to guide. The business contexts and cross-functional domain expertise that is required by the facets of this practice, exist across an organization, and are not situated in any one team or leader. Every team in an organization applies data for their use cases. Most teams, if not every team, are accountable for the systems that generate the data for their functions. Teams with the domain expertise in their data are best positioned to prepare and maintain their data to serve use cases from within and wider teams. A SAFE culture fosters such shared ownership and accountability.
Can AI not just do all this (viz data governance) for all of us?
:-)
A smile is the best way to frame a reply to this question, a common and recurring thought experiment, that can unpack vibrant conversations. Peeyush Tiwari has penned a "spot on" framing post on this topic here. While there is legit opportunity to apply AI in scaling the practice, the practice requires context to ground on and scale with AI. The velocity of applied innovation will continue to spiral in the age of AI, with business contexts evolving in fast turn cycles from applied learning. Human practitioners of data complementing AI in seeding and fine-tuning evolving contexts to scale the preparation and maintenance of fast-growing data estates for diverse use cases, is a 1+1 > 3 operating model to foster.
In a writing to follow, I will share illustrations of putting AI to the test on common actions to prepare and maintain a small but realistic data estate for its use cases. The illustrations will help maker clearer the significance of the human practitioner in the loop. Hopefully a nice cliff hanger to wrap this (already lengthy) writing :-)
In closing
As practitioners of data governance, we each have the opportunity to create shared clarity and foster shared energy amongst our cross-functional stakeholders and partners, for the benefits in scaling the practice as a team sport. To do such, we can start speaking to and explaining the practice in language that is relatable to a wider and diverse audience. We also have the opportunity to lead in fostering a SAFE (Shared Accountability for Everyone) culture to activate organization-wide and cross-functional engagement in scaling the practice.
Hope you found this writing useful. All thoughts and feedback are welcome as comments and/or DMs to brainstorm further. It would also be great to hear from practitioners in the community on approaches that have worked for them in widely resonating the practice of "data governance for everyone".
Building Scalable Revenue Engines for Industrial Tech |Industrial IoT | Industry 4.0 | OEM Strategy |Digital Transformation
5moSomething Engaging for you Saurav Suman
Great insightful article on the importance of data governance Karthik. You basically covered all the pillars of data governance - including classification, lineage and cataloging data. Can’t emphasize enough on the battles I faced on classification of legacy data (esp lacking metadata), which is sort of a pre-requisite for tagging and cataloging for master data management. Companies still seem to face a lot of unresolved issues with source or raw level data tagging/classification.
AI and SaaS Software Executive | CEO/GM & GTM Leader | ex-Microsoft, Vmware, SAP, Business Objects, Hyperion | Driving Revenue Growth & Building High-Performance Sales teams in SaaS | CEO Advisor & AI Innovation Leader.
5moPowerful insights, Karthik Ravindran! As LLMs transform our AI landscape, we must recognize that data governance isn't just for compliance teams - it's everyone's responsibility. The AI systems we're building are only as trustworthy as their foundation data. What's striking to me is how LLM-generated insights face a critical trust challenge. Without robust governance across the entire data lifecycle - from source through transformation, to the insights produced - we risk building sophisticated AI systems on quicksand. At Numly™, Inc., we've seen firsthand how a trusted data foundation serves as the ultimate accelerator for AI adoption, not a hindrance. Governance isn't just about data lineage anymore; it must extend to validating the quality, reliability and ethics of AI-generated outputs themselves. The future belongs to organizations that view governance as a strategic capability rather than a necessary evil. Great reminder that in this new AI reality, everyone needs to be a steward of responsible data practices. #DataGovernance #ResponsibleAI #TrustedData
Founder | Expert in AI Ethics & Data Governance | Speaker | Strategy Consultant for Impact-Driven Orgs
5moData governance is one of those rare topics that lives at the intersection of tech, optimization, ethics, and business strategy—and it truly requires a holistic understanding of its context to be applied effectively. Your approach here is one of the most grounded and constructive I’ve seen. In many ways, we don’t even need the formal vocabulary to start applying good practices. In my first role as Head of Data for a small company, I was unknowingly following principles from BCBS 239 and DAMA-DMBOK—driven simply by a desire to build transparent, accurate, and responsible data processes for a crypto platform. I only learned the names of the frameworks later, when I started asking myself: “Is this good enough?”
Data & Analytics Leader | Driving Business Value with Scalable Data, Analytics & AI Solutions
5moExcellent article Karthik Ravindran. Love the value focus. Every time we do that we put customer/stakeholder right front and center and we have our "WHY?" Once we have the WHY answered, the mechanics of HOW face less friction. I love the notion of SAFE culture. When this is embedded into the core operations of any function/department as critical to the outcomes, we have an energy and responsibility that is natural. Working with my clients, more recently in the last year I have seen security teams show more enthusiasm especially in cases where we have data being fed to models both for training and inferencing. Governance traditionally (with exceptions of course) focused on quality of the data more than integrity. There are issues with data poisoning, data exfiltration etc that are being factored into governance.. things that we typically as data practitioners did not think of since we were more focused on traditionally data flowing from business systems into central data stores and then onto the reports. Lastly I have seen lot of success with introducing automation for governance, controls and guardrails for the life cycle of data, models etc. Operating in the cloud certainly has made this a lot easier.