The map is NOT the territory.
I was bemoaning that ‘Nobody gets it!’ when discussing the recent feedback from my last article with my business partner. Why has there been such resistance to what I believe is a simple and more pragmatic approach to data warehousing?
My business partner, who happens to be a reservist in the Australian Defence Force (ADF), gets it. When we first started working together, he told me that as a data engineer, he hated data modellers - is that a general sentiment, please let me know? But over time, he began to appreciate the minimal modelling paradigm of Hook and is now a firm supporter; that’s why we’re business partners. During this particular conversation, he used a phrase which resonated with me. He said that in the ADF, they often say, “The map is NOT the territory”.
So I googled it, and discovered that “the map is not the territory” was a remark made by the Polish-American scientist/philosopher Alfred Korzybski. The meaning is that a map is only an abstract representation of reality. We need to unpack that a bit more.
Maps are abstractions that describe the territory using a set of predefined concepts, such as roads, contours, buildings, train lines, rivers, and lakes. If we understand the fundamental concepts of a map, then we know how to read it and figure out how to navigate the territory.
But the map is not the territory; it doesn’t tell us what is happening at any given time. Are there road closures or floods? What is the weather like? Are the schools open? It is impossible to visualise the landscape by looking at a map without going out and seeing for yourself.
But are maps useful..?
It’s a rhetorical question; the answer is obvious. Of course maps are useful, which is why we have them. They enable us to focus on elements that help us figure out how to get from A to B. Drop a stranger into unknown territory and ask them to find their way home; they would be lost. Given them a map, they would have a fighting chance.
And so it is with a Hook data warehouse. Hook provides the map to help you navigate the data assets you have. We don’t need to model/shape the data into a logical structure, because that isn’t the structure of the data. Hook’s approach to integration allows you to navigate these assets regardless.
Is this map enough? In most cases, yes; armed with the raw source data organised around business concepts using frames and hooks, I would be confident that we can meet 80% of analytical use cases. Unfortunately, I can’t back that number up because you can’t measure what you don’t need to build!!!
I want to turn this argument around to those telling me that it is imperative to reshape the data to match some predefined view of what the business thinks it should be. Let’s apply that thinking to our map. If we build our map according to what we want to see in the landscape, we’d have to create/alter the landscape accordingly. That sounds like a lot of effort and time. It sounds expensive.
If you absolutely do want that kind of modelling, it doesn’t belong in the data warehouse. It should either be “shifted left” into the design of the operational source systems themselves, or “shifted right” into consumable assets for specific use cases. Dare I say it, we need to “shift out” from the data warehouse.
There is no substitute for shifting the problem left, but, as is often the case, especially in the data warehouse world, we have no control or influence over the design of operational source systems. We cannot ensure they are designed properly, and we must eat whatever is served to us. However, by following the Hook approach, we can at least minimise the modelling we need to perform on the right.
Principal Data Analyst at Wellington City Council
2moFrom my initial reading it's more like a bridge (Kimball) to simplify joins. Vault allowed the parallel loading of hash keys that you could automate. Data world is changing so fast with MCP being deployed by all major vendors we are now in the mad rush to add semantic layers which describes relations so we don't need additional layers.
Data Analytics Architect at Self-employed
3moRead Hook on substack and liked it. How do we continue on from the data acquisition to the blueprinting of end user data products? My main goal is to design a warehouse/lake house that can be queried relatively easily compared to stitching queries across competing and overlapping sources of data. In the past we could extract the various sources into a common enterprise relational database (roughly 1 main table per thing & then all of its complicated relationships to other things). Alternatively we could extract the various sources into conformed enterprise relational dimensional models. The classic Kimball or Inmon approaches, with the ultimate gift being a Kimball mart created from an Inmon warehouse! Vault is easier to load competing overlapping sources into a relational structure but the consumer now has the challenge of stitching a query across various sources. Previously ETL had to do this, by conforming into a single set of Inmon or Kimball tables. The data lake house makes it easier to land data into a centrally administered cloud repository, allowing for quick ingestion and exploratory analysis, but customers also need need data products that conform to a data model blueprint (i.e, no dups, pk's, fk's, etc).
Data Leader | Enterprise Data Architect | Snowflake & Cloud Data Platform | Strategic Data Modeling & Integration | Remote-First Leadership
3moThe map metaphor is perhaps incorrect. A map is often used to describe / illustrate what is already there, to various levels of accuracy. If can be radically incorrect from what the actual geography is, based on many reasons. This is not what a data model does. A data model is architected, based on analysis of how the business currently understands its data and then provides a mechanism to change how the business needs to understand and manage its data. This is done by logical and physical data models but conceptual can also play into thism To reflect back to the map metaphor, a map never changes the underlying geology / topography. It reflects a point in time, from a singular perspective, and will not drive change into anything, except perhaps another version of the map.
Information, Data, Knowledge Expert
3moIt’s the difference between a prescriptive model and a descriptive model. A map is NOT a zoning plan.
Analytics Consultant
3moThat’s a good metaphor! I’ll make sure to remember it when talking about Hook with others ☺️