7 Best Practices in Data Architecture

7 Best Practices in Data Architecture

Imagine for a moment that you are a data architect working for the biggest supermarket in your country. Or the biggest hospital, or bank, or airport. You are responsible for all data in the company, from collection, storage, integration, transformation, distribution and consumption. You are responsible for building and managing all the systems, tools and infrastructure required for doing all that, including the data models, policies, rules and standards. What are you going to do to achive all that?

Well, here are some of the best practices that you will need to follow:

  1. Data platform

  2. Data model

  3. Aligned to business

  4. Available

  5. Secure

  6. Governance

  7. Application landscape

Before we dive into those 7 best practices, let's take a moment to define what data architecture is.

Data architecture can be a few different things:

  • It can be the data models

  • It can be the systems, tools and infrastructure

  • It can be the framework of data standards, policies and rules

So data architecture is not just about the data models. It is everything from data collection, storage, integration, transformation, distribution and consumption of data. It includes the systems, the tools, the infrastructure, the data standards, policies and rules. All of that is data architecture. And it is your responsibility as a data architect to create and manage all of those things.

1. Data Platform

To achieve all that, the first thing that you need to do is to build a data platform. The data platform collects and integrates data from different sources. The data platform transforms and stores the data. The data platform distributes the data to different applications and reporting tools where the data is used. The data platform does almost all the things that you need to achieve in data architecture. It includes the systems, the tooling, and the infrastructure. It includes the data standards, policies and rules. It includes all the things in data architecture. But data platform is not the entire data architecture. There are many applications outside the data platform.

The data platform needs to integrate data. It needs to transform data. And it needs to distribute data. Those are the 3 main functions of a data platform.

The data platform needs to follow several principles. It needs to be scalable. It needs to be secure. It needs to be automated. And it needs to be flexible. And the best way to achieve all that is be using cloud infrastructure (PAAS and SAAS). Because you don't need to install anything. And it is scalable, secure and flexible. You can create a new data store in an instant. Or a compute. Or a file store. And you pay only for what you use, one month after you use it. There is no capital outlay up-front.

The data platform can be a data warehouse. It can be a data lake. It can be a data lakehouse. It can be a data mart. It can be a data mesh. But you must have a data platform. And it must be in the cloud. Otherwise you'll spend ages installing servers and configuring them. And you'll spend a fortune for that. And spend a lot of money to make it secure. Using a cloud data platform you don't install anything. The difference in timeline could be months, years even. No, you don't need any convincing. As a data architect you know it has to be cloud platform.

2. Data Models

The second most important thing is the data models. A data model is a visual representation of data elements and their relationships. It helps organising the data. It helps structuring the data to support the business process. Data models help the communication between the business and IT by representing the business requirements. Data models facilitate the collaboration between the business and IT on how the data will be stored, accessed and shared. Data models help data integrity. Data models help standardise the data elements.

Data models document the business. Data models represent the reality. The people, the places, the things, the business processes. And the business events. Data models play important role in the design and development of IT applications. If the data model is wrong, that the applications won't be working properly.

The business elements are documented in the data models. Create a data dictionary to define them. And a data catalog. Define the data lineage. Create the data standards. Again you don't need any convincing. As a data architect you know you must have good data models.

3. Align to business

The data architecture must be aligned to the business strategy. You must understand the business objectives. You must understand the key goals and priorities that drive the company forward. Whether you are a supermarket, hotel, hospital, airport or a bank, you have business objectives. Things that the business want to achieve in the next 5 years. So you design a flexible and scalable data architecture which supports those business objectives and priorities.

Too often, IT and data become the barrier to achieve business goals. The business wants to offer certain things to the customers but the IT applications are not able to support it. Customer loyalty programme for example. Or client reporting. Or a new regulation coming next year. Or things that differentiate you from your competitors. Let's say that you are an asset manager, specialising in credit derivative. Or in money market. Or in ESG investing. But you don't have IT and data to support that specialisation. Then it becomes empty words. Or you are an insurance company specialising in life cover. Or marine. Or pets. But your IT and data don't support that specialisation. Then it becomes empty words.

You said to the market and customers that you are the best in marine insurance. But your marine data is far below your competitors. You said you are specialising in money market, but your money market data is far below your competitor. You have to invest in marine data and systems. You have to invest in money market data. Data is the new oil. Your business relies on data. Without the data you can't compete in the market. Again, you don't need any convincing in this. As a data architect, you know you must have the data to support the business goals and priorities.

4. Available

You know what is worse than not having the data required to support the business goal? You have the data but it's not available. That is worse. The business needs certain data. You have that data in the building. But it is not available to the department that needs it. Perhaps it's not in the right format. Or the right shape. Or the right mapping. Or the right level. Or the right timing.

Asset management example: the client reporting team needs the breakdown of the client investment portfolio by asset class, for the last 12 months, compared to the benchmark. Supermarket example: your purchasing team needs the suppliers performance data for the last 6 months, in terms of timeliness, accuracy and product coverage. You have the data. It's in application X and Y. Database A and B, and so on. But that data is not available to the team that needs it.

Guess what do most companies do? A data analyst in IT queried that data and sent it to that department. That my friend is the beginning of something bad. Slowly but surely, various different data requests keep coming and you end up paying a few data analysts to satisfy that demand. It becomes like spagetti, with you in the middle of it. No data governance, no data platform. Just constant fire fighting. Because there are due dates. There are deadlines. Customers are waiting. Suppliers are waiting. IT becomes a data factory. But everything is fully manual. If this happens in the biggest supermarket in your country, it would be a big mess. Or the biggest hospital. Or hotel. Or bank. Or airport. If your competitor knows that your data state is in such as mess, imagine what happens to your stock prices in the stock market.

So you need to make the data available, to the departments or teams that needs it. In the right format. The right shape. The right mapping. The right values. The right level. The right time period. At the right data quality. And it's not simple. But yet that is the name of the game. You are in the data business. And you are the data architect. In order to make the data avilable to the teams that need it, you need to have the right data platform. See point 1 above. You need to have the right data models. See point 2 above. You need to align your data architecture with the business goals. See point 3 above. And only then you can make the data available to the right team, IN THE RIGHT WAY.

Of course you can take the manual approach as as described above. The data analyst manually grabbing the data and formatting the data and put it on Excel and email it to the departments that needs the data. Of course you can. But that my friend is the beginning of a disaster. The end of your career. You've got to do it the right way. Make the data available, but do it the right way. Use a data platform. Use data sharing. Use reporting tools. Use data architecture. Use data models. You are the Data Architect. Again you don't need any convincing in this. You know that you need to do it the right way.

5. Secure

It only takes one incident. The attackers hacked into your systems and stole your customer data. 250,000 of your customers had their name, address and credit card number stolen. You are the biggest supermarket in the country. Or the biggest hospital, hotel, bank or airport. And you had your customer data stolen. It's all in the news. TV, radio, newspaper, online media, social media. You were the biggest in your industry. Not any more. Customers left you and moved to your competitors. It only takes one data incident and you're finished. Your company is finished. You are the data architect, for God's sake. Of course you're fired. And worse, no one wants to hire you. You becomes infamous. The person who did not secure the company data.

Internally you know it's not your vault. You are not the Security Architect. You are not the InfoSec. You are not the Data Protection Officer. You are the Data Architect. That my friend, is the name of the game. As the data architect for this company, you are responsible, fully responsible, for the security of the data. It is your duty to make the data secure. All the data.

So you have to classify them. You've got to know which one is sensitive data, and which one is not. Which one is PII. Your data platform collects all kinds of data from many different sources. It has lots of data, and therefore is the primary target of an attack. Who wants to hack into a branch if all the data from all branches are in the headquarters? Who wants to hack into one application, if all the data from all applications are stored in the data platform? You have to classify your data in the data platform. And you have to super protect the data platform. Make sure that only those who needs to get in, can get in.

And that means that you need classify your users. You need to give them access only to the data they need. Only for the period they need. Only the people who need it. And how can you manage all that? That's where RBAC comes in. Role Based Access Control. You group the users into roles. Like Business Analysts. Or Client Reporting team. Or purchasing. And you give the role access to the data. You need to give them the right privilege. If they need read access, don't give them write access. This is obvious. But if you have to manage 500 users, you will get an incident. And like I said in the beginning. It only takes one data incident, you are finished. So you have to have a good system to manage data access. Use Active Directory. Use RBAC. Use secure views. Use row-level security. Column level security.

And use data encryption. You have to encrypt your database. Your data files. Your data lake. Your data marts. You need to encrypt the data when they are being transported. Especially over the internet. You need to use https. You need to use good authentication. Use Single Sign On, not user name and password. Use Multi Factor Authentication. I know you know all this, which is fine if you only have 3 apps. But to do it on a hundred applications, you need to a good system to manage all that. And good procedure. Remember it only takes one data incident.

Again you don't need any convincing in this. You know you have to secure the data at all cost. Work with InfoSec. Work with Data Protection Officer. Work with your security architect. Work with the data governance manager. You can't do this alone. It will take all of you working together to secure the data. To control the data access. And to manage the process. And that my friend, bring us to the next point, which is the governance.

6. Governance

Data governance is not your responsibility. You are the data architect. You DO NOT do data management. Or data governance. Or data quality. Your job is to design the data model. The data platform. Data integrity. Data ingestion. Reporting. Analytics. And so on. There are a lot of things you are responsible for. And data governance IS NOT one of them.

Is that right? Think again. You can have the best data platform in the world. Everything is fully automated. Everything is in the cloud. You have good data models. Best data ingestions. Good data transformation. Every data that each department needs is available to them. You have good control on data access. You have everything that we discussed above. But the data has poor quality. There is no validation when the data is being entered into the web applications. Into your websites. No governance, no policies, no cleansing, no validation. Poor data quality. Do you think you can support the company goal?

This is just like data security. You're not directly responsible for it. But you are responsible for it. If you don't manage the governance, you can't deliver. So the best practice in data architecture is to facilitate good data governance. Provide them with what they need. They need a data quality tool. They need data quality rules built. And that is fully within your realm. They need an MDM to manage master data. And again it's fully within your realm as a data architect. They need to establish data governance policies and rules. And again you, as a data architect, can and must help them. The key to your success is to get data governance working well. Validation, DQ remediation, data cleansing, DQ rules, policies. Work with data governance.

7. Application landscape

As a data architect, you have to have the map of your land. It is called application landscape. It is a diagram of all your business applications on one page. As a data architect, you need to understand this application landscape diagram. Which applications feed which applications. Which applications use what data from which applications. You have to know who are responsible for each application, how the application works, what the inputs are, what the outputs are, and how the data is processed. Every, single, application.

Enterprise architecture consists of 5 layers:

  • Business architecture

  • Application architecture

  • Data architecture (aka information architecture)

  • Technology architecture

  • Security architecture

You can pretend that application architecture is not your domain. Not your responsibility. But the truth is, as a data architect it is your business to know what data is flowing around in your company. And the best way to understand that is this application landscape diagram. It is the best map to see what data is flowing where.

You are not the Enterprise Architect. You are not responsible for the business archtiecture, the technology architecture or the security architecture. But it is in your interest to understand them. You can't work in silos. Just doing data architecture. You need to understand the other 4 layers, in order to do your job well.

And that my friend, is data architecture.

And I wish you all the best, hopefully you can do your job well as a data architect for your company. Particularly in 2025. May the new year brings good opportunity to you. Especially in data architecture.

Hopefully all the above helps. I wrote what's in my mind without thinking it to much so I must have made a lot mistakes. And forget a lot of things. So please correct me if I'm wrong. I would appreciate all your comments.

List of my articles: https://lnkd.in/eRTNN6GP

Sudhir Sriram

Data Migration | Transformation | Banking, Asset & Wealth Management | Data Office | Data Management | Data Governance | Data Quality | Regulatory Compliance | SQL | Alteryx | Snowflake

7mo

Classification, Glossary and Curation of Data Assets are critical underlying capabilites to ensure these best practices actually work in tandem. One of the biggest challenges is where Business talk big on Data but walk very little in terms of investing key resources and time to develop these initial building blocks. This needs organisation wide push and can't just be a Data architect/Data team responsibility.

Very sound insight Vincent

Esraa Eissa

Senior BI Developer @ Link Development

7mo

Thanks for sharing. Can you recommend books in this topic ?

Christopher King

Healthcare | Data Product Management | Enterprise Applications

7mo

This is GOLD. Excellent summary.

Jörgen Larsson

Change Management in the Era of AI and Information Management @ Jaxbird AB | Leadership, Operations Development

7mo

Or maybe it is like this: Data architecture = data structure Data infrastructure = technical architecture Or maybe it is something else…for some. We should be careful about how we speak the language (what terms we are using), or at least define them for how they are used within this current context.

To view or add a comment, sign in

Others also viewed

Explore topics