AI Engineering and Data Engineering

AI Engineering and Data Engineering

Many companies have data engineering team. Some even have data architecture team. But, no AI engineering team. Nor AI architecture team. 

Why is this important? Because AI Engineering is very different to data engineering. You need to do model serving, i.e. the container, the container images, the model registry, and the inference. Before that you need to do model training, i.e. feature selection, algorithm selection, prepari g training data, provide training compute, and validating the model, calculating and storing model performance and accuracy. 

And of course before you can do all that, you need to develop the models. And that includes collecting the requirements, collecting data, analysing data, creating different features and storing them, trying out different algorithms, different hyperparameters, etc.

And that is only for machine learning, or what started to be known as the old AI. Gen AI requires completely different process. And different architecture too. You need to think about MCP (model context protocol), about multiple agents. And multi modal too, from images to voice to documents. The infrastructure is different. MCP server, MCP client, knowledge graph database, vector database, OpenAI account with API key, python environment with Langchain, Streamlit for building apps.

And, after all that, you also need to consider the security such as authentication and encryption, plus logging and monitoring too (what we now call Observability). And you also need to monitor model drift and performance for all the models in production.

It is a different engineering practice all together. Your data engineering practice might be about data lake, Spark, data warehousing, analytics and BI tool, etc. Perhaps like this: https://aws.amazon.com/solutions/guidance/building-an-advertising-data-lake-for-publishers-on-aws/

Whereas your AI engineering is probably something like this:

https://aws.amazon.com/blogs/machine-learning/governing-the-ml-lifecycle-at-scale-part-4-scaling-mlops-with-security-and-governance-controls/

Or like this: link.

Of course Azure, AWS and Google are very different environments, whether it's data engineering or AI engineering. But the point is, data engineering is quite a different practice to AI engineering.

Can both be done by one department. I think they should. Instead of "Data Engineering" it should be "Data and AI Engineering". Why? Because the skills required are similar. You need data skills, CI/CD pipeline (DevOps), development environment, product backlog, etc. Believe me, if you are looking for a home for your AI engineering practice, the most suitable home is the Data Engineering department.

And it is a good mix too. Data scientists and data engineers work together closely. They need each other. As we all know, no AI can live without data. They spark each other, inspire each other, help each other. After all, they are both engineers.

I think you should combine Data Engineering and AI Engineering. And call your team Data and AI Engineering.

Similarly, Data Architecture and AI Architecture should be combined too. And call it Data and AI Architecture.

What do you think? Do you agree? Or disagree? Let me know.

My Linkedin articles: https://www.linkedin.com/pulse/list-all-my-articles-vincent-rainardi-eohge/

My blog: https://www.datawarehouse.org.uk/

#DataEngineering #AIEngineering #DataArchitecture #AIArchitecture #DataWarehouse #Data

Roshana Rose Roy

Data & AI Architect | Data Engineering | AWS AI Certified | Ex-Trianz, Quantiphi, UST, Wipro | Designing AWS/Azure Cloud-Native Data Architectures, Building Intelligent Data Lakehouses ETL Modernization & GenAI Solutions

3d

I agree 💯

Salman Sayyad

Data & Analytics Engineer (Deputy Manager | AWS| GCP Transforming Raw Data into Actionable Business Insights.

3w

Agreed! Combining Data Engineering and AI Engineering/Architecture makes total sense. It creates a more holistic view of the data lifecycle for AI, boosts efficiency by reducing redundancy, and improves collaboration. This integration ensures better data quality, governance, and smoother model deployment. It's crucial for modern, data-driven organizations.

Arfan R.

Data & AI Engineer | Quantexa Data Engineer

3w

100%, data engineering flows mimick very closely AI development best practices !

Rachmat Yudi Subagiyo

Young Expert Goods Services Procurement Manager

3w

Thanks for sharing, Vincent

Rémy Fannader

Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao

3w

Intelligence is the ability to make differences ...

To view or add a comment, sign in

Others also viewed

Explore topics