Tom Baeyens’ Post

View profile for Tom Baeyens

Co-founder and CTO at Soda

Petition to stop using vague terms in data engineering. I see many teams label roles in pipelines as “owners,” “stakeholders,” or “users.” But these words rarely explain who actually does what. Who fixes a failing pipeline? Who gets alerts when data is delayed or corrupted? Who approves schema changes? Who maintains transformations or joins? If your policies or documentation can’t answer these questions clearly, they won’t work in practice. That’s why I advocate using precise terms like data producers and data consumers. These describe actual behavior, not abstract roles. A data producer is any system, team, or individual responsible for creating, generating, or modifying data. This includes manual data entry, ETL pipelines, API ingestion, or applications writing to databases. A data consumer is any person, process, or tool that uses data for downstream purposes. This includes analysts building dashboards, ML models using features, finance teams preparing reports, or business systems making decisions based on data. Clear language leads to clear responsibility, faster troubleshooting, and more reliable pipelines. Which vague data engineering term should we retire next?

To view or add a comment, sign in

Explore content categories