Effective DataOps means better Data Engineering and accelerated Data Science

Soumendra Mohanty

Member at Forbes Technology Council, Techpreneur, Innovator, Author

Published Dec 5, 2016

Last week I was in a business trip to Australia and Singapore, there were several hour long intensive data analytics workshop for mostly the CDO’s organisation of leading multinational banks, Construction and Engineering companies, Analysts and interesting analytics startup companies. During the conversation, almost all of the CDO’s said that they have already compiled a list of analytics use cases they want to develop and implement over the next 12 months, some have taken the route of establishing an integrated analytics platform, few have chosen the route of building data lakes and then solving specific business use cases, but when I asked them at a high level how would they categorise the use cases?” The response was very interesting, a majority of the use cases fell into the bucket of “data engineering”. Amazing!

While we all talk about Data Science, Machine Learning, Predictive Analytics and creating possibilities with data analytics, irrespective of the maturity of the organisation there exists a long list of things to be done to solve today’s business problems and most of these problems are related to “data engineering”.

Data Experience and Data Accessibility

Data ops has to become everyone’s business for any organisation that wants to become data-driven. Data ops is not all about technical skills – it’s also about democratising data. Knowledge is powerful, and so is the underlying data that facilitates this knowledge. Every time I talk to the business users and CDO’s, I hear them talking about the biggest hurdle as – within the organisation there is a very visible culture of data hoarding, meaning people who have the data hold onto it. Good data ops requires the ability to negotiate organisational boundaries and forge relationships to ensure that data is shared across functions and silos.

Collaborative Data Engineering

When I talk to the Data Scientists, I get to hear a tinge of frustration from them – they spend 50% to 80% of their time in data engineering activities. Why so? Data engineering consists of several sets of tasks - data acquisition, data quality, metadata management, data enrichment, building APIs for data exchange, , data privacy and security.

Perhaps, today much of the data engineering activities are done to solve the problem of now (baring the repeatable and scheduling data jobs to feed a datawarehouse), hence most often it is seen that there is a re-inventing the wheel kind of approach when get to solve another problem. Data has multiple uses, both now and later, hence things like re-usability, library of business rules, library of code snippets to address micro-level tasks, etc are extremely important to bring in efficiency and productivity. This is one of the real values of establishing DataOps for highly collaborative data engineering.

Skills scarcity and advent of citizen data scientists

One of the CDO admitted that they are in a precarious state of drinking from a fire-hose, meaning the demand for data and insight from business users in funnelling down at such a rapid pace and volume that they are simply struggling to keep up with demand. The problem is beyond the recruitment of data scientists, it is also about managing and organising the data to accelerate the data to insights journey. Analytics consumers are demanding self-service analytics solutions to get insights and thus the move to develop more and more citizen data scientists – that is, business people who are curious, adventurous and determined to research, prototype and experiment with analytics.

Accelerated Data Science

In the traditional BI world correctness is the key, but when we’re talking about data science we are talking about experimentation thus the aim is to be reasonably accurate and consistent. That’s a really hard thing especially when your objective is to explore and find possibilities with data and analytics - you can’t tell what would have happened if you hadn’t, unless you experiment.

This means your data prep activities, predictive modelling and visualisations have to be update-able and refinements easy to deploy, and at the same time remaining robust and auditable. An effective DataOps practice becomes vital for regular retraining, testing, validation and deployment of analytics.

Summary

DataOps is a new discipline but if done rightly it can deliver values in order of magnitudes. DataOps also means cross-skilling – which is the crying need of the hour. While in Australia I had a very interesting conversation with one of the startup company which has found clever ways to monetise data. The interesting part was not about building a platform, acquiring open data and monetising it; rather the interesting part was – how they have taken DataOps to the next level with tools, methodologies, automation and also cross skilling their team – everybody plays an ownership role on whichever dataset they are working on and continuously enriching the data at every step.

Dhananjay Parab

Managing Director at Accenture

Nice!! I am quite sure you were expecting CDOs and CIOs to typically have a much higher proportion of data engineering use cases in their list of analytics objectives :) True data analytics use cases would originate from business / function leads - CFOs, CMOs, Supply Chain, Pricing, etc.

Ramachandran Venkat

Driving GenAI in Healthcare

It is always interesting to read your posts soumendra! Thanks for sharing the insights from CDOs. It has always been a balancing act that data engineering has to establish between data proliferation and insight generation demands. I think drinking from the firehose is the new norm.

2 Reactions

Dr. Arun Uniyal (Ph.D.)

great post!

1 Reaction

Soumyaranjan Mangaraj

Founder & Director of BCure | Bringing Affordable Healthcare to Tier-4 and Tier-5 Cities

very interesting

Vinayak Pai

Nice one Soumendra. It is interesting to note that most of the use cases are around data engineering, but not surprising. Its great to know that they are accepting it

Effective DataOps means better Data Engineering and accelerated Data Science

Soumendra Mohanty

Member at Forbes Technology Council, Techpreneur, Innovator, Author

More articles by this author

Others also viewed

The Future of Data Engineering with AI?

Understanding Data Science: A Beginner’s Guide

Selected Data Engineering Posts . . . June 2024

Reimagining Enterprise Data Engineering and Management by Leveraging AI Agents

Selected Data Engineering Posts . . . September 2024

Introduction To Data Science: A Comprehensive Guide For Beginners

Emergence of Real-Time Data Processing: Data Engineers' Vital Contribution in an Ever-Changing Terrain

Mastering the Flow: Navigating the Currents of Data Collection and Ingestion in Data Engineering Interviews.

Data to Decisions: Powering AI and ML with Medallion Data Architecture

What Skills Will You Gain from Palantir Foundry Data Science?

Explore topics

Musings of a Chief Analytics Officer: So many ML Models, yet very few in Production!

Mar 12, 2020

Management Model in the age of AI

Jun 25, 2018

Musings of a Chief Analytics Officer: What if Siri were a Data Scientist?

Jan 23, 2018

Musings of a Chief Analytics Officer: All Things AI and The Philosophy of AI

Aug 18, 2017

Musings of a Chief Analytics Officer: From Collections to Connections to IoT Analytics

Jul 6, 2017

Musings of a Chief Analytics Officer: Data Science? What big deal, we just need to adopt the infinite monkey theorem!

Jun 16, 2017

Musings of a Chief Analytics Officer: Discovering Signals from Noisy Illusions!

May 23, 2017

Conversations with a Data Whisperer!

Feb 9, 2017

Linking the Quantified Self to Quantified Us: Where Small Data Meets Big Data

Jan 16, 2017

Obliquity, Epiphany and Small Data - Sweat the Small Stuff!

Jan 10, 2017