Effective DataOps means better Data Engineering and accelerated Data Science
Last week I was in a business trip to Australia and Singapore, there were several hour long intensive data analytics workshop for mostly the CDO’s organisation of leading multinational banks, Construction and Engineering companies, Analysts and interesting analytics startup companies. During the conversation, almost all of the CDO’s said that they have already compiled a list of analytics use cases they want to develop and implement over the next 12 months, some have taken the route of establishing an integrated analytics platform, few have chosen the route of building data lakes and then solving specific business use cases, but when I asked them at a high level how would they categorise the use cases?” The response was very interesting, a majority of the use cases fell into the bucket of “data engineering”. Amazing!
While we all talk about Data Science, Machine Learning, Predictive Analytics and creating possibilities with data analytics, irrespective of the maturity of the organisation there exists a long list of things to be done to solve today’s business problems and most of these problems are related to “data engineering”.
Data Experience and Data Accessibility
Data ops has to become everyone’s business for any organisation that wants to become data-driven. Data ops is not all about technical skills – it’s also about democratising data. Knowledge is powerful, and so is the underlying data that facilitates this knowledge. Every time I talk to the business users and CDO’s, I hear them talking about the biggest hurdle as – within the organisation there is a very visible culture of data hoarding, meaning people who have the data hold onto it. Good data ops requires the ability to negotiate organisational boundaries and forge relationships to ensure that data is shared across functions and silos.
Collaborative Data Engineering
When I talk to the Data Scientists, I get to hear a tinge of frustration from them – they spend 50% to 80% of their time in data engineering activities. Why so? Data engineering consists of several sets of tasks - data acquisition, data quality, metadata management, data enrichment, building APIs for data exchange, , data privacy and security.
Perhaps, today much of the data engineering activities are done to solve the problem of now (baring the repeatable and scheduling data jobs to feed a datawarehouse), hence most often it is seen that there is a re-inventing the wheel kind of approach when get to solve another problem. Data has multiple uses, both now and later, hence things like re-usability, library of business rules, library of code snippets to address micro-level tasks, etc are extremely important to bring in efficiency and productivity. This is one of the real values of establishing DataOps for highly collaborative data engineering.
Skills scarcity and advent of citizen data scientists
One of the CDO admitted that they are in a precarious state of drinking from a fire-hose, meaning the demand for data and insight from business users in funnelling down at such a rapid pace and volume that they are simply struggling to keep up with demand. The problem is beyond the recruitment of data scientists, it is also about managing and organising the data to accelerate the data to insights journey. Analytics consumers are demanding self-service analytics solutions to get insights and thus the move to develop more and more citizen data scientists – that is, business people who are curious, adventurous and determined to research, prototype and experiment with analytics.
Accelerated Data Science
In the traditional BI world correctness is the key, but when we’re talking about data science we are talking about experimentation thus the aim is to be reasonably accurate and consistent. That’s a really hard thing especially when your objective is to explore and find possibilities with data and analytics - you can’t tell what would have happened if you hadn’t, unless you experiment.
This means your data prep activities, predictive modelling and visualisations have to be update-able and refinements easy to deploy, and at the same time remaining robust and auditable. An effective DataOps practice becomes vital for regular retraining, testing, validation and deployment of analytics.
Summary
DataOps is a new discipline but if done rightly it can deliver values in order of magnitudes. DataOps also means cross-skilling – which is the crying need of the hour. While in Australia I had a very interesting conversation with one of the startup company which has found clever ways to monetise data. The interesting part was not about building a platform, acquiring open data and monetising it; rather the interesting part was – how they have taken DataOps to the next level with tools, methodologies, automation and also cross skilling their team – everybody plays an ownership role on whichever dataset they are working on and continuously enriching the data at every step.
Managing Director at Accenture
8yNice!! I am quite sure you were expecting CDOs and CIOs to typically have a much higher proportion of data engineering use cases in their list of analytics objectives :) True data analytics use cases would originate from business / function leads - CFOs, CMOs, Supply Chain, Pricing, etc.
Driving GenAI in Healthcare
8yIt is always interesting to read your posts soumendra! Thanks for sharing the insights from CDOs. It has always been a balancing act that data engineering has to establish between data proliferation and insight generation demands. I think drinking from the firehose is the new norm.
great post!
Founder & Director of BCure | Bringing Affordable Healthcare to Tier-4 and Tier-5 Cities
8yvery interesting
Nice one Soumendra. It is interesting to note that most of the use cases are around data engineering, but not surprising. Its great to know that they are accepting it