Major Technology Trends Unravelled during Big Data London 2023
That's a wrap for Big Data London! One of the niciest things about these events is that one can easily check what the industry is up for and how the future we are building together looks like.
With the involvement of literally thousands of experts, we can strive to reach highly precise conclusions.
I would also like to mention that these trends were quite visible throughout last few years but a great event filled by dozens of meaningful discussions seems like a good point to summarise those.
Disclaimer. This is my personal opinion and it does notnecessary reflect the point of view of my current employer.
So, for this new iteration of evolution of our data systems I see and feel three major trends.
1. Data as a Commodity
2. Interactive Applications
3. Real Time Systems
Let me give you a glimpse of what I mean by each of those.
Data as a Commodity
Business requires data, it is obvious for everyone. Even during pre-information era some of the businesses and disciplines were almost 100% data driven like banks, for instance, or architecture bureaus, etc.
Now enterprises regardless of their size require data even more. They need it faster, more precise and accurate, easier to consume. Clearly this is not the state which is achievable with our current complex and clumsy systems and processes.
Data as a Commodity means that we want to enable our businesses with trusted and easy to use yet secure and compliant self-service data platform. This means that new generation of data serving engines will become highly integrated with data, infrastructure and security tools.
That is already happening with several end-to-end data platforms presented at the conference and Microsoft Fabric leading the pack.
These platforms are complex to create for a common use scenarios, thus, I would expect them to be fully operational and ready to use in the next few years.
During the event I have talked to some prominent consulting companies specialising in preparing data for the future "commodity" state either in Mesh or in some less sophisticated technical implementation. Thus, we are heading there.
But ... there is always a but, no? How will we know that information which is fed into these new shiny platforms is correct and trustworthy? How should we get rid of redundancies and ensure high level of data quality? How can we map some awkward legacy naming of layers over layers of our systems to a single meaning recognisable by the business?
This is where the second set of tools is kicking in. Starting with the new gig of Data Observability and going all the way back to traditional Data Engineering. These tools are becoming more intelligent being able to utilise AI capabilities for patterns recognition, finding correlations, generating and automating pipelines etc. However, there is still a bulk of manual work to be done in order to make it all work. Some of the most intensive tasks like business mapping, redundancy tracking, etc. are meant to be dealt with by highly skilled professionals.
So, Data as a Commodity will be enabled by several set of tools and technologies:
1. Convergent end-to-end Data Platforms.
2. Intelligent Governance and Observability.
3. Intelligent Data Engineering and Processing.
With my Microsoft hat on I would say that Microsoft 1st party products and our partnership with Databricks and Informatica cover most of the ground here.
Interactive Applications
With the rise of Chat GPT we see that more and more people (literally hundreds of millions) are getting used to interactive mode when communicating with machines.
During the AI Panel I made an experiment. I asked from the auditory to raise they hands if they or their friends or relatives are not using Generative AI. Only few dozens of hands of hundreds of participants came into the air.
People are getting more and more used to fast and (relatively) precise answers with some additional transparency and ability to follow up immediately. This is what is clearly missing in people communications with our applications right now. We have to accept that information coming to us from traditional ways of communicating with the business is vague, often inaccurate and requires some (sometimes) significant patience.
Thus, we see the rise of Generative AI not only as a means of automation but also as a new standard of communication with our customers, which may become the de facto standard in the nearest future..
Interestingly enough, just applying Generative AI as a communication method is not enough. In any real business case we need to hook up both data and application APIs so users can access both via single interactive interface. While few months back it looked like a simple task, in fact it is not and it requires some significant investment in application development.
First of all, it will be all but impossible to interact with applications which do not have APIs either synchronous or asynchronous (but close to real-time). Secondly, such workloads should be orchestrated, monitored, linked with data properly, enabled with filtering of content and results as well as with proper error handling. None of this is obvious in case you want to make it yourselves.
Thus, raise of new generation of intelligent application platforms, such as Microsoft Co-Pilot and lots of other 3rd party tools. These tools will allow enterprises to create their own versions of interactive applications with lower cost and also to connect to the Data as a Commodity platforms so the data is immediately available to users.
Real Time Systems
This raise of interactive applications (not only backed by Generative AI) like e-Commerce platforms, etc. adds some additional requirements to the velocity of data processing, analysis and synthesis.
We want our customers to get recommendations instantly, we want to know if they are happy with the product or if they are doing something wrong with the platform in a real-time. The same applies even to most conservative areas as a regulatory reporting, etc.
It is worth mentioning that real-time processing is very much different from the batch processing we are used to. In this context it doesn't really matter if it is minutes, seconds or milliseconds. The main difference between batch and real time is the amount and impact of potential changes in incoming data. Unlike batch system (where changes are locked and implemented via reconciliation process of some sort) in the real time systems we need to process data as it comes with some significant probability that same data will change in the next instant. Thus it requires quite a different approach to handling writes and updates as we have to still be sure that data is accurate and correct.
Real-time processing is not new on the market and is applied on various scales for the last decade. However, at this moment real-time turns into main source of data which means that most of our processes, applications and tools have to adapt to the real-time processing patterns. Thus a number of companies providing tools to link old-fashioned batch world with the emerging real-time paradigm. CDC tools, streaming and real-time analytics platforms, real-time processing tools as well as systems ensuring high consistency all were present and some new (at least for me) names appeared.
In Microsoft we are investing in real-time data capture, processing and analytics with some of the cutting-edge technologies such as Azure Data Explorer, various forms of Spark Structured Streaming and Azure Stream Analytics but also into streaming and real-time processing with Azure Event Hubs, Azure Service Bus and Azure IoT Hub.
Elephant in the room(s)
A new trend which is still in its early days but which I believe is becoming more and more visible. Let us discuss it for a moment.
People hate doing boring repetitive work. And, let's face it, few individuals really enjoy creating data copy pipelines manually, or documenting fields in the database. It is not only boring but also requires tremendous amount of time and efforts. In addition it is risky as people are making mistakes all the time.
Logically, we may think of Generative AI as a main transformative technology in this field as well as expanding automatisation beyond just templates and deterministic script generators.
During an event I saw some patches of this work in different products and services but most of these cases were quite narrow.
I believe that in the upcoming years we will see more and more complex automation with specialised models coming into play and solving some of the repetitive tasks in Data Engineering and Data Governance. I also think that this would be a tipping point for modern data stack. Let us make it happen!
Thank you so much for reading and hope this was helpful.
NFP Director | Snr Intelligence Engineer | CyberBioSecurity, Data, BI, GIS, DevOps | Environment, Health and Human Services | Scout Ldr
1yVery useful summary Andrei. Thank you.
Public Speaker | Data Evangelist | Business Advisor | Driving innovation across EMEA as a Global Black Belt Data & AI Specialist at Microsoft | Empowering businesses with transformative technology.
1yAndrei Zaichikov it was great to catch up.
Data Access Governance for AI and Analytics
1y“Data as a Commodity means that we want to enable our businesses with trusted and easy to use yet secure and compliant self-service data platform. “ - Secure and compliant self-service data is a big challenge that we solve at Raito. Raito gives data owners end-to-end ownership of data security workflows by making it extremely easy to monitor, manage, and automate data access & usage across the data stack.
Thank you for sharing Andrei
Thanks for coming Andrei and contributing to a great panel. You were great!