Major Technology Trends Unravelled during Big Data London 2023

Andrei Zaichikov

Director, Enterprise Technology Strategy, EMEA at Pure Storage

Published Sep 30, 2023

That's a wrap for Big Data London! One of the niciest things about these events is that one can easily check what the industry is up for and how the future we are building together looks like.

With the involvement of literally thousands of experts, we can strive to reach highly precise conclusions.

I would also like to mention that these trends were quite visible throughout last few years but a great event filled by dozens of meaningful discussions seems like a good point to summarise those.

Disclaimer. This is my personal opinion and it does notnecessary reflect the point of view of my current employer.

So, for this new iteration of evolution of our data systems I see and feel three major trends.

1. Data as a Commodity

2. Interactive Applications

3. Real Time Systems

Let me give you a glimpse of what I mean by each of those.

Data as a Commodity

Business requires data, it is obvious for everyone. Even during pre-information era some of the businesses and disciplines were almost 100% data driven like banks, for instance, or architecture bureaus, etc.

Now enterprises regardless of their size require data even more. They need it faster, more precise and accurate, easier to consume. Clearly this is not the state which is achievable with our current complex and clumsy systems and processes.

Data as a Commodity means that we want to enable our businesses with trusted and easy to use yet secure and compliant self-service data platform. This means that new generation of data serving engines will become highly integrated with data, infrastructure and security tools.

That is already happening with several end-to-end data platforms presented at the conference and Microsoft Fabric leading the pack.

These platforms are complex to create for a common use scenarios, thus, I would expect them to be fully operational and ready to use in the next few years.

During the event I have talked to some prominent consulting companies specialising in preparing data for the future "commodity" state either in Mesh or in some less sophisticated technical implementation. Thus, we are heading there.

But ... there is always a but, no? How will we know that information which is fed into these new shiny platforms is correct and trustworthy? How should we get rid of redundancies and ensure high level of data quality? How can we map some awkward legacy naming of layers over layers of our systems to a single meaning recognisable by the business?

This is where the second set of tools is kicking in. Starting with the new gig of Data Observability and going all the way back to traditional Data Engineering. These tools are becoming more intelligent being able to utilise AI capabilities for patterns recognition, finding correlations, generating and automating pipelines etc. However, there is still a bulk of manual work to be done in order to make it all work. Some of the most intensive tasks like business mapping, redundancy tracking, etc. are meant to be dealt with by highly skilled professionals.

So, Data as a Commodity will be enabled by several set of tools and technologies:

1. Convergent end-to-end Data Platforms.

2. Intelligent Governance and Observability.

3. Intelligent Data Engineering and Processing.

With my Microsoft hat on I would say that Microsoft 1st party products and our partnership with Databricks and Informatica cover most of the ground here.

Interactive Applications

With the rise of Chat GPT we see that more and more people (literally hundreds of millions) are getting used to interactive mode when communicating with machines.

During the AI Panel I made an experiment. I asked from the auditory to raise they hands if they or their friends or relatives are not using Generative AI. Only few dozens of hands of hundreds of participants came into the air.

People are getting more and more used to fast and (relatively) precise answers with some additional transparency and ability to follow up immediately. This is what is clearly missing in people communications with our applications right now. We have to accept that information coming to us from traditional ways of communicating with the business is vague, often inaccurate and requires some (sometimes) significant patience.

Thus, we see the rise of Generative AI not only as a means of automation but also as a new standard of communication with our customers, which may become the de facto standard in the nearest future..

Interestingly enough, just applying Generative AI as a communication method is not enough. In any real business case we need to hook up both data and application APIs so users can access both via single interactive interface. While few months back it looked like a simple task, in fact it is not and it requires some significant investment in application development.

First of all, it will be all but impossible to interact with applications which do not have APIs either synchronous or asynchronous (but close to real-time). Secondly, such workloads should be orchestrated, monitored, linked with data properly, enabled with filtering of content and results as well as with proper error handling. None of this is obvious in case you want to make it yourselves.

Thus, raise of new generation of intelligent application platforms, such as Microsoft Co-Pilot and lots of other 3rd party tools. These tools will allow enterprises to create their own versions of interactive applications with lower cost and also to connect to the Data as a Commodity platforms so the data is immediately available to users.

Real Time Systems

This raise of interactive applications (not only backed by Generative AI) like e-Commerce platforms, etc. adds some additional requirements to the velocity of data processing, analysis and synthesis.

We want our customers to get recommendations instantly, we want to know if they are happy with the product or if they are doing something wrong with the platform in a real-time. The same applies even to most conservative areas as a regulatory reporting, etc.

It is worth mentioning that real-time processing is very much different from the batch processing we are used to. In this context it doesn't really matter if it is minutes, seconds or milliseconds. The main difference between batch and real time is the amount and impact of potential changes in incoming data. Unlike batch system (where changes are locked and implemented via reconciliation process of some sort) in the real time systems we need to process data as it comes with some significant probability that same data will change in the next instant. Thus it requires quite a different approach to handling writes and updates as we have to still be sure that data is accurate and correct.

Real-time processing is not new on the market and is applied on various scales for the last decade. However, at this moment real-time turns into main source of data which means that most of our processes, applications and tools have to adapt to the real-time processing patterns. Thus a number of companies providing tools to link old-fashioned batch world with the emerging real-time paradigm. CDC tools, streaming and real-time analytics platforms, real-time processing tools as well as systems ensuring high consistency all were present and some new (at least for me) names appeared.

In Microsoft we are investing in real-time data capture, processing and analytics with some of the cutting-edge technologies such as Azure Data Explorer, various forms of Spark Structured Streaming and Azure Stream Analytics but also into streaming and real-time processing with Azure Event Hubs, Azure Service Bus and Azure IoT Hub.

Elephant in the room(s)

A new trend which is still in its early days but which I believe is becoming more and more visible. Let us discuss it for a moment.

People hate doing boring repetitive work. And, let's face it, few individuals really enjoy creating data copy pipelines manually, or documenting fields in the database. It is not only boring but also requires tremendous amount of time and efforts. In addition it is risky as people are making mistakes all the time.

Logically, we may think of Generative AI as a main transformative technology in this field as well as expanding automatisation beyond just templates and deterministic script generators.

During an event I saw some patches of this work in different products and services but most of these cases were quite narrow.

I believe that in the upcoming years we will see more and more complex automation with specialised models coming into play and solving some of the repetitive tasks in Data Engineering and Data Governance. I also think that this would be a tipping point for modern data stack. Let us make it happen!

Thank you so much for reading and hope this was helpful.

Alex Thomas

NFP Director | Snr Intelligence Engineer | CyberBioSecurity, Data, BI, GIS, DevOps | Environment, Health and Human Services | Scout Ldr

Very useful summary Andrei. Thank you.

1 Reaction

Andreas Bergstedt

Public Speaker | Data Evangelist | Business Advisor | Driving innovation across EMEA as a Global Black Belt Data & AI Specialist at Microsoft | Empowering businesses with transformative technology.

Andrei Zaichikov it was great to catch up.

1 Reaction

Bart Vandekerckhove 🔐

Data Access Governance for AI and Analytics

“Data as a Commodity means that we want to enable our businesses with trusted and easy to use yet secure and compliant self-service data platform. “ - Secure and compliant self-service data is a big challenge that we solve at Raito. Raito gives data owners end-to-end ownership of data security workflows by making it extremely easy to monitor, manage, and automate data access & usage across the data stack.

1 Reaction

Reto Bargetzi

Thank you for sharing Andrei

1 Reaction

Mike Ferguson

Thanks for coming Andrei and contributing to a great panel. You were great!

2 Reactions

See more comments

To view or add a comment, sign in

See all

Major Technology Trends Unravelled during Big Data London 2023

Andrei Zaichikov

Director, Enterprise Technology Strategy, EMEA at Pure Storage

Data as a Commodity

Interactive Applications

Real Time Systems

Elephant in the room(s)

More articles by this author

Others also viewed

Why the ‘Big Data’ Hype is NOT About Big or Data!

Big Data trends in financial services

Data Discovery Market Size To USD 20.03 Billion By 2030 | CAGR 15.60%

Thinking Dynamically About Data Profiling

Advanced Data Flow in SwiftUI: Beyond @State and @Binding

Fear no Data Migration

Three Data and Analytics Predictions for 2022

The summer workbook of innovation, week 5: Open data, big data and smart data

Navigating the Data Highway: Understanding Data Observability Through a Bus Journey

Creating a dynamic strategy to face the ever-changing data & technology landscapes

Explore topics

Data as a Commodity

Interactive Applications

Real Time Systems

Elephant in the room(s)

Time to Data: Measuring and Using for Continuous Improvement

Jan 10, 2024

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

Dec 1, 2023

Building Interactive Enterprise Grade Applications with Open AI and Microsoft Azure

May 17, 2023

Unbiased view of bringing Synapse Analytics and Azure Databricks together

Apr 21, 2023

Short Note on Custom Tokenization for simple FSI use-cases

Mar 1, 2023

Lightweight Implementation of Self-Service Data Sharing Platform on Azure

Nov 18, 2022

Deleting Sensitive Data in the Data Lake (and beyond)

Jul 15, 2022

Some Notes on Data Lake Zoning

May 17, 2022

Landing Oracle DB on Azure: Where? How?

Jan 20, 2022

Azure DataBox and Soft Skills – Practical Notes on using Azure DataBox and Similar Solutions

Aug 12, 2021

Others also viewed

Why the ‘Big Data’ Hype is NOT About Big or Data!

Big Data trends in financial services

Data Discovery Market Size To USD 20.03 Billion By 2030 | CAGR 15.60%

Thinking Dynamically About Data Profiling

Advanced Data Flow in SwiftUI: Beyond @State and @Binding

Fear no Data Migration

Three Data and Analytics Predictions for 2022

The summer workbook of innovation, week 5: Open data, big data and smart data

Navigating the Data Highway: Understanding Data Observability Through a Bus Journey

Creating a dynamic strategy to face the ever-changing data & technology landscapes

Explore topics