From the course: Agentic AI: Building Data-First AI Agents

Dealing with data puddles

- We have a bad tendency of building data puddles even dynamically. We have this database that exists that we use for all our data. But then when I need the data, I export it into an Excel spreadsheet, and then I use that, and then that Excel spreadsheet gets shared with other people. And then all of a sudden, you have a little puddle over here, and then you have a little puddle over here. So how do you design a process for the company in such a way that that doesn't happen? That we don't split up our data because it's easier? - I've seen so many companies starting with their AI journey, and sometimes they end up taking years doing this because they cannot agree on the basic aspects of data stewardship, ownership, who's controlling what, where. And that is a dangerous thing, right? Like when you're, there are certain types of industries and companies, and it's no fault of theirs. They work on a project basis. Every initiative is a project. So within that project, you have certain people. They want to store the data here and not there. They don't want to unify or share it with someone else. So these are cultural problems that we face and business problems too because they don't really need to share with other projects. But what if they did? What if they had existing legacy systems, database is all connected together and unified, or still separated but available to each other to learn from, to improve the Agentic AI system, right? So we do need Excel of the world. We do need people to do their individual work because that's the way they're comfortable. It's low cost. It's right there on their laptop. They want to continue to use Office tools, they will continue to use that and put some data in there. But then you need a way to siphon all of that data into a one lake. That's what we call it. One lake is the one drive for all of your data, everywhere in every surface. And if you keep it all unified and together with the right security levels and access layers, all of that in place, then you have a playground through which you can set up workspaces. You can have folders, you can do everything that you need to do within your spectrum and focus to build out your app. Now, I'll take a great example here, and I'm working with the company right now, which is more in the retail space. Their sales team wants to move much faster than their other teams, right? Operations, IT, everybody else. So they'll build AI on their data. But what about that data and its connectivity to the other data assets, the CRM data, the finance data, all the other pools or puddle of data, if you want to call it, right? And why the term puddles? Because each project that they're doing for their end customer could be handled separately and kept separately in another silo. And so we had to think about integrating, you know, the data in such a way, or bringing the data together in such a way that it maintains the independence to be good at their business, what they do every day, but also, allow them to move at the pace that they want to within the different organizations and functions. We have ended up building a new platform, a modern data platform called Fabric. And the mission behind that Fabric was do not move your data, keep it where it is, but make sure there are shortcuts and mirrored location so that the data is available for data scientists. It is available for AI engineers, it's available for those super users or business users, the data citizens of the world who want to make the right decisions at the right time. So if you are able to achieve that framework, you take your existing stores. Let's say you have 100,000 SQL servers, each one of them have valuable data. If you're able to connect it all together and without moving much data, use it for AI applications. That becomes your modern data platform, right? The one that helps AI have a holistic view, good quality view. You can do all your transformation in there. You can do your governance on top of it. You can make sure the quality of the data is there, and then you can take subsets of it and share it with the right people so that they can use it. Imagine your Excel data, which you were just talking about earlier, is transformed into delta park and available, accessible by all the other users who want to build AI applications on it. This is something was never done before. This is amazing stuff, right? Because now, truly, you can take the power of your day-to-day business user, take their data, take your IT data, ops data, sales, finance data, and make truly intelligent and smart systems, AI agents on top of it, which have the full 360 view of the customer, the full view of your business, and be able to make kind of a much better, complete, comprehensive system per se.

Contents