https://ift.tt/Fdq0kRx
https://ift.tt/Fdq0kRx
LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.
Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.
Skip to main content
Unstructured data often contains the most distinct and proprietary content that remains untapped within enterprises. When operationalized through Agentic AI workflows and Large Language Models (LLMs), this data becomes actionable, driving tangible business value. Yet, the challenge lies in managing the inherent complexity of unstructured data at scale. Piethein Strengholt highlights how the Medallion Architecture, traditionally focused on structured data, can be repurposed into a unified, layered framework designed to handle unstructured data effectively. He discusses how: ✅ Bronze, Silver, and Gold layers can be extended to ingest, validate, and contextualize unstructured data. ✅ LLMs and RAG patterns can transform raw documents into reliable, AI-ready inputs. ✅ Governance and new roles (context engineers, value engineers) will be essential to translate unstructured data into business value. Click on the link below to learn more https://lnkd.in/gwxcjgKR #UnstructuredData #MedallionArchitecture #DataQuality #AIready #Datatrust #DataArchitecture
To view or add a comment, sign in
I came across this article from Piethein Strengholt a few weeks ago, and have since brought this article up in multiple conversations. It’s such a simple yet powerful way to visualize the deep architectural underpinnings required for Agentic AI adoption. Often, discussions on this topic get lost in abstractions — but this framing makes it accessible while still capturing the complexity behind it. Highly recommend giving it a read if you’re exploring how to design data and system foundations for the Agentic AI era. #dataarchitecture #openarchitecture #lakehouse #dataobservability #dataquality #agenticworkflow
Unstructured data often contains the most distinct and proprietary content that remains untapped within enterprises. When operationalized through Agentic AI workflows and Large Language Models (LLMs), this data becomes actionable, driving tangible business value. Yet, the challenge lies in managing the inherent complexity of unstructured data at scale. Piethein Strengholt highlights how the Medallion Architecture, traditionally focused on structured data, can be repurposed into a unified, layered framework designed to handle unstructured data effectively. He discusses how: ✅ Bronze, Silver, and Gold layers can be extended to ingest, validate, and contextualize unstructured data. ✅ LLMs and RAG patterns can transform raw documents into reliable, AI-ready inputs. ✅ Governance and new roles (context engineers, value engineers) will be essential to translate unstructured data into business value. Click on the link below to learn more https://lnkd.in/gwxcjgKR #UnstructuredData #MedallionArchitecture #DataQuality #AIready #Datatrust #DataArchitecture
To view or add a comment, sign in
https://ift.tt/WK3t5VB Data Engineering for AI-Native Architectures: Designing Scalable, Cost-Optimized Data Pipelines to Power GenAI, Agentic AI, and Real-Time Insights Editor's Note: The following is an article written for and published in DZone's 2025 Trend Report, Data Engineering: Scaling Intelligence With the Modern Data Stack. The data engineering landscape has undergone a fundamental transformation with a complete reimagining of how data flows through organizations. Traditional business intelligence (BI) pipelines were built for looking backward, answering questions like "How did we perform last quarter?" Today's AI-native architectures demand systems that can feed real-time insights to recommendation engines, provide context to large language models, and maintain the massive vector stores that power retrieval-augmented generation (RAG). https://ift.tt/Fdq0kRx Abhishek Gupta
https://ift.tt/Fdq0kRx
To view or add a comment, sign in
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Large Language Model (LLM) agents, acting on their users' behalf to manipulate and analyze data, are likely to become the dominant workload for data systems in the future…We argue that data systems need to adapt to more natively support agentic workloads. We take advantage of the characteristics of agentic speculation that we identify, i.e., scale, heterogeneity, redundancy, and steerability - to outline a number of new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores… https://lnkd.in/dwfME4yq
To view or add a comment, sign in
🚨 Data architecture has a problem. Data Lakes → became dump yards. Warehouses → rigid & expensive. Data Mesh → great in theory, but demands cultural change few orgs are ready for. Data Fabric → automates a lot, but still metadata-heavy. 👉 The result? Costly, fragmented, hard-to-govern ecosystems. 🔥 The next evolution: AI-Native Data Architecture 1️⃣ Hybrid Lakehouse → unified foundation. 2️⃣ AI-Driven Fabric → auto-discovery, lineage, quality, policy enforcement. 3️⃣ Cognitive Layer → AI agents that build products, heal pipelines, and optimize continuously. 4️⃣ Conversational Access → NL → SQL, semantic search, governance through chat. 💡 The shift: from manual wrangling → to AI-assisted, self-optimizing platforms. Not about replacing humans — it’s about freeing them to create value instead of firefighting. 🚀 The journey: Data Lake → Lakehouse → AI Fabric → Cognitive Platform. Do you see AI as the missing piece to fix data architecture — or just another layer of complexity? #DataArchitecture #Lakehouse #DataFabric #DataMesh #DataGovernance #AI #AIDriven #CognitiveAI #FutureOfData #DigitalTransformation
To view or add a comment, sign in
Harness the power of semantic entity resolution to transform your knowledge graph projects. Russell Jurney's new article details how language models can automate complex data processes, driving efficiency and innovation in data management.
To view or add a comment, sign in
It was an honor and a fantastic opportunity to host my friend Francesco Puppini for an online session on his Unified Star Schema (USS) approach. His perspective was a powerful reminder that fundamentals are crucial to developing advanced data processing capabilities. Here are my top takeaways: Data Modeling is King 👑: Great data modeling remains the absolute bedrock of successful self-service BI. Without a solid, intuitive structure, even the most powerful tools will fall short. Curbing Data Proliferation: The USS approach he invented with Bill Inmon presents a brilliant strategy for reducing data proliferation. By creating a unified, non-redundant layer, we can finally tame the chaos of countless data marts and tables. Unlocking Agentic AI 🤖: USS is a key enabler for Agentic AI because it drastically reduces the dependency on complex, multi-join SQL to aggregate data. A simpler semantic layer means AI agents can more easily understand and query data, leading to faster, more reliable insights. Harmony with Kimball: The Unified Star Schema isn't here to replace everything we know. It can coexist perfectly with traditional Kimball dimensional modeling, allowing for a flexible and hybrid approach to data architecture. In the age of AI, is data modeling still relevant? After an insightful presentation the answer is a resounding YES, now more than ever! 🧠 A huge thank you to Francesco for sharing his invaluable knowledge! The path to a truly data-driven, AI-enabled organization is paved with smart, simple, and scalable data models. #DataModeling #UnifiedStarSchema #SelfServiceBI #BI #DataAnalytics #AI #AgenticAI #DataStrategy #DataArchitecture
To view or add a comment, sign in
The Unified Star Schema belongs to everyone. Period. #unifiedstarschema #theunifiedstarschema #selfservice #LLMs #AI #datamesh #rethinkdata
It was an honor and a fantastic opportunity to host my friend Francesco Puppini for an online session on his Unified Star Schema (USS) approach. His perspective was a powerful reminder that fundamentals are crucial to developing advanced data processing capabilities. Here are my top takeaways: Data Modeling is King 👑: Great data modeling remains the absolute bedrock of successful self-service BI. Without a solid, intuitive structure, even the most powerful tools will fall short. Curbing Data Proliferation: The USS approach he invented with Bill Inmon presents a brilliant strategy for reducing data proliferation. By creating a unified, non-redundant layer, we can finally tame the chaos of countless data marts and tables. Unlocking Agentic AI 🤖: USS is a key enabler for Agentic AI because it drastically reduces the dependency on complex, multi-join SQL to aggregate data. A simpler semantic layer means AI agents can more easily understand and query data, leading to faster, more reliable insights. Harmony with Kimball: The Unified Star Schema isn't here to replace everything we know. It can coexist perfectly with traditional Kimball dimensional modeling, allowing for a flexible and hybrid approach to data architecture. In the age of AI, is data modeling still relevant? After an insightful presentation the answer is a resounding YES, now more than ever! 🧠 A huge thank you to Francesco for sharing his invaluable knowledge! The path to a truly data-driven, AI-enabled organization is paved with smart, simple, and scalable data models. #DataModeling #UnifiedStarSchema #SelfServiceBI #BI #DataAnalytics #AI #AgenticAI #DataStrategy #DataArchitecture
To view or add a comment, sign in
Passive metadata is so passé. The future? Active metadata — the real-time, event-driven brain behind trustworthy AI and seamless data governance. Imagine your data stack not as a dusty archive, but a living, breathing organism that senses, adapts, and orchestrates every byte at the speed of business. With 65% of companies now running generative AI and regulations tightening every quarter, ignoring active metadata isn’t just risky—it’s reckless. For CEOs and data leaders: Stop documenting history and start architecting the future. In the AI era, metadata is your new secret weapon. https://lnkd.in/gUQx6xav
To view or add a comment, sign in
🎯 𝗘𝘅𝘁𝗿𝗮𝗰𝘁 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗳𝗿𝗼𝗺 𝗟𝗟𝗠-𝗯𝗮𝘀𝗲d 𝗦emantic Metadata Analysis If you're still crafting SQL to understand field meanings, you’re not alone. Many Data engineers continue to spend excessive time: → 𝙎𝙘a̶n̶n̶i̶n̶g schemas → Manually defining semantic models → Coding quality checks field by field That was static metadata. With agentic AI, things transform: ➡️ Schemas are identified automatically ➡️ Fields are categorized with business context ➡️ Initial rules (nulls, ranges, integrity) are applied immediately ➡️ Coverage updates dynamically in your business notebook It’s more than a map. It’s an intelligent, evolving context layer. ❇️ And here’s why it matters: 42% of enterprises extract data from over eight sources for AI workflows. Such complexity disrupts static metadata models. To construct reliable AI, you need metadata that acts—semantic context that evolves over time. #AgenticAI #DataManagement #DataQuality #DataObservability #AIReadyData #semanticmetadata
To view or add a comment, sign in
Don't be fooled by the name; the Boring Semantic Layer by Julien Hurault and Xorq Labs is one of the most exciting projects to watch! I've been toying with the idea of implementing a semantic layer on top of Kedro's Data Catalog. When I led data engineering teams at QuantumBlack, AI by McKinsey, one of the first activities we performed at any client was connecting and exploring their data. However, ad-hoc analyses didn't directly leverage the data models we'd painstakingly built. The Boring Semantic Layer provides built-in querying and charting capabilities on top of those data models and even exposes an MCP server for AI-driven answers! Everything is still very much a work in progress (I'm building on top of a bleeding-edge branch of Boring Semantic Layer without asking 🤫). One of the things I still don't know is whether people actually want to access (or even augment) their semantic layer from within their data pipelines or if it's mostly for BI/reporting. To what extent does a semantic layer replace feature engineering?
To view or add a comment, sign in