The Data Prep Dilemma: Are We Overthinking It or Finally Solving It?
Disney Coronado Springs Resort Location of Qlik Connect 2025

The Data Prep Dilemma: Are We Overthinking It or Finally Solving It?

By Keith Townsend

I just spent a week at QlikConnect—deep in the world of data analytics. It’s not my usual beat. I’m an infrastructure guy. I think in cloud nodes, AI clusters, and system throughput. But I walked away with a sharper lens on one of the biggest debates in AI right now:

How much data prep is too much?

This is a conversation I’ve been having for months with Luke Norris. His take? Most enterprises over-engineer their data prep pipelines. They spend millions making data “AI-ready” and never get to value. I’ve seen that happen too. AI programs that die in the staging layer.

But after watching how Qlik is positioning its platform—not just as a BI tool, but as a connective tissue between legacy data sources and modern AI infrastructure—I think we’re approaching a more balanced answer.

🔍 Why This Matters Now

Let’s be real: enterprises still run critical operations on systems like SAP and even mainframes. If you’re building GenAI or agentic workflows and your data can’t leave those systems, you’re stuck.

Qlik showed how it can bridge these environments—pulling real-time data from traditional systems, layering governance and quality checks, and feeding it directly into AI services like Amazon Bedrock and SageMaker. That’s not just dashboarding. That’s infrastructure enablement.

⚖️ A Middle Ground: Friction with Flow

So here’s where I’ve landed:

  • Luke’s right—if your AI roadmap is still stuck in a two-year data prep phase, you’ve already missed the point.

  • But blindly skipping prep and hoping a model can clean up your data mess mid-prompt? That’s a fast track to hallucinations, missed compliance risks, and failed deployments.

  • What Qlik showed at Tech Field Day was the start of something better: streamlined prep, automated quality, and in-context AI assist—without pretending data readiness is a solved problem.

💡 The AI Infra Perspective

This was my takeaway as an infrastructure strategist: Qlik isn’t just in the analytics lane. It’s becoming part of the AI infrastructure stack. Their ability to prep and move data from SAP and mainframes to Bedrock—while maintaining explainability and governance—is something we don’t talk enough about in the infra world.

If we want real AI outcomes in the enterprise, we have to stop seeing data prep as overhead and start designing it as part of the AI system architecture.

Check out TFDx Sessions https://www.linkedin.com/posts/tech-field-day_tech-field-day-experience-at-qlik-connect-activity-7330969678263050240-sFRR?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAHnXmYB8BBsbVV7v7Lqx449pzZKjkwOC7U

Let’s keep the conversation going—especially if you're on Team “Just Ship the Model” or Team “Prep ‘til You Drop.” There’s nuance here. And maybe, just maybe, Qlik is helping us find it.

Tony Baer

Principal at dbInsight LLC

2mo

As infrastructure guy, you were brave to wade into my world. Maybe it helped that I wasn't there 😉 But one thing I've been thinking about having spent my time at SAP this week. There might be a potential role for AI in making these pipelines more sane. Of course, that could open up another can of worms, which is, "Just because you can, should you." If by some miracle, AI coding assistants and digital assistants that scan your data to understand its schema (or lack thereof) and quality issues, might we find ourselves back in that rut of prepping data not vital to the problem we're trying to solve?

Like
Reply
Luke Norris

Wearer of white shoes / Builder of companies that make an impact

2mo

Appreciate this piece, Keith. Another 6 months we will see eye to eye ;) The shift we’re seeing in the enterprise is from ETL-first to inference-first. Instead of spending years cleaning and centralizing data before unlocking value, teams are now asking how quickly they can drive outcomes by running AI directly against the messy, distributed reality of their data estate. That said, I think the idea that prompting alone replaces structure is risky. It’s like giving a smart intern access to the file room with no guidance. You’ll get fast answers, but you won’t get reliable ones, or any institutional memory. At KamiwazaAI we’ve found that you don’t need to fix the data first, but you do need to frame it. Inference-first doesn’t mean governance-last. It means context through lightweight scaffolding like graph retrieval, entity extraction, and smart routing. That’s when GenAI becomes an operator, not just a demo.

Like
Reply

Data is the piece... uniformity is the key..

Like
Reply

Put simply: We are dramatically UNDER-thinking it. We have been for almost 20 years, Keith Townsend.

To view or add a comment, sign in

Others also viewed

Explore topics