The Data Prep Dilemma: Are We Overthinking It or Finally Solving It?

Keith Townsend

Founder & Executive Strategist | Advisor to CIOs, CTOs & the Vendors Who Serve Them

Published May 22, 2025

By Keith Townsend

I just spent a week at QlikConnect—deep in the world of data analytics. It’s not my usual beat. I’m an infrastructure guy. I think in cloud nodes, AI clusters, and system throughput. But I walked away with a sharper lens on one of the biggest debates in AI right now:

How much data prep is too much?

This is a conversation I’ve been having for months with Luke Norris. His take? Most enterprises over-engineer their data prep pipelines. They spend millions making data “AI-ready” and never get to value. I’ve seen that happen too. AI programs that die in the staging layer.

But after watching how Qlik is positioning its platform—not just as a BI tool, but as a connective tissue between legacy data sources and modern AI infrastructure—I think we’re approaching a more balanced answer.

🔍 Why This Matters Now

Let’s be real: enterprises still run critical operations on systems like SAP and even mainframes. If you’re building GenAI or agentic workflows and your data can’t leave those systems, you’re stuck.

Qlik showed how it can bridge these environments—pulling real-time data from traditional systems, layering governance and quality checks, and feeding it directly into AI services like Amazon Bedrock and SageMaker. That’s not just dashboarding. That’s infrastructure enablement.

⚖️ A Middle Ground: Friction with Flow

So here’s where I’ve landed:

Luke’s right—if your AI roadmap is still stuck in a two-year data prep phase, you’ve already missed the point.
But blindly skipping prep and hoping a model can clean up your data mess mid-prompt? That’s a fast track to hallucinations, missed compliance risks, and failed deployments.
What Qlik showed at Tech Field Day was the start of something better: streamlined prep, automated quality, and in-context AI assist—without pretending data readiness is a solved problem.

💡 The AI Infra Perspective

This was my takeaway as an infrastructure strategist: Qlik isn’t just in the analytics lane. It’s becoming part of the AI infrastructure stack. Their ability to prep and move data from SAP and mainframes to Bedrock—while maintaining explainability and governance—is something we don’t talk enough about in the infra world.

If we want real AI outcomes in the enterprise, we have to stop seeing data prep as overhead and start designing it as part of the AI system architecture.

Check out TFDx Sessions https://www.linkedin.com/posts/tech-field-day_tech-field-day-experience-at-qlik-connect-activity-7330969678263050240-sFRR?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAHnXmYB8BBsbVV7v7Lqx449pzZKjkwOC7U

Let’s keep the conversation going—especially if you're on Team “Just Ship the Model” or Team “Prep ‘til You Drop.” There’s nuance here. And maybe, just maybe, Qlik is helping us find it.

Tony Baer

Principal at dbInsight LLC

2mo

As infrastructure guy, you were brave to wade into my world. Maybe it helped that I wasn't there 😉 But one thing I've been thinking about having spent my time at SAP this week. There might be a potential role for AI in making these pipelines more sane. Of course, that could open up another can of worms, which is, "Just because you can, should you." If by some miracle, AI coding assistants and digital assistants that scan your data to understand its schema (or lack thereof) and quality issues, might we find ourselves back in that rut of prepping data not vital to the problem we're trying to solve?

Luke Norris

Wearer of white shoes / Builder of companies that make an impact

2mo

Appreciate this piece, Keith. Another 6 months we will see eye to eye ;) The shift we’re seeing in the enterprise is from ETL-first to inference-first. Instead of spending years cleaning and centralizing data before unlocking value, teams are now asking how quickly they can drive outcomes by running AI directly against the messy, distributed reality of their data estate. That said, I think the idea that prompting alone replaces structure is risky. It’s like giving a smart intern access to the file room with no guidance. You’ll get fast answers, but you won’t get reliable ones, or any institutional memory. At KamiwazaAI we’ve found that you don’t need to fix the data first, but you do need to frame it. Inference-first doesn’t mean governance-last. It means context through lightweight scaffolding like graph retrieval, entity extraction, and smart routing. That’s when GenAI becomes an operator, not just a demo.

Ron vTwindude

2mo

Data is the piece... uniformity is the key..

Jim Czuprynski

2mo

Put simply: We are dramatically UNDER-thinking it. We have been for almost 20 years, Keith Townsend.

The Data Prep Dilemma: Are We Overthinking It or Finally Solving It?

Keith Townsend

Founder & Executive Strategist | Advisor to CIOs, CTOs & the Vendors Who Serve Them

🔍 Why This Matters Now

⚖️ A Middle Ground: Friction with Flow

💡 The AI Infra Perspective

More articles by this author

Others also viewed

Inside Snowflake’s Next Act

How Databricks Makes Data Intelligence Platform Work for Every Team

Snowflake’s Next Act?

How We Built LLM Infrastructure That Works — And What I Learned

In the AI-era, PepsiCo is doubling down on making data its competitive advantage

Analytics and Data Science News for the Week of June 20; Updates from Databricks, Domino Data Lab, Gartner & More

The Future of Bizmetric with Databricks: Our Vision for 2025

Building Resilient Data Pipelines In The AI-Driven Era

Insights from a Data & AI Week - Edition 31/25

Unlocking Data-Driven Decision Making: A Blueprint for AI-Ready Organisations

Explore topics

🔍 Why This Matters Now

⚖️ A Middle Ground: Friction with Flow

💡 The AI Infra Perspective

VCF 9 Operations: What It Really Means for the Enterprise

Aug 15, 2025

A 67-Year-Old Vibe-Coder Is Rewriting the Rules with Generative AI

Jul 26, 2025

Why Labels Like “PaaS” and “Kubernetes Platform” Fall Flat in the Real Enterprise

Jul 24, 2025

You Don’t Need a CCIE to Build a Network—Until You Do

Jul 17, 2025

Navigating HPE's Ambitious VMware Alternative Strategy

Jul 16, 2025

You Can’t Vibe Code Your Way to Scale (And That’s Okay)

May 24, 2025

Can AI Agents Replace SAP? The Enterprise Software Cage Match We Didn’t Know We Needed

May 2, 2025

AI Didn’t Replace Me. It Made My Tweets Better

Apr 15, 2025

Embracing AI as a Collaborative Partner: A Guide for Business Leaders

Mar 6, 2025

IBM’s Acquisition of HashiCorp: A Synergy of Potential and Challenges

Mar 3, 2025

Others also viewed

Inside Snowflake’s Next Act

How Databricks Makes Data Intelligence Platform Work for Every Team

Snowflake’s Next Act?

How We Built LLM Infrastructure That Works — And What I Learned

In the AI-era, PepsiCo is doubling down on making data its competitive advantage

Analytics and Data Science News for the Week of June 20; Updates from Databricks, Domino Data Lab, Gartner & More

The Future of Bizmetric with Databricks: Our Vision for 2025

Building Resilient Data Pipelines In The AI-Driven Era

Insights from a Data & AI Week - Edition 31/25

Unlocking Data-Driven Decision Making: A Blueprint for AI-Ready Organisations

Explore topics