AI and the Illusion of Creativity: Why Your GenAI Assistant Is Smarter Than Average, But Rarely Brilliant

AI and the Illusion of Creativity: Why Your GenAI Assistant Is Smarter Than Average, But Rarely Brilliant

I once compared a team’s whiteboarded brainstorm with the output of an LLM given the exact same problem.

The prompt was about optimizing deployment strategies for a hybrid cloud architecture. The model’s answers were good: technically correct, neatly formatted, and immediately applicable.

The team’s answers? Messier. Slower. More tangents. They argued. Someone got stuck on an idea that clearly wasn’t going to work. Then, someone else—building on that discarded thread—came up with something no one had seen coming.

That idea reframed the whole feature.

It didn’t come from the model. It came from the unfiltered, collaborative mess of human reasoning.

Since then, I’ve run this kind of side-by-side comparison multiple times in workshops and research studies. The results are almost always the same:

LLMs outperform the average participant. But they rarely outperform the best moments of a good team.

This paradox—the gap between competence and breakthroughs—is exactly what a new study (Has the Creativity of Large-Language Models peaked? an analysis of inter-and intra-LLM variability) by my friend and colleague Paul Hanel , together with Jennifer Haase , and Sebastian Pokutta just put under the microscope.

And if you’re a software engineer—or leading teams that rely on GenAI—you should pay attention.


The Gap Between Original and Disruptive

This study tested 14 state-of-the-art language models—from GPT-4o to Claude 3.7—on standardized creativity assessments like the Alternative Uses Task (think: "What else could a toothbrush be used for?") and the Divergent Association Task (generate 10 semantically distant words).

What they found is subtle but important:

Most LLMs performed above the average human baseline, but only 0.28% of their responses reached the top 10% of human creativity benchmarks.

That’s a striking paradox.

On one hand, AI is more creative than most humans most of the time. It can outperform the median user in fluency, variation, and even in novelty—especially when prompted well. It’s a solid brainstorming partner. A productivity amplifier.

But on the other hand:

It almost never hits brilliance. It rarely produces the kind of outlier ideas that shift mental models, redefine product categories, or spark genuine invention.

That puts software professionals in a tricky position.


So What Are We Really Using GenAI For?

If you’ve worked with LLMs long enough—whether it’s in co-writing design specs, generating code comments, naming features, or ideating use cases—you’ve probably seen the pattern:

  • You get a lot of answers.
  • They sound polished.
  • Some are surprisingly good.

But it’s not like working with that wild-card colleague who suddenly says something you would never have thought of but can’t unsee once you hear it.

Generative AI excels at combinatorial creativity—remixing known elements—but struggles with conceptual creativity, the kind that reframes the question itself.

And let’s be honest: most of the time, we don’t need disruptive ideas. What we actually need is speed, coverage, and scaffolding. We’re not trying to reinvent software engineering. We’re trying to unblock the next iteration.


The Danger Is Mistaking Fluency for Insight

One of the paper’s more subtle findings is that LLMs are not just variable across models—but within themselves.

Even the same model, given the same prompt, can generate responses ranging from below-average to exceptional. The same LLM is a moving target.

This is crucial if you’re treating LLMs like teammates. Imagine a junior dev who’s fast, available 24/7, and happy to contribute—but who might suggest a bad approach 3 out of 10 times. You’d appreciate their support—but you wouldn’t let them design your system architecture solo.

What this means in practice: AI doesn’t eliminate the need for good judgment. It multiplies the need. You now have more output to vet, not less work to think through.


Prompt Engineering Helps—But Only So Much

The study also looked at the impact of prompt framing. Some models (like Claude and Grok) performed significantly better when they were explicitly told, “This is a creativity test.” Others (like DeepSeek) got worse.

This matters if you’re in a team that’s investing in “prompt libraries” or thinking seriously about GenAI integration.

It suggests that:

Prompting affects outcomes, but model behavior is idiosyncratic. Some respond to creativity cues. Others don’t.

This echoes what many dev teams are discovering the hard way: you can’t fix model limitations with prompt finesse alone. If the underlying reasoning, abstraction, or randomness calibration isn’t there, no clever phrasing will save you.

So the real question isn’t just how you prompt—but when you should switch models—or whether to use AI at all for the task in front of you.


Where GenAI Actually Shines

I don’t want this to sound cynical. I use GenAI tools every day. And when applied thoughtfully, they do something extremely valuable: they reduce the cost of trying ideas.

In traditional workflows, generating multiple solutions takes time and coordination. With GenAI, the marginal cost of a new idea is effectively zero. That means you can scale divergent thinking—not necessarily in quality, but in quantity.

That’s a big deal.

Imagine:

  • Product teams running 5x more concept variations before choosing a path.
  • UX designers prototyping more edge cases than time would otherwise allow.
  • Researchers bouncing between metaphors or model names without burning cycles.

LLMs may not spark radical innovation, but they enable faster exploration of the idea space. That alone is a form of creative leverage.

But to use that leverage wisely, you have to understand the tool’s limits. It can surface ideas, but it cannot evaluate them. It can remix, but not redefine. It can sound insightful, but may never say anything new.


The Future Is in the Middle

Where does this leave us as software professionals?

If we overestimate AI’s creativity, we risk flattening our own. We delegate too soon. We stop pushing for better. We accept the first fluent answer as good enough.

But if we underestimate its value, we miss out on real advantages: idea velocity, low-friction drafting, and cross-pollination across domains.

The middle path is the hardest to hold—but it’s also the most realistic:

Treat AI like a smart but erratic intern. Use it to explore, not decide. And always, always finish the thought yourself.

Final thought: This isn’t about creativity scores — it’s about cognitive trust

The message is clear: Generative AI isn’t just a tool for faster brainstorming — it’s changing how we think, collaborate, and problem-solve in software teams.

We’re no longer ideating in isolation. We’re partnering with models that remix the past, surface safe bets, and speak fluently — but don’t truly know when they’re being original. And that shift demands more than prompt tuning. It demands discernment.

Too many teams treat AI-assisted creativity as a convenience upgrade — not a reconfiguration of the cognitive work software engineering demands. That’s where we need sharper awareness and better guardrails.

So, what’s your next move?

If you’re a software engineer: Don’t just accept the first fluent answer. Use AI to widen your search space — but apply your own edge for abstraction, weirdness, and judgment. Save time on scaffolding, but claim ownership of the insight. Ask: “Is this clever, or just clean?” Let the LLM start the conversation — not end it.

If you’re a tech lead, architect, or engineering manager: Make creative AI use part of the development process — not a backchannel. Define when GenAI is for idea generation, when it needs human post-processing, and when it should be excluded altogether. Create lightweight rituals for vetting AI output, and talk openly about where it helps — and where it misleads. Great technical leaders won’t just delegate ideation — they’ll curate it.

If you’re a head of engineering, director, or innovation lead: Treat AI not as a productivity hack, but as a cognitive shift. Invest in your team’s creative literacy — not just tooling. Build a culture that values experimentation, reflective thinking, and the ability to reject ideas that are plausible but unremarkable. Because your next breakthrough won't come from velocity — it’ll come from the question no LLM thought to ask.

The future of software ideation won’t be shaped by speed. It’ll be shaped by the teams that know when to trust the model — and when to trust each other.

That’s where I can help.

I work with teams and organizations to integrate GenAI into software workflows thoughtfully — using research-backed strategies to strengthen team cognition, ideation practices, and critical oversight.

👉 Learn more about building cognitively aware engineering teams: danielrusso.org/evidence-based-organizational-change

How is your team using GenAI in design, planning, or problem-solving?

💬 Let’s exchange ideas in the comments.

#SoftwareEngineering #GenAI #HumanFactors #AIandTeams #CreativeThinking #EngineeringCulture #PromptEngineering #FutureOfSoftware #CognitiveWork

Juan Guirao

Data Science Analyst at RSA Insurance Group

1mo

Interesting. What LLM model did you use? Different models have wildly varying capabilities. Even within one model one could crank the creativity temperature and generate out-of-the-box ideas.

Like
Reply
Julie Torpegaard

UX Specialist at Projectum • Enterprise UX • Digital Adoption • Design Sketching

2mo

Excellent point! 🎯

To view or add a comment, sign in

Others also viewed

Explore topics