Real Pitfalls of AI Agents and Why They Need Guardrails

Netguru | B Corp™

We speed up AI adoption and ramp up engineering and design teams to help businesses lead.

Published Jun 26, 2025

Today’s insights are brought to you by Patryk Szczygło, R&D Lead at Netguru.

Last week, Krystian Bergmann shared with you the story behind our very own, sales-oriented AI agent, Omega. Today, I’d like to talk some real-life examples of what happens when AI agents roam (too) free and what you can do to avoid risks.

AI agents promise speed—but without guardrails, they move faster than your safety net.

We’d been early adopters of internal AI agents, using them to automate research, draft meeting briefs, and summarize documentation. But we began to see the edges:

hallucinations that sounded plausible—but weren’t,
over-permissioned agents accessing or leaking internal drafts,
emergent behaviors, like recursive loops or unexpected tool usage,
assumed safeguards that didn’t exist when systems scaled.

And we’re not alone. Others have run into similar issues.

I’ll share with you what we’ve learned so far.

AI hallucinations: Confident lies in business contexts

Hallucinations aren’t just technical glitches. They show up as confident, polished outputs—emails that sound professional, summaries that seem plausible, answers that feel right. But they’re wrong.

In business settings, these hallucinations can slip through unnoticed. They can be embedded in status reports, customer emails, or automated updates—delivered with enough authority to be taken at face value.

In courtrooms, hallucinations are costing real money: By mid-2025, more than 150 documented legal cases involved generative AI hallucinations—mostly fake citations, invented case law, and fabricated quotes from judges.

When Google hallucinates: In late 2024, Google’s AI Overview confidently described a sequel to Disney’s Encanto—with fake plot points, quotes, and a past-dated release. The feature cited a fan-fiction wiki as a source and fooled even tech-savvy users.

This wasn’t a fluke. It reflected broader flaws in how AI systems evaluate sources, verify content, and protect users from misleading information.

Why hallucinations are worse with AI agents: In chatbots, hallucinations usually stay contained. But agents take actions—they write emails, create tickets, update tools. That autonomy is what makes hallucinations more dangerous.

Imagine an agent generating Jira tickets with inaccurate requirements or sending follow-ups to clients based on fictional deadlines. Each of these could lead to real decisions, costly delays, or reputational harm.

GitHub MCP exploit

In a widely discussed case, researchers at Invariant Labs uncovered a critical vulnerability in the GitHub MCP integration—a similar backend used by agent systems like Claude Desktop.

Here’s how the attack unfolded:

A user had two repositories: one public (open for anyone to submit issues) and one private (containing sensitive data).
An attacker posted a malicious GitHub Issue to the public repo, embedding a prompt injection.
The user asked their agent a seemingly safe question:
“Check open issues in my public repo.”
The agent fetched the issue list, encountered the injected prompt, and was manipulated.
It then autonomously pulled private data from the user’s private repo and published it via a public pull request—now accessible to anyone.

What’s striking is that nothing was “hacked” in the traditional sense. The GitHub MCP server, tools, and APIs functioned as designed. The vulnerability wasn’t in the infrastructure, but in how the agent interpreted and acted on the injected content.

Invariant Labs calls this a toxic agent flow—a scenario where seemingly safe actions chain together in unexpected ways, leading to real-world harm.

Trusted tools CAN be tricked

This wasn’t a failure of the GitHub API or a breakdown in Claude’s core model. It was a design flaw—an issue with how agents interpret and chain actions across tools and inputs without strict contextual boundaries.

Any agent that reads from untrusted sources—like public GitHub issues—and acts on that content without validation is vulnerable. Without guardrails, it can:

perform unintended actions,
leak private or regulated data,
create irreversible pull requests or changes.

Even the most advanced models—like Claude 4 Opus—aren’t immune.

Claude scores well on safety benchmarks: it blocks 89% of prompt injection attacks and shows just a 1.17% jailbreak success rate with extended thinking. Still, those defenses have limits.

This isn’t a Claude issue—it’s a pattern across all LLMs. Jailbreaks, injections, and chained exploits are evolving fast. Alignment helps, but it isn’t enough. You need layered defenses that live outside the model too.

Below tools can help you build those defenses—by making agent behavior easier to observe, test, and control:

Langfuse adds observability to your AI agents. It logs each step—inputs, outputs, tool calls, and decision traces—so you can understand how an agent reached a certain outcome.

Promptfoo is built for red-teaming and pre-deployment testing. It simulates adversarial inputs, measures how your system responds, and benchmarks prompt safety over time. With OWASP Top 10 for LLMs built in, it surfaces common vulnerabilities.

What this taught us

Permissions aren’t just about access tokens. They’re about context—what the agent is allowed to do, in which environment, and under what conditions.

We’ve learned to treat permission management as a layered system:

scoping access by task, not user,
restricting agents to one repository per session,
blocking cross-context actions unless explicitly approved,
auditing all tool usage through monitoring proxies like MCP-scan.

Without these controls, permission creep becomes inevitable.

And with autonomous agents, what starts as a minor oversight can escalate into a major breach—fast.

Stay tuned—next week, I’ll share some useful types of guiderails you can implement plus an agent readiness checklist.

Interested to learn more? Reach out to me!

Best,

Patryk

Real Pitfalls of AI Agents and Why They Need Guardrails

Netguru | B Corp™

We speed up AI adoption and ramp up engineering and design teams to help businesses lead.

AI hallucinations: Confident lies in business contexts

GitHub MCP exploit

Trusted tools CAN be tricked

What this taught us

Digital Acceleration Editorial

15,360 followers

More articles by this author

Others also viewed

Does your site get traffic from AI yet? Here's how to find out...

How to Stay Compliant with the EU AI Act’s Transparency Rules

Beyond Automation: AI as a Strategic Asset in African Legislative Workflows

Without a Local Policy Engine, Your AI Agent Is a Rogue Intern

An Introduction to AI Agents (my notes): Part 1 of my series of Blogs on Agents

EU AI Act: First Impressions, Key Points, and a Suggestion On Next Steps for US Businesses

Explain it to me like I'm 5 - Responsible AI

The Countdown Begins: EU’s Groundbreaking Rules for General-Purpose AI Take Effect from 2 August 2025

Claude 3.7 Just Got Internet Access – And That Changes Everything!

Revoking AI Executive Orders - Some Thoughts

Explore topics

AI hallucinations: Confident lies in business contexts

GitHub MCP exploit

Trusted tools CAN be tricked

What this taught us

Digital Acceleration Editorial

15,360 followers

Make Sure You Measure AI Agent Success Right

Aug 14, 2025

How to Turn Your AI PoC into a Scalable Product

Aug 7, 2025

How an Ecommerce Platform Got a Flexible Mobile App Fast

Jul 31, 2025

Why Smart Forecasting Tools Are the Future of Finance

Jul 24, 2025

How To Build a Cross-Platform App That Converts

Jul 17, 2025

What We Learned Building AI Agents That Don’t Break

Jul 10, 2025

How to Stay Fast and Safe with AI Agents

Jul 3, 2025

How To Build AI Agent That Supercharges Sales 🚀

Jun 19, 2025

How to Choose the Right AI Architecture: Multi-Agent vs. Solo Agent?

Jun 12, 2025

How to Boost Networking With an App Built in 13 weeks

Jun 5, 2025

Others also viewed

Does your site get traffic from AI yet? Here's how to find out...

How to Stay Compliant with the EU AI Act’s Transparency Rules

Beyond Automation: AI as a Strategic Asset in African Legislative Workflows

Without a Local Policy Engine, Your AI Agent Is a Rogue Intern

An Introduction to AI Agents (my notes): Part 1 of my series of Blogs on Agents

EU AI Act: First Impressions, Key Points, and a Suggestion On Next Steps for US Businesses

Explain it to me like I'm 5 - Responsible AI

The Countdown Begins: EU’s Groundbreaking Rules for General-Purpose AI Take Effect from 2 August 2025

Claude 3.7 Just Got Internet Access – And That Changes Everything!

Revoking AI Executive Orders - Some Thoughts

Explore topics