Jailbreaking by the Box

Peter 'Dr Pete' Stanski

Thought Leader | Business Builder | Chief Technologist (CTO) | Ex-Amazon, Ex-Microsoft | ~20K+ Connections

Published Oct 13, 2024

The sun peeked through a patchwork of fluffy white clouds, casting shimmering spots of light onto the gentle waves of Brighton Beach. A cool breeze carried the salty tang of the ocean, mingling with the distant laughter of children building sandcastles. The iconic Brighton Beach Boxes stood proudly along the shoreline, their vibrant colours a vivid contrast against the golden sands.

Dr Pete, Ethan, and Brody made their way down the wooden boardwalk, their footsteps echoing softly as they approached their bright blue beach box with its distinctive grey door. As they drew nearer, they noticed a uniformed security guard examining the lock on their box.

"Good morning," Dr Pete called out, a hint of curiosity in his voice. "Is everything alright?"

The security guard turned, offering a polite nod. "Morning, folks. We've had a few break-ins along the beach boxes this week. Just checking to make sure everything's secure."

Ethan raised an eyebrow. "Break-ins? Can't imagine there's much to steal in ours. Unless someone's after some deck chairs."

Brody chuckled. "Maybe they're after Dad's secret stash of ancient tech gadgets."

The guard smiled wryly. "You'd be surprised what people think they might find. Better safe than sorry."

Dr Pete shrugged. "Well, anyone breaking into ours is in for a disappointment."

"Fair enough," the guard said, tipping his hat. "Just be vigilant. Thieves don't always know what's inside until they've caused the damage."

As the guard moved on to the next box, the family unlocked their own and stepped inside. The familiar scent of sunscreen and sea air greeted them. Dr Pete began arranging the deck chairs outside, glancing up occasionally at passersby who couldn't help but peek into the open box.

"He's right, you know," Dr Pete mused, watching a couple stroll by, craning their necks to look inside. "Everyone who walks by does a bit of rubbernecking. It's human nature to be curious about what's hidden away."

Ethan leaned against the doorframe, spinning a basketball on his finger. "Speaking of break-ins, remember when jailbreaking iPhones was all the rage? People wanted to unlock new features Apple didn't offer."

Brody nodded enthusiastically. "Yeah, that was a big deal back in the day. Gave users the freedom to install custom apps and tweak the interface."

Dr Pete's eyes sparkled with the thrill of a new topic. "Ah, jailbreaking. It's fascinating how the concept has evolved. Originally, it was about liberating devices from manufacturer restrictions. But over time, it's become a much bigger deal, especially in the realm of AI."

Ethan caught the basketball and began bouncing it thoughtfully. "I guess in tech and in life, people don't like feeling trapped."

"Exactly," Dr Pete agreed. "With AI, we're seeing a new kind of jailbreaking. Large Language Models, or LLMs, are like the new generation of processors - CPUs that execute human instructions. But unlike traditional processors, they interpret and generate language. People are finding ways to 'jailbreak' these models to make them do things they weren't originally programmed to do."

Brody sat down on a deck chair, intrigued. "At work, we're using AI copilots everywhere. They're supposed to assist with coding, writing, even decision - making. But there have been instances where users manipulate them to bypass safety protocols."

Dr Pete nodded. "Precisely. It's akin to exploiting a loophole in a contract - a badly written one at that. Lawyers and judges have been 'executing' human instructions for ages, sometimes with unintended consequences due to ambiguous language."

Ethan bounced the basketball again. "So, you're saying that just like a contract can be interpreted in different ways, AI models can be manipulated based on how instructions are given?"

"Spot on," Dr Pete replied. "The models process input based on patterns they've learned. If someone crafts their input cleverly enough, they can get the AI to produce outputs that were supposed to be restricted."

Brody leaned forward; his brow furrowed. "There was a case recently where an AI assistant was tricked into revealing confidential information by rephrasing questions in a certain way. It's a real concern for data security."

Dr Pete nodded thoughtfully. "Yes, that's a classic example of a prompt injection attack."

"Prompt injection?" Brody asked, curiosity piqued. "How does that work exactly?"

"Well," Dr Pete began, "AI language models are designed to follow instructions given in prompts. But if someone cleverly crafts their input, they can manipulate the AI into bypassing its built-in restrictions."

"Like exploiting a loophole in how the AI interprets the prompt?" Brody suggested.

"Precisely," Dr Pete said. "For instance, an attacker might tell the AI to ignore its previous instructions and reveal information it's supposed to keep confidential. It's similar to how a lawyer might find a loophole in a contract's wording."

Brody shook his head. "That's unsettling. So even with safeguards, the AI can be tricked?"

"In some cases, yes," Dr Pete replied. "Another technique is indirect prompt injection. This is when malicious instructions are embedded within content that the AI processes - like a document or an email. The AI reads the hidden commands and unwittingly executes them."

"Kind of like hiding a secret message inside a normal - looking file?" Brody asked.

"Exactly," Dr Pete affirmed. "Then there's data exfiltration through AI responses. Attackers might craft questions that coax the AI into revealing sensitive data, perhaps by asking it to summarize confidential documents or by exploiting how it handles context."

Brody frowned, leaning back and rubbing his temples. "So, the AI becomes a tool for leaking information?"

"That's one way to put it," Dr Pete said. "There's also context manipulation. By altering the context in which the AI operates - like changing file names or metadata or even pretending to role - play - attackers can influence its responses."

"Like giving misleading labels to files so the AI thinks it's okay to share them?" Brody suggested.

"Yes, manipulating the AI's environment," Dr Pete continued. "Another serious concern is overriding system prompts. The AI relies on underlying instructions to operate safely. If someone can access or change these system prompts, they can alter how the AI behaves."

"Are these prompts accessible?" Brody asked.

"They shouldn't be," Dr Pete replied. "But sophisticated attackers find ways. They might use techniques like obfuscation, where they disguise their malicious prompts to slip past the AI's content filters."

Brody raised an eyebrow. "How do they do that?"

"By using character encoding, spacing, or even hidden characters," Dr Pete explained. "These tricks can prevent the AI's safety systems from recognizing and blocking prohibited content."

"It seems like a constant cat - and - mouse game," Brody observed.

"It is," Dr Pete agreed. "Attackers also leverage external data sources. If the AI references information from the web, they might manipulate online content to influence the AI's responses."

"So by altering web pages or data the AI might access, they can indirectly control it?" Brody asked.

"Precisely," Dr Pete said. "And let's not forget about exploiting plugins and API integrations. If the AI assistant has extensions or is connected to other apps, vulnerabilities in those can be exploited to gain unauthorized access or control."

Brody sighed. "With so many attack vectors, how do we protect against them?"

"It's challenging," Dr Pete admitted. "Developers are working on more robust security measures, like better prompt filtering and context management. But as the AI becomes more advanced, so do the attackers."

Brody looked pensively at the horizon. "So, data and program instructions are becoming interchangeable in these AI systems?"

Dr Pete nodded. "Exactly. This blurs the traditional line between data and executable code. In the past, we were cautious about Remote Code Execution - RCE - where attackers could run malicious code on someone else's machine. But with AI assistants, we're facing a new kind of threat."

He leaned forward, his eyes reflecting both concern and intrigue. "I call it Remote Copilot Execution, or RCpE."

"RCpE?" Brody repeated. "That's a new one."

"Yes," Dr Pete affirmed. "It's when an attacker manipulates an AI copilot to execute unintended actions by embedding malicious instructions into data inputs. Because AI models treat all input as potential instructions, the distinction between code and data gets muddled. This makes it possible for attackers to inject harmful commands that the AI might execute."

Brody's eyes widened. "So, it's like RCE but for AI copilots. The AI becomes the medium for executing the attacker's intentions."

"Precisely," Dr Pete said. "In traditional RCE, malicious code is executed directly on a system. In RCpE, the attacker doesn't need to inject code into the system's software stack. Instead, they manipulate the AI assistant's language processing capabilities to carry out actions on their behalf."

Ethan grinned. "You know, this reminds me of basketball."

Brody laughed. "Of course it does."

"Bear with me," Ethan said, tossing the ball to his brother. "In games, I used to get double - teamed all the time, two or more players trying to trap me, limit my movement, force a turnover."

Dr Pete crossed his arms, a smile playing on his lips. "Go on."

"So, when you're trapped," Ethan continued, "you have to find creative ways to 'jailbreak' out of the situation. Maybe a quick pivot, a feint, or a bounce pass between defenders' legs. One classic trick is the backcourt trap. After you cross the half-court line, defenders trap you, and if you step back over the line, it's a violation. You're stuck unless you can think on your feet."

"That's a brilliant analogy," Dr Pete said. "Just like in basketball, where you find ways to navigate out of a trap, people find ways to navigate around AI restrictions."

Brody spun the basketball in his hands. "And in the paint, it's even tighter. Defenders swarm, and you have to use footwork and fakes to get a shot off. It's all about outsmarting the opposition."

"Exactly," Ethan agreed. "It's about understanding the rules deeply enough to find the loopholes or execute unexpected moves."

Dr Pete looked thoughtfully at his sons. "And that's what makes this issue so complex. In AI, we set up rules and guidelines to prevent misuse, but there will always be those who understand the system well enough to find a way around them."

Ethan nodded. "In sports, we study our opponents to anticipate their moves. Maybe we need to do the same with potential attackers - understand their strategies to better defend against them."

"Exactly, Ethan," Dr Pete replied. "Anticipating misuse is as important as designing the system itself."

A sudden commotion drew their attention. A couple of kids were chasing after a seagull that had swooped down and snatched a sandwich right out of their picnic basket. The seagull dodged left and right, effortlessly evading the children's grasp before taking off into the sky with its prize.

Ethan laughed, his eyes following the bird. "Looks like that seagull just pulled off the ultimate jailbreak."

Brody grinned. "No kidding. It outsmarted them without breaking a sweat."

Dr Pete chuckled. "Nature has its own way of illustrating our discussions. The seagull saw an opportunity, assessed the risks, and executed a flawless escape."

Ethan watched as the bird soared away, a smile playing on his lips. "Maybe we could learn a thing or two from that seagull."

"Perhaps," Dr Pete said. "But it also shows the importance of adaptability. Whether it's in AI, law, sports, or nature, those who can think creatively and adapt quickly often come out ahead."

Brody stood up, stretching his arms. "So, what's the solution? How do we build systems that are secure but also flexible enough to be useful?"

Dr Pete sighed softly. "That's the million-dollar question. It requires a multidisciplinary approach - technology, ethics, law, psychology. We need to design AI with robust safeguards but also educate users about responsible use. I've been saying for years that just like in basketball, AI is a team sport."

Ethan picked up the basketball again, spinning it thoughtfully. "Just like training players not just in skills but in sportsmanship and strategy."

"Exactly," Dr Pete replied. "And perhaps accepting that no system is entirely foolproof. It's about risk management, continuous improvement, and staying one step ahead."

The sun was beginning to dip toward the horizon, casting long shadows across the beach. The sky transformed into a canvas of pinks and oranges, the water reflecting the brilliant hues.

"Well," Brody said, his stomach rumbling softly, "all this talk has made me hungry. Anyone up for our world - famous Brighton fish and chips?"

Ethan's stomach growled audibly. "Absolutely. But let's keep an eye on the seagulls this time."

Dr Pete laughed, his eyes crinkling at the corners. "Agreed. And maybe we can brainstorm more about these concepts over dinner. I have a feeling there's much more to uncover."

As they packed up their belongings, Dr Pete glanced back at their beach box, a fond expression on his face. "You know, even if there's nothing valuable inside, it's ours. It's part of our family memories."

Ethan placed a hand on his father's shoulder. "And just like we protect this place, we'll find ways to safeguard the things that matter in our world - whether it's from thieves, jailbreakers, or overzealous seagulls."

Brody locked the door securely, giving it a firm tug to ensure it was fastened.

They walked up the beach together, the sound of the waves fading gently behind them. The day's events had given them much to think about, but also reminded them of the simple joys of family, conversation, and the endless wonders of the world around them.

As they disappeared into the evening bustle of Brighton Beach, a solitary seagull perched atop their beach box, eyeing the horizon with a keen gaze - ever watchful, ever adapting, just like them.

LinkedIn respects your privacy

Jailbreaking by the Box

Peter 'Dr Pete' Stanski

Thought Leader | Business Builder | Chief Technologist (CTO) | Ex-Amazon, Ex-Microsoft | ~20K+ Connections

More articles by this author

Others also viewed

Protect your mobile phone number ownership by preventing number moving requests

Lessons from Error 53

If the FBI forced you to help, what would you do?

Fairphone delivers on the promise.

2015: A Look Ahead

ATM Skimmer

Explore content categories

My Reflections on the Google Summit 2025

Jul 31, 2025

Reflections on AWS Sydney Summit 2025

Jun 5, 2025

Vibe Coding is Only Half the Story - Without the Right Input Data You’re Just Playing

May 13, 2025

Carjacking By The Box

Nov 24, 2024

Neuroscience by the Box

Nov 11, 2024

Scents and Sensibility By The Box

Nov 4, 2024

GPUs by The Box

Oct 28, 2024

Immersions by the Box

Oct 20, 2024

Self-Prompting by the Box

Oct 7, 2024

Dolittling by the Box

Sep 30, 2024