Responsible AI vs. Exploitable AI: Why an Interdisciplinary Approach is Critical

Betania Allo

International Cybersecurity & AI Policy │ Harvard-alum Tech Lawyer │ UN, OAS │ Bridging Tech & Law Across Countries | Keynote Speaker — A Latina voice for Responsible AI and tech ethics | 🇦🇷🇪🇸

Published Mar 3, 2025

Advanced AI models are revolutionizing industries with unprecedented capabilities but they also introduce new cybersecurity risks that grow as these models become more powerful. Recent incidents show that large language models (LLMs) and other AI systems can be exploited in novel ways, fueling AI-powered cyberattacks, deepfake fraud, and “jailbreak” exploits that bypass safety controls. Today I will briefly examine why today’s cutting-edge AI (from OpenAI’s ChatGPT to emerging systems like Qwen, Grok 3 and DeepSeek AI) are both transformative and vulnerable, questions whether current cybersecurity measures can keep pace, and argues that securing AI requires an interdisciplinary approach: blending technical safeguards with AI ethics, regulatory governance, and international policy coordination. Professionals who bridge these domains are in high demand across sectors, guiding organizations toward responsible AI development under evolving laws like the EU AI Act and domestic regulatory frameworks. The stakes are high: without robust, multi-faceted security strategies, AI’s rapid advance could spiral into a safety crisis. But with the right collaboration and oversight, we can harness powerful AI responsibly and avert disaster. Let's explore these issues with an authoritative look at threats, defenses, policy frameworks, and the path forward for trusted AI.

It is no news that AI models have grown astonishingly powerful in generating human-like text, images, and decisions. Ironically, as their capabilities surge, so do their vulnerabilities. Modern AI systems are complex software models and can harbor many of the same weaknesses as traditional software. Researchers warn that generative AI models can contain “a host of weaknesses or vulnerabilities” that malicious actors may exploit. One of the most prominent flaws in current AI is the susceptibility to prompt injection attacks. In such attacks, threat actors craft inputs that manipulate the model’s behavior in unintended ways, effectively hijacking the model.

Jailbreak exploits are a striking example. A jailbreak is a specially crafted prompt that forces an AI chatbot to ignore its safety guardrails and produce disallowed output. Ever since ChatGPT’s release, the bad guys have been discovering clever ways to “trick” LLMs into generating harmful content (hate speech, disinformation, even bomb-making instructions) despite the built-in filters. AI providers constantly patch these systems, but new jailbreaks keep emerging. Early jailbreaks were simple (like the notorious “DAN” Do Anything Now prompt) which just asked the model to ignore its rules. As defenses improved, attackers countered with more sophisticated jailbreaks, sometimes using AI-generated prompts or obfuscated characters to slip past filters. All major LLMs remain somewhat vulnerable to jailbreaks, and completely eliminating them “is nearly impossible. In other words, advanced AI models carry inherent security holes that determined adversaries can exploit, much as they exploit classic software vulnerabilities.

New AI systems sometimes struggle with safety guardrails. In testing, DeepSeek’s flagship chatbot failed to block 100% of known malicious prompts, indicating just how easily attackers can bypass weak safeguards. As AI models become more complex, “jailbreak” exploits highlight the challenge of securing AI behavior.

This growing attack surface isn’t just theoretical. Real-world tests underscore the problem. In a 2025 analysis, researchers threw 50 well-known malicious prompts at DeepSeek AI’s latest chatbot - and every single one got through its content filters. They achieved a 100% success rate in bypassing DeepSeek’s safety measures. By comparison, OpenAI’s and other established models have more mature safeguards, so DeepSeek’s lapse shows what can happen when AI development leaps ahead faster than security hardening. The result was shocking: any attempt to make the model produce toxic or dangerous output succeeded. Such failures illustrate a trade-off in rushing out powerful AI: without commensurate investment in safety, these models remain wide open to abuse. Even top-tier models that are rigorously trained can be induced into malicious behavior with clever input. In short, the more we rely on AI, the more crucial it becomes to secure it against exploits.

Notably, vulnerabilities extend beyond prompt tricks. Adversaries have shown they can attack AI models through methods like data poisoning (feeding corrupt training data to bias or weaken the model), model evasion (finding inputs that consistently fool the model’s detection, akin to adversarial examples), and even model extraction (stealing a model’s knowledge via repeated queries). The EU’s AI Act explicitly highlights risks such as “data poisoning,” “model evasion,” and adversarial attacks on AI systems. These technical exploits could lead to AI systems making dangerous errors or leaking sensitive info. For example, an attacker might reverse-engineer private data out of an AI (a privacy attack), or subtly corrupt an AI’s outputs to spread misinformation (output integrity attack). As AI gets integrated into everything from customer service bots to autonomous vehicles, these vulnerabilities become more than theoretical: they could be gateways for cyberattacks on critical infrastructure or means to scale fraud.

The uncomfortable truth is that no AI system of sufficient complexity will be 100% secure or foolproof. Just as traditional software has bugs, AI models will have “holes” in their logic or training that clever exploiters find. And because AI often operates in probabilistic ways, it can be harder to predict and patch all failure modes. The DeepSeek case is a red flag: AI developers cannot treat security as an afterthought. If cutting-edge models can be jailbroken at will, attackers will eagerly leverage that to cause harm – from generating illicit material to manipulating AI-driven processes in finance, healthcare, or government.

While AI systems themselves are targets, AI is also a potent weapon in the hands of attackers. We are now seeing cybercriminals and fraudsters using advanced AI to scale up attacks and create new ones that were previously infeasible. In 2024, it became evident that the most prevalent AI-enabled threat was not rogue “superintelligence” or self-replicating AI malware – it was old-fashioned fraud, supercharged by AI. Two of the most impactful forms are AI-powered phishing and deepfake scams.

AI-generated phishing: Traditional phishing emails are often clumsy and rife with grammatical errors, tipping off savvy users. Now, generative AI can produce highly convincing phishing lures at scale. Attackers feed models with details scraped from LinkedIn or company websites to output personalized, fluent emails that mimic a colleague or business partner’s style. Security researchers report that in 2024, 75% of phishing kits for sale on the dark web advertised some AI capability to craft more persuasive messages. These kits even tout deepfake integration – 82% of phishing kits claimed to offer deepfake features, blending synthetic media into phishing attempts. The result: phishing scams that look and sound alarmingly authentic, easily evading the “gut check” that might save a target. It’s no surprise that AI-fueled phishing has surged. With AI, criminals can automate the creation of endless phishing variations, defeating traditional spam filters and training. The sheer volume and quality of AI-generated bait mean that even well-trained employees might be fooled occasionally... and the attackers only need one success.
Deepfake fraud: Perhaps the most chilling new threat is deepfakes: AI-generated synthetic audio or video that impersonates real people. No longer sci-fi, deepfakes are being weaponized for high-stakes social engineering. In early 2024, a Hong Kong financial manager approved a $25.6 million transfer after a video conference with what seemed to be the company CFO and other executives. In reality, those “colleagues” were AI-generated deepfakes orchestrated by scammers. The fraudsters had convincingly cloned the faces and voices of the firm’s leaders to gain the employee’s trust and it worked. This jaw-dropping case shows how far deepfake realism has come, and how AI can facilitate large-scale heists without hacking a single computers. Furthermore, biometric verification, such as face-recognition for banking apps, is now threatened by “face swap” attacks. One security firm noted a 704% increase in deepfake-based face swaps in 2023, often used to bypass facial recognition loginssc. Such tactics have led analysts to predict that by 2026, 30% of companies may lose confidence in facial biometric authentication due to deepfake threats. Even voice deepfakes are common enough that a phishing attempt targeted a tech firm’s executive with an AI-cloned voice – luckily thwarted when the employee grew suspicious. These examples underscore that AI is enabling entirely new fraud vectors that bypass technical security controls by exploiting human trust and the authenticity of media.

Beyond social engineering, AI can generate malicious code and even help find software vulnerabilities. Long feared, this scenario has now been observed in the wild. In late 2024, HP security researchers uncovered the first documented case of AI-generated malware used in real attacks. The malware, targeting French users, was written in VBScript and JavaScript and bore telltale signs of having been coded by an LLM (such as unusual comments explaining code and variable naming styles). This confirmed that criminals are leveraging generative AI to produce working malware, lowering the barrier for less-skilled attackers OpenAI itself has acknowledged that its ChatGPT technology has been misused by cyber threat groups to write malware and assist in hacking. In fact, OpenAI took action in 2023 by banning several accounts tied to cybercriminal rings and sharing the indicators with law enforcement. This underscores how seriously AI providers are taking the threat: generative AI can enable “script kiddies” (novice hackers) to generate sophisticated exploits or polymorphic malware at the click of a button. The flood of malware variants AI can produce could overwhelm traditional detection. A security experiment showed an AI could generate thousands of polymorphic malware samples that evade antivirus tools. Meanwhile, AI’s ability to analyze code and data can be turned toward finding unknown vulnerabilities (zero-days), meaning attackers might use AI to discover new ways to breach.

Can Cybersecurity Keep Up with AI-Driven Threats?

The rapid proliferation of AI threats has sparked concern that cybersecurity measures may fall behind. After all, if attackers are using AI to innovate faster than defenders can respond, we could face a widening gap in the security landscape. Many cyber leaders indeed worry that AI threats are outpacing defenses. In one survey of CISOs, 91% believed that the adoption of AI is poised to outstrip the ability of security teams to keep up. As AI adds “new wrinkles on old security challenges” like phishing and introduces entirely new threat vectors, defenders are scrambling to adapt.

However, there is another side to this equation: AI is also a powerful tool for defenders. The cybersecurity community is actively leveraging AI and machine learning to enhance threat detection, automate responses, and analyze vast amounts of data for anomalies. For example, AI systems can detect subtle patterns in network traffic or user behavior that might signal a breach – things humans or legacy tools might miss. This has led to an arms race dynamic: AI vs. AI, where attackers deploy AI and defenders counter with. We see this in practice as AI-powered security products emerge to filter AI-generated phishing, verify media authenticity, and hunt for AI-written malware. Governments are also investing in AI for cyber defense; for instance, the U.S. Executive Order on AI in 2023 calls for developing advanced AI cybersecurity programs to find and fix vulnerabilities in critical software using AI.

So, if we adapt quickly and use AI to our advantage, cybersecurity can keep up with this. It’s true that AI expands the attack surface and increases attack speed, but defenders aren’t standing still. AI can dramatically speed up incident response and threat intelligence. The key is a mindset shift to treat AI as neither magic bullet nor doomsday device, but as a tool: one that both hackers and security teams wield. If we ensure AI is in the right hands (defenders, ethical researchers, robustly governed companies), we tilt the balance in favor of security.

There are encouraging precedents. Past technological leaps (from the internet to cloud computing) similarly gave attackers new opportunities, yet the cybersecurity field adapted with new defenses. By doubling down on innovation and sharing knowledge about AI threats, the security community can catch up. Transparency is vital: companies must be open about incidents so others can learn and prepare. Threat intel sharing, industry benchmarks, and red-team exercises specifically targeting AI systems are all essential so that defenses evolve in tandem with threats.

Yet, a purely technical arms race is not enough. Keeping up also means anticipating the societal implications of AI threats, something traditional cybersecurity alone may not cover. This is where an interdisciplinary approach becomes critical. It’s not just about better firewalls or AI detectors; it’s about understanding human factors (e.g. deepfake detection training for employees), ethical guidelines for AI deployment, and policies that incentivize security.

Given the multifaceted risks posed by advanced AI, it’s clear that no single field has all the answers. AI security isn’t purely a tech problem – it touches on ethical judgment, legal norms, and global policy. Addressing these challenges effectively calls for an interdisciplinary approach that brings together AI researchers, cybersecurity engineers, ethicists, legal experts, and policymakers. Solving AI’s value alignment and safety issues “requires interdisciplinary collaboration between AI researchers, ethicists, policymakers, and stakeholders." In practice, this means building AI security teams that include not only data scientists and security analysts, but also AI ethicists and governance specialists who consider misuse, bias, and compliance, as well as advisors versed in international law and policy who can navigate the regulatory landscape.

Such cross-disciplinary professionals are increasingly in demand. Organizations across sectors (tech, finance, healthcare, government) are realizing they need talent who can bridge the gap between cutting-edge AI technology and the ethical/regulatory considerations surrounding it. The job market is reflecting this need: new roles like AI Ethics Officer, AI Security Specialist, and AI Policy Advisor are emerging rapidly. Companies want experts who not only develop and implement AI, but can also ensure its security and set ethical guidelines. According to industry analyses, there is growing demand for specialists who understand AI’s technical underpinnings and can navigate issues of bias, transparency, and governance. These professionals act as translators between the technical teams and the C-suite or regulators, a critical function when AI projects carry both innovation potential and reputational risk.

Even global organizations recognize this need. UNESCO and other international bodies advocate a “holistic and interdisciplinary approach” to responsible AI, explicitly to tackle challenges like implicit bias and safety. In other words, AI ethics is not a luxury, it’s a necessity for sustainable AI deployment. AI systems, if left purely to engineering teams, might optimize for performance but miss ethical failures or security blind spots that someone with a different perspective would catch. By embedding ethicists and policy experts into AI development cycles, organizations can ensure that considerations like fairness, accountability, and abuse prevention are baked into the product – not patched on later (or ignored until a crisis hits).

Article content — Adapted by WEF and Oxford from Linkov, I., & Trump, B.D. (2019). The science and practice of resilience: Risk, systems and decisions, Chapter 6. Springer.

For professionals with expertise in AI ethics and policy, this is a moment of opportunity and responsibility. Interdisciplinary experts are in high demand to help steer AI projects in the right direction. Businesses are actively seeking guidance on questions like: How do we deploy generative AI without violating privacy laws? What governance framework should we adopt to manage AI risks enterprise-wide? How can we train our staff to recognize deepfake scams? These are not purely technical questions; they require an understanding of technology, law, and human behavior. The individuals and teams who can synthesize these domains are proving invaluable. They provide strategic consulting that can save organizations from costly missteps, ensuring innovation proceeds within ethical and secure bounds. In essence, they help companies answer: Just because we can do it with AI, should we? And if so, how do we do it safely and responsibly?

To meet this challenge, some forward-looking organizations are forming AI governance committees that include members from compliance, legal, IT security, and business units, alongside AI developers. Others bring in external advisors (academics or specialized consultants) to audit AI systems and stress-test their security and ethics. This interdisciplinary oversight isn’t about slowing down progress. It’s about future-proofing progress. With regulators and the public increasingly scrutinizing AI, having that breadth of expertise ensures companies aren’t blindsided by issues that could have been mitigated early on. It’s a proactive stance that savvy leadership is taking: involve the ethicists and policy folks at the table from the start. Not only does this reduce risk, it can enhance innovation. By building public trust and unlocking use cases that regulators will approve because the due diligence was done.

Responsible AI Development

The challenges outlined paint a daunting picture. Yet, the tone here is ultimately optimistic: we can meet these challenges by embracing a responsible AI development ethos backed by solid action. That means security is not an afterthought in AI innovation, but a core requirement; ethical considerations are not “nice to have,” but non-negotiable; and policy engagement is not someone else’s problem, but a shared responsibility.

Achieving responsible AI at scale will require enthusiasm and commitment from all stakeholders: developers, business leaders, regulators, and researchers. We should be excited about the opportunity to create AI that genuinely benefits humanity without causing collateral damage. The same creativity that drives AI breakthroughs can be applied to solving its risks. For instance, if generative models can produce malware, they can also be trained to detect malware; if they can generate disinformation, they can also help flag it. Companies like OpenAI, Google, and others are already pouring resources into AI alignment and safety research, and collaborating with academia on this front. Multistakeholder initiatives, such as partnership on AI, bring together diverse experts to hammer out guidelines for things like AI transparency and incident response.

There is also a growing movement for red-teaming and auditing AI systems. Before deploying an AI model, organizations should subject it to rigorous testing – attacking it, probing its biases, and seeing how it fails – then fixing those issues. This practice, common in cybersecurity, is now being adopted in AI (indeed, the U.S. EO formalizes red-team testing for critical models). I strongly advocate that every organization using advanced AI have an “AI audit” process in place. Don’t wait for regulation to force your hand; it’s just good business. As the saying goes, trust is earned: if users and clients know you’ve tested your AI for security and ethical compliance, they’ll be more likely to use it confidently.

Responsible AI development also calls for continuous learning and adaptation. The field of AI security is new and evolving; what works today might not suffice tomorrow. This is why connecting with external experts and communities is vital. Engage with conferences, standards bodies, and research publications on AI security. Encourage your technical teams to stay up-to-date on the latest adversarial attack methods and defenses. Likewise, keep abreast of policy developments: a new law or international agreement can change the compliance landscape quickly. The companies that thrive will be those that are nimble and well-informed, ready to pivot their practices as better ideas and tools emerge.

Finally, fostering a culture of ethical responsibility around AI will do wonders. When engineers, managers, and executives all appreciate the importance of AI safety, they will naturally incorporate those values into their work. This might mean an engineer feeling empowered to raise a concern that a model could be biased or vulnerable, or a product manager choosing to delay a launch until an extra security review is done. Leadership should champion this culture: it’s the kind of top-down support that ensures efforts like interdisciplinary collaboration aren’t seen as red tape, but as essential parts of the mission. ♡

betania@betaniaallo.com

Cyber News Global

5mo

H.E Dr Al-Kuwaiti sums it up https://youtube.com/shorts/f-Q3v3yCPVc?feature=share

3 Reactions

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

This is getting real! I'm seeing more and more about AI-powered adversarial machine learning in threat intelligence reports. What are your thoughts on how defenders can leverage explainable AI to better understand and counter these evolving attacks?

2 Reactions

See more comments

Responsible AI vs. Exploitable AI: Why an Interdisciplinary Approach is Critical

Betania Allo

International Cybersecurity & AI Policy │ Harvard-alum Tech Lawyer │ UN, OAS │ Bridging Tech & Law Across Countries | Keynote Speaker — A Latina voice for Responsible AI and tech ethics | 🇦🇷🇪🇸

Can Cybersecurity Keep Up with AI-Driven Threats?

Responsible AI Development

More articles by this author

Others also viewed

What exactly is Responsible AI anyway?

What exactly is Responsible AI anyway?

A Thrilling Tale of AI Blackmail

Social Engineer in Silicon: Deconstructing AI's #1 Threat

Have you heard about the Risk of AI Manipulation?

The Cybersecurity Wild West of Large Language Models: Risks, Intrigue, and Chaos

Is your AI assistant as reliable as you think it is? Behind the scenes of a revolution to be secured

AI Gone Rogue? Machine Builds Hidden Network, Sparking Security Concerns

Jailbreaking Generative AI

“AI Security” Just Got Interesting

Explore topics

Can Cybersecurity Keep Up with AI-Driven Threats?

Responsible AI Development

Regulating Emotional AI: What North Carolina’s SB 624 Gets Right

Jun 17, 2025

Your AI BFF Might Be Gaslighting You: The Next Frontier of AI Regulation

Jun 16, 2025

Companies, Take Note: Your Team Is Sharing Sensitive Data With AI Without Realizing It

Apr 7, 2025

Trust, Transparency, and Tech: The Triple T of Future Governance

Feb 17, 2025

The Future of Cybersecurity is Governance—Here’s Why

Feb 4, 2025

DeepSeek AI: A Chinese New Year Fortune Cookie... Cracked Open With Code

Jan 29, 2025

UN’s New Pact: A Bold Blueprint for the Future of Global Governance

Sep 24, 2024

AI in the Middle East: What Gulf Countries Are Getting Right About Regulations

Aug 30, 2024

10 Countries Leading the AI Revolution: Who’s Setting the Rules?

Aug 21, 2024

What the UN’s New Convention on Cybercrime Means for You, your Business and your Country

Aug 9, 2024

Others also viewed

What exactly is Responsible AI anyway?

What exactly is Responsible AI anyway?

A Thrilling Tale of AI Blackmail

Social Engineer in Silicon: Deconstructing AI's #1 Threat

Have you heard about the Risk of AI Manipulation?

The Cybersecurity Wild West of Large Language Models: Risks, Intrigue, and Chaos

Is your AI assistant as reliable as you think it is? Behind the scenes of a revolution to be secured

AI Gone Rogue? Machine Builds Hidden Network, Sparking Security Concerns

Jailbreaking Generative AI

“AI Security” Just Got Interesting

Explore topics

Regulating Emotional AI: What North Carolina’s SB 624 Gets Right