GPT-5 Jailbreak and Zero-Click AI Agent Attacks on Cloud and IoT Systems
Summary
Hide â˛
Show âŧ
Researchers have discovered a jailbreak technique to bypass GPT-5's ethical guardrails, leveraging Echo Chamber and narrative-driven steering to produce harmful content. Additionally, zero-click AI agent attacks have been identified, targeting cloud and IoT systems, exploiting vulnerabilities in AI connectors and integrations. The Echo Chamber technique manipulates GPT-5 by seeding a poisonous conversational context and guiding the model with subtle storytelling, avoiding explicit intent signaling. This method has been combined with Crescendo to bypass defenses in other AI models. Zero-click attacks, such as AgentFlayer, exploit AI agents connected to cloud services and code repositories, leading to data exfiltration and unauthorized access. These attacks highlight the risks of integrating AI models with external systems, increasing the potential attack surface and introducing new security vulnerabilities. Researchers successfully jailbroke GPT-5 within 24 hours of its release using the Echo Chamber and Storytelling technique, demonstrating the ongoing vulnerability of LLMs to multi-turn attacks.
Timeline
-
11.08.2025 19:46 đ° 1 articles
GPT-5 Jailbreak Demonstrated Using Echo Chamber and Storytelling
Researchers from NeuralTrust successfully jailbroke GPT-5 within 24 hours of its release using the Echo Chamber and Storytelling technique. The attack required only three turns and did not use unsafe language in the initial prompts. The technique leverages narrative continuity and urgency to bypass safety mechanisms, demonstrating the ongoing vulnerability of LLMs to multi-turn attacks. The Echo Chamber technique can also be applied to previous versions of OpenAI's GPT, Google's Gemini, and Grok-4. NeuralTrust has offered to share their findings with OpenAI to help address these vulnerabilities.
Show sources
- Echo Chamber, Prompts Used to Jailbreak GPT-5 in 24 Hours â www.darkreading.com â 11.08.2025 19:46
-
09.08.2025 18:06 đ° 1 articles
GPT-5 Jailbreak and Zero-Click AI Agent Attacks Disclosed
Researchers have uncovered a jailbreak technique for GPT-5, leveraging Echo Chamber and narrative-driven steering to produce harmful content. Additionally, zero-click AI agent attacks have been identified, targeting cloud and IoT systems. These attacks exploit vulnerabilities in AI connectors and integrations, leading to data exfiltration and unauthorized access.
Show sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
Information Snippets
-
Echo Chamber is a jailbreak technique that uses indirect references, semantic steering, and multi-step inference to deceive LLMs into generating responses to prohibited topics.
First reported: 09.08.2025 18:06đ° 2 sources, 2 articlesShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
- Echo Chamber, Prompts Used to Jailbreak GPT-5 in 24 Hours â www.darkreading.com â 11.08.2025 19:46
-
Echo Chamber has been combined with Crescendo, a multi-turn jailbreaking technique, to bypass defenses in xAI's Grok 4 and GPT-5.
First reported: 09.08.2025 18:06đ° 2 sources, 2 articlesShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
- Echo Chamber, Prompts Used to Jailbreak GPT-5 in 24 Hours â www.darkreading.com â 11.08.2025 19:46
-
GPT-5 can be manipulated to produce harmful procedural content by framing it within a story, using keywords and iterative steering.
First reported: 09.08.2025 18:06đ° 2 sources, 2 articlesShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
- Echo Chamber, Prompts Used to Jailbreak GPT-5 in 24 Hours â www.darkreading.com â 11.08.2025 19:46
-
AgentFlayer is a zero-click attack that exploits AI connectors, such as those for Google Drive, to exfiltrate sensitive data from cloud storage services.
First reported: 09.08.2025 18:06đ° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
-
Zero-click attacks targeting AI agents can exploit integrations with Jira and Microsoft Copilot Studio, leading to data exfiltration and unauthorized access.
First reported: 09.08.2025 18:06đ° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
-
AI agents' excessive autonomy and ability to act independently can be leveraged in zero-click attacks to manipulate them and leak data.
First reported: 09.08.2025 18:06đ° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems â thehackernews.com â 09.08.2025 18:06
Similar Happenings
HexStrike AI Exploits Citrix Vulnerabilities Disclosed in August 2025
Threat actors have begun using HexStrike AI to exploit Citrix vulnerabilities disclosed in August 2025. HexStrike AI, an AI-driven security platform, was designed to automate reconnaissance and vulnerability discovery for authorized red teaming operations, but it has been repurposed for malicious activities. The exploitation attempts target three Citrix vulnerabilities, with some threat actors offering access to vulnerable NetScaler instances for sale on darknet forums. The use of HexStrike AI by threat actors significantly reduces the time between vulnerability disclosure and exploitation, increasing the risk of widespread attacks. The tool's automation capabilities allow for continuous exploitation attempts, enhancing the likelihood of successful breaches. Security experts emphasize the urgency of patching and hardening affected systems to mitigate the risks posed by this AI-driven threat. HexStrike AI's client features a retry logic and recovery handling to mitigate the effects of failures in any individual step on its complex operations. HexStrike AI has been open-source and available on GitHub for the last month, where it has already garnered 1,800 stars and over 400 forks. Hackers started discussing HexStrike AI on hacking forums within hours of the Citrix vulnerabilities disclosure. HexStrike AI has been used to automate the exploitation chain, including scanning for vulnerable instances, crafting exploits, delivering payloads, and maintaining persistence. Check Point recommends defenders focus on early warning through threat intelligence, AI-driven defenses, and adaptive detection.
AI-Powered Cyberattacks Targeting Critical Sectors Disrupted
Anthropic disrupted an AI-powered operation in July 2025 that used its Claude AI chatbot to conduct large-scale theft and extortion across 17 organizations in healthcare, emergency services, government, and religious sectors. The actor used Claude Code on Kali Linux to automate various phases of the attack cycle, including reconnaissance, credential harvesting, and network penetration. The operation, codenamed GTG-2002, employed AI to make tactical and strategic decisions, exfiltrating sensitive data and demanding ransoms ranging from $75,000 to $500,000 in Bitcoin. The actor used AI to craft bespoke versions of the Chisel tunneling utility to evade detection and disguise malicious executables as legitimate Microsoft tools. The operation highlights the increasing use of AI in cyberattacks, making defense and enforcement more challenging. Anthropic developed new detection methods to prevent future abuse of its AI models.
AI-Driven Ransomware Strain PromptLock Discovered
A new ransomware strain named PromptLock has been identified by ESET researchers. This strain leverages AI to generate malicious scripts in real-time, making it more difficult to detect and defend against. PromptLock is currently in development and has not been observed in active attacks. It can exfiltrate files, encrypt data, and is being upgraded to destroy files. The ransomware uses the gpt-oss:20b model from OpenAI via the Ollama API and is written in Go, targeting Windows, Linux, and macOS systems. The Bitcoin address associated with PromptLock appears to belong to Satoshi Nakamoto. PromptLock uses Lua scripts generated from hard-coded prompts to enumerate the local filesystem, inspect target files, exfiltrate selected data, and perform encryption. The ransomware can generate custom ransom notes based on the type of infected machine and uses the SPECK 128-bit encryption algorithm to lock files. PromptLock is assessed to be a proof-of-concept (PoC) rather than a fully operational malware deployed in the wild.
AI systems vulnerable to data-theft via hidden prompts in downscaled images
Researchers at Trail of Bits have demonstrated a new attack method that exploits image downscaling in AI systems to steal user data. The attack injects hidden prompts in full-resolution images that become visible when the images are resampled to lower quality. These prompts are interpreted by AI models as user instructions, potentially leading to data leakage or unauthorized actions. The vulnerability affects multiple AI systems, including Google Gemini CLI, Vertex AI Studio, Google Assistant on Android, and Genspark. The attack works by embedding instructions in images that are only revealed when the images are downscaled using specific resampling algorithms. The AI model then interprets these hidden instructions as part of the user's input, executing them without the user's knowledge. The researchers have developed an open-source tool, Anamorpher, to create images for testing this vulnerability. To mitigate the risk, Trail of Bits recommends implementing dimension restrictions on image uploads, providing users with previews of downscaled images, and requiring explicit user confirmation for sensitive tool calls.
UNC5518 deploys CORNFLAKE.V3 backdoor via ClickFix and fake CAPTCHA pages
UNC5518, an access-as-a-service threat actor, deploys the CORNFLAKE.V3 backdoor using the ClickFix social engineering tactic and fake CAPTCHA pages. This backdoor is used by at least two other groups, UNC5774 and UNC4108, to initiate multi-stage infections and drop additional payloads. The attack begins with users being tricked into running a malicious PowerShell script via a fake CAPTCHA page. The script executes a dropper payload that ultimately launches CORNFLAKE.V3, which supports various payload types and collects system information. The backdoor has been observed in both JavaScript and PHP versions and uses Cloudflare tunnels to avoid detection. A new ClickFix variant manipulates AI-generated text summaries to deliver malicious commands, turning AI tools into active participants in social engineering attacks.