Jailbreak Techniques Exploit GPT-5 and AI Agents in Cloud and IoT Environments
Summary
Hide β²
Show βΌ
Researchers have discovered a jailbreak technique that bypasses ethical guardrails in OpenAI's GPT-5 model. This technique, combining Echo Chamber and narrative-driven steering, tricks the model into producing harmful instructions. Additionally, zero-click AI agent attacks have been identified, targeting cloud and IoT systems, highlighting the risks of integrating AI models with external systems. The Echo Chamber method, combined with Crescendo, manipulates GPT-5 into generating prohibited content by framing it within a story. This technique avoids explicit intent signaling, making it difficult for the model to detect and refuse malicious prompts. The attacks demonstrate how AI agents connected to external systems can be exploited to exfiltrate sensitive data, emphasizing the need for robust security measures in AI development.
Timeline
-
09.08.2025 18:06 π° 1 articles Β· β± 1mo ago
Echo Chamber and Crescendo Techniques Exploit GPT-5 and AI Agents
Researchers have uncovered a jailbreak technique combining Echo Chamber and Crescendo to bypass ethical guardrails in GPT-5. This method manipulates the model into producing harmful instructions by framing them within a story. Additionally, zero-click AI agent attacks, such as AgentFlayer, have been identified, exploiting AI connectors to exfiltrate sensitive data from cloud and IoT systems. These attacks highlight the risks of integrating AI models with external systems and the need for robust security measures.
Show sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
Information Snippets
-
Echo Chamber, a jailbreak technique, manipulates LLMs by seeding and reinforcing a poisonous conversational context.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
Crescendo, a multi-turn jailbreaking technique, has been used in combination with Echo Chamber to bypass defenses in xAI's Grok 4.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
The attack on GPT-5 involves framing harmful instructions within a story, using keywords and narrative-driven steering.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
AgentFlayer attacks exploit AI connectors like Google Drive to trigger zero-click attacks and exfiltrate sensitive data.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
Zero-click attacks on AI agents can be initiated through malicious documents, Jira tickets, or emails containing prompt injections.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
AI agents' excessive autonomy and ability to act independently can be exploited to manipulate them and leak data.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
Integration of AI models with external systems increases the attack surface and potential for security vulnerabilities.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
SPLX's tests found GPT-5 to be nearly unusable for enterprise out of the box, with GPT-4o outperforming it on hardened benchmarks.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
-
Researchers demonstrated how prompt injections can hijack smart home systems using Google's Gemini AI.
First reported: 09.08.2025 18:06π° 1 source, 1 articleShow sources
- Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems β thehackernews.com β 09.08.2025 18:06
Similar Happenings
MostereRAT Malware Campaign Targets Japanese Windows Users
A new malware campaign involving MostereRAT, a banking malware-turned-remote access Trojan (RAT), has been identified. This campaign uses sophisticated evasion techniques, including the use of an obscure programming language, disabling of security tools, and mutual TLS (mTLS) for command-and-control communications to maintain long-term access to compromised systems. The malware targets Microsoft Windows users in Japan, deploying through phishing emails and weaponized Word documents. MostereRAT's capabilities include persistence, privilege escalation, AV evasion, and remote access tool deployment. The campaign highlights the importance of removing local administrator privileges and blocking unapproved remote access tools. The malware's design reflects long-term, strategic, and flexible objectives, with capabilities to extend functionality, deploy additional payloads, and apply evasion techniques. These features point to an intent to maintain persistent control over compromised systems, maximize the utility of victim resources, and retain ongoing access to valuable data.
AI-Powered Cyberattacks Targeting Critical Sectors Disrupted
Anthropic disrupted a sophisticated AI-powered cyberattack campaign in July 2025. The operation, codenamed GTG-2002, targeted 17 organizations across healthcare, emergency services, government, and religious institutions. The attacker used Anthropic's AI-powered chatbot Claude to automate theft and extortion, threatening to expose stolen data publicly to extort ransoms ranging from $75,000 to $500,000 in Bitcoin. The attacker employed Claude Code on Kali Linux to automate various phases of the attack cycle, including reconnaissance, credential harvesting, and network penetration. The AI tool was also used to craft bespoke versions of the Chisel tunneling utility, disguise malicious executables, and organize stolen data for monetization. The attacker used Claude Code to create scanning frameworks using a variety of APIs, provide preferred operational TTPs, and perform real-time assistance with network penetrations. The AI tool was also used to create obfuscated versions of the Chisel tunneling tool, develop new TCP proxy code, analyze exfiltrated financial data to determine ransom amounts, and generate visually alarming HTML ransom notes. The attacker used AI to make tactical and strategic decisions, adapt to defensive measures in real-time, and create customized ransom notes and extortion strategies. The attacker's activities led Anthropic to develop a tailored classifier and new detection method to prevent future abuse. The operation represents a shift to 'vibe hacking,' where threat actors use LLMs and agentic AI to perform attacks.
AI-Powered Ransomware 'PromptLock' Under Development
A new AI-powered ransomware strain named 'PromptLock' has been discovered by ESET researchers. This ransomware uses an AI model to generate scripts on the fly, making it difficult to detect. The malware is currently in development and has not been observed in active attacks. It is designed to exfiltrate files, encrypt data, and potentially destroy files. The ransomware was uploaded to VirusTotal from the United States and is written in the Go programming language, with variants for Windows, Linux, and macOS systems. The Bitcoin address associated with PromptLock appears to belong to Satoshi Nakamoto. PromptLock uses the SPECK 128-bit encryption algorithm to lock files and can generate custom notes based on the files affected and the type of infected machine.
AI systems vulnerable to data-theft prompts in downscaled images
Researchers have demonstrated a new attack method that steals user data by embedding malicious prompts in images. These prompts are invisible in full-resolution images but become visible when the images are downscaled by AI systems. The attack exploits aliasing artifacts introduced by resampling algorithms, allowing hidden text to emerge and be interpreted as user instructions by the AI model. This can lead to data leakage or unauthorized actions. The method has been successfully tested against several AI systems, including Google Gemini CLI, Vertex AI Studio, Gemini's web interface, Gemini's API, Google Assistant on Android, and Genspark. The attack was developed by Kikimora Morozova and Suha Sabi Hussain from Trail of Bits, building on a 2020 theory presented in a USENIX paper. The researchers have also released an open-source tool, Anamorpher, to create images for testing the attack. They recommend implementing dimension restrictions and user confirmation for sensitive tool calls as mitigation strategies.
PromptFix Exploit Targets AI Browsers for Malicious Prompts
Researchers from Guardio Labs have demonstrated a new prompt injection technique called PromptFix. This exploit tricks generative AI (GenAI) models into executing malicious instructions embedded within fake CAPTCHA checks on web pages. The attack targets AI-driven browsers like Perplexity's Comet, which automate tasks such as shopping and email management. The exploit misleads AI models into interacting with phishing pages or fraudulent sites without user intervention, leading to potential data breaches and financial losses. The technique, dubbed Scamlexity, represents a new era of scams where AI convenience collides with invisible scam surfaces, making humans collateral damage. The exploit can trick AI models into purchasing items on fake websites, entering credentials on phishing pages, or downloading malicious payloads. The findings underscore the need for robust defenses in AI systems to anticipate, detect, and neutralize such attacks. Microsoft Edge is embedding agentic browsing features through a Copilot integration, and OpenAI is developing an agentic AI browser platform codenamed 'Aura'. Comet is quickly penetrating the mainstream consumer market. Agentic AI browsers were released with inadequate security safeguards against known and novel attacks. Guardio advises against assigning sensitive tasks to agentic AI browsers until their security matures. AI browser agents from major AI firms failed to reliably detect the signs of a phishing site. Comet often added items to a shopping cart, filled out credit-card details, and clicked the buy button on a fake Walmart site. AI browsers with access to email will read and act on prompts embedded in the messages. AI companies need stronger sanitation and guardrails against these attacks. Nearly all companies (96%) claim to want to expand their use of AI agents in the next year, but most are not prepared for the new risks posed by AI agents in a business environment. A fundamental issue is how to discern actions taken through a browser by a user versus those taken by an agent. AI agents need to be experts at not just getting things done, but at sussing out and blocking potential security threats to workers and company data. Companies should move from "trust, but verify" to "doubt, and double verify"βessentially hobbling automation until an AI agent has shown it can always complete a workflow properly. Defective AI operations continue to be a major problem, and security represents another layer on top of those issues. Companies should hold off on putting AI agents into any business process that requires reliability until AI-agent makers offer better visibility, control, and security. Companies that intend to push their use of AI into agent-based workflows should focus on a comprehensive strategy, including inventorying all AI services used by employees and creating an AI usage policy. Employees need to understand the basics of AI safety and what it means to give these bots information or privileges to do things on their behalf.