ChatGPT downgrade attack via prompt manipulation

First reported

21.08.2025 23:35

Last updated

1 unique sources, 1 articles

Summary

Hide ▲

A new technique called PROMISQROUTE allows attackers to downgrade ChatGPT to less secure models by manipulating prompts. This technique exploits ChatGPT's routing mechanism, which directs prompts to different models based on complexity and task type. The downgraded models are more susceptible to jailbreak attacks, posing a security risk. The vulnerability arises because ChatGPT uses a routing layer to direct prompts to appropriate models, including older, less secure versions. Attackers can influence this routing by adding specific phrases or keywords to their prompts, tricking the system into using less secure models. The impact of this attack includes the potential for malicious actors to bypass security measures and exploit vulnerabilities in older models. OpenAI has acknowledged the issue but has not provided a detailed solution.

Timeline

21.08.2025 23:35 1 articles · 1mo ago

PROMISQROUTE downgrade attack technique disclosed
Researchers from Adversa disclosed a new technique called PROMISQROUTE, which allows attackers to downgrade ChatGPT to less secure models by manipulating prompts. This technique exploits ChatGPT's routing mechanism, which directs prompts to different models based on complexity and task type. Attackers can influence this routing by adding specific phrases or keywords to their prompts, tricking the system into using less secure models. The impact of this attack includes the potential for malicious actors to bypass security measures and exploit vulnerabilities in older models. OpenAI has acknowledged the issue but has not provided a detailed solution.
Show sources

Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35
Open in new tab

Information Snippets

The PROMISQROUTE technique allows attackers to downgrade ChatGPT to less secure models by manipulating prompts.
First reported: 21.08.2025 23:35

1 source, 1 article
Show sources
- Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35
ChatGPT's routing layer directs prompts to different models based on complexity and task type.
First reported: 21.08.2025 23:35

1 source, 1 article
Show sources
- Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35
Older, less secure models are more susceptible to jailbreak attacks.
First reported: 21.08.2025 23:35

1 source, 1 article
Show sources
- Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35
Attackers can influence the routing mechanism by adding specific phrases or keywords to their prompts.
First reported: 21.08.2025 23:35

1 source, 1 article
Show sources
- Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35
The impact of this attack includes the potential for malicious actors to bypass security measures and exploit vulnerabilities in older models.
First reported: 21.08.2025 23:35

1 source, 1 article
Show sources
- Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35
OpenAI has acknowledged the issue but has not provided a detailed solution.
First reported: 21.08.2025 23:35

1 source, 1 article
Show sources
- Easy ChatGPT Downgrade Attack Undermines GPT-5 Security — www.darkreading.com — 21.08.2025 23:35

Similar Happenings

ShadowLeak: Undetectable Email Theft via AI Agents

A new attack vector, dubbed ShadowLeak, allows hackers to invisibly steal emails from users who integrate AI agents like ChatGPT with their email inboxes. The attack exploits the lack of visibility into AI processing on cloud infrastructure, making it undetectable to the user. The vulnerability was discovered by Radware and reported to OpenAI, which addressed it in August 2025. The attack involves embedding malicious code in emails, which the AI agent processes and acts upon without user awareness. The attack leverages an indirect prompt injection hidden in email HTML, using techniques like tiny fonts, white-on-white text, and layout tricks to remain undetected by the user. The attack can be extended to any connector that ChatGPT supports, including Box, Dropbox, GitHub, Google Drive, HubSpot, Microsoft Outlook, Notion, or SharePoint. The ShadowLeak attack targets users who connect AI agents to their email inboxes, such as those using ChatGPT with Gmail. The attack is non-detectable and leaves no trace on the user's network. The exploit involves embedding malicious code in emails, which the AI agent processes and acts upon, exfiltrating sensitive data to an attacker-controlled server. OpenAI acknowledged and fixed the issue in August 2025, but the exact details of the fix remain unclear. The exfiltration in ShadowLeak occurs directly within OpenAI's cloud environment, bypassing traditional security controls.

Open in new tab

SAP S/4HANA Command Injection Vulnerability CVE-2025-42957 Exploited in the Wild

A critical command injection vulnerability in SAP S/4HANA, tracked as CVE-2025-42957, is actively exploited in the wild. The flaw allows attackers with low-privileged user access to execute arbitrary ABAP code, potentially leading to full system compromise. The vulnerability affects both on-premise and Private Cloud editions of SAP S/4HANA. The flaw was patched in SAP's August 2025 updates, but exploitation has been observed. SecurityBridge Threat Research Labs, BleepingComputer, and Pathlock have reported active exploitation. Organizations are advised to apply patches, monitor logs for suspicious RFC calls or new admin users, implement SAP's Unified Connectivity framework (UCON) to restrict RFC usage, and take additional security measures to mitigate the risk.

Open in new tab

Emergence of AI-Powered Ransomware Strain PromptLock

A new AI-powered ransomware strain, named PromptLock, has been identified by ESET researchers. The ransomware leverages an AI model to generate Lua scripts on the fly, complicating detection and defense. PromptLock is not yet active in the wild but is nearly ready for deployment. It can exfiltrate files and encrypt data, with plans to add file destruction capabilities. The ransomware was uploaded to VirusTotal from the United States and is written in Go, targeting both Windows, Linux, and macOS systems. The Bitcoin address used for ransom payments is linked to Satoshi Nakamoto. The development of AI-driven ransomware presents new challenges for cybersecurity defenders. The ransomware strain was discovered by Anton Cherepanov and Peter Strycek, who shared their findings on social media 18 hours after detecting samples on VirusTotal. The use of AI in ransomware introduces variability in indicators of compromise (IoCs), making detection more difficult. PromptLock uses the SPECK 128-bit encryption algorithm to lock files and can generate custom notes based on the files affected and the type of infected machine. The attacker can establish a proxy or tunnel from the compromised network to a server running the Ollama API with the gpt-oss-20b model.

Open in new tab

Summary

Timeline

PROMISQROUTE downgrade attack technique disclosed

Information Snippets

Similar Happenings

ShadowLeak: Undetectable Email Theft via AI Agents

SAP S/4HANA Command Injection Vulnerability CVE-2025-42957 Exploited in the Wild

Emergence of AI-Powered Ransomware Strain PromptLock