Lies-in-the-Loop Attack Exploits AI Coding Agents
Summary
Hide ▲
Show ▼
A new attack vector called 'lies-in-the-loop' (LITL) exploits AI coding agents to deceive users into granting permissions for dangerous actions. The attack manipulates AI agents into presenting seemingly safe contexts, leveraging human trust and fallibility. This technique was demonstrated on Anthropic's Claude Code and Microsoft Copilot Chat, showing potential for software supply chain attacks. The LITL attack exploits the intersection of agentic tooling and human fallibility, targeting AI agents that rely on human-in-the-loop (HITL) interactions for safety and security approvals. The attack can be applied to any AI agent that uses HITL mechanisms. The researchers from Checkmarx Zero demonstrated the attack by convincing Claude Code to run arbitrary commands, including a command injection that could lead to a software supply chain attack. The attack highlights the risks of prompt injection and the need for vigilance in reviewing AI-generated prompts. The research also shows that attackers can manipulate HITL dialogs to appear harmless, even though approving them triggers arbitrary code execution. The attack can originate from indirect prompt injections that poison the agent's context long before the dialog is shown. The researchers recommend a defense-in-depth approach to mitigate the risks.
Timeline
-
17.12.2025 18:00 1 articles · 23h ago
LITL Attack Demonstrated on Microsoft Copilot Chat
The research demonstrates the LITL attack on Microsoft Copilot Chat in VS Code, showing how improper Markdown sanitization allows injected elements to render in ways that could mislead users after approval. The disclosure timeline shows that Microsoft acknowledged a report in October 2025 and later marked it as completed without a fix, stating the behavior did not meet its criteria for a security vulnerability.
Show sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
15.09.2025 12:11 2 articles · 3mo ago
Lies-in-the-Loop Attack Demonstrated on Anthropic's Claude Code
Researchers from Checkmarx Zero demonstrated the 'lies-in-the-loop' (LITL) attack on Anthropic's Claude Code, an AI code assistant. The attack exploits the trust between humans and AI agents, using prompt injection to deceive users into granting dangerous permissions. The researchers successfully ran arbitrary commands and submitted malicious npm packages to GitHub repositories, highlighting the potential for software supply chain attacks. The attack underscores the need for vigilance in reviewing AI-generated prompts and careful management of AI agent adoption. The research also shows that attackers can manipulate HITL dialogs to appear harmless, even though approving them triggers arbitrary code execution. The attack can originate from indirect prompt injections that poison the agent's context long before the dialog is shown.
Show sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
Information Snippets
-
The 'lies-in-the-loop' (LITL) attack targets AI coding agents to deceive users into granting dangerous permissions.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The attack exploits the trust between humans and AI agents, manipulating them to present fake, safe contexts.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The LITL attack was demonstrated on Anthropic's Claude Code, an AI code assistant known for its safety considerations.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The attack involves prompt injection, where malicious commands are hidden within long responses to deceive users.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The researchers successfully demonstrated the attack by running arbitrary commands and submitting malicious npm packages to GitHub repositories.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The attack highlights the risks of prompt injection and the need for careful review of AI-generated prompts.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
Organizations are increasingly adopting AI agents, with 79% using AI-assisted coding agents in some workflows.
First reported: 15.09.2025 12:111 source, 1 articleShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
-
The security of AI agents remains a concern, especially in software development workflows.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The researchers recommend practicing suspicion with AI agents and external content, as well as careful management of AI agent adoption.
First reported: 15.09.2025 12:112 sources, 2 articlesShow sources
- 'Lies-in-the-Loop' Attack Defeats AI Coding Agents — www.darkreading.com — 15.09.2025 12:11
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The LITL attack can manipulate HITL dialogs to appear harmless, even though approving them triggers arbitrary code execution.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
Attackers can prepend benign-looking text, tamper with metadata, and exploit Markdown rendering flaws in user interfaces.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The attack can originate from indirect prompt injections that poison the agent's context long before the dialog is shown.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The research demonstrates the attack on both Claude Code and Microsoft Copilot Chat in VS Code.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
Anthropic acknowledged the reports in August 2025 but classified them as informational.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
Microsoft acknowledged a report in October 2025 and later marked it as completed without a fix, stating the behavior did not meet its criteria for a security vulnerability.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
-
The researchers recommend a defense-in-depth approach, including improving user awareness, strengthening visual clarity of approval dialogs, validating and sanitizing inputs, using safe OS APIs, and applying guardrails and reasonable length limits to dialogs.
First reported: 17.12.2025 18:001 source, 1 articleShow sources
- New “Lies-in-the-Loop” Attack Undermines AI Safety Dialogs — www.infosecurity-magazine.com — 17.12.2025 18:00
Similar Happenings
Google Enhances Chrome Agentic AI Security Against Indirect Prompt Injection Attacks
Google is introducing new security measures to protect Chrome's agentic AI capabilities from indirect prompt injection attacks. These protections include a new AI model called the User Alignment Critic, expanded site isolation policies, additional user confirmation steps for sensitive actions, and a prompt injection detection classifier. The User Alignment Critic independently evaluates the agent's actions, ensuring they align with the user's goals. Google is also enforcing Agent Origin Sets to limit the agent's access to relevant data origins and has developed automated red-teaming systems to test defenses. The company has announced bounty payments for security researchers to further enhance the system's robustness.
Emerging Security Risks of Agentic AI Browsers
A new generation of AI browsers, known as agentic browsers, is transitioning from passive tools to autonomous agents capable of executing tasks on behalf of users. This shift introduces significant security risks, including increased attack surfaces and vulnerabilities to prompt injection attacks. Security teams must adapt their strategies to mitigate these risks as the adoption of AI browsers grows.
Adaptive Multi-Turn Attacks Bypass Defenses in Open-Weight LLMs
Open-weight large language models (LLMs) remain vulnerable to adaptive multi-turn adversarial attacks, despite robust single-turn defenses. These persistent, multi-step conversations can achieve over 90% success rates against most tested defenses. Researchers from Cisco AI Defense identified 15 critical sub-threat categories, including malicious code generation, data exfiltration, and ethical boundary violations. The study highlights the need for enhanced security measures to protect against iterative manipulation. The findings emphasize the importance of implementing strict system prompts, deploying runtime guardrails, and conducting regular AI red-teaming assessments to mitigate risks.
AI Sidebar Spoofing Vulnerability in Atlas and Comet Browsers
Researchers from NeuralTrust have discovered a vulnerability in the OpenAI Atlas browser that allows for jailbreaking through the omnibox. This vulnerability can trick users into following malicious instructions, leading to potential data breaches and unauthorized actions. The attack works by disguising a prompt instruction as a URL, which is then treated as a trusted user intent. This can override user intent, trigger cross-domain actions, and bypass safety layers. The vulnerability affects the latest versions of the Atlas browser. Researchers demonstrated two realistic attack scenarios: a copy-link trap to phish credentials and destructive instructions to delete files. The attack requires only 'host' and 'storage' permissions, which are common for productivity tools. Users are advised to be cautious when using these browsers for sensitive activities and to restrict their use to non-sensitive tasks until further security measures are implemented. Earlier, researchers from SquareX discovered a similar vulnerability in OpenAI's Atlas and Perplexity's Comet browsers that allows for AI Sidebar Spoofing. This attack can trick users into following malicious instructions, leading to potential data breaches and unauthorized actions. The vulnerability affects the latest versions of both browsers and requires only 'host' and 'storage' permissions. Users are advised to be cautious and restrict the use of these browsers to non-sensitive activities.
ForcedLeak Vulnerability in Salesforce Agentforce Exploited via AI Prompt Injection
A critical vulnerability in Salesforce Agentforce, named ForcedLeak, allowed attackers to exfiltrate sensitive CRM data through indirect prompt injection. The flaw affected organizations using Salesforce Agentforce with Web-to-Lead functionality enabled. The vulnerability was discovered and reported by Noma Security on July 28, 2025. Salesforce has since patched the issue and implemented additional security measures, including regaining control of an expired domain and preventing AI agent output from being sent to untrusted domains. The exploit involved manipulating the Description field in Web-to-Lead forms to execute malicious instructions, leading to data leakage. Salesforce has enforced a Trusted URL allowlist to mitigate the risk of similar attacks in the future. The ForcedLeak vulnerability is a critical vulnerability chain with a CVSS score of 9.4, described as a cross-site scripting (XSS) play for the AI era. The exploit involves embedding a malicious prompt in a Web-to-Lead form, which the AI agent processes, leading to data leakage. The attack could potentially lead to the exfiltration of internal communications, business strategy insights, and detailed customer information. Salesforce is addressing the root cause of the vulnerability by implementing more robust layers of defense for their models and agents.