CyberHappenings logo

Track cybersecurity events as they unfold. Sourced timelines. Filter, sort, and browse. Fast, privacy‑respecting. No invasive ads, no tracking.

Lies-in-the-Loop Attack Exploits AI Coding Agents

First reported
Last updated
2 unique sources, 2 articles

Summary

Hide ▲

A new attack vector called 'lies-in-the-loop' (LITL) exploits AI coding agents to deceive users into granting permissions for dangerous actions. The attack manipulates AI agents into presenting seemingly safe contexts, leveraging human trust and fallibility. This technique was demonstrated on Anthropic's Claude Code and Microsoft Copilot Chat, showing potential for software supply chain attacks. The LITL attack exploits the intersection of agentic tooling and human fallibility, targeting AI agents that rely on human-in-the-loop (HITL) interactions for safety and security approvals. The attack can be applied to any AI agent that uses HITL mechanisms. The researchers from Checkmarx Zero demonstrated the attack by convincing Claude Code to run arbitrary commands, including a command injection that could lead to a software supply chain attack. The attack highlights the risks of prompt injection and the need for vigilance in reviewing AI-generated prompts. The research also shows that attackers can manipulate HITL dialogs to appear harmless, even though approving them triggers arbitrary code execution. The attack can originate from indirect prompt injections that poison the agent's context long before the dialog is shown. The researchers recommend a defense-in-depth approach to mitigate the risks.

Timeline

  1. 17.12.2025 18:00 1 articles · 23h ago

    LITL Attack Demonstrated on Microsoft Copilot Chat

    The research demonstrates the LITL attack on Microsoft Copilot Chat in VS Code, showing how improper Markdown sanitization allows injected elements to render in ways that could mislead users after approval. The disclosure timeline shows that Microsoft acknowledged a report in October 2025 and later marked it as completed without a fix, stating the behavior did not meet its criteria for a security vulnerability.

    Show sources
  2. 15.09.2025 12:11 2 articles · 3mo ago

    Lies-in-the-Loop Attack Demonstrated on Anthropic's Claude Code

    Researchers from Checkmarx Zero demonstrated the 'lies-in-the-loop' (LITL) attack on Anthropic's Claude Code, an AI code assistant. The attack exploits the trust between humans and AI agents, using prompt injection to deceive users into granting dangerous permissions. The researchers successfully ran arbitrary commands and submitted malicious npm packages to GitHub repositories, highlighting the potential for software supply chain attacks. The attack underscores the need for vigilance in reviewing AI-generated prompts and careful management of AI agent adoption. The research also shows that attackers can manipulate HITL dialogs to appear harmless, even though approving them triggers arbitrary code execution. The attack can originate from indirect prompt injections that poison the agent's context long before the dialog is shown.

    Show sources

Information Snippets

Similar Happenings

Google Enhances Chrome Agentic AI Security Against Indirect Prompt Injection Attacks

Google is introducing new security measures to protect Chrome's agentic AI capabilities from indirect prompt injection attacks. These protections include a new AI model called the User Alignment Critic, expanded site isolation policies, additional user confirmation steps for sensitive actions, and a prompt injection detection classifier. The User Alignment Critic independently evaluates the agent's actions, ensuring they align with the user's goals. Google is also enforcing Agent Origin Sets to limit the agent's access to relevant data origins and has developed automated red-teaming systems to test defenses. The company has announced bounty payments for security researchers to further enhance the system's robustness.

Emerging Security Risks of Agentic AI Browsers

A new generation of AI browsers, known as agentic browsers, is transitioning from passive tools to autonomous agents capable of executing tasks on behalf of users. This shift introduces significant security risks, including increased attack surfaces and vulnerabilities to prompt injection attacks. Security teams must adapt their strategies to mitigate these risks as the adoption of AI browsers grows.

Adaptive Multi-Turn Attacks Bypass Defenses in Open-Weight LLMs

Open-weight large language models (LLMs) remain vulnerable to adaptive multi-turn adversarial attacks, despite robust single-turn defenses. These persistent, multi-step conversations can achieve over 90% success rates against most tested defenses. Researchers from Cisco AI Defense identified 15 critical sub-threat categories, including malicious code generation, data exfiltration, and ethical boundary violations. The study highlights the need for enhanced security measures to protect against iterative manipulation. The findings emphasize the importance of implementing strict system prompts, deploying runtime guardrails, and conducting regular AI red-teaming assessments to mitigate risks.

AI Sidebar Spoofing Vulnerability in Atlas and Comet Browsers

Researchers from NeuralTrust have discovered a vulnerability in the OpenAI Atlas browser that allows for jailbreaking through the omnibox. This vulnerability can trick users into following malicious instructions, leading to potential data breaches and unauthorized actions. The attack works by disguising a prompt instruction as a URL, which is then treated as a trusted user intent. This can override user intent, trigger cross-domain actions, and bypass safety layers. The vulnerability affects the latest versions of the Atlas browser. Researchers demonstrated two realistic attack scenarios: a copy-link trap to phish credentials and destructive instructions to delete files. The attack requires only 'host' and 'storage' permissions, which are common for productivity tools. Users are advised to be cautious when using these browsers for sensitive activities and to restrict their use to non-sensitive tasks until further security measures are implemented. Earlier, researchers from SquareX discovered a similar vulnerability in OpenAI's Atlas and Perplexity's Comet browsers that allows for AI Sidebar Spoofing. This attack can trick users into following malicious instructions, leading to potential data breaches and unauthorized actions. The vulnerability affects the latest versions of both browsers and requires only 'host' and 'storage' permissions. Users are advised to be cautious and restrict the use of these browsers to non-sensitive activities.

ForcedLeak Vulnerability in Salesforce Agentforce Exploited via AI Prompt Injection

A critical vulnerability in Salesforce Agentforce, named ForcedLeak, allowed attackers to exfiltrate sensitive CRM data through indirect prompt injection. The flaw affected organizations using Salesforce Agentforce with Web-to-Lead functionality enabled. The vulnerability was discovered and reported by Noma Security on July 28, 2025. Salesforce has since patched the issue and implemented additional security measures, including regaining control of an expired domain and preventing AI agent output from being sent to untrusted domains. The exploit involved manipulating the Description field in Web-to-Lead forms to execute malicious instructions, leading to data leakage. Salesforce has enforced a Trusted URL allowlist to mitigate the risk of similar attacks in the future. The ForcedLeak vulnerability is a critical vulnerability chain with a CVSS score of 9.4, described as a cross-site scripting (XSS) play for the AI era. The exploit involves embedding a malicious prompt in a Web-to-Lead form, which the AI agent processes, leading to data leakage. The attack could potentially lead to the exfiltration of internal communications, business strategy insights, and detailed customer information. Salesforce is addressing the root cause of the vulnerability by implementing more robust layers of defense for their models and agents.