CyberHappenings logo

Track cybersecurity events as they unfold. Sourced timelines. Filter, sort, and browse. Fast, privacy‑respecting. No invasive ads, no tracking.

Microsoft Develops Scanner for Detecting Backdoors in Open-Weight LLMs

First reported
Last updated
1 unique sources, 1 articles

Summary

Hide ▲

Microsoft has developed a lightweight scanner designed to detect backdoors in open-weight large language models (LLMs). The scanner identifies three key signals to flag backdoors while maintaining a low false positive rate. The tool can detect model poisoning, where threat actors embed hidden behaviors into the model's weights during training, causing unintended actions upon trigger detection. The scanner works by analyzing memorized content and attention patterns in LLMs without requiring additional training or prior knowledge of backdoor behavior. The scanner is part of Microsoft's broader initiative to address AI-specific security concerns, including prompt injections and data poisoning, as part of its Secure Development Lifecycle (SDL).

Timeline

  1. 04.02.2026 19:52 1 articles · 4h ago

    Microsoft Develops Scanner to Detect Backdoors in Open-Weight LLMs

    Microsoft has developed a lightweight scanner that can detect backdoors in open-weight large language models (LLMs). The scanner identifies three key signals to flag backdoors while maintaining a low false positive rate. The tool can detect model poisoning, where threat actors embed hidden behaviors into the model's weights during training, causing unintended actions upon trigger detection. The scanner is part of Microsoft's broader initiative to address AI-specific security concerns, including prompt injections and data poisoning, as part of its SDL.

    Show sources

Information Snippets