Microsoft Develops Scanner for Detecting Backdoors in Open-Weight LLMs
Summary
Hide ▲
Show ▼
Microsoft has developed a lightweight scanner designed to detect backdoors in open-weight large language models (LLMs). The scanner identifies three key signals to flag backdoors while maintaining a low false positive rate. The tool can detect model poisoning, where threat actors embed hidden behaviors into the model's weights during training, causing unintended actions upon trigger detection. The scanner works by analyzing memorized content and attention patterns in LLMs without requiring additional training or prior knowledge of backdoor behavior. The scanner is part of Microsoft's broader initiative to address AI-specific security concerns, including prompt injections and data poisoning, as part of its Secure Development Lifecycle (SDL).
Timeline
-
04.02.2026 19:52 1 articles · 4h ago
Microsoft Develops Scanner to Detect Backdoors in Open-Weight LLMs
Microsoft has developed a lightweight scanner that can detect backdoors in open-weight large language models (LLMs). The scanner identifies three key signals to flag backdoors while maintaining a low false positive rate. The tool can detect model poisoning, where threat actors embed hidden behaviors into the model's weights during training, causing unintended actions upon trigger detection. The scanner is part of Microsoft's broader initiative to address AI-specific security concerns, including prompt injections and data poisoning, as part of its SDL.
Show sources
- Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models — thehackernews.com — 04.02.2026 19:52
Information Snippets
-
Microsoft's scanner leverages three observable signals to detect backdoors in LLMs: distinctive attention patterns, memorization of poisoning data, and activation by fuzzy triggers.
First reported: 04.02.2026 19:521 source, 1 articleShow sources
- Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models — thehackernews.com — 04.02.2026 19:52
-
The scanner extracts memorized content from the model, analyzes it to isolate salient substrings, and scores suspicious substrings to return a ranked list of trigger candidates.
First reported: 04.02.2026 19:521 source, 1 articleShow sources
- Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models — thehackernews.com — 04.02.2026 19:52
-
The scanner does not work on proprietary models, requires access to model files, and is most effective on trigger-based backdoors that generate deterministic outputs.
First reported: 04.02.2026 19:521 source, 1 articleShow sources
- Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models — thehackernews.com — 04.02.2026 19:52
-
Microsoft is expanding its SDL to address AI-specific security concerns, including prompt injections and data poisoning.
First reported: 04.02.2026 19:521 source, 1 articleShow sources
- Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models — thehackernews.com — 04.02.2026 19:52