Find notable cyber news and cases, enriched with sources, timelines, and signals.

B3 open-source benchmark for backbone LLM security

Security Tool/Service
First reported
Last updated
Happening score
H score 10
1 unique sources, 1 articles

Summary

Hide ▲

AISI, Check Point, and Lakera released b3, an open source benchmark that helps developers measure how well backbone LLMs resist prompt manipulation and other attacks in AI agents. The tool uses threat snapshots and a large adversarial dataset to expose weak points in model calls that process prompts, files, and web inputs. It gives model providers a reproducible way to compare security across systems and improve resilience against system prompt exfiltration, phishing link insertion, and unauthorized tool calls.

Related Happenings

OpenAI launches Daybreak cybersecurity initiative for AI-powered vulnerability detection and patch validation

Security Tool/Service
First: 12.05.2026 09:55 Last: 12.05.2026 09:55 Sources 1

About this happening: OpenAI's **Daybreak** launch adds an **AI-powered cybersecurity service** for **vulnerability detection** and **patch validation**, helping organizations fix flaws before attacker...

Anthropic launches Claude Opus 4.6 with code review and vulnerability-finding capabilities

Security Tool/Service
First: 06.02.2026 07:49 Last: 06.02.2026 07:49 Sources 1

About this happening: **Anthropic** launched **Claude Opus 4.6** with stronger **code review** and **debugging** support, and the model has already been used to uncover **more than 500** previously unk...

Villager AI red-teaming framework appears on PyPI with abuse concerns

Security Tool/Service
First: 15.09.2025 10:12 Last: 15.09.2025 10:12 Sources 1

About this happening: **Villager** surfaced on **PyPI** as an **AI-powered penetration testing** framework, and its public availability now matters because it could be repurposed for malicious use. The...

Auto Exploit LLM-assisted exploit generation research

Technical Analysis
First: 29.08.2025 16:01 Last: 29.08.2025 16:01 Sources 1

About this happening: Researchers built **Auto Exploit**, an AI-driven system that generated proof-of-concept exploits for **14 open source vulnerabilities** in as little as **15 minutes**, compressing...

Timeline

  1. 29.10.2025 12:45 2 articles · 7mo ago

    AISI, Check Point, and Lakera release b3 open source benchmark

    Initial Disclosure

    The UK AI Security Institute (AISI), Check Point, and Lakera released b3, an open source benchmark for evaluating the security of backbone LLMs used in AI agents. The framework uses threat snapshots and a dataset of 19,433 Gandalf adversarial attacks to measure resilience against system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.

    Show sources
  2. 29.10.2025 12:45 1 articles · 7mo ago

    b3 benchmark isolates vulnerable backbone LLM calls in AI agents

    Technical Analysis Update

    b3 focuses on individual backbone LLM calls inside agent workflows rather than end-to-end agent behavior, especially the moments when prompts, files, or web inputs trigger malicious output. Lakera said the benchmark makes LLM security measurable, reproducible, and comparable across models and application categories, and the reported results said step-by-step reasoning models tend to be more secure while open-weight models are closing the gap with closed systems faster than expected.

    Show sources