B3 open-source benchmark for backbone LLM security

Security Tool/Service

First reported

29.10.2025 12:45

Last updated

29.10.2025 12:45

Happening score

H score 11

1 unique sources, 1 articles

Summary

Hide ▲

AISI, Check Point, and Lakera released b3, an open source benchmark that helps developers measure how well backbone LLMs resist prompt manipulation and other attacks in AI agents. The tool uses threat snapshots and a large adversarial dataset to expose weak points in model calls that process prompts, files, and web inputs. It gives model providers a reproducible way to compare security across systems and improve resilience against system prompt exfiltration, phishing link insertion, and unauthorized tool calls.

Related Happenings

ExploitBench benchmark shows frontier AI models can stage Chrome exploit chains against vulnerable V8 builds

Technical Analysis

H score16 First: 04.06.2026 16:00 Last: 04.06.2026 16:00 Sources 1

About this happening: Bugcrowd’s **ExploitBench** now shows frontier AI models can progress through staged **Google Chrome** exploit chains, raising the risk of faster **AI-assisted exploit development...

Open Happening

OpenAI launches Daybreak cybersecurity initiative for AI-powered vulnerability detection and patch validation

Security Tool/Service

H score25 First: 12.05.2026 09:55 Last: 12.05.2026 09:55 Sources 1

About this happening: OpenAI's **Daybreak** launch adds an **AI-powered cybersecurity service** for **vulnerability detection** and **patch validation**, helping organizations fix flaws before attacker...

Open Happening

Anthropic launches Claude Opus 4.6 with code review and vulnerability-finding capabilities

Security Tool/Service

H score14 First: 06.02.2026 07:49 Last: 06.02.2026 07:49 Sources 1

About this happening: **Anthropic** launched **Claude Opus 4.6** with stronger **code review** and **debugging** support, and the model has already been used to uncover **more than 500** previously unk...

Open Happening

Villager AI red-teaming framework appears on PyPI with abuse concerns

Security Tool/Service

H score11 First: 15.09.2025 10:12 Last: 15.09.2025 10:12 Sources 1

About this happening: **Villager** surfaced on **PyPI** as an **AI-powered penetration testing** framework, and its public availability now matters because it could be repurposed for malicious use. The...

Open Happening

Auto Exploit LLM-assisted exploit generation research

Technical Analysis

H score24 First: 29.08.2025 16:01 Last: 29.08.2025 16:01 Sources 1

About this happening: Researchers built **Auto Exploit**, an AI-driven system that generated proof-of-concept exploits for **14 open source vulnerabilities** in as little as **15 minutes**, compressing...

Open Happening

Timeline

29.10.2025 12:45 2 articles · 8mo ago

AISI, Check Point, and Lakera release b3 open source benchmark

Initial Disclosure
The UK AI Security Institute (AISI), Check Point, and Lakera released b3, an open source benchmark for evaluating the security of backbone LLMs used in AI agents. The framework uses threat snapshots and a dataset of 19,433 Gandalf adversarial attacks to measure resilience against system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.
Show sources

Open Source “b3” Benchmark to Boost LLM Security for Agents — www.infosecurity-magazine.com — 29.10.2025 12:45

Open Source “b3” Benchmark to Boost LLM Security for Agents — www.infosecurity-magazine.com — 29.10.2025 12:45
Open in new tab
29.10.2025 12:45 1 articles · 8mo ago

b3 benchmark isolates vulnerable backbone LLM calls in AI agents

Technical Analysis Update
b3 focuses on individual backbone LLM calls inside agent workflows rather than end-to-end agent behavior, especially the moments when prompts, files, or web inputs trigger malicious output. Lakera said the benchmark makes LLM security measurable, reproducible, and comparable across models and application categories, and the reported results said step-by-step reasoning models tend to be more secure while open-weight models are closing the gap with closed systems faster than expected.
Show sources

Open Source “b3” Benchmark to Boost LLM Security for Agents — www.infosecurity-magazine.com — 29.10.2025 12:45
Open in new tab

Summary

Related Happenings

ExploitBench benchmark shows frontier AI models can stage Chrome exploit chains against vulnerable V8 builds

OpenAI launches Daybreak cybersecurity initiative for AI-powered vulnerability detection and patch validation

Anthropic launches Claude Opus 4.6 with code review and vulnerability-finding capabilities

Villager AI red-teaming framework appears on PyPI with abuse concerns

Auto Exploit LLM-assisted exploit generation research

Timeline

AISI, Check Point, and Lakera release b3 open source benchmark

b3 benchmark isolates vulnerable backbone LLM calls in AI agents