K2 Think Partial Prompt Leaking security flaw

Vulnerability

First reported

11.09.2025 15:00

Last updated

11.09.2025 15:00

Happening score

H score 0

1 unique sources, 1 articles

Summary

Hide ▲

A K2 Think weakness called Partial Prompt Leaking enabled a successful jailbreak, showing that the model's guardrails could be bypassed to reach restricted instructions. The flaw mattered because the model exposed enough of its reasoning to help an attacker map the controls and iterate past them. The same weakness was reported to expose risk for malware-related prompts as well.

Timeline

11.09.2025 15:00 2 articles · 10mo ago

K2 Think jailbreak publicized via Partial Prompt Leaking

Initial Disclosure
Adversa AI's Alex Polyakov disclosed a jailbreak method against K2 Think that exploited Partial Prompt Leaking, where the model exposed plaintext reasoning and refusal logic that revealed which rules blocked malicious prompts. After a few prompt iterations, he bypassed layered safeguards and elicited restricted instructions, including guidance for creating malware.
Show sources

'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00

'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
Open in new tab

Summary

Timeline

K2 Think jailbreak publicized via Partial Prompt Leaking