K2 Think Partial Prompt Leaking security flaw
VulnerabilityFirst reported
Last updated
Happening score
H score
20
Summary
Hide ▲
Show ▼
A K2 Think weakness called Partial Prompt Leaking enabled a successful jailbreak, showing that the model's guardrails could be bypassed to reach restricted instructions. The flaw mattered because the model exposed enough of its reasoning to help an attacker map the controls and iterate past them. The same weakness was reported to expose risk for malware-related prompts as well.
Timeline
-
11.09.2025 15:00 2 articles · 8mo ago
K2 Think jailbreak publicized via Partial Prompt Leaking
Initial DisclosureAdversa AI's Alex Polyakov disclosed a jailbreak method against K2 Think that exploited Partial Prompt Leaking, where the model exposed plaintext reasoning and refusal logic that revealed which rules blocked malicious prompts. After a few prompt iterations, he bypassed layered safeguards and elicited restricted instructions, including guidance for creating malware.
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00