K2 Think AI Model Jailbroken Using Partial Prompt Leaking

First reported

11.09.2025 15:00

Last updated

1 unique sources, 1 articles

Summary

Hide ▲

K2 Think, a new AI reasoning model, was jailbroken shortly after its release. The jailbreak was achieved using a vulnerability called Partial Prompt Leaking, which exploits the model's transparency in reasoning. K2 Think, developed by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42, was released on September 9, 2025. The model's transparency, intended to make it auditable, inadvertently exposed its reasoning process, allowing attackers to manipulate it. The jailbreak was demonstrated by Adversa AI's Alex Polyakov, who highlighted how the model's transparency could be exploited to bypass its safeguards. The vulnerability arises from K2 Think's detailed reasoning logs, which reveal the model's decision-making process. This transparency, while beneficial for auditing, provides attackers with insights into the model's rules and safeguards. Polyakov successfully jailbroke the model by iteratively refining his prompts based on the model's responses, ultimately bypassing its security measures. The exploit demonstrates the potential risks of overly transparent AI models and the need for balanced security measures.

Timeline

11.09.2025 15:00 1 articles · 21d ago

K2 Think AI Model Jailbroken Using Partial Prompt Leaking
K2 Think, released on September 9, 2025, was jailbroken using Partial Prompt Leaking. The exploit, demonstrated by Alex Polyakov of Adversa AI, highlights the risks of overly transparent AI models. The model's detailed reasoning logs, intended for auditing, were used to bypass its safeguards. The jailbreak involved iteratively refining prompts based on the model's responses, ultimately allowing the model to perform restricted actions.
Show sources

'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
Open in new tab

Information Snippets

K2 Think was released on September 9, 2025, by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
K2 Think is a 32 billion parameter model designed for complex and transparent reasoning.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
The model's transparency allows users to see the logic behind its outputs, making it susceptible to Partial Prompt Leaking.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
Alex Polyakov of Adversa AI demonstrated the jailbreak using Partial Prompt Leaking, exploiting the model's reasoning transparency.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
The jailbreak involved iteratively refining prompts based on the model's responses to bypass its safeguards.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
Partial Prompt Leaking exploits the model's detailed reasoning logs, revealing its decision-making process.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00
The exploit highlights the need for balanced security measures in transparent AI models.
First reported: 11.09.2025 15:00

1 source, 1 article
Show sources
- 'K2 Think' AI Model Jailbroken Mere Hours After Release — www.darkreading.com — 11.09.2025 15:00