CyberHappenings logo

Track cybersecurity events as they unfold. Sourced timelines, daily updates. Fast, privacy‑respecting. No ads, no tracking.

K2 Think AI Model Jailbroken Using Partial Prompt Leaking

First reported
Last updated
1 unique sources, 1 articles

Summary

Hide ▲

K2 Think, a new AI reasoning model, was jailbroken shortly after its release. The jailbreak was achieved using a vulnerability called Partial Prompt Leaking, which exploits the model's transparency in reasoning. K2 Think, developed by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42, was released on September 9, 2025. The model's transparency, intended to make it auditable, inadvertently exposed its reasoning process, allowing attackers to manipulate it. The jailbreak was demonstrated by Adversa AI's Alex Polyakov, who highlighted how the model's transparency could be exploited to bypass its safeguards. The vulnerability arises from K2 Think's detailed reasoning logs, which reveal the model's decision-making process. This transparency, while beneficial for auditing, provides attackers with insights into the model's rules and safeguards. Polyakov successfully jailbroke the model by iteratively refining his prompts based on the model's responses, ultimately bypassing its security measures. The exploit demonstrates the potential risks of overly transparent AI models and the need for balanced security measures.

Timeline

  1. 11.09.2025 15:00 1 articles · 18d ago

    K2 Think AI Model Jailbroken Using Partial Prompt Leaking

    K2 Think, released on September 9, 2025, was jailbroken using Partial Prompt Leaking. The exploit, demonstrated by Alex Polyakov of Adversa AI, highlights the risks of overly transparent AI models. The model's detailed reasoning logs, intended for auditing, were used to bypass its safeguards. The jailbreak involved iteratively refining prompts based on the model's responses, ultimately allowing the model to perform restricted actions.

    Show sources

Information Snippets

  • K2 Think was released on September 9, 2025, by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources
  • K2 Think is a 32 billion parameter model designed for complex and transparent reasoning.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources
  • The model's transparency allows users to see the logic behind its outputs, making it susceptible to Partial Prompt Leaking.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources
  • Alex Polyakov of Adversa AI demonstrated the jailbreak using Partial Prompt Leaking, exploiting the model's reasoning transparency.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources
  • The jailbreak involved iteratively refining prompts based on the model's responses to bypass its safeguards.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources
  • Partial Prompt Leaking exploits the model's detailed reasoning logs, revealing its decision-making process.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources
  • The exploit highlights the need for balanced security measures in transparent AI models.

    First reported: 11.09.2025 15:00
    1 source, 1 article
    Show sources