CyberHappenings logo
☰

Jailbreak vulnerability in K2 Think AI model disclosed

First reported
Last updated
📰 1 unique sources, 1 articles

Summary

Hide ▲

A researcher has publicly disclosed a jailbreak vulnerability in the K2 Think AI model, released by the UAE's Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42. The vulnerability, called Partial Prompt Leaking, allows attackers to manipulate the model's reasoning process to bypass its safeguards. K2 Think, released on September 9, 2025, is designed to be highly transparent, revealing its reasoning methods in plaintext. This transparency, intended to make the model auditable, also exposes a new type of vulnerability that can be exploited to jailbreak the model. The jailbreak was demonstrated by Adversa AI's Alex Polyakov, who showed that the model's transparency makes it easier to map and exploit than typical models. The vulnerability allows attackers to craft manipulative prompts that can bypass the model's security rules, potentially leading to unauthorized actions.

Timeline

  1. 11.09.2025 15:00 📰 1 articles

    Jailbreak vulnerability in K2 Think AI model disclosed

    A researcher has publicly disclosed a jailbreak vulnerability in the K2 Think AI model, released by the UAE's Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42. The vulnerability, called Partial Prompt Leaking, allows attackers to manipulate the model's reasoning process to bypass its safeguards. The jailbreak was demonstrated by Adversa AI's Alex Polyakov, who showed that the model's transparency makes it easier to map and exploit than typical models. The vulnerability allows attackers to craft manipulative prompts that can bypass the model's security rules, potentially leading to unauthorized actions.

    Show sources

Information Snippets

  • K2 Think was released on September 9, 2025, by MBZUAI and G42.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources
  • K2 Think is designed to be highly transparent, revealing its reasoning methods in plaintext.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources
  • The jailbreak vulnerability, called Partial Prompt Leaking, exploits the model's transparency to bypass its safeguards.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources
  • Alex Polyakov demonstrated the jailbreak, showing that the model's transparency makes it easier to exploit.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources
  • The vulnerability allows attackers to craft manipulative prompts that can bypass the model's security rules.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources
  • K2 Think is built on 32 billion parameters, claiming reasoning, math, and coding performance comparable to larger models.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources
  • G42 is backed by Abu Dhabi's sovereign wealth and Microsoft, and run by UAE's national security chief.

    First reported: 11.09.2025 15:00
    📰 1 source, 1 article
    Show sources