CyberHappenings logo
☰

Track cybersecurity events as they unfold. Sourced timelines, daily updates. Fast, privacy‑respecting. No ads, no tracking.

K2 Think AI Model Jailbroken via Partial Prompt Leaking

First reported
Last updated
πŸ“° 1 unique sources, 1 articles

Summary

Hide β–²

K2 Think, an advanced reasoning AI model released by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42, was jailbroken within hours of its public release. The model's transparency, which allows users to see its reasoning process, was exploited to bypass its safeguards. This vulnerability, known as Partial Prompt Leaking, enables attackers to manipulate the model into performing restricted actions. K2 Think, developed in the UAE, is designed to handle complex, multistep problems with a parameter efficiency that rivals larger models. Its transparency features, intended to make the model auditable, were found to be exploitable by malicious actors. The jailbreak was demonstrated by Adversa AI's Alex Polyakov, who highlighted the model's susceptibility to prompt engineering attacks.

Timeline

  1. 11.09.2025 15:00 πŸ“° 1 articles Β· ⏱ 6d ago

    K2 Think AI Model Jailbroken via Partial Prompt Leaking

    K2 Think, an advanced reasoning AI model released by MBZUAI and G42, was jailbroken within hours of its public release. The model's transparency features, which allow users to see its reasoning process, were exploited to bypass its safeguards. This vulnerability, known as Partial Prompt Leaking, enables attackers to manipulate the model into performing restricted actions. The jailbreak was demonstrated by Alex Polyakov of Adversa AI, who highlighted the model's susceptibility to prompt engineering attacks. The transparency intended to make K2 Think auditable was found to be a significant security risk.

    Show sources

Information Snippets

  • K2 Think was released on September 9, 2025, by MBZUAI and G42, a UAE-based company with significant geopolitical influence.

    First reported: 11.09.2025 15:00
    πŸ“° 1 source, 1 article
    Show sources
  • The model is designed to be highly transparent, allowing users to see its reasoning process in plaintext.

    First reported: 11.09.2025 15:00
    πŸ“° 1 source, 1 article
    Show sources
  • Partial Prompt Leaking was used to jailbreak K2 Think by exploiting its transparency features.

    First reported: 11.09.2025 15:00
    πŸ“° 1 source, 1 article
    Show sources
  • The jailbreak was demonstrated by Alex Polyakov of Adversa AI, who showed how the model's reasoning process could be manipulated.

    First reported: 11.09.2025 15:00
    πŸ“° 1 source, 1 article
    Show sources
  • K2 Think's transparency was intended to make it auditable but created a new type of vulnerability.

    First reported: 11.09.2025 15:00
    πŸ“° 1 source, 1 article
    Show sources
  • The model's reasoning process can be manipulated to bypass its safeguards, potentially allowing it to perform restricted actions.

    First reported: 11.09.2025 15:00
    πŸ“° 1 source, 1 article
    Show sources