AI inference frameworks unsafe ZeroMQ/pickle deserialization (multiple vulnerabilities)
Vulnerability
Summary
Hide ▲
Show ▼
Researchers disclosed critical remote code execution flaws in AI inference frameworks tied to unsafe ZeroMQ/pickle deserialization, creating a path to arbitrary code execution on inference nodes. The affected scope includes Meta Llama, NVIDIA TensorRT-LLM, vLLM, SGLang, Modular Max Server, and Sarathi-Serve. Some fixes are available, but Sarathi-Serve remains unpatched and SGLang has only incomplete fixes. The underlying issue is especially risky because it can be triggered over unauthenticated ZMQ TCP sockets.
Timeline
-
14.11.2025 17:20 2 articles · 6mo ago
Oligo discloses ShadowMQ RCE flaws in AI inference frameworks
Initial DisclosureOligo Security reported critical remote code execution vulnerabilities in AI inference engines from Meta, Nvidia, Microsoft, vLLM, SGLang, Modular Max Server, and Sarathi-Serve. The root cause was unsafe ZeroMQ recv_pyobj() deserialization with Python pickle over unauthenticated ZMQ TCP sockets, a code-reuse pattern dubbed ShadowMQ, which could let an attacker send malicious data for deserialization and execute arbitrary code on inference nodes. Remediation status varied: Meta's Llama framework had been patched last October, NVIDIA TensorRT-LLM was fixed in version 0.18.2, Modular Max Server was fixed, vLLM switched to the V1 engine by default, SGLang had incomplete fixes, and Sarathi-Serve remained unpatched.
Show sources
- Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks — thehackernews.com — 14.11.2025 17:20
- Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks — thehackernews.com — 14.11.2025 17:20