AI inference frameworks unsafe ZeroMQ/pickle deserialization (multiple vulnerabilities)

Vulnerability

First reported

14.11.2025 17:20

Last updated

14.11.2025 17:20

Happening score

H score 26

1 unique sources, 1 articles

Summary

Hide ▲

Researchers disclosed critical remote code execution flaws in AI inference frameworks tied to unsafe ZeroMQ/pickle deserialization, creating a path to arbitrary code execution on inference nodes. The affected scope includes Meta Llama, NVIDIA TensorRT-LLM, vLLM, SGLang, Modular Max Server, and Sarathi-Serve. Some fixes are available, but Sarathi-Serve remains unpatched and SGLang has only incomplete fixes. The underlying issue is especially risky because it can be triggered over unauthenticated ZMQ TCP sockets.

Timeline

14.11.2025 17:20 2 articles · 7mo ago

Oligo discloses ShadowMQ RCE flaws in AI inference frameworks

Initial Disclosure
Oligo Security reported critical remote code execution vulnerabilities in AI inference engines from Meta, Nvidia, Microsoft, vLLM, SGLang, Modular Max Server, and Sarathi-Serve. The root cause was unsafe ZeroMQ recv_pyobj() deserialization with Python pickle over unauthenticated ZMQ TCP sockets, a code-reuse pattern dubbed ShadowMQ, which could let an attacker send malicious data for deserialization and execute arbitrary code on inference nodes. Remediation status varied: Meta's Llama framework had been patched last October, NVIDIA TensorRT-LLM was fixed in version 0.18.2, Modular Max Server was fixed, vLLM switched to the V1 engine by default, SGLang had incomplete fixes, and Sarathi-Serve remained unpatched.
Show sources

Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks — thehackernews.com — 14.11.2025 17:20

Researchers Find Serious AI Bugs Exposing Meta, Nvidia, and Microsoft Inference Frameworks — thehackernews.com — 14.11.2025 17:20
Open in new tab

Summary

Timeline

Oligo discloses ShadowMQ RCE flaws in AI inference frameworks