Key Takeaways

  • Between March and May, Zenity honeypots observed three distinct campaigns that hijacked publicly reachable Ollama and LiteLLM endpoints with no exploit beyond knowing the URL and port.
  • Ollama commonly exposes /api/generate and /api/chat on port 11434 and LiteLLM exposes /v1/responses on port 4000; default installs often ship without authentication.
  • Attackers repoint their agents to victim endpoints and run full agent payloads; one victim logged 140,000‑character Strix prompts and unexpected costs when abuse began.
  • If an AI backend is reachable from the public internet without strong authentication, it will be used as someone else’s attack infrastructure.

Modern AI stacks expose inference endpoints like /api/generate, /api/chat, or /v1/responses so apps can call models over HTTP. When self-hosted backends are reachable from the public internet without auth, they effectively become free “LLM-as-a-service” for anyone who finds them. [3]

Between March and May, Zenity honeypots saw three campaigns doing exactly this, abusing exposed Ollama and LiteLLM instances as offensive AI backends with no exploit beyond knowing URL and port. [1][3]

💡 Key takeaway: If your AI backend is on the internet without strong auth, assume it will be used as someone else’s attack engine. [1]


1. How Threat Actors Hijack Exposed AI Endpoints

Self-hosted AI runtimes expose inference APIs so frontends, agents, and tools can call models. In Ollama, that includes /api/generate and /api/chat on port 11434; LiteLLM commonly exposes /v1/responses on port 4000. [3] Many teams spin these up for PoCs, bind to 0.0.0.0, and never add network restrictions or authentication. [3]

Zenity observed three real-world campaigns abusing such honeypots as backend compute. [1][3] Across them, attackers typically:

  • Scan for reachable Ollama / LiteLLM-style endpoints.
  • Send a small “hello” prompt to verify model behavior.
  • Repoint their own agents to the discovered endpoint as the model backend.

⚠️ Key point: No RCE, SSRF, or deserialization bug is needed—the attack surface is the intended API, misconfigured on the internet. [1][3]

Two campaigns used autonomous penetration frameworks (Strix, HexStrike AI), uploading large orchestration prompts and toolsets into the victim’s Ollama or LiteLLM. [1][3] A third used an OpenAI Codex persona tuned to bypass safety refusals and assist with web reverse‑engineering. [1]

Operationally, the adversary simply reconfigures their client:

# Benign
export LLM_BASE_URL=https://api.openai.com/v1

# Hijack victim endpoint
export LLM_BASE_URL=http://victim.example.com:4000/v1

They then send full agent payloads—system prompt, tool schemas, objectives—in the request body, turning the victim’s endpoint into the “brain” of their agent. [1][3] One SaaS company only noticed its exposed Ollama box after cost anomalies and 140k‑character Strix prompts appeared in logs. [1][3]

The workflow below summarizes how attackers convert an exposed AI endpoint into the backend for their agents. [1][3]

flowchart LR
    title Hijacking Exposed AI Endpoints
    A[Scan open ports] --> B[Test model prompt]
    B --> C[Repoint AI client]
    C --> D[Run attacker agents]
    D --> E[Costs & risk to victim]
    style A fill:#3b82f6,stroke:#1d4ed8,stroke-width:2px
    style B fill:#22c55e,stroke:#15803d,stroke-width:2px
    style C fill:#f59e0b,stroke:#b45309,stroke-width:2px
    style D fill:#ef4444,stroke:#b91c1c,stroke-width:2px
    style E fill:#ef4444,stroke:#b91c1c,stroke-width:2px

The root problem is weak defaults: Ollama ships without authentication, and LiteLLM treats it as opt‑in, so many instances are launched with no real access control. [1]

💼 Operational impact: You pay for GPUs and risk; the attacker gets free offensive AI infrastructure. [1][3]


2. Why Exposed AI Endpoints Are a High-Impact Attack Surface

These incidents align with the shift from single LLM calls to agentic apps that plan, call tools, and iterate. [7] When attackers hijack your endpoint, they can run full penetration testing, exploit development, or code‑analysis workflows—not just completions. [5]

Research on agent security shows most failures stem from insecure patterns and misconfigured tools, not a specific framework. [5] Across CrewAI and AutoGen, issues like:

  • Poor scope definition,
  • Unsafe tool / API integrations,
  • Over‑trusted code interpreters

all produced similar compromise scenarios. [5] Any generic, exposed LLM backend can therefore be wired into risky agents.

📊 Data point: A single agent trajectory may involve dozens of tool calls, browser sessions, and code executions—each a potential abuse vector if the attacker controls the objective. [5][7]

AI also compresses exploit timelines: models rapidly analyze codebases, map APIs, and suggest exploits, making compute-rich endpoints highly attractive when someone else pays. [1][8]

Attackers don’t need full environment compromise. By merely repointing their clients to your endpoint, they gain model access, compute, and sometimes network adjacency, while you inherit cost, legal, and reputational exposure if your infra appears in intrusion logs. [1][3]

Reality check: “We’ll lock it down when we productize” is unsafe—attackers already treat test endpoints as infrastructure. [1][3]


3. Defensive Playbook: Securing AI Endpoints and Investigating Abuse

Architect for non‑exposure. Do not place Ollama, LiteLLM, or similar runtimes directly on the public internet. Instead: [1][3]

  • Keep them on private networks or behind VPNs.
  • Front them with authenticated API gateways.
  • Enforce real authentication; reject placeholder / default keys.
  • Continuously scan cloud, on‑prem, and lab environments for open AI ports; shut down or properly wrap anything unintentionally reachable. [1][3]

Strengthen observability:

  • Log full request bodies, not just headers and status codes.
  • Flag large system prompts, embedded tool definitions, and “mission” descriptions as possible external agent traffic. [1][7]
  • Watch for prompts mentioning offensive tooling (e.g., “Strix”, “penetration test”, “do not ask permission”) or long JSON tool schemas. [1][3][7]

Leverage existing telemetry stacks. Microsoft Purview, Defender, and Sentinel, for example, can show who initiated AI interactions, when, and which resources were touched, enabling reconstruction of AI activity chains. [6]

Use a scope–context–signal model for investigations: [6]

  1. Scope: Identities, IPs, and services hitting the suspect endpoint.
  2. Context: Data, tools, and internal systems accessed.
  3. Signal: Anomalies such as usage spikes, unusual prompts, or credential exposure.

Prepare the organization:

  • Run AI‑specific incident tabletop exercises.
  • Track CISA JCDC efforts toward an AI Security Incident Collaboration Playbook for shared response patterns. [9]
  • Align security, engineering, and legal roles before an AI endpoint hijack, when minutes matter. [8][9]

⚠️ Key point: Treat AI endpoints as first‑class production services with threat models, runbooks, and incident drills—not experimental sidecars. [6][9]


Conclusion: Inventory, Lock Down, and Practice the Response

Exposed inference endpoints are low‑friction, high‑reward targets. Attackers need only a URL to conscript your models and compute, as agentic AI, misconfiguration, and accelerated exploit development converge. [1][5][8]

Concretely, you should:

  • Inventory every AI endpoint across environments.
  • Lock down exposure and enforce strong authentication.
  • Extend logging to capture agent prompts, tools, and trajectories.
  • Integrate AI‑focused tabletop exercises into incident response. [1][6][9]

Doing this now is the difference between reading about hijacked AI infrastructure and discovering the infrastructure is yours.

Frequently Asked Questions

How exactly do attackers hijack exposed AI endpoints?
Attackers simply discover a reachable inference endpoint, verify model behavior with a small “hello” prompt, and repoint their agent’s LLM_BASE_URL (or equivalent) to that URL, requiring no RCE or deserialization bug. They then send full agent payloads—system prompts, tool schemas, and objectives—so the victim’s model runs the attacker’s orchestration. Campaigns observed by Zenity used scans for typical ports (11434 for Ollama, 4000 for LiteLLM), uploaded large orchestration prompts (one case produced 140k‑character Strix payloads), and reused the victim’s compute and model as the backend for autonomous frameworks like Strix or HexStrike AI, turning a misconfigured service into free offensive infrastructure.
What telemetry and signs indicate my AI endpoint is being abused?
Definitive signs include sudden usage spikes, large request bodies (especially system prompts exceeding normal lengths), appearance of tool/JSON schemas in request payloads, and prompt text referencing offensive tooling or instructions to bypass safeguards. Also watch for cost anomalies on GPU/machine billing, new external IPs hitting inference ports, and logged requests containing mission statements or multi‑step objectives. Instrument full request body logging, alert on unusually long system prompts or embedded tool definitions, and correlate network/source IPs with unexpected compute consumption to detect hijacking early.
What immediate steps should I take to secure exposed inference endpoints?
Immediately restrict network exposure by placing runtimes on private networks or behind a VPN and front them with authenticated API gateways using strong, non‑default credentials. Perform an inventory scan for open AI ports across cloud and on‑prem environments, shut down any publicly reachable instances, and enable full request body logging and alerts for large or agent‑style prompts. Follow up with incident drills, apply least‑privilege integrations for tool access, and treat AI endpoints as first‑class production services with runbooks, monitoring, and regular audits.

Sources & References (9)

Key Entities

💡
/api/chat
Concept
💡
/v1/responses
Concept
💡
/api/generate
Concept
💡
agentic apps
Concept
💡
OpenAI Codex persona
Concept
💡
exposed inference endpoints
Concept
💡
public internet without auth
Concept
💡
140k-character Strix prompts
Concept
📅
three campaigns
Event
🏢
CISA JCDC
Org
🏢
Zenity honeypots
Org
📦
WikipediaProduit
📦
WikipediaProduit
📦
LiteLLM
Produit

Generated by CoreProse in 4m 27s

9 sources verified & cross-referenced 1,005 words 0 false citations

Share this article

Generated in 4m 27s

What topic do you want to cover?

Get the same quality with verified sources on any subject.