Key Takeaways
- Between March and May, Zenity honeypots observed three distinct campaigns that hijacked publicly reachable Ollama and LiteLLM endpoints with no exploit beyond knowing the URL and port.
- Ollama commonly exposes /api/generate and /api/chat on port 11434 and LiteLLM exposes /v1/responses on port 4000; default installs often ship without authentication.
- Attackers repoint their agents to victim endpoints and run full agent payloads; one victim logged 140,000‑character Strix prompts and unexpected costs when abuse began.
- If an AI backend is reachable from the public internet without strong authentication, it will be used as someone else’s attack infrastructure.
Modern AI stacks expose inference endpoints like /api/generate, /api/chat, or /v1/responses so apps can call models over HTTP. When self-hosted backends are reachable from the public internet without auth, they effectively become free “LLM-as-a-service” for anyone who finds them. [3]
Between March and May, Zenity honeypots saw three campaigns doing exactly this, abusing exposed Ollama and LiteLLM instances as offensive AI backends with no exploit beyond knowing URL and port. [1][3]
💡 Key takeaway: If your AI backend is on the internet without strong auth, assume it will be used as someone else’s attack engine. [1]
1. How Threat Actors Hijack Exposed AI Endpoints
Self-hosted AI runtimes expose inference APIs so frontends, agents, and tools can call models. In Ollama, that includes /api/generate and /api/chat on port 11434; LiteLLM commonly exposes /v1/responses on port 4000. [3] Many teams spin these up for PoCs, bind to 0.0.0.0, and never add network restrictions or authentication. [3]
Zenity observed three real-world campaigns abusing such honeypots as backend compute. [1][3] Across them, attackers typically:
- Scan for reachable Ollama / LiteLLM-style endpoints.
- Send a small “hello” prompt to verify model behavior.
- Repoint their own agents to the discovered endpoint as the model backend.
⚠️ Key point: No RCE, SSRF, or deserialization bug is needed—the attack surface is the intended API, misconfigured on the internet. [1][3]
Two campaigns used autonomous penetration frameworks (Strix, HexStrike AI), uploading large orchestration prompts and toolsets into the victim’s Ollama or LiteLLM. [1][3] A third used an OpenAI Codex persona tuned to bypass safety refusals and assist with web reverse‑engineering. [1]
Operationally, the adversary simply reconfigures their client:
# Benign
export LLM_BASE_URL=https://api.openai.com/v1
# Hijack victim endpoint
export LLM_BASE_URL=http://victim.example.com:4000/v1
They then send full agent payloads—system prompt, tool schemas, objectives—in the request body, turning the victim’s endpoint into the “brain” of their agent. [1][3] One SaaS company only noticed its exposed Ollama box after cost anomalies and 140k‑character Strix prompts appeared in logs. [1][3]
The workflow below summarizes how attackers convert an exposed AI endpoint into the backend for their agents. [1][3]
flowchart LR
title Hijacking Exposed AI Endpoints
A[Scan open ports] --> B[Test model prompt]
B --> C[Repoint AI client]
C --> D[Run attacker agents]
D --> E[Costs & risk to victim]
style A fill:#3b82f6,stroke:#1d4ed8,stroke-width:2px
style B fill:#22c55e,stroke:#15803d,stroke-width:2px
style C fill:#f59e0b,stroke:#b45309,stroke-width:2px
style D fill:#ef4444,stroke:#b91c1c,stroke-width:2px
style E fill:#ef4444,stroke:#b91c1c,stroke-width:2px
The root problem is weak defaults: Ollama ships without authentication, and LiteLLM treats it as opt‑in, so many instances are launched with no real access control. [1]
💼 Operational impact: You pay for GPUs and risk; the attacker gets free offensive AI infrastructure. [1][3]
2. Why Exposed AI Endpoints Are a High-Impact Attack Surface
These incidents align with the shift from single LLM calls to agentic apps that plan, call tools, and iterate. [7] When attackers hijack your endpoint, they can run full penetration testing, exploit development, or code‑analysis workflows—not just completions. [5]
Research on agent security shows most failures stem from insecure patterns and misconfigured tools, not a specific framework. [5] Across CrewAI and AutoGen, issues like:
- Poor scope definition,
- Unsafe tool / API integrations,
- Over‑trusted code interpreters
all produced similar compromise scenarios. [5] Any generic, exposed LLM backend can therefore be wired into risky agents.
📊 Data point: A single agent trajectory may involve dozens of tool calls, browser sessions, and code executions—each a potential abuse vector if the attacker controls the objective. [5][7]
AI also compresses exploit timelines: models rapidly analyze codebases, map APIs, and suggest exploits, making compute-rich endpoints highly attractive when someone else pays. [1][8]
Attackers don’t need full environment compromise. By merely repointing their clients to your endpoint, they gain model access, compute, and sometimes network adjacency, while you inherit cost, legal, and reputational exposure if your infra appears in intrusion logs. [1][3]
⚡ Reality check: “We’ll lock it down when we productize” is unsafe—attackers already treat test endpoints as infrastructure. [1][3]
3. Defensive Playbook: Securing AI Endpoints and Investigating Abuse
Architect for non‑exposure. Do not place Ollama, LiteLLM, or similar runtimes directly on the public internet. Instead: [1][3]
- Keep them on private networks or behind VPNs.
- Front them with authenticated API gateways.
- Enforce real authentication; reject placeholder / default keys.
- Continuously scan cloud, on‑prem, and lab environments for open AI ports; shut down or properly wrap anything unintentionally reachable. [1][3]
Strengthen observability:
- Log full request bodies, not just headers and status codes.
- Flag large system prompts, embedded tool definitions, and “mission” descriptions as possible external agent traffic. [1][7]
- Watch for prompts mentioning offensive tooling (e.g., “Strix”, “penetration test”, “do not ask permission”) or long JSON tool schemas. [1][3][7]
Leverage existing telemetry stacks. Microsoft Purview, Defender, and Sentinel, for example, can show who initiated AI interactions, when, and which resources were touched, enabling reconstruction of AI activity chains. [6]
Use a scope–context–signal model for investigations: [6]
- Scope: Identities, IPs, and services hitting the suspect endpoint.
- Context: Data, tools, and internal systems accessed.
- Signal: Anomalies such as usage spikes, unusual prompts, or credential exposure.
Prepare the organization:
- Run AI‑specific incident tabletop exercises.
- Track CISA JCDC efforts toward an AI Security Incident Collaboration Playbook for shared response patterns. [9]
- Align security, engineering, and legal roles before an AI endpoint hijack, when minutes matter. [8][9]
⚠️ Key point: Treat AI endpoints as first‑class production services with threat models, runbooks, and incident drills—not experimental sidecars. [6][9]
Conclusion: Inventory, Lock Down, and Practice the Response
Exposed inference endpoints are low‑friction, high‑reward targets. Attackers need only a URL to conscript your models and compute, as agentic AI, misconfiguration, and accelerated exploit development converge. [1][5][8]
Concretely, you should:
- Inventory every AI endpoint across environments.
- Lock down exposure and enforce strong authentication.
- Extend logging to capture agent prompts, tools, and trajectories.
- Integrate AI‑focused tabletop exercises into incident response. [1][6][9]
Doing this now is the difference between reading about hijacked AI infrastructure and discovering the infrastructure is yours.
Frequently Asked Questions
How exactly do attackers hijack exposed AI endpoints?
What telemetry and signs indicate my AI endpoint is being abused?
What immediate steps should I take to secure exposed inference endpoints?
Sources & References (9)
- 1Attackers Hijack Exposed AI Endpoints to Power Offensive Ops
Zenity researchers documented three distinct attack campaigns between March and May in which threat actors hijacked exposed AI inference endpoints (Ollama and LiteLLM) to power offensive operations — ...
- 2Attackers Hijack Exposed AI Endpoints to Power Offensive Ops
Attackers Hijack Exposed AI Endpoints to Power Offensive Ops Attackers don't need any special authentication to reach a target endpoint — they just need to know where it is. Source: The Cyber Securi...
- 3Attackers Seize Exposed AI Endpoints to Power Offensive Ops
Threat actors are trying to leverage organization-owned AI agents to power complex threat activity. Between March and May, Zenity researchers observed three distinct campaigns leveraging its honeypot...
- 4Attackers Hijack Exposed AI Endpoints to Power Offensive Ops
Attackers Hijack Exposed AI Endpoints to Power Offensive Ops — by Alexander Culafi This article discusses how attackers are exploiting exposed AI endpoints to conduct offensive operations. The piece,...
- 5AI Agents Are Here. So Are the Threats.
Executive Summary Agentic applications are programs that leverage AI agents — software designed to autonomously collect data and take actions toward specific objectives — to drive their functionality...
- 6AI systems are now part of everyday work. Investigators need a consistent way to reconstruct what happened within them.
AI interactions generate telemetry across Microsoft Purview, Defender, and Sentinel. That telemetry captures who initiated an interaction, when it occurred, and which resources were involved. It provi...
- 7How to Test and Evaluate Agentic Systems for Reliability
Agentic systems—autonomous, goal-directed stacks that plan, call tools, observe results, and iterate—are rapidly becoming a core component of modern products. Examples include travel-booking assistant...
- 8AI is changing the economics of both software development and cyberattacks
AI is changing the economics of both software development and cyberattacks. Organizations are shipping code faster than ever, increasingly with the help of AI agents and tools that generate, modify, a...
- 9Enhancing AI Security Incident Response Through Collaborative Exercises
I had the privilege of participating in an AI Security Incident tabletop exercise led by the Cybersecurity and Infrastructure Security Agency’s (CISA) Joint Cyber Defense Collaborative (JCDC). This ex...
Key Entities
Generated by CoreProse in 4m 27s
What topic do you want to cover?
Get the same quality with verified sources on any subject.