175,000 open ollama servers and nobody is surprised

if you run docker run -p 11434:11434 ollama/ollama on a machine with a public IP, that service is now globally reachable at port 11434. the docker -p flag binds to 0.0.0.0 by default, not 127.0.0.1. this is documented behavior. nobody reads documentation.

the result is 175,000 Ollama servers accessible from the public internet across 130 countries, as documented by a joint SentinelOne Censys investigation published in early 2026. approximately 30% of them are in China, just over 20% in the United States, with significant concentrations in Germany, France, South Korea, India, Russia, Singapore, Brazil, and the United Kingdom. nearly half of the exposed hosts have tool-calling capabilities enabled. 201 hosts were running uncensored prompt templates that strip safety guardrails. this is the attack surface. it exists because people followed quickstart guides without understanding networking.

the exploitation chain, explained

Ollama binds to 127.0.0.1:11434 by default. that is the secure configuration. when someone runs it inside Docker with a port mapping, the container networking layer rebinds the service to 0.0.0.0, making it reachable from any interface on the host. if the host has a public IP, the service is public. this is not a bug. this is working as documented. the documentation recommends binding to 127.0.0.1 explicitly when exposing ports, but the default Docker behavior overrides that intent without any warning to the user.

the security implications compound rapidly once a service is exposed. a text-generation endpoint is a nuisance. a tool-calling endpoint is a different risk class entirely. when an Ollama instance has tool-calling enabled, the model can generate structured output that triggers function calls on the host system. the output format is parsed by the calling application, which then executes the corresponding operations. in an authenticated, network-isolated deployment, this is fine. in a publicly exposed Docker container running as root on a residential machine, this means a stranger on the internet can invoke arbitrary functions on your system through your LLM interface.

the primary threat is compute theft. attackers scan for exposed Ollama instances, validate that they respond coherently, and then resell access to the GPU cycles at discounted rates through a commercial gateway. the workflow is three steps: scan for exposed Ollama, vLLM, and OpenAI-compatible APIs without authentication; validate the endpoints by measuring response quality; monetize access through a unified LLM API gateway. this is a business. it runs at industrial scale.

vllm and the jpeg2000 heap overflow

separate from the Ollama exposure, vLLM has a critical RCE vulnerability tracked as CVE-2026-22778, affecting versions 0.8.3 through 0.14.0. vLLM is a high-throughput LLM serving engine with over 3 million monthly PyPI downloads. the vulnerability allows arbitrary command execution through a malicious video URL submitted to the API.

the attack chain has two stages. first, an information leak in PIL (the Python Imaging Library) bypasses ASLR. when an invalid image is submitted to a multimodal vLLM endpoint, PIL generates an error message that exposes a heap memory address. the leaked address is positioned before libc in memory, allowing an attacker to map the memory layout and defeat address randomization. second, vLLM uses OpenCV, which bundles FFmpeg 5.1.x, to decode video input. the FFmpeg JPEG2000 decoder has a heap overflow triggered by a malformed channel definition box. the decoder trusts the cdef box without validating buffer sizes, allowing an attacker to write large Y-channel data into a smaller U-channel buffer, overflowing into adjacent heap memory and overwriting function pointers to redirect execution to libc functions like system().

the fix is vLLM 0.14.1, which updated OpenCV to a patched release and sanitized PIL error messages to prevent heap address leakage. organizations running vLLM with video-capable models in production on exposed infrastructure should update immediately. organizations that cannot patch should disable video model features in production environments.

the pattern is older than you think

this class of failure is not new. in 2012, MongoDB instances started appearing on the internet without authentication, searchable via Shodan, hosting databases full of customer records. the default configuration bound to 0.0.0.0. the same happened with Redis, with Cassandra, with CouchDB, with every database that prioritized developer ergonomics over secure defaults. the security community called it out repeatedly. deployment guides eventually added authentication setup steps. the databases that shipped with authentication disabled by default stopped doing it, mostly. the lesson seemed learned.

it was not learned. local AI deployment tooling in 2026 ships with the same defaults that MongoDB shipped with in 2012. the difference is that an exposed Ollama instance with tool-calling can execute code on your network. a compromised MongoDB let attackers read your data. an exposed LLMjacking victim lets attackers run inference on their hardware, use your GPU for cryptocurrency mining, and pivot through your network if the deployment is on enterprise infrastructure. the blast radius is wider and the tooling for exploitation is already packaged and sold as a service.

the IPv6 problem compounds this. IPv4 addresses behind NAT are not globally routable by default. an unconfigured home router does not forward ports to internal machines. IPv6 addresses are frequently globally routable. a machine that would be unreachable behind a NAT on IPv4 is directly addressable on IPv6. the same Docker port mapping that exposes a service to a NATed IPv4 address exposes it to the entire internet on IPv6. users who understand this are in the minority. users who understand that their ISP may have deployed IPv6 and that their machine is now reachable from outside their network are rarer still.

what enterprises are doing wrong

the enterprise response to local AI has been to deploy it at the edge without applying edge security controls. DGX Spark units, Mac Studio deployments, AMD Strix Halo workstations, and headless GPU servers are appearing on corporate networks with Ollama or vLLM installed by developers who wanted to test a model against their codebase. the deployment has no API authentication, no network segmentation, no monitoring for unauthorized inference requests, and in many cases, no firewall rule restricting access to the local subnet. the machines are on the same network as production systems. if one of those machines has a tool-calling capable model and an exposed API, the attack surface is the corporate network.

treating LLMs as critical infrastructure is the starting point. API endpoints need authentication even when the LLM is behind a corporate firewall. network controls need to assume that any machine on the network is potentially reachable from the internet via IPv6 or WiFi. monitoring needs to distinguish between authorized inference requests and the patterns that characterize LLMjacking, which involves many rapid requests from diverse IPs testing model quality. and the development tooling that makes local AI deployment frictionless, like Ollama and vLLM, needs to be treated with the same suspicion applied to any service that defaults to unauthenticated network access.

the open source AI ecosystem has been extraordinary in lowering the barrier to running powerful models locally. the security practices that should accompany that ecosystem have not kept pace. 175,000 exposed servers is the result.

sources: