Running Ollama in Docker Compose with GPU and Persistent Model Storage
This article explains how to set up a reproducible local or single-node Ollama server using Docker Compose, with GPU acceleration and persistent model storage.
Why it matters
This article provides a reproducible and configurable way to set up an Ollama server using Docker Compose, making it easier to manage and scale the service.
Key Points
- 1Docker Compose provides benefits over a bare metal Ollama installation when you have a team setup or want to manage upgrades and sidecars easily.
- 2The article provides a Compose file that pins the Ollama image version, mounts a volume for persistent model storage, and exposes configuration options.
- 3GPU acceleration can be added to the Compose setup on hosts with NVIDIA GPUs.
Details
The article starts by discussing the advantages of using Docker Compose to manage an Ollama server, such as version pinning, persistent storage, and the ability to easily add sidecars like a web UI, reverse proxy, or auth gateway. It then provides a sample Compose file that bakes in these decisions, allowing you to control the Ollama image tag, bind IP, and various service tuning parameters through environment variables. The Compose file also mounts a volume for persistent model storage, which is important to avoid re-downloading models on each container restart. Finally, the article mentions that GPU acceleration can be added to the Compose setup on hosts with NVIDIA GPUs.
No comments yet
Be the first to comment