Ollama GPU Acceleration (Optional)¶
Enable this only if your host has NVIDIA GPU support and you need faster local inference.
Prerequisites¶
- NVIDIA GPU with updated driver
- NVIDIA Container Toolkit installed
Install Toolkit (Ubuntu)¶
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Validate GPU from Docker¶
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Enable GPU in Dev Compose¶
Edit infrastructure/docker/environments/dev/docker-compose-dev.yaml and add the GPU reservation block under ollama.
ollama:
image: ollama/ollama:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Restart and Verify¶
make dev-down
make dev
docker exec ollama-dev nvidia-smi
docker logs ollama-dev | grep -i cuda
If CUDA is detected, GPU acceleration is active.