NVIDIA A6000 Server (48GB VRAM)
This guide covers working with LLMs and GPU resources on the A6000 server.
Server Overview
| Specification | Details |
|---|---|
| GPU | NVIDIA A6000 |
| VRAM | 48 GB |
| Container | ollama-a6000 |
| Port | 5000 |
GPU Commands
Check GPU Status
View GPU utilization and running processes:
nvidia-smi
For continuous monitoring:
watch -n 1 nvidia-smi
Ollama Setup
We use Ollama to run LLM models. The configuration file is located at:
/usr/local/bin/ollama-multi-gpu.sh
Docker Commands
View Running Containers
docker ps
List Available Models
docker exec ollama-a6000 ollama list
Check GPU Usage by Container
Verify that the Ollama container is utilizing the GPU:
docker exec ollama-a6000 nvidia-smi
Managing Models
Pull a New Model
Download models from Ollama Model Library:
docker exec ollama-a6000 ollama pull <model_name>
Examples:
# Pull Llama 3.2
docker exec ollama-a6000 ollama pull llama3.2
# Pull Mistral
docker exec ollama-a6000 ollama pull mistral
# Pull CodeLlama
docker exec ollama-a6000 ollama pull codellama
Remove a Model
docker exec ollama-a6000 ollama rm <model_name>
Popular Models
| Model | Description | Size |
|---|---|---|
llama3.2 | Meta's Llama 3.2 | 3B, 11B, 90B |
mistral | Mistral 7B | 7B |
codellama | Code-focused Llama | 7B, 13B, 34B |
phi3 | Microsoft Phi-3 | 3.8B |
gemma2 | Google Gemma 2 | 9B, 27B |
Browse all models: https://ollama.com/library
Accessing from Local Machine
Step 1: Create SSH Tunnel
ssh -L 5000:localhost:5000 test_user@<server_ip>
Step 2: Test the Connection
curl http://localhost:5000/api/tags
Step 3: Use the API
Generate a response:
curl http://localhost:5000/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello, how are you?"
}'
Troubleshooting
Container Not Running
# Check all containers
docker ps -a
# Start the container
docker start ollama-a6000
Model Not Found
# List available models
docker exec ollama-a6000 ollama list
# Pull the model
docker exec ollama-a6000 ollama pull <model_name>
Out of Memory (OOM)
- Check GPU usage:
nvidia-smi - Use a smaller model variant (e.g.,
llama3.2:3binstead ofllama3.2:70b) - Wait for other processes to complete
Connection Refused
- Ensure SSH tunnel is active
- Verify correct port with
docker ps - Check if container is running