NVIDIA A6000 Server (48GB VRAM)

This guide covers working with LLMs and GPU resources on the A6000 server.

Server Overview

Specification	Details
GPU	NVIDIA A6000
VRAM	48 GB
Container	`ollama-a6000`
Port	5000

GPU Commands

Check GPU Status

View GPU utilization and running processes:

nvidia-smi

For continuous monitoring:

watch -n 1 nvidia-smi

Ollama Setup

We use Ollama to run LLM models. The configuration file is located at:

/usr/local/bin/ollama-multi-gpu.sh

Docker Commands

View Running Containers

docker ps

List Available Models

docker exec ollama-a6000 ollama list

Check GPU Usage by Container

Verify that the Ollama container is utilizing the GPU:

docker exec ollama-a6000 nvidia-smi

Managing Models

Pull a New Model

Download models from Ollama Model Library:

docker exec ollama-a6000 ollama pull <model_name>

Examples:

# Pull Llama 3.2
docker exec ollama-a6000 ollama pull llama3.2

# Pull Mistral
docker exec ollama-a6000 ollama pull mistral

# Pull CodeLlama
docker exec ollama-a6000 ollama pull codellama

Remove a Model

docker exec ollama-a6000 ollama rm <model_name>

Popular Models

Model	Description	Size
`llama3.2`	Meta's Llama 3.2	3B, 11B, 90B
`mistral`	Mistral 7B	7B
`codellama`	Code-focused Llama	7B, 13B, 34B
`phi3`	Microsoft Phi-3	3.8B
`gemma2`	Google Gemma 2	9B, 27B

Browse all models: https://ollama.com/library

Accessing from Local Machine

Step 1: Create SSH Tunnel

ssh -L 5000:localhost:5000 test_user@<server_ip>

Step 2: Test the Connection

curl http://localhost:5000/api/tags

Step 3: Use the API

Generate a response:

curl http://localhost:5000/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello, how are you?"
}'

Troubleshooting

Container Not Running

# Check all containers
docker ps -a

# Start the container
docker start ollama-a6000

Model Not Found

# List available models
docker exec ollama-a6000 ollama list

# Pull the model
docker exec ollama-a6000 ollama pull <model_name>

Out of Memory (OOM)

Check GPU usage: nvidia-smi
Use a smaller model variant (e.g., llama3.2:3b instead of llama3.2:70b)
Wait for other processes to complete

Connection Refused

Ensure SSH tunnel is active
Verify correct port with docker ps
Check if container is running

Server Overview​

GPU Commands​

Check GPU Status​

Ollama Setup​

Docker Commands​

View Running Containers​

List Available Models​

Check GPU Usage by Container​

Managing Models​

Pull a New Model​

Remove a Model​

Popular Models​

Accessing from Local Machine​

Step 1: Create SSH Tunnel​

Step 2: Test the Connection​

Step 3: Use the API​

Troubleshooting​

Container Not Running​

Model Not Found​

Out of Memory (OOM)​

Connection Refused​