Skip to main content

NVIDIA A6000 Server (48GB VRAM)

This guide covers working with LLMs and GPU resources on the A6000 server.

Server Overview

SpecificationDetails
GPUNVIDIA A6000
VRAM48 GB
Containerollama-a6000
Port5000

GPU Commands

Check GPU Status

View GPU utilization and running processes:

nvidia-smi

For continuous monitoring:

watch -n 1 nvidia-smi

Ollama Setup

We use Ollama to run LLM models. The configuration file is located at:

/usr/local/bin/ollama-multi-gpu.sh

Docker Commands

View Running Containers

docker ps

List Available Models

docker exec ollama-a6000 ollama list

Check GPU Usage by Container

Verify that the Ollama container is utilizing the GPU:

docker exec ollama-a6000 nvidia-smi

Managing Models

Pull a New Model

Download models from Ollama Model Library:

docker exec ollama-a6000 ollama pull <model_name>

Examples:

# Pull Llama 3.2
docker exec ollama-a6000 ollama pull llama3.2

# Pull Mistral
docker exec ollama-a6000 ollama pull mistral

# Pull CodeLlama
docker exec ollama-a6000 ollama pull codellama

Remove a Model

docker exec ollama-a6000 ollama rm <model_name>
ModelDescriptionSize
llama3.2Meta's Llama 3.23B, 11B, 90B
mistralMistral 7B7B
codellamaCode-focused Llama7B, 13B, 34B
phi3Microsoft Phi-33.8B
gemma2Google Gemma 29B, 27B

Browse all models: https://ollama.com/library


Accessing from Local Machine

Step 1: Create SSH Tunnel

ssh -L 5000:localhost:5000 test_user@<server_ip>

Step 2: Test the Connection

curl http://localhost:5000/api/tags

Step 3: Use the API

Generate a response:

curl http://localhost:5000/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello, how are you?"
}'

Troubleshooting

Container Not Running

# Check all containers
docker ps -a

# Start the container
docker start ollama-a6000

Model Not Found

# List available models
docker exec ollama-a6000 ollama list

# Pull the model
docker exec ollama-a6000 ollama pull <model_name>

Out of Memory (OOM)

  1. Check GPU usage: nvidia-smi
  2. Use a smaller model variant (e.g., llama3.2:3b instead of llama3.2:70b)
  3. Wait for other processes to complete

Connection Refused

  • Ensure SSH tunnel is active
  • Verify correct port with docker ps
  • Check if container is running