Optimize GPU costs for machine learning with just-in-time Kubernetes scaling
Learn how to implement just-in-time GPU provisioning on Civo Kubernetes to dynamically scale resources, enforce budget and utilization guardrails, and receive alerts for efficient ML training cost optimization.
Written by
Software Engineer @ GoCardless
Written by
Software Engineer @ GoCardless
Machine learning projects face an escalating challenge due to the increasing costs of GPU infrastructure. Training modern deep learning models requires substantial computational resources, yet many teams find expensive GPU nodes sitting idle between training runs. This inefficiency turns model iteration into an ongoing budget problem rather than a pure research or product problem.
Just-in-time GPU provisioning solves this by creating capacity only when workloads demand it. Instead of maintaining always-on GPU clusters, teams dynamically scale GPU resources, ensuring accelerators are available for training and inference while avoiding idle-hour charges.
This tutorial walks through building a lightweight, on-demand GPU scheduler that also posts notifications about cost and training events. Specifically, the system will:
- Monitor Kubernetes pods that request GPUs.
- Scale GPU node pools using Civo CLI (or REST API in production).
- Enforce simple utilization and budget guardrails.
- Post Slack/Telegram alerts for scaling and budget events (simulated in this tutorial).
Architecture overview
The solution consists of several components working together:
- Civo Kubernetes cluster: Base CPU-only cluster with dynamic GPU node pools
- Watchdog controller: Monitors workload patterns and triggers scaling decisions
- Cost monitor: Tracks spending against budgets using Civo's billing API
- Alert System: Provides real-time notifications about scaling events and cost thresholds
- Policy engine: Enforces business rules around utilization and spending limits
The resulting architecture combines node labels/taints, a small watchdog script, and lightweight alerts to make GPU spending observable and predictable.
Prerequisites
Keep credentials out of source code by using Kubernetes Secrets. Assign minimal RBAC permissions to any in-cluster automation to follow least-privilege principles.
Step 1: Create a GPU cluster on Civo
Instead of spinning up a separate GPU-only cluster, add a GPU node pool to the existing CPU cluster. This approach keeps workloads organized, CPU nodes handle system tasks and light jobs, while GPU nodes are reserved only for ML training and inference. It also simplifies networking, namespaces, and monitoring since everything stays under one Kubernetes control plane.
There are two approaches to doing this:
- Civo Dashboard:
- Go to Civo Dashboard → Kubernetes → Create Cluster
- Choose a standard CPU size (e.g.,
g4s.kube.small). - Set initial node count = 1 to avoid accidental charges during testing.
- Go to cluster → Create New Pool.
- Select a GPU-optimized size (e.g.,
g4g.kube.small). - Set initial node count = 1.
- Civo CLI:
- Create or scale CPU and GPU pools programmatically.
- Example CLI commands:
# Step 1: Create a CPU-only clustercivo kubernetes create CLUSTER_NAME --size=g4s.kube.small --nodes=1# Step 2: Add a GPU pool (start with 0 nodes to avoid idle costs)civo kubernetes node-pool create CLUSTER_NAME --size=g4g.kube.small --count=1# Step 3: Scale GPU pool when training startscivo kubernetes node-pool scale CLUSTER_NAME NODEPOOL_ID -n NUMBER
The advantage of Civo’s GPU
Unlike other providers, Civo pre-installs NVIDIA drivers, CUDA toolkit, and container runtime on GPU nodes. This eliminates the common "bootstrap time" problem where teams wait 10-15 minutes for driver installation after node creation.
Get started with Civo AI today
AI in our cloud, or yours? Civo AI puts the power of the latest NVIDIA GPUs and multi-cloud control in your hands without cost, complexity or lock-in. Work at the speed of your ideas, without draining your budget – and keep your data close, compliant and completely under your control.
After creating the pool, confirm that nodes join the cluster and that the NVIDIA device plugin or vendor operator is running. GPU nodes require drivers and device plugins to expose nvidia.com/gpu resources in Kubernetes. Here’s how to do it:
kubectl describe node <gpu-node-name> | grep nvidia.com/gpu

Step 2: Tag GPU workloads
Tagging and taints prevent the accidental scheduling of CPU workloads on GPU nodes, allowing the scheduler to detect GPU work. Inspect Node Labels and Taints:
kubectl describe node <gpu-node-name>
- Look for labels like
node.kubernetes.io/instance-type,nodepool=gpu, or custom tags. - Note taints like
gpu=true: NoSchedulewhich prevent generic pods from landing there.

Apply Labels, Taints, and Affinity:
If missing, add labels via:
kubectl label nodes <gpu-node-name> gpu=on-demandkubectl taint nodes <gpu-node-name> gpu=true:NoSchedule
If nodes cannot be labelled, skip this step and use nodeAffinity in the job spec (below is the working example).
Example Job Spec (gpu-test.yaml):
# gpu-test.yamlapiVersion: v1kind: Podmetadata:name: gpu-testLabels:gpu: on-demandspec:restartPolicy: OnFailurecontainers:- name: fake-gpuimage: busyboxcommand: ["sh", "-c", "echo 'Simulating GPU work'; sleep 60"]resources:limits:nvidia.com/gpu: 1tolerations:- key: "gpu"operator: "Equal"value: "true"effect: "NoSchedule"affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:nodeSelectorTerms:- matchExpressions:- key: gpuoperator: Invalues:- on-demand
Pro tip: Consistent labeling (gpu=on-demand) ensures the watchdog can detect GPU jobs reliably across different teams and projects. Consider implementing a labeling policy that includes cost centers and project codes for better financial tracking.
Step 3: Write watchdog script
Watchdog is a lightweight Python service that continuously monitors GPU job activity in your cluster, triggers node-pool scaling when needed, and posts notifications for team visibility. Think of it as the “eyes and hands” of the cost-aware GPU scheduler.
Design Principles:
- Monitor pods, not obs: Pod-level inspection (status: Pending/Running) is more reliable.
- Secure authentication: Read the Civo API key from a Kubernetes Secret.
- Least-privilege RBAC: ServiceAccount with read-only access to pods/jobs.
- Backoff & cooldown: Avoid thrashing with exponential backoff or cooldown between scale actions.
Folder structure:
civo-gpu-scheduler/├── gpu-test.yaml # GPU test pod manifest├── watchdog_local.py # Watchdog script for local simulation├── watchdog.py # Production-ready script (with CLI/API calls)└── README.md
Example watchdog (Python, CLI-based for simplicity):
Import dependencies and configure logging:
import time, logginglogging.basicConfig(level=logging.INFO)
Define state management variables:
scaled_jobs = set()COOLDOWN = 10LAST_ACTION = 0
Function to detect pending GPU jobs:
def pending_gpu_jobs():return ["gpu-test"] # Replace with real pod query
Continuous Monitoring with Scale-Up and Scale-Down: Loops to check GPU job status, trigger scale-up when pending jobs exist, and cooldown has passed, and trigger scale-down when no jobs exist, and cooldown has passed.
while True:pending = pending_gpu_jobs()now = time.time()# -----------------------------------------------# Scale-Up Logic# -----------------------------------------------# If GPU jobs are waiting and cooldown passed,# simulate a scale-up and send alerts.if pending and (now - LAST_ACTION) > COOLDOWN:for job in pending:logging.info(f"Pending GPU pod detected: {job}, simulating scale-up")print("Scale-up simulated (Civo API/CLI call would go here)")print(f"Slack alert: GPU scale-up triggered for job {job}")print(f"Telegram alert: GPU scale-up triggered for job {job}")scaled_jobs.add(job)LAST_ACTION = now# -----------------------------------------------# Scale-Down Logic# -----------------------------------------------# If no jobs remain and cooldown passed,# simulate a scale-down and send alerts.if not pending and scaled_jobs and (now - LAST_ACTION) > COOLDOWN:logging.info("No pending GPU pods: simulating scale-down")print("Scale-down simulated (Civo API/CLI call would go here)")print("Slack alert: GPU scale-down triggered")print("Telegram alert: GPU scale-down triggered")scaled_jobs.clear()LAST_ACTION = now# -----------------------------------------------# Sleep Interval# -----------------------------------------------# Avoids busy looping. Runs check every 5 seconds.time.sleep(5)
Alerts are simulated. In production, replace print() with real Slack/Telegram webhooks.
RBAC example
The watchdog requires specific Kubernetes permissions to monitor pods and jobs. This RBAC configuration follows least-privilege principles:
apiVersion: v1kind: ServiceAccountmetadata:name: gpu-watchdognamespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata:name: gpu-watchdog-rolenamespace: kube-systemrules:- apiGroups: [""]resources: ["pods"]verbs: ["get", "list", "watch"]- apiGroups: ["batch"]resources: ["jobs"]verbs: ["get", "list", "watch"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:name: gpu-watchdog-bindingnamespace: kube-systemsubjects:- kind: ServiceAccountname: gpu-watchdognamespace: kube-systemroleRef:kind: Rolename: gpu-watchdog-roleapiGroup: rbac.authorization.k8s.io
Step 4: Set cost guardrails
Effective cost management requires proactive guardrails that prevent budget overruns while maintaining development velocity. Civo's transparent billing model makes implementing these controls straightforward.
Suggested guardrails
Metrics and signals
Prefer telemetry over heuristics. Deploy NVIDIA DCGM exporter and Prometheus to compute nvidiagpuutilization. Use PromQL rolling averages to implement thresholds. If metrics are not available, fall back to pod-count heuristics.
Budget checking via Civo API
Civo exposes billing endpoints to list account charges. Use the charges API to compute cumulative usage and enforce budget caps. Example API resource: GET https://api.civo.com/v2/charges. Parse responses safely and handle missing fields.
Example budget check (safe parsing and error handling recommended):
import requests, osCIVO_API_KEY = os.getenv("CIVO_API_KEY")HEADERS = {"Authorization": f"bearer {CIVO_API_KEY}"}def get_charges():resp = requests.get("https://api.civo.com/v2/charges", headers=HEADERS, timeout=10)resp.raise_for_status()return resp.json()
Caution: The exact billing object shape can change. Validate response fields before use.
Step 5: Test the workflow
Comprehensive testing ensures the system behaves correctly under various scenarios, from normal operations to edge cases and failure conditions.
Testing steps
- Deploy the watchdog as a Deployment in the cluster.
- Submit a short GPU job labeled
gpu=on-demand. - Verify the job becomes Pending → watchdog detects it → simulated scale-up occurs.
- Confirm simulated Slack/Telegram alerts appear with relevant context.
- Observe job runs on GPU node → completion triggers scale-down and alert.
- Test error conditions: API failures, network issues, budget limits.
Here’s an example of the output you should see after running the watchdog script.

Monitor the dashboard and billing for expected activity. Pre-pull images or maintain a warm node for faster startup if needed.
Troubleshooting
Common issues and their solutions when deploying the GPU scheduler:
- Pod pending: Check that tolerations/affinity labels match GPU node configuration exactly. Mismatched labels are the most common cause of scheduling failures.
- Cold start issues: Civo's fast provisioning helps, but image pulls can still add 2–5 minutes. Pre-pull common images or maintain standby nodes for latency-sensitive workloads.
- Webhook failures: Check logs and network egress rules. Ensure the cluster can reach external notification services.
- Quota/API errors: Inspect CLI/API output carefully. GPU quotas may limit scaling, especially for new Civo accounts.
Summary
By leveraging Civo's flexible node-pool management, strategic workload tagging, and telemetry-driven scaling, you can optimize your GPU usage and costs. Implementing best practices such as using Secrets for API keys, minimal RBAC, and cooldown windows can further enhance your deployment's efficiency and reliability. With these strategies, you can ensure predictable and policy-driven GPU spending. By following these guidelines, you'll be well on your way to maximizing the potential of your GPU-accelerated workloads on Civo.
If you are looking to learn more about some of the topics discussed in this tutorial, check out these resources:

Software Engineer @ GoCardless
Mostafa Ibrahim is a software engineer and technical writer specializing in developer-focused content for SaaS and AI platforms. He currently works as a Software Engineer at GoCardless, contributing to production systems and scalable payment infrastructure.
Alongside his engineering work, Mostafa has written more than 200 technical articles reaching over 500,000 readers. His content covers topics including Kubernetes deployments, AI infrastructure, authentication systems, and retrieval-augmented generation (RAG) architectures.
Share this article
Further Reading
12 March 2025
Lightweight text-to-SQL assistant on Civo GPUs
28 January 2025
Accelerating machine learning: Jupyter Notebook on Civo GPUs
30 January 2025