How to build sustainable AI infrastructure on GPU cloud
Written by
Marketing Team at Civo
Written by
Marketing Team at Civo
AI's environmental cost is real, and it's growing. Training a large language model can consume the electricity of hundreds of households for weeks. Inference at production scale runs continuously, with GPU clusters drawing power around the clock. The data centers that house all of this are some of the most concentrated energy consumers in the modern technology stack.
The honest response to this is not the marketing language of carbon-neutral cloud or net-zero AI. Most of those claims rely on offsets that don't reduce the underlying energy use, and the credibility of the broader category has suffered as a result. The honest response is to focus on the levers that actually reduce environmental impact: running infrastructure more efficiently, using GPUs more fully, consolidating workloads onto fewer machines, and reducing the data movement that accompanies AI work.
These levers are not glamorous. They show up in operational metrics, not press releases. But they're the ones that move the numbers, and they're the ones an engineering team can act on.
This is a working guide to the practical aspects of sustainable GPU cloud - what actually reduces AI's environmental footprint, how to measure it, and what infrastructure choices make a meaningful difference.
What "sustainable" actually means for GPU cloud
The phrase is used loosely. For GPU cloud specifically, sustainability breaks down into three concrete factors:
What Civo's infrastructure does on the efficiency side
For Civo's Cloud GPU platform, the infrastructure-level commitments are specific. The UK facilities operate on 100% renewable energy. The annualized PUE is approximately 1.2 - well below the global average and within the range of the most efficient operating data centers in the market.
Backup power is from hydrotreated vegetable oil (HVO) rather than diesel, reducing generator emissions by up to 90% during the periods when backup is being used. This matters more than it sounds: backup generator runtime can be a meaningful portion of total emissions for a data center, particularly during grid instability or planned maintenance.
The newer data center facility currently being built targets a PUE under 1.1, with advanced liquid cooling for next-generation GPU systems. Liquid cooling becomes increasingly important as GPU TDP climbs - B200 systems at around 1,000W per card and the upcoming Vera Rubin NVL72 systems pull substantially more power than the H100s that preceded them. Cooling these efficiently is a real engineering challenge, and liquid cooling addresses it more effectively than air at high densities.
These are infrastructure choices that reduce environmental impact at the level of the data center itself. They don't depend on offsets, and they don't require trusting marketing claims about carbon accounting.
Cloud GPU powered compute and Kubernetes
Get on-demand access to the latest NVIDIA® H100, H200, and B200 Blackwell GPUs for both bare metal compute and managed Kubernetes. Stop navigating confusing contracts and start building what's next, all from just $0.69 per GPU/hour.
The utilization lever, in detail
The infrastructure-level commitments matter, but the bigger lever for most workloads is utilization. A GPU running at low utilization is consuming power without producing useful output. The carbon cost of that idle compute is real, and it's almost entirely avoidable.
The patterns that increase utilization:
Right-size the GPU to the workload
Many teams default to the most powerful GPU available, regardless of whether the workload needs it. An H100 running at 25% utilization is more wasteful, environmentally and economically, than an A100 running at 75% on the same workload. The smaller GPU draws less power, runs hotter on a smaller area, and produces more useful output per watt consumed.
Civo's GPU range spans A100, H100, H200, L40s, B200 Blackwell, and the upcoming Rubin GPUs in Vera Rubin NVL72 configurations. The range exists specifically so workloads can be matched to appropriate hardware rather than over-provisioned to the most capable card.
Reserve your Vera Rubin capacity
2,016 Vera Rubin GPUs. Q1 2027 delivery confirmed. Pricing from $11.00/hr. Allocations are first-come, first-served. Once they are gone, they are gone.
Increase batch size and improve data pipelines
The most common cause of low GPU utilization is data starvation: the GPU is waiting for the next batch of data to arrive. Increasing batch size and improving the data loading pipeline both improve utilization. The environmental impact is the same as the cost impact: the GPU does more useful work per hour, and the carbon cost per unit of output drops.
Use mixed precision and modern hardware features
Modern GPUs include specialized hardware for lower-precision operations - Tensor Cores for FP16/BF16, FP8 support on H100 and B200. Using these instead of FP32 produces substantial throughput improvements with minimal accuracy loss for most ML workloads. The same work gets done faster, with less total energy consumed.
Share GPUs across compatible workloads
For workloads that don't need a full GPU, Multi-Instance GPU (MIG) on supported cards partitions a single physical GPU into multiple isolated instances. Sharing capacity across workloads keeps utilization high and reduces the number of physical GPUs needed to support the same total workload.
Scale down GPUs that aren't being actively used
Development environments, batch jobs between runs, inference endpoints during off-peak hours - all of these tend to leave GPUs running when they're not doing useful work. Fast provisioning makes it practical to tear them down and recreate them rather than leaving them idle "just in case." Civo's managed Kubernetes GPU clusters deploy in 120 seconds, which makes this pattern operationally realistic.
Workload consolidation as a sustainability lever
Beyond individual GPU utilization, workload consolidation is the meso-level sustainability lever. A team running ten small training jobs on separate GPUs is using more infrastructure than the same team running them combined on a smaller number of larger jobs.
The pattern shows up in several forms:
- Batching related training jobs together: Multiple model variants trained on the same data, hyperparameter sweeps, and ensemble training
- Combining inference workloads with similar models: Using model serving frameworks that allow multiple models to share GPU memory
- Consolidating development and experimentation environments: Shared notebook environments rather than per-team dedicated clusters
Each consolidation reduces the total infrastructure footprint without reducing the amount of work being done. For an organization running significant AI workloads, the cumulative effect across the portfolio can be substantial.
The data movement lever
The third lever, often overlooked, is reducing the data that moves between compute and storage, between regions, and between providers. Every byte of data movement consumes energy: in network equipment, in routing infrastructure, in the storage systems on both ends.
The practical patterns:
Keep compute and data physically close
For data-intensive workloads, keeping compute and storage on the same physical infrastructure dramatically reduces data movement. The bytes don't traverse external networks; the data path stays inside the same facility.
For Civo workloads, GPU compute and storage live on the same infrastructure, which keeps data paths short and reduces the energy cost of moving data between layers.
Civo's sovereign cloud regions in the UK and India keep workloads physically located within those jurisdictions. The compliance benefits are well-documented; the environmental side is less discussed but follows directly from the same architecture.
Avoid unnecessary cross-cloud and cross-region traffic
Multi-cloud and multi-region architectures have legitimate reasons for existing, but each cross-boundary data flow costs energy. Designing workflows that minimize unnecessary inter-region or inter-cloud transfer reduces both cost and environmental impact.
The metrics to track
For teams that want to measure sustainability progress concretely, the metrics worth tracking:
- GPU utilization at the workload, cluster, and fleet level
- Useful output per GPU-hour (samples per second for training, requests per second for inference)
- Idle GPU-hours: Capacity that was allocated but didn't do useful work
- Data center PUE of the underlying infrastructure
- Power source mix of the provider's facilities
These are the numbers that actually move when sustainability improvements are made. Tracking them honestly gives the team a basis for prioritizing the changes that matter.
What's actionable
For organizations that want to reduce AI's environmental footprint, the practical actions break down into infrastructure choices and operational practices.
The infrastructure choices:
- Choose providers whose data centers have low PUE and renewable-powered infrastructure
- Choose providers whose facilities are physically located near the workload's users to reduce data movement
- Choose providers whose pricing structure supports operational practices like fast scale-down (no egress fees, transparent compute pricing, granular billing)
The operational practices:
- Right-size GPUs to workloads rather than defaulting to the most powerful card
- Improve utilization through batch size tuning, mixed precision, and pipeline optimization
- Consolidate workloads to keep utilization high across the fleet
- Scale down GPUs that aren't actively in use
- Track utilization and useful output as ongoing metrics, not just point-in-time measurements
The infrastructure choices set the floor. The operational practices determine how much of that floor is actually realized. Both matter, and both are within the team's control.
For workloads on Civo's Cloud GPU platform, the combination of 100% renewable-powered UK facilities, a current PUE around 1.2 with the new facility targeting under 1.1, and HVO-based backup power addresses the infrastructure side honestly without overclaiming. The operational practices that increase utilization and reduce waste are then up to the team, supported by the platform's fast provisioning, transparent pricing without egress fees, and the full NVIDIA GPU range for right-sizing.

Marketing Team at Civo
Civo is the Sovereign Cloud and AI platform designed to help developers and enterprises build without limits. We bridge the gap between the openness of the public cloud and the rigorous security of private environments, delivering full cloud parity across every deployment. As a team, we are dedicated to providing scalable compute, lightning-fast Kubernetes, and managed services that are ready in minutes. Through CivoStack Enterprise and our FlexCore appliance, we empower organizations to maintain total data sovereignty on their own hardware.
Our mission is to make the cloud faster, simpler, and fairer. By providing enterprise-grade NVIDIA GPUs and streamlined model management, we ensure that high-performance AI and machine learning are accessible to everyone. Built for transparency and performance, the Civo Team is here to give you total control over your infrastructure, your data, and your spend.
Share this article