Transcribe your audio library at scale on Civo: Faster-whisper + Kubernetes indexed jobs

Learn how to build a batch audio transcription pipeline on Civo Kubernetes using faster-whisper and Indexed Jobs. Parallel GPU pods share a cached model to process hundreds of files in one run — and cost nothing when idle.

13 minutes reading time

Written by

Mostafa Ibrahim
Mostafa Ibrahim

Software Engineer at GoCardless

Whisper processes audio reliably, but when the backlog grows to hundreds of podcast episodes, meeting recordings, or voicemails, a single container picking them up one at a time turns a manageable workload into something that takes days to clear.

The obvious fixes fall short in different ways. An always-on GPU server speeds things up but keeps billing running even when nothing is being processed. Spinning up more containers helps with throughput, but each new pod pulls a 3 GB model from the internet on startup. With ten pods pulling simultaneously, the network saturates, and each pod sits idle for twenty minutes before it even starts transcribing.

This tutorial builds a system that solves both problems. A Kubernetes Indexed Job fans work across multiple GPU pods in parallel, where each pod picks up exactly one audio file, transcribes it, and writes the result back to object storage. A shared disk caches the faster-whisper model once, so pods start in under two minutes instead of twenty.

The whole setup runs on Civo Kubernetes with L40S GPU nodes, and Civo Object Store acts as the input and output bus. When the job finishes, the GPU nodes go idle, and you are no longer billed for compute. 

What you’ll build

A batch transcription pipeline on Civo Kubernetes that takes a folder of audio files and returns structured JSON transcripts, with every file processed in parallel across GPU pods.

The pipeline works in five steps:

  1. Audio files are uploaded to Civo Object Store alongside a manifest listing each one
  2. A Kubernetes Indexed Job spins up one pod per file, each assigned a number that maps directly to one audio file
  3. Every pod reads the Whisper model from a shared disk instead of downloading it fresh, keeping startup under 2 minutes
  4. Each pod transcribes its file on a GPU and writes the JSON transcript back to Object Store
  5. When all pods finish the job, the job terminates, and GPU nodes are removed

The core components:

  • Civo Kubernetes cluster with separate CPU and GPU node pools
  • Kubernetes Indexed Job that distributes files across pods and shuts down when done
  • Shared PVC (Persistent Volume Claim) caching the faster-whisper model once for the entire batch
  • faster-whisper worker container handling transcription on each GPU
  • Civo Object Store as both input source and transcript destination

By the end, you have a pipeline that clears an entire audio backlog in one run and costs nothing when idle.

Why Civo for this project?

Civo's per-second GPU billing was built for workloads like this one, short bursts of heavy compute with nothing running in between. The GPU nodes come up when the batch starts, do the work, and get removed when it finishes, so the bill reflects exactly what the pipeline consumed. The CPU node keeps the cluster alive between runs at a fraction of the GPU cost, and node pool separation ensures transcription pods never land on it.

Civo Object Store removes the need for a message queue entirely. Audio files sit in the bucket, pods pull from it, and transcripts go back out with no extra layer or service in between. Because the cluster, GPU nodes, and Object Store all live in one environment, there is nothing to stitch together before you can run your first job.

What you need

Before starting, make sure you have the following in place.

Before diving in, create the following folder structure on your local machine:

civo-batch-transcription/
manifests/
model-pvc.yaml
model-download-job.yaml
transcribe-job.yaml
worker/
worker.py
data/
audio/
episode-001.mp3
...

The manifests/ folder holds all Kubernetes YAML files, worker/ contains the transcription script, and data/audio/ is where your sample audio files live before uploading to the Object Store.

Tested with:
- Civo CLI v1.5.2
- Kubernetes v1.34.2
- Helm v3.x
- Python 3.11
- faster-whisper v1.2.1

Everything else gets created as part of the tutorial.

How it fits together 

The system is built as a simple flow from storage to compute and back.

Audio files are uploaded into a bucket in Civo Object Store alongside a manifest file that lists every file to be processed. When the Indexed Job starts, Kubernetes creates multiple GPU pods at the same time, each receiving a unique index number. That index maps to a specific line in the manifest, which tells the pod exactly which audio file to pick up.

Each pod then pulls its assigned file from the Object Store, loads the Whisper model from the shared disk, transcribes the audio on a GPU, and writes the resulting JSON transcript back to the same bucket. Once a pod finishes its file, it exits, and when every index is complete, the job terminates.

Transcribe your audio library at scale on Civo: Faster-whisper + Kubernetes indexed jobs

Image by Author

Every component has a single responsibility, and nothing runs longer than it needs to.

Cluster sizing 

This tutorial uses two node pools with a clear separation of responsibilities:

Node PoolCountSizePurpose

CPU

1

g4s.kube.small (2 vCPU / 4 GB)

System pods, model download

GPU

2

an.g1.l40s.kube.x1 (L40S 40GB)

Transcription workers

A quick distinction worth making before moving on: nodes are the actual machines in your cluster, while pods are the workloads that run on them. You can have more pods than nodes. If you submit a job with ten pods but only have two GPU nodes, Kubernetes runs two pods at a time and queues the rest. Each pod waits for a node to free up before it starts, which is exactly the take-turns behavior you will see during the run.

Two GPU nodes are enough to demonstrate parallelism without a large bill. With two nodes running in parallel and five files to process, you will see pods completing and new ones scheduling in real time. For a larger batch, you scale the GPU pool and adjust the job's parallelism setting to match.

Creating the Cluster 

Authenticate the Civo CLI with your API key, found under your profile in the Civo dashboard.

civo apikey save my-key YOUR_API_KEY_HERE
civo apikey use my-key
civo regions ls

Create the cluster with a single CPU node pool, then save the kubeconfig:

civo kubernetes create whisper-batch \
--size=g4s.kube.small \
--nodes=1 \
--region=NYC1 \
--wait
civo kubernetes config whisper-batch --save --switch

Add the GPU node pool: 

civo kubernetes node-pool create whisper-batch \
--size=an.g1.l40s.kube.x1 \
--nodes=2 \
--region=NYC1

Verify all three nodes are ready:

kubectl get nodes
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

Image showing that all three nodes are ready

Install the NVIDIA GPU Operator using the exact flags below. Civo's GPU images ship with the container toolkit pre-installed, so toolkit.enabled is set to false.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm upgrade --install gpu-operator \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=true \
--set toolkit.enabled=false \
--set devicePlugin.enabled=true \
--set gfd.enabled=true \
--set operator.defaultRuntime=containerd \
--set validator.cuda.runtimeClassName=nvidia

The operator takes 3 to 5 minutes to initialize. Wait until all pods are running, then confirm GPUs are visible to the scheduler:

kubectl get pods -n gpu-operator
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

All GPU operator pods running

Then confirm each GPU node is exposing one GPU to the scheduler:

kubectl get nodes -o custom-columns="NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

Each GPU node is showing 1

Object store setup

Create the bucket and credentials:

civo objectstore credential create whisper-creds --region=NYC1
civo objectstore create whisper-data --region=NYC1 --size=500

Get the access key and secret, then configure s3cmd:

civo objectstore show whisper-data --region=NYC1
civo objectstore credential secret --access-key=YOUR_ACCESS_KEY --region=NYC1

Configure s3cmd with the Civo endpoint:

s3cmd --configure

When prompted, enter your access key and secret key, set the endpoint to objectstore.nyc1.civo.com, enable HTTPS, and accept the defaults for everything else. Save the config when done.

Upload your audio files:

s3cmd put episode-001.mp3 episode-002.mp3 episode-003.mp3 \
episode-004.mp3 episode-005.mp3 s3://whisper-data/audio/

The bucket layout should look like this after uploading:

whisper-data/
audio/
episode-001.mp3
...
transcripts/ ← filled by the job
manifest.txt ← generated next

Generate and upload the manifest:

s3cmd ls s3://whisper-data/audio/ | awk '{print $4}' | sed 's|.*/||' > manifest.txt
s3cmd put manifest.txt s3://whisper-data/
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

manifest.txt showing one filename per line

Each line in the manifest maps to one pod index. Line 0 goes to pod 0, line 1 to pod 1, and so on.

Caching the model on a shared disk 

The faster-whisper large-v3 model is roughly 3 GB. Without caching, every pod downloads it on startup. With 10 pods starting simultaneously, that is 30 GB of downloads saturating the network and pushing startup times past 20 minutes.

The fix is to download the model once onto a shared disk and mount it into every pod. Startup drops to under 2 minutes.

Create a PersistentVolumeClaim (a disk that Kubernetes pods can mount). Civo's storage supports ReadWriteOnce, meaning one node writes to it at a time.

model-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: whisper-model-cache
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: civo-volume

Apply:

kubectl apply -f model-pvc.yaml

Now run a one-time job that downloads the model onto the PVC. This pod runs on the CPU node since it only needs internet access, not a GPU.

model-download-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
name: download-whisper-model
spec:
template:
spec:
containers:
- name: downloader
image: python:3.11-slim
command:
- bash
- "-c"
- "pip install faster-whisper==1.2.1 && python -c 'from faster_whisper import WhisperModel; print(\"Downloading model...\"); WhisperModel(\"large-v3\", device=\"cpu\", compute_type=\"float32\", download_root=\"/models\"); print(\"Done.\")'"
volumeMounts:
- name: model-storage
mountPath: /models
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: whisper-model-cache
restartPolicy: Never
backoffLimit: 3

Download:

kubectl apply -f model-download-job.yaml
kubectl logs -f job/download-whisper-model

Wait for "Done" to appear in the logs. The download takes 5 to 10 minutes, depending on network speed. Once it finishes, the model is cached, and every subsequent pod reads it from disk instead of downloading it fresh.

The worker script

Each pod runs a single Python script that follows five steps regardless of which file it is assigned: read its index, fetch the manifest, download its audio file, load the model from the shared disk, and write the JSON transcript back to Object Store.

Part 1: Setup and S3 client

The request_checksum_calculation="when_required" flag is required for Civo Object Store; without it, uploads fail with a checksum mismatch error.

import os
import json
import boto3
from botocore.config import Config
from faster_whisper import WhisperModel
index = int(os.environ["JOB_COMPLETION_INDEX"])
s3 = boto3.client(
"s3",
endpoint_url=os.environ["S3_ENDPOINT"],
aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
config=Config(
signature_version="s3v4",
request_checksum_calculation="when_required"
)
)
bucket = os.environ["S3_BUCKET"]

Part 2: Manifest lookup

Pod 0 gets line 0, pod 1 gets line 1, and so on.

manifest_obj = s3.get_object(Bucket=bucket, Key="manifest.txt")
lines = manifest_obj["Body"].read().decode().strip().split("\n")
filename = lines[index].strip()
print(f"Pod {index}: assigned file '{filename}'")

Part 3: Audio download

Each pod downloads only its assigned file. 

local_audio = f"/tmp/{filename}"
s3.download_file(bucket, f"audio/{filename}", local_audio)
print(f"Pod {index}: downloaded '{filename}'")

Part 4: Model loading

The model loads from /models, the mounted PVC. No internet download happens here.

model = WhisperModel(
"large-v3",
device="cuda",
compute_type="float16",
download_root="/models",
)
print(f"Pod {index}: model loaded from cache")

Part 5: Transcription and upload

The script transcribes, builds a JSON object with timestamps and language metadata, and uploads to Object Store. 

segments, info = model.transcribe(local_audio, beam_size=5)
results = []
for segment in segments:
results.append({
"start": round(segment.start, 2),
"end": round(segment.end, 2),
"text": segment.text.strip(),
})
transcript = {
"filename": filename,
"language": info.language,
"language_probability": round(info.language_probability, 2),
"segments": results,
}
output_key = f"transcripts/{os.path.splitext(filename)[0]}.json"
local_output = f"/tmp/{os.path.splitext(filename)[0]}.json"
with open(local_output, "w") as f:
json.dump(transcript, f, indent=2, ensure_ascii=False)
s3.upload_file(local_output, bucket, output_key)
print(f"Pod {index}: transcript uploaded to '{output_key}'")

This tutorial mounts the script into pods using a Kubernetes ConfigMap. If you want to build your own Docker image instead, that is a good next step for a production setup.

The Indexed Job manifest

A Kubernetes Indexed Job runs N pods simultaneously, where each pod receives a unique number as the JOB_COMPLETION_INDEX environment variable. Pod 0 picks up file 0, pod 1 picks up file 1, and so on. Kubernetes tracks completed indexes and retries any that fail without touching the ones that succeeded.

Three settings control the run. completions sets the total number of files, one per pod index. parallelism controls how many pods run at once; set this to match your GPU node count. completionMode: Indexed tells Kubernetes to assign each pod its unique number.

Before applying for the job, create a Kubernetes Secret (an object that stores sensitive data like API keys) with your Object Store credentials:

kubectl create secret generic objectstore-creds \
--from-literal=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY \
--from-literal=AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY \
--from-literal=S3_ENDPOINT=https://objectstore.nyc1.civo.com \
--from-literal=S3_BUCKET=whisper-data

Also, mount the worker script as a ConfigMap so every pod has access to it:

kubectl create configmap worker-script --from-file=worker.py=worker/worker.py

Here is the full job manifest:

transcribe-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
name: whisper-transcribe
spec:
completions: 5
parallelism: 2
completionMode: Indexed
backoffLimit: 3
template:
spec:
restartPolicy: Never
nodeSelector:
nvidia.com/gpu.present: "true"
containers:
- name: worker
image: nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04
command:
- bash
- -c
- "apt-get update -q && apt-get install -y -q python3 python3-pip && pip3 install faster-whisper==1.2.1 boto3 nvidia-cublas-cu12 nvidia-cudnn-cu12 && python3 /app/worker.py"
envFrom:
- secretRef:
name: objectstore-creds
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: model-cache
mountPath: /models
- name: worker-script
mountPath: /app
volumes:
- name: model-cache
persistentVolumeClaim:
claimName: whisper-model-cache
- name: worker-script
configMap:
name: worker-script

The nodeSelector uses nvidia.com/gpu.present: "true" so pods only land on GPU nodes. Each pod requests exactly one GPU, and the scheduler will not place it unless one is available. backoffLimit: 3 handles retries without affecting completed indexes. The model PVC mounts at /models and the worker script mounts from the ConfigMap at /app, giving every pod everything it needs to run independently. 

Running and watching 

Apply the manifest to start the job:

kubectl apply -f transcribe-job.yaml

Kubernetes creates all five pods immediately, but only schedules two at a time, one per GPU node. Watch them take turns:

kubectl get pods -l job-name=whisper-transcribe -w

You will see two pods move to Running while the rest stay Pending. As each pod finishes, Kubernetes schedules the next one onto the freed GPU.

Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

Two pods run at a time while the rest stay pending

Check the logs of a running pod to confirm the pipeline is working:

kubectl logs whisper-transcribe-0-s9pvz

You should see the four steps printed in order:

Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

Pod 0 transcribes its assigned file and writes the result back

The key line is "model loaded from cache," which confirms the pod read the model from the shared PVC instead of downloading it from the internet.

Once all five indexes are complete, check the job status:

kubectl get job whisper-transcribe
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

All five files transcribed, job complete

The pods exit as soon as they finish, so no pod sits idle. The GPU nodes themselves keep running after the job completes. Scale them down to stop node-level billing:

Get your GPU pool ID:

civo kubernetes node-pool list whisper-batch --region=NYC1

Copy the ID from the output, then scale down:

civo kubernetes node-pool scale whisper-batch \
--node-pool=YOUR_GPU_POOL_ID \
--nodes=0 \
--region=NYC1

Checking results

List the transcripts in Object Store to confirm all five were written:

s3cmd ls s3://whisper-data/transcripts/
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

s3cmd ls showing 5 JSON files

Spot-check one file to see the transcript structure:

s3cmd get s3://whisper-data/transcripts/episode-001.json -

Each JSON file contains the detected language, a confidence score, and a list of timestamped segments with the transcribed text.

Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

JSON output

The JSON above is the full output for one audio file. The language field shows what faster-whisper detected automatically, with language_probability giving the confidence score. The segments array is where the real value is, each entry has a start and end timestamp in seconds alongside the transcribed text. That structure makes the output ready to feed directly into subtitle generators, search indexes, or any downstream NLP pipeline without post-processing.

s3cmd ls --recursive s3://whisper-data/
Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

Five audio files in, five JSON transcripts out

Looking at the bucket as a whole shows the full picture. Five audio files went in, five transcripts came out. 

Now that the pipeline is verified, try it with your own audio. Drop your files into s3://whisper-data/audio/, regenerate the manifest, update completions to match your file count, and run kubectl apply -f transcribe-job.yaml.

The same pipeline handles 5 files or 500 without any changes to the architecture.

The cold-start win

The shared PVC cuts pod startup from 25 minutes to 90 seconds by eliminating the model download entirely. 

Transcribe Your Audio Library at Scale on Civo: Faster-Whisper + Kubernetes Indexed Jobs

Image by Author

For a batch of 100 files on 10 GPU nodes, the no-cache approach burns over four hours on downloads before any transcription happens. With the cache, pods are transcribing within two minutes of starting.

The pattern scales linearly. For 10x the files, add 10x the GPU nodes and bump parallelism to match. Civo bills GPU nodes per second, so you only pay for active transcription time, nothing more.

Cleanup and what's next

Delete the job and its associated resources:

kubectl delete job whisper-transcribe
kubectl delete job download-whisper-model
kubectl delete secret objectstore-creds

Keep the model PVC. It persists across runs, so your next batch starts with a warm cache. When you are done, scale the GPU pool to zero to stop billing: 

Get your GPU pool ID:

civo kubernetes node-pool list whisper-batch --region=NYC1

Copy the ID from the output, then scale down:

civo kubernetes node-pool scale whisper-batch \
--node-pool=YOUR_GPU_POOL_ID \
--nodes=0 \
--region=NYC1

To delete everything:

civo kubernetes delete whisper-batch --region=NYC1
civo objectstore delete whisper-data --region=NYC1

Where to go from here:

  • Set up a CronJob or webhook that fires the Indexed Job automatically when new audio lands in the Object Store
  • Try a smaller model like medium or base for faster transcription, or add speaker diarization with Pyannote
  • Apply the same Indexed Job pattern to other GPU workloads like image generation, document embedding, or video frame extraction

For a production setup, building a custom Docker image instead of the ConfigMap approach is a natural next step. This guide walks through self-hosting a container registry on Civo Kubernetes using Harbor.  

Key takeaways 

Kubernetes Indexed Jobs removes the need for a message queue, a task scheduler, or any custom coordination code. You define how many files you have, how many pods run at once, and Kubernetes handles the rest. Each pod gets a number, grabs its file, does the work, and exits. When the last index completes, the job is done.

The shared PVC is what makes GPU pods viable for batch work at scale. Without it, cold starts dominate your runtime and your bill. A one-time model download onto a shared disk cuts startup from 25 minutes to 90 seconds, and that saving compounds with every pod you add.

Civo's per-second GPU billing aligns directly with how this pipeline behaves. The cluster spins up, processes the entire backlog, and shuts down. You pay for transcription time only, not for idle servers sitting between runs. For a workload that runs in bursts rather than continuously, that billing model makes a meaningful difference.

Finally, the pattern generalizes. The Kubernetes and Object Store logic stays exactly the same whether the worker container is running faster-whisper, an embedding model, a Stable Diffusion pipeline, or any other GPU-heavy batch task. Swap the container, keep the architecture.

Mostafa Ibrahim
Mostafa Ibrahim

Software Engineer at GoCardless

Mostafa Ibrahim is a software engineer and technical writer specializing in developer-focused content for SaaS and AI platforms. He currently works as a Software Engineer at GoCardless, contributing to production systems and scalable payment infrastructure.

Alongside his engineering work, Mostafa has written more than 200 technical articles reaching over 500,000 readers. His content covers topics including Kubernetes deployments, AI infrastructure, authentication systems, and retrieval-augmented generation (RAG) architectures.

View author profile