Transcribe your audio library at scale on Civo: Faster-whisper + Kubernetes indexed jobs
Learn how to build a batch audio transcription pipeline on Civo Kubernetes using faster-whisper and Indexed Jobs. Parallel GPU pods share a cached model to process hundreds of files in one run — and cost nothing when idle.
Written by
Software Engineer at GoCardless
Written by
Software Engineer at GoCardless
Whisper processes audio reliably, but when the backlog grows to hundreds of podcast episodes, meeting recordings, or voicemails, a single container picking them up one at a time turns a manageable workload into something that takes days to clear.
The obvious fixes fall short in different ways. An always-on GPU server speeds things up but keeps billing running even when nothing is being processed. Spinning up more containers helps with throughput, but each new pod pulls a 3 GB model from the internet on startup. With ten pods pulling simultaneously, the network saturates, and each pod sits idle for twenty minutes before it even starts transcribing.
This tutorial builds a system that solves both problems. A Kubernetes Indexed Job fans work across multiple GPU pods in parallel, where each pod picks up exactly one audio file, transcribes it, and writes the result back to object storage. A shared disk caches the faster-whisper model once, so pods start in under two minutes instead of twenty.
The whole setup runs on Civo Kubernetes with L40S GPU nodes, and Civo Object Store acts as the input and output bus. When the job finishes, the GPU nodes go idle, and you are no longer billed for compute.
What you’ll build
A batch transcription pipeline on Civo Kubernetes that takes a folder of audio files and returns structured JSON transcripts, with every file processed in parallel across GPU pods.
The pipeline works in five steps:
- Audio files are uploaded to Civo Object Store alongside a manifest listing each one
- A Kubernetes Indexed Job spins up one pod per file, each assigned a number that maps directly to one audio file
- Every pod reads the Whisper model from a shared disk instead of downloading it fresh, keeping startup under 2 minutes
- Each pod transcribes its file on a GPU and writes the JSON transcript back to Object Store
- When all pods finish the job, the job terminates, and GPU nodes are removed
The core components:
- Civo Kubernetes cluster with separate CPU and GPU node pools
- Kubernetes Indexed Job that distributes files across pods and shuts down when done
- Shared PVC (Persistent Volume Claim) caching the faster-whisper model once for the entire batch
- faster-whisper worker container handling transcription on each GPU
- Civo Object Store as both input source and transcript destination
By the end, you have a pipeline that clears an entire audio backlog in one run and costs nothing when idle.
Why Civo for this project?
Civo's per-second GPU billing was built for workloads like this one, short bursts of heavy compute with nothing running in between. The GPU nodes come up when the batch starts, do the work, and get removed when it finishes, so the bill reflects exactly what the pipeline consumed. The CPU node keeps the cluster alive between runs at a fraction of the GPU cost, and node pool separation ensures transcription pods never land on it.
Civo Object Store removes the need for a message queue entirely. Audio files sit in the bucket, pods pull from it, and transcripts go back out with no extra layer or service in between. Because the cluster, GPU nodes, and Object Store all live in one environment, there is nothing to stitch together before you can run your first job.
What you need
Before starting, make sure you have the following in place.
- A Civo account with GPU node access
- Civo CLI, kubectl, and s3cmd installed locally
- Basic familiarity with Kubernetes concepts, specifically pods, jobs, and storage. If you are new to Civo Kubernetes, this intro is a good starting point.
- A small set of short MP3 or WAV audio files for testing
- No Docker setup required, a prebuilt container image is provided for the worker
Before diving in, create the following folder structure on your local machine:
civo-batch-transcription/manifests/model-pvc.yamlmodel-download-job.yamltranscribe-job.yamlworker/worker.pydata/audio/episode-001.mp3...
The manifests/ folder holds all Kubernetes YAML files, worker/ contains the transcription script, and data/audio/ is where your sample audio files live before uploading to the Object Store.
Tested with:- Civo CLI v1.5.2- Kubernetes v1.34.2- Helm v3.x- Python 3.11- faster-whisper v1.2.1
Everything else gets created as part of the tutorial.
How it fits together
The system is built as a simple flow from storage to compute and back.
Audio files are uploaded into a bucket in Civo Object Store alongside a manifest file that lists every file to be processed. When the Indexed Job starts, Kubernetes creates multiple GPU pods at the same time, each receiving a unique index number. That index maps to a specific line in the manifest, which tells the pod exactly which audio file to pick up.
Each pod then pulls its assigned file from the Object Store, loads the Whisper model from the shared disk, transcribes the audio on a GPU, and writes the resulting JSON transcript back to the same bucket. Once a pod finishes its file, it exits, and when every index is complete, the job terminates.
Image by Author
Every component has a single responsibility, and nothing runs longer than it needs to.
Cluster sizing
This tutorial uses two node pools with a clear separation of responsibilities:
A quick distinction worth making before moving on: nodes are the actual machines in your cluster, while pods are the workloads that run on them. You can have more pods than nodes. If you submit a job with ten pods but only have two GPU nodes, Kubernetes runs two pods at a time and queues the rest. Each pod waits for a node to free up before it starts, which is exactly the take-turns behavior you will see during the run.
Two GPU nodes are enough to demonstrate parallelism without a large bill. With two nodes running in parallel and five files to process, you will see pods completing and new ones scheduling in real time. For a larger batch, you scale the GPU pool and adjust the job's parallelism setting to match.
Creating the Cluster
Authenticate the Civo CLI with your API key, found under your profile in the Civo dashboard.
civo apikey save my-key YOUR_API_KEY_HEREcivo apikey use my-keycivo regions ls
Create the cluster with a single CPU node pool, then save the kubeconfig:
civo kubernetes create whisper-batch \--size=g4s.kube.small \--nodes=1 \--region=NYC1 \--waitcivo kubernetes config whisper-batch --save --switch
Add the GPU node pool:
civo kubernetes node-pool create whisper-batch \--size=an.g1.l40s.kube.x1 \--nodes=2 \--region=NYC1
Verify all three nodes are ready:
kubectl get nodes
Image showing that all three nodes are ready
Install the NVIDIA GPU Operator using the exact flags below. Civo's GPU images ship with the container toolkit pre-installed, so toolkit.enabled is set to false.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidiahelm repo updatehelm upgrade --install gpu-operator \-n gpu-operator --create-namespace \nvidia/gpu-operator \--set driver.enabled=true \--set toolkit.enabled=false \--set devicePlugin.enabled=true \--set gfd.enabled=true \--set operator.defaultRuntime=containerd \--set validator.cuda.runtimeClassName=nvidia
The operator takes 3 to 5 minutes to initialize. Wait until all pods are running, then confirm GPUs are visible to the scheduler:
kubectl get pods -n gpu-operator
All GPU operator pods running
Then confirm each GPU node is exposing one GPU to the scheduler:
kubectl get nodes -o custom-columns="NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
Each GPU node is showing 1
Object store setup
Create the bucket and credentials:
civo objectstore credential create whisper-creds --region=NYC1civo objectstore create whisper-data --region=NYC1 --size=500
Get the access key and secret, then configure s3cmd:
civo objectstore show whisper-data --region=NYC1civo objectstore credential secret --access-key=YOUR_ACCESS_KEY --region=NYC1
Configure s3cmd with the Civo endpoint:
s3cmd --configure
When prompted, enter your access key and secret key, set the endpoint to objectstore.nyc1.civo.com, enable HTTPS, and accept the defaults for everything else. Save the config when done.
Upload your audio files:
s3cmd put episode-001.mp3 episode-002.mp3 episode-003.mp3 \episode-004.mp3 episode-005.mp3 s3://whisper-data/audio/
The bucket layout should look like this after uploading:
whisper-data/audio/episode-001.mp3...transcripts/ ← filled by the jobmanifest.txt ← generated next
Generate and upload the manifest:
s3cmd ls s3://whisper-data/audio/ | awk '{print $4}' | sed 's|.*/||' > manifest.txts3cmd put manifest.txt s3://whisper-data/
manifest.txt showing one filename per line
Each line in the manifest maps to one pod index. Line 0 goes to pod 0, line 1 to pod 1, and so on.
Caching the model on a shared disk
The faster-whisper large-v3 model is roughly 3 GB. Without caching, every pod downloads it on startup. With 10 pods starting simultaneously, that is 30 GB of downloads saturating the network and pushing startup times past 20 minutes.
The fix is to download the model once onto a shared disk and mount it into every pod. Startup drops to under 2 minutes.
Create a PersistentVolumeClaim (a disk that Kubernetes pods can mount). Civo's storage supports ReadWriteOnce, meaning one node writes to it at a time.
model-pvc.yaml
apiVersion: v1kind: PersistentVolumeClaimmetadata:name: whisper-model-cachespec:accessModes:- ReadWriteOnceresources:requests:storage: 10GistorageClassName: civo-volume
Apply:
kubectl apply -f model-pvc.yaml
Now run a one-time job that downloads the model onto the PVC. This pod runs on the CPU node since it only needs internet access, not a GPU.
model-download-job.yaml
apiVersion: batch/v1kind: Jobmetadata:name: download-whisper-modelspec:template:spec:containers:- name: downloaderimage: python:3.11-slimcommand:- bash- "-c"- "pip install faster-whisper==1.2.1 && python -c 'from faster_whisper import WhisperModel; print(\"Downloading model...\"); WhisperModel(\"large-v3\", device=\"cpu\", compute_type=\"float32\", download_root=\"/models\"); print(\"Done.\")'"volumeMounts:- name: model-storagemountPath: /modelsvolumes:- name: model-storagepersistentVolumeClaim:claimName: whisper-model-cacherestartPolicy: NeverbackoffLimit: 3
Download:
kubectl apply -f model-download-job.yamlkubectl logs -f job/download-whisper-model
Wait for "Done" to appear in the logs. The download takes 5 to 10 minutes, depending on network speed. Once it finishes, the model is cached, and every subsequent pod reads it from disk instead of downloading it fresh.
The worker script
Each pod runs a single Python script that follows five steps regardless of which file it is assigned: read its index, fetch the manifest, download its audio file, load the model from the shared disk, and write the JSON transcript back to Object Store.
Part 1: Setup and S3 client
The request_checksum_calculation="when_required" flag is required for Civo Object Store; without it, uploads fail with a checksum mismatch error.
import osimport jsonimport boto3from botocore.config import Configfrom faster_whisper import WhisperModelindex = int(os.environ["JOB_COMPLETION_INDEX"])s3 = boto3.client("s3",endpoint_url=os.environ["S3_ENDPOINT"],aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],config=Config(signature_version="s3v4",request_checksum_calculation="when_required"))bucket = os.environ["S3_BUCKET"]
Part 2: Manifest lookup
Pod 0 gets line 0, pod 1 gets line 1, and so on.
manifest_obj = s3.get_object(Bucket=bucket, Key="manifest.txt")lines = manifest_obj["Body"].read().decode().strip().split("\n")filename = lines[index].strip()print(f"Pod {index}: assigned file '{filename}'")
Part 3: Audio download
Each pod downloads only its assigned file.
local_audio = f"/tmp/{filename}"s3.download_file(bucket, f"audio/{filename}", local_audio)print(f"Pod {index}: downloaded '{filename}'")
Part 4: Model loading
The model loads from /models, the mounted PVC. No internet download happens here.
model = WhisperModel("large-v3",device="cuda",compute_type="float16",download_root="/models",)print(f"Pod {index}: model loaded from cache")
Part 5: Transcription and upload
The script transcribes, builds a JSON object with timestamps and language metadata, and uploads to Object Store.
segments, info = model.transcribe(local_audio, beam_size=5)results = []for segment in segments:results.append({"start": round(segment.start, 2),"end": round(segment.end, 2),"text": segment.text.strip(),})transcript = {"filename": filename,"language": info.language,"language_probability": round(info.language_probability, 2),"segments": results,}output_key = f"transcripts/{os.path.splitext(filename)[0]}.json"local_output = f"/tmp/{os.path.splitext(filename)[0]}.json"with open(local_output, "w") as f:json.dump(transcript, f, indent=2, ensure_ascii=False)s3.upload_file(local_output, bucket, output_key)print(f"Pod {index}: transcript uploaded to '{output_key}'")
This tutorial mounts the script into pods using a Kubernetes ConfigMap. If you want to build your own Docker image instead, that is a good next step for a production setup.
The Indexed Job manifest
A Kubernetes Indexed Job runs N pods simultaneously, where each pod receives a unique number as the JOB_COMPLETION_INDEX environment variable. Pod 0 picks up file 0, pod 1 picks up file 1, and so on. Kubernetes tracks completed indexes and retries any that fail without touching the ones that succeeded.
Three settings control the run. completions sets the total number of files, one per pod index. parallelism controls how many pods run at once; set this to match your GPU node count. completionMode: Indexed tells Kubernetes to assign each pod its unique number.
Before applying for the job, create a Kubernetes Secret (an object that stores sensitive data like API keys) with your Object Store credentials:
kubectl create secret generic objectstore-creds \--from-literal=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY \--from-literal=AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY \--from-literal=S3_ENDPOINT=https://objectstore.nyc1.civo.com \--from-literal=S3_BUCKET=whisper-data
Also, mount the worker script as a ConfigMap so every pod has access to it:
kubectl create configmap worker-script --from-file=worker.py=worker/worker.py
Here is the full job manifest:
transcribe-job.yaml
apiVersion: batch/v1kind: Jobmetadata:name: whisper-transcribespec:completions: 5parallelism: 2completionMode: IndexedbackoffLimit: 3template:spec:restartPolicy: NevernodeSelector:nvidia.com/gpu.present: "true"containers:- name: workerimage: nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04command:- bash- -c- "apt-get update -q && apt-get install -y -q python3 python3-pip && pip3 install faster-whisper==1.2.1 boto3 nvidia-cublas-cu12 nvidia-cudnn-cu12 && python3 /app/worker.py"envFrom:- secretRef:name: objectstore-credsresources:requests:nvidia.com/gpu: 1limits:nvidia.com/gpu: 1volumeMounts:- name: model-cachemountPath: /models- name: worker-scriptmountPath: /appvolumes:- name: model-cachepersistentVolumeClaim:claimName: whisper-model-cache- name: worker-scriptconfigMap:name: worker-script
The nodeSelector uses nvidia.com/gpu.present: "true" so pods only land on GPU nodes. Each pod requests exactly one GPU, and the scheduler will not place it unless one is available. backoffLimit: 3 handles retries without affecting completed indexes. The model PVC mounts at /models and the worker script mounts from the ConfigMap at /app, giving every pod everything it needs to run independently.
Running and watching
Apply the manifest to start the job:
kubectl apply -f transcribe-job.yaml
Kubernetes creates all five pods immediately, but only schedules two at a time, one per GPU node. Watch them take turns:
kubectl get pods -l job-name=whisper-transcribe -w
You will see two pods move to Running while the rest stay Pending. As each pod finishes, Kubernetes schedules the next one onto the freed GPU.
Two pods run at a time while the rest stay pending
Check the logs of a running pod to confirm the pipeline is working:
kubectl logs whisper-transcribe-0-s9pvz
You should see the four steps printed in order:
Pod 0 transcribes its assigned file and writes the result back
The key line is "model loaded from cache," which confirms the pod read the model from the shared PVC instead of downloading it from the internet.
Once all five indexes are complete, check the job status:
kubectl get job whisper-transcribe
All five files transcribed, job complete
The pods exit as soon as they finish, so no pod sits idle. The GPU nodes themselves keep running after the job completes. Scale them down to stop node-level billing:
Get your GPU pool ID:
civo kubernetes node-pool list whisper-batch --region=NYC1
Copy the ID from the output, then scale down:
civo kubernetes node-pool scale whisper-batch \--node-pool=YOUR_GPU_POOL_ID \--nodes=0 \--region=NYC1
Checking results
List the transcripts in Object Store to confirm all five were written:
s3cmd ls s3://whisper-data/transcripts/
s3cmd ls showing 5 JSON files
Spot-check one file to see the transcript structure:
s3cmd get s3://whisper-data/transcripts/episode-001.json -
Each JSON file contains the detected language, a confidence score, and a list of timestamped segments with the transcribed text.
JSON output
The JSON above is the full output for one audio file. The language field shows what faster-whisper detected automatically, with language_probability giving the confidence score. The segments array is where the real value is, each entry has a start and end timestamp in seconds alongside the transcribed text. That structure makes the output ready to feed directly into subtitle generators, search indexes, or any downstream NLP pipeline without post-processing.
s3cmd ls --recursive s3://whisper-data/
Five audio files in, five JSON transcripts out
Looking at the bucket as a whole shows the full picture. Five audio files went in, five transcripts came out.
Now that the pipeline is verified, try it with your own audio. Drop your files into s3://whisper-data/audio/, regenerate the manifest, update completions to match your file count, and run kubectl apply -f transcribe-job.yaml.
The same pipeline handles 5 files or 500 without any changes to the architecture.
The cold-start win
The shared PVC cuts pod startup from 25 minutes to 90 seconds by eliminating the model download entirely.
Image by Author
For a batch of 100 files on 10 GPU nodes, the no-cache approach burns over four hours on downloads before any transcription happens. With the cache, pods are transcribing within two minutes of starting.
The pattern scales linearly. For 10x the files, add 10x the GPU nodes and bump parallelism to match. Civo bills GPU nodes per second, so you only pay for active transcription time, nothing more.
Cleanup and what's next
Delete the job and its associated resources:
kubectl delete job whisper-transcribekubectl delete job download-whisper-modelkubectl delete secret objectstore-creds
Keep the model PVC. It persists across runs, so your next batch starts with a warm cache. When you are done, scale the GPU pool to zero to stop billing:
Get your GPU pool ID:
civo kubernetes node-pool list whisper-batch --region=NYC1
Copy the ID from the output, then scale down:
civo kubernetes node-pool scale whisper-batch \--node-pool=YOUR_GPU_POOL_ID \--nodes=0 \--region=NYC1
To delete everything:
civo kubernetes delete whisper-batch --region=NYC1civo objectstore delete whisper-data --region=NYC1
Where to go from here:
- Set up a CronJob or webhook that fires the Indexed Job automatically when new audio lands in the Object Store
- Try a smaller model like
mediumorbasefor faster transcription, or add speaker diarization with Pyannote - Apply the same Indexed Job pattern to other GPU workloads like image generation, document embedding, or video frame extraction
For a production setup, building a custom Docker image instead of the ConfigMap approach is a natural next step. This guide walks through self-hosting a container registry on Civo Kubernetes using Harbor.
Key takeaways
Kubernetes Indexed Jobs removes the need for a message queue, a task scheduler, or any custom coordination code. You define how many files you have, how many pods run at once, and Kubernetes handles the rest. Each pod gets a number, grabs its file, does the work, and exits. When the last index completes, the job is done.
The shared PVC is what makes GPU pods viable for batch work at scale. Without it, cold starts dominate your runtime and your bill. A one-time model download onto a shared disk cuts startup from 25 minutes to 90 seconds, and that saving compounds with every pod you add.
Civo's per-second GPU billing aligns directly with how this pipeline behaves. The cluster spins up, processes the entire backlog, and shuts down. You pay for transcription time only, not for idle servers sitting between runs. For a workload that runs in bursts rather than continuously, that billing model makes a meaningful difference.
Finally, the pattern generalizes. The Kubernetes and Object Store logic stays exactly the same whether the worker container is running faster-whisper, an embedding model, a Stable Diffusion pipeline, or any other GPU-heavy batch task. Swap the container, keep the architecture.

Software Engineer at GoCardless
Mostafa Ibrahim is a software engineer and technical writer specializing in developer-focused content for SaaS and AI platforms. He currently works as a Software Engineer at GoCardless, contributing to production systems and scalable payment infrastructure.
Alongside his engineering work, Mostafa has written more than 200 technical articles reaching over 500,000 readers. His content covers topics including Kubernetes deployments, AI infrastructure, authentication systems, and retrieval-augmented generation (RAG) architectures.
Share this article
Further Reading
5 May 2026
Automated code review agent on Civo GPUs
12 March 2025
Lightweight text-to-SQL assistant on Civo GPUs
28 January 2025