Transcribe your audio library at scale on Civo

Whisper processes audio reliably, but when the backlog grows to hundreds of podcast episodes, meeting recordings, or voicemails, a single container picking them up one at a time turns a manageable workload into something that takes days to clear.

The obvious fixes fall short in different ways. An always-on GPU server speeds things up but keeps billing running even when nothing is being processed. Spinning up more containers helps with throughput, but each new pod pulls a 3 GB model from the internet on startup. With ten pods pulling simultaneously, the network saturates, and each pod sits idle for twenty minutes before it even starts transcribing.

This tutorial builds a system that solves both problems. A Kubernetes Indexed Job fans work across multiple GPU pods in parallel, where each pod picks up exactly one audio file, transcribes it, and writes the result back to object storage. A shared disk caches the faster-whisper model once, so pods start in under two minutes instead of twenty.

The whole setup runs on Civo Kubernetes with L40S GPU nodes, and Civo Object Store acts as the input and output bus. When the job finishes, the GPU nodes go idle, and you are no longer billed for compute.

What you’ll build

A batch transcription pipeline on Civo Kubernetes that takes a folder of audio files and returns structured JSON transcripts, with every file processed in parallel across GPU pods.

The pipeline works in five steps:

Audio files are uploaded to Civo Object Store alongside a manifest listing each one
A Kubernetes Indexed Job spins up one pod per file, each assigned a number that maps directly to one audio file
Every pod reads the Whisper model from a shared disk instead of downloading it fresh, keeping startup under 2 minutes
Each pod transcribes its file on a GPU and writes the JSON transcript back to Object Store
When all pods finish the job, the job terminates, and GPU nodes are removed

The core components:

Civo Kubernetes cluster with separate CPU and GPU node pools
Kubernetes Indexed Job that distributes files across pods and shuts down when done
Shared PVC (Persistent Volume Claim) caching the faster-whisper model once for the entire batch
faster-whisper worker container handling transcription on each GPU
Civo Object Store as both input source and transcript destination

By the end, you have a pipeline that clears an entire audio backlog in one run and costs nothing when idle.

Why Civo for this project?

Civo's per-second GPU billing was built for workloads like this one, short bursts of heavy compute with nothing running in between. The GPU nodes come up when the batch starts, do the work, and get removed when it finishes, so the bill reflects exactly what the pipeline consumed. The CPU node keeps the cluster alive between runs at a fraction of the GPU cost, and node pool separation ensures transcription pods never land on it.

Civo Object Store removes the need for a message queue entirely. Audio files sit in the bucket, pods pull from it, and transcripts go back out with no extra layer or service in between. Because the cluster, GPU nodes, and Object Store all live in one environment, there is nothing to stitch together before you can run your first job.

What you need

Before starting, make sure you have the following in place.

A Civo account with GPU node access
Civo CLI, kubectl, and s3cmd installed locally
Basic familiarity with Kubernetes concepts, specifically pods, jobs, and storage. If you are new to Civo Kubernetes, this intro is a good starting point.
A small set of short MP3 or WAV audio files for testing
No Docker setup required, a prebuilt container image is provided for the worker

Before diving in, create the following folder structure on your local machine:

1civo-batch-transcription/
2  manifests/
3    model-pvc.yaml
4    model-download-job.yaml
5    transcribe-job.yaml
6  worker/
7    worker.py
8  data/
9    audio/
10      episode-001.mp3
11      ...

The manifests/ folder holds all Kubernetes YAML files, worker/ contains the transcription script, and data/audio/ is where your sample audio files live before uploading to the Object Store.

1Tested with:
2- Civo CLI v1.5.2
3- Kubernetes v1.34.2
4- Helm v3.x
5- Python 3.11
6- faster-whisper v1.2.1

Everything else gets created as part of the tutorial.

How it fits together

The system is built as a simple flow from storage to compute and back.

Audio files are uploaded into a bucket in Civo Object Store alongside a manifest file that lists every file to be processed. When the Indexed Job starts, Kubernetes creates multiple GPU pods at the same time, each receiving a unique index number. That index maps to a specific line in the manifest, which tells the pod exactly which audio file to pick up.

Each pod then pulls its assigned file from the Object Store, loads the Whisper model from the shared disk, transcribes the audio on a GPU, and writes the resulting JSON transcript back to the same bucket. Once a pod finishes its file, it exits, and when every index is complete, the job terminates.

Transcribe your audio library at scale on Civo: Faster-whisper + Kubernetes indexed jobs

Every component has a single responsibility, and nothing runs longer than it needs to.

Cluster sizing

This tutorial uses two node pools with a clear separation of responsibilities:

Node Pool	Count	Size	Purpose
CPU	1	g4s.kube.small (2 vCPU / 4 GB)	System pods, model download
GPU	2	an.g1.l40s.kube.x1 (L40S 40GB)	Transcription workers

A quick distinction worth making before moving on: nodes are the actual machines in your cluster, while pods are the workloads that run on them. You can have more pods than nodes. If you submit a job with ten pods but only have two GPU nodes, Kubernetes runs two pods at a time and queues the rest. Each pod waits for a node to free up before it starts, which is exactly the take-turns behavior you will see during the run.

Two GPU nodes are enough to demonstrate parallelism without a large bill. With two nodes running in parallel and five files to process, you will see pods completing and new ones scheduling in real time. For a larger batch, you scale the GPU pool and adjust the job's parallelism setting to match.

Creating the Cluster

Authenticate the Civo CLI with your API key, found under your profile in the Civo dashboard.

1civo apikey save my-key YOUR_API_KEY_HERE
2civo apikey use my-key
3civo regions ls

Create the cluster with a single CPU node pool, then save the kubeconfig:

1civo kubernetes create whisper-batch \
2  --size=g4s.kube.small \
3  --nodes=1 \
4  --region=NYC1 \
5  --wait
6
7civo kubernetes config whisper-batch --save --switch

Add the GPU node pool:

1civo kubernetes node-pool create whisper-batch \
2  --size=an.g1.l40s.kube.x1 \
3  --nodes=2 \
4  --region=NYC1

Verify all three nodes are ready:

1kubectl get nodes

Install the NVIDIA GPU Operator using the exact flags below. Civo's GPU images ship with the container toolkit pre-installed, so toolkit.enabled is set to false.

1helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
2helm repo update
3
4helm upgrade --install gpu-operator \
5  -n gpu-operator --create-namespace \
6  nvidia/gpu-operator \
7  --set driver.enabled=true \
8  --set toolkit.enabled=false \
9  --set devicePlugin.enabled=true \
10  --set gfd.enabled=true \
11  --set operator.defaultRuntime=containerd \
12  --set validator.cuda.runtimeClassName=nvidia

The operator takes 3 to 5 minutes to initialize. Wait until all pods are running, then confirm GPUs are visible to the scheduler:

1kubectl get pods -n gpu-operator

Then confirm each GPU node is exposing one GPU to the scheduler:

1kubectl get nodes -o custom-columns="NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

Object store setup

Create the bucket and credentials:

1civo objectstore credential create whisper-creds --region=NYC1
2civo objectstore create whisper-data --region=NYC1 --size=500

Get the access key and secret, then configure s3cmd:

1civo objectstore show whisper-data --region=NYC1
2civo objectstore credential secret --access-key=YOUR_ACCESS_KEY --region=NYC1

Configure s3cmd with the Civo endpoint:

1s3cmd --configure

When prompted, enter your access key and secret key, set the endpoint to objectstore.nyc1.civo.com, enable HTTPS, and accept the defaults for everything else. Save the config when done.

Upload your audio files:

1s3cmd put episode-001.mp3 episode-002.mp3 episode-003.mp3 \
2  episode-004.mp3 episode-005.mp3 s3://whisper-data/audio/

The bucket layout should look like this after uploading:

1whisper-data/
2  audio/
3    episode-001.mp3
4    ...
5  transcripts/       ← filled by the job
6  manifest.txt       ← generated next

Generate and upload the manifest:

1s3cmd ls s3://whisper-data/audio/ | awk '{print $4}' | sed 's|.*/||' > manifest.txt
2s3cmd put manifest.txt s3://whisper-data/

Each line in the manifest maps to one pod index. Line 0 goes to pod 0, line 1 to pod 1, and so on.

Caching the model on a shared disk

The faster-whisper large-v3 model is roughly 3 GB. Without caching, every pod downloads it on startup. With 10 pods starting simultaneously, that is 30 GB of downloads saturating the network and pushing startup times past 20 minutes.

The fix is to download the model once onto a shared disk and mount it into every pod. Startup drops to under 2 minutes.

Create a PersistentVolumeClaim (a disk that Kubernetes pods can mount). Civo's storage supports ReadWriteOnce, meaning one node writes to it at a time.

model-pvc.yaml

1apiVersion: v1
2kind: PersistentVolumeClaim
3metadata:
4  name: whisper-model-cache
5spec:
6  accessModes:
7    - ReadWriteOnce
8  resources:
9    requests:
10      storage: 10Gi
11  storageClassName: civo-volume

Apply:

1kubectl apply -f model-pvc.yaml

Now run a one-time job that downloads the model onto the PVC. This pod runs on the CPU node since it only needs internet access, not a GPU.

model-download-job.yaml

1apiVersion: batch/v1
2kind: Job
3metadata:
4  name: download-whisper-model
5spec:
6  template:
7    spec:
8      containers:
9        - name: downloader
10          image: python:3.11-slim
11          command:
12            - bash
13            - "-c"
14            - "pip install faster-whisper==1.2.1 && python -c 'from faster_whisper import WhisperModel; print(\"Downloading model...\"); WhisperModel(\"large-v3\", device=\"cpu\", compute_type=\"float32\", download_root=\"/models\"); print(\"Done.\")'"
15          volumeMounts:
16            - name: model-storage
17              mountPath: /models
18      volumes:
19        - name: model-storage
20          persistentVolumeClaim:
21            claimName: whisper-model-cache
22      restartPolicy: Never
23  backoffLimit: 3

Download:

1kubectl apply -f model-download-job.yaml
2kubectl logs -f job/download-whisper-model

Wait for "Done" to appear in the logs. The download takes 5 to 10 minutes, depending on network speed. Once it finishes, the model is cached, and every subsequent pod reads it from disk instead of downloading it fresh.

The worker script

Each pod runs a single Python script that follows five steps regardless of which file it is assigned: read its index, fetch the manifest, download its audio file, load the model from the shared disk, and write the JSON transcript back to Object Store.

Part 1: Setup and S3 client

The request_checksum_calculation="when_required" flag is required for Civo Object Store; without it, uploads fail with a checksum mismatch error.

1import os
2import json
3import boto3
4from botocore.config import Config
5from faster_whisper import WhisperModel
6
7index = int(os.environ["JOB_COMPLETION_INDEX"])
8
9s3 = boto3.client(
10    "s3",
11    endpoint_url=os.environ["S3_ENDPOINT"],
12    aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
13    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
14    config=Config(
15        signature_version="s3v4",
16        request_checksum_calculation="when_required"
17    )
18)
19
20bucket = os.environ["S3_BUCKET"]

Part 2: Manifest lookup

Pod 0 gets line 0, pod 1 gets line 1, and so on.

1manifest_obj = s3.get_object(Bucket=bucket, Key="manifest.txt")
2lines = manifest_obj["Body"].read().decode().strip().split("\n")
3filename = lines[index].strip()
4print(f"Pod {index}: assigned file '{filename}'")

Part 3: Audio download

Each pod downloads only its assigned file.

1local_audio = f"/tmp/{filename}"
2s3.download_file(bucket, f"audio/{filename}", local_audio)
3print(f"Pod {index}: downloaded '{filename}'")

Part 4: Model loading

The model loads from /models, the mounted PVC. No internet download happens here.

1model = WhisperModel(
2    "large-v3",
3    device="cuda",
4    compute_type="float16",
5    download_root="/models",
6)
7print(f"Pod {index}: model loaded from cache")

Part 5: Transcription and upload

The script transcribes, builds a JSON object with timestamps and language metadata, and uploads to Object Store.

1segments, info = model.transcribe(local_audio, beam_size=5)
2results = []
3for segment in segments:
4    results.append({
5        "start": round(segment.start, 2),
6        "end": round(segment.end, 2),
7        "text": segment.text.strip(),
8    })
9
10transcript = {
11    "filename": filename,
12    "language": info.language,
13    "language_probability": round(info.language_probability, 2),
14    "segments": results,
15}
16
17output_key = f"transcripts/{os.path.splitext(filename)[0]}.json"
18local_output = f"/tmp/{os.path.splitext(filename)[0]}.json"
19
20with open(local_output, "w") as f:
21    json.dump(transcript, f, indent=2, ensure_ascii=False)
22
23s3.upload_file(local_output, bucket, output_key)
24print(f"Pod {index}: transcript uploaded to '{output_key}'")

This tutorial mounts the script into pods using a Kubernetes ConfigMap. If you want to build your own Docker image instead, that is a good next step for a production setup.

The Indexed Job manifest

A Kubernetes Indexed Job runs N pods simultaneously, where each pod receives a unique number as the JOB_COMPLETION_INDEX environment variable. Pod 0 picks up file 0, pod 1 picks up file 1, and so on. Kubernetes tracks completed indexes and retries any that fail without touching the ones that succeeded.

Three settings control the run. completions sets the total number of files, one per pod index. parallelism controls how many pods run at once; set this to match your GPU node count. completionMode: Indexed tells Kubernetes to assign each pod its unique number.

Before applying for the job, create a Kubernetes Secret (an object that stores sensitive data like API keys) with your Object Store credentials:

1kubectl create secret generic objectstore-creds \
2  --from-literal=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY \
3  --from-literal=AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY \
4  --from-literal=S3_ENDPOINT=https://objectstore.nyc1.civo.com \
5  --from-literal=S3_BUCKET=whisper-data

Also, mount the worker script as a ConfigMap so every pod has access to it:

1kubectl create configmap worker-script --from-file=worker.py=worker/worker.py

Here is the full job manifest:

transcribe-job.yaml

1apiVersion: batch/v1
2kind: Job
3metadata:
4  name: whisper-transcribe
5spec:
6  completions: 5
7  parallelism: 2
8  completionMode: Indexed
9  backoffLimit: 3
10  template:
11    spec:
12      restartPolicy: Never
13      nodeSelector:
14        nvidia.com/gpu.present: "true"
15      containers:
16        - name: worker
17          image: nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04
18          command:
19            - bash
20            - -c
21            - "apt-get update -q && apt-get install -y -q python3 python3-pip && pip3 install faster-whisper==1.2.1 boto3 nvidia-cublas-cu12 nvidia-cudnn-cu12 && python3 /app/worker.py"
22          envFrom:
23            - secretRef:
24                name: objectstore-creds
25          resources:
26            requests:
27              nvidia.com/gpu: 1
28            limits:
29              nvidia.com/gpu: 1
30          volumeMounts:
31            - name: model-cache
32              mountPath: /models
33            - name: worker-script
34              mountPath: /app
35      volumes:
36        - name: model-cache
37          persistentVolumeClaim:
38            claimName: whisper-model-cache
39        - name: worker-script
40          configMap:
41            name: worker-script

The nodeSelector uses nvidia.com/gpu.present: "true" so pods only land on GPU nodes. Each pod requests exactly one GPU, and the scheduler will not place it unless one is available. backoffLimit: 3 handles retries without affecting completed indexes. The model PVC mounts at /models and the worker script mounts from the ConfigMap at /app, giving every pod everything it needs to run independently.

Running and watching

Apply the manifest to start the job:

1kubectl apply -f transcribe-job.yaml

Kubernetes creates all five pods immediately, but only schedules two at a time, one per GPU node. Watch them take turns:

1kubectl get pods -l job-name=whisper-transcribe -w

You will see two pods move to Running while the rest stay Pending. As each pod finishes, Kubernetes schedules the next one onto the freed GPU.

Check the logs of a running pod to confirm the pipeline is working:

1kubectl logs whisper-transcribe-0-s9pvz

You should see the four steps printed in order:

The key line is "model loaded from cache," which confirms the pod read the model from the shared PVC instead of downloading it from the internet.

Once all five indexes are complete, check the job status:

1kubectl get job whisper-transcribe

The pods exit as soon as they finish, so no pod sits idle. The GPU nodes themselves keep running after the job completes. Scale them down to stop node-level billing:

Get your GPU pool ID:

1civo kubernetes node-pool list whisper-batch --region=NYC1

Copy the ID from the output, then scale down:

1civo kubernetes node-pool scale whisper-batch \
2  --node-pool=YOUR_GPU_POOL_ID \
3  --nodes=0 \
4  --region=NYC1

Checking results

List the transcripts in Object Store to confirm all five were written:

1s3cmd ls s3://whisper-data/transcripts/

Spot-check one file to see the transcript structure:

1s3cmd get s3://whisper-data/transcripts/episode-001.json -

Each JSON file contains the detected language, a confidence score, and a list of timestamped segments with the transcribed text.

The JSON above is the full output for one audio file. The language field shows what faster-whisper detected automatically, with language_probability giving the confidence score. The segments array is where the real value is, each entry has a start and end timestamp in seconds alongside the transcribed text. That structure makes the output ready to feed directly into subtitle generators, search indexes, or any downstream NLP pipeline without post-processing.

1s3cmd ls --recursive s3://whisper-data/

Looking at the bucket as a whole shows the full picture. Five audio files went in, five transcripts came out.

Now that the pipeline is verified, try it with your own audio. Drop your files into s3://whisper-data/audio/, regenerate the manifest, update completions to match your file count, and run kubectl apply -f transcribe-job.yaml.

The same pipeline handles 5 files or 500 without any changes to the architecture.

The cold-start win

The shared PVC cuts pod startup from 25 minutes to 90 seconds by eliminating the model download entirely.

For a batch of 100 files on 10 GPU nodes, the no-cache approach burns over four hours on downloads before any transcription happens. With the cache, pods are transcribing within two minutes of starting.

The pattern scales linearly. For 10x the files, add 10x the GPU nodes and bump parallelism to match. Civo bills GPU nodes per second, so you only pay for active transcription time, nothing more.

Cleanup and what's next

Delete the job and its associated resources:

1kubectl delete job whisper-transcribe
2kubectl delete job download-whisper-model
3kubectl delete secret objectstore-creds

Keep the model PVC. It persists across runs, so your next batch starts with a warm cache. When you are done, scale the GPU pool to zero to stop billing:

Get your GPU pool ID:

1civo kubernetes node-pool list whisper-batch --region=NYC1

Copy the ID from the output, then scale down:

1civo kubernetes node-pool scale whisper-batch \
2  --node-pool=YOUR_GPU_POOL_ID \
3  --nodes=0 \
4  --region=NYC1

To delete everything:

1civo kubernetes delete whisper-batch --region=NYC1
2civo objectstore delete whisper-data --region=NYC1

Where to go from here:

Set up a CronJob or webhook that fires the Indexed Job automatically when new audio lands in the Object Store
Try a smaller model like medium or base for faster transcription, or add speaker diarization with Pyannote
Apply the same Indexed Job pattern to other GPU workloads like image generation, document embedding, or video frame extraction

For a production setup, building a custom Docker image instead of the ConfigMap approach is a natural next step. This guide walks through self-hosting a container registry on Civo Kubernetes using Harbor.

Key takeaways

Kubernetes Indexed Jobs removes the need for a message queue, a task scheduler, or any custom coordination code. You define how many files you have, how many pods run at once, and Kubernetes handles the rest. Each pod gets a number, grabs its file, does the work, and exits. When the last index completes, the job is done.

The shared PVC is what makes GPU pods viable for batch work at scale. Without it, cold starts dominate your runtime and your bill. A one-time model download onto a shared disk cuts startup from 25 minutes to 90 seconds, and that saving compounds with every pod you add.

Civo's per-second GPU billing aligns directly with how this pipeline behaves. The cluster spins up, processes the entire backlog, and shuts down. You pay for transcription time only, not for idle servers sitting between runs. For a workload that runs in bursts rather than continuously, that billing model makes a meaningful difference.

Finally, the pattern generalizes. The Kubernetes and Object Store logic stays exactly the same whether the worker container is running faster-whisper, an embedding model, a Stable Diffusion pipeline, or any other GPU-heavy batch task. Swap the container, keep the architecture.

Transcribe your audio library at scale on Civo: Faster-whisper + Kubernetes indexed jobs