Automating database backups with Kubernetes CronJobs

Learn to automate database backups with Kubernetes CronJobs and Civo's object storage. This tutorial guides you through creating, backing up, and storing databases effectively.

5 minutes reading time

Written by

Jubril Oyetunji
Jubril Oyetunji

Technical Writer @ Civo

For many years, system administrators have used Cron to automate recurring tasks on Unix systems, in comparison CronJobs in Kubernetes is the new kid on the block, reaching general availability in April of 2021, CronJobs provides a resource that allows users to schedule recurring tasks, each CronJob is similar to a crontab (cron table) file on a Unix system.

In this tutorial, we’ll look at how to automate database backups using Kubernetes CronJobs, to store the backups, we will be leveraging Civo’s object storage.

Prerequisites

This article assumes some working knowledge of Kubernetes. In addition, you would need the following installed:

Creating a Database

We’ll begin by creating a database using the Civo CLI:

civo db create backup-labs -m PostgreSQL

This creates a one-node database cluster. Using the -m flag we supply the type of database we want to create. At the time of writing, Civo supports PostgreSQL and MySQL.

From this, you should see the following output:

Creating a Database

Seeding the Database

Before we create a backup, we’ll need a database. In your terminal, run the following command to create one:

psql -U civo -h [HOST_IP] -W -c 'create database customers;'

Next, let’s create a schema and populate the database with some mock data. In a directory of your choice, create a file named schema.sql. Add the following code to define the Customers table:

Creating a table

CREATE TABLE Customers (
ID serial ,
Name varchar(50) NOT NULL,
Phone varchar(15) NOT NULL,
Address varchar(50),
Birthday date NOT NULL,
CustomerEmail varchar(50) NOT NULL,
PRIMARY KEY (ID)
);

Apply the schema changes

psql -U civo -d customers -h 74.220.17.133 -W -f schema.sql

Adding mock data

Begin creating a new file called data.sql within your editor of choice, and add the following code:

INSERT INTO Customers (Name, Phone, Address, Birthday, CustomerEmail)
SELECT
md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) as Name,
substring(md5(random()::text || clock_timestamp()::text)::uuid::varchar(50), 1, 15) as Phone,
md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) as Address,
current_date - interval '18 years' - random() * interval '50 years' as Birthday,
md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) || '@example.com' as CustomerEmail
FROM generate_series(1, 100); -- Adjust the number of rows as needed

Apply the schema changes

psql -U civo -d customers -h <YOUR-DATABASE-IP> -W -f data.sql

You can verify the mock data was indeed generated by running the following command:

psql -U civo -d customers -h 74.220.17.133 -W -c 'select * from customers;'

Output should be similar to:

Apply the schema changes

Creating a backup script

With a database created, we can shift our attention towards the backups. For this demonstration, we will be using a bash script to perform the backup operations. Create file named backup.sh and add the following code:

#!/bin/bash
DB_HOST=$DB_HOST
DB_NAME=$DB_NAME
S3_BUCKET=$S3_BUCKET
BACKUP_PREFIX=cronjob
# Create a timestamped backup filename
BACKUP_FILENAME="${BACKUP_PREFIX}_$(date +%Y%m%d_%H%M%S).sql"
# Create the database backup
PGPASSWORD="$DB_PASSWORD" pg_dump -U civo -h $DB_HOST $DB_NAME > ./$BACKUP_FILENAME
# configure aws cli
aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID
aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
aws configure set default.region LON1
# Upload the backup to S3
aws --endpoint-url <https://objectstore.lon1.civo.com> s3 cp $BACKUP_FILENAME s3://$S3_BUCKET
# Cleanup (optional)
rm $BACKUP_FILENAME

Cleaning up files already uploaded is essential to avoid filling up the container’s file system with outdated backups.

The script starts by defining several environment variables that will be used later:

  • DB_HOST: The hostname or IP address of the PostgreSQL database server
  • DB_NAME: The name of the PostgreSQL database to back up
  • S3_BUCKET: The name of the bucket to upload backups to
  • BACKUP_PREFIX: A prefix that will be added to backup filenames

It then constructs a backup filename using BACKUP_PREFIX, the current date/time, and a .sql extension. This ensures each backup has a unique name.

The pg_dump command creates a compressed backup file in custom format. It connects to the DB server using the configured credentials and database name and writes the output to the backup filename generated earlier.

As we have used object store, we created resides in the LON1 region on Civo. The endpoint URL is https://objectstore.lon1.civo.com.

The AWS CLI is configured using the access key and secret access key environment variables. This allows uploading the backup file to S3.

Civo's object storage is S3 compatible, which means it can be accessed and managed using the same tools and APIs as Amazon S3. Therefore, we can utilize the AWS CLI, a command-line interface tool for interacting with AWS services, to upload backups to Civo's object storage.

Finally, the backup is uploaded to the specified S3 bucket and then deleted locally. The upload location in S3 will be s3://$S3BUCKET/$BACKUPFILENAME.

Containerizing the backup

Next up, we need to create a container image we can deploy to our Kubernetes cluster. Create a file named Dockerfile and add the following directives:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \\
curl \\
openssl \\
postgresql-client \\
python3-pip \\
libsasl2-modules \\
libssl-dev \\
postgresql-client-common \\
libpq-dev
RUN pip3 install awscli
RUN mkdir /scripts
COPY backup.sh /scripts
WORKDIR /scripts
RUN chmod +x backup.sh
ENTRYPOINT [ "./backup.sh" ]

Next, we need to build and push the image to a container registry. In this demo, we will be using ttl.sh, an ephemeral container registry that doesn’t require authentication to use, this makes it easy to use in demos such as these. In production, you’d probably want to use an internal registry or something like DockerHub to store your images.

Build and push the image

export IMAGE_NAME=k8s-db-backup
docker build --push -t ttl.sh/${IMAGE_NAME}:1h .

Creating an object store

Before we begin scheduling backups, the last resource we need to provision is an object store. This can be any S3-compatible storage, for this demonstration, I would be using Civo’s object storage. To create an object store using the CLI, run the following commands:

Generate object store credentials

civo objectstore credentials create k8s-backup

Create the object store

civo objectstore create –region LON1 k8s-db-backup --owner-access-key k8s-backup --wait`

Finally, you’d need to obtain your object store credentials. To do this using the CLI, run:

civo objectstore credential secret --access-key=[your access key]

Scheduling backups

With all the moving parts in place, we can finally schedule a backup using the CronJob resource. Create a file named backup.yaml and follow along with the code below:

apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: db-backup
image: ttl.sh/k8s-db-backup:1h
env:
- name: DB_HOST
value: [YOUR DB IP ADDRESS]
- name: DB_NAME
value: "customers"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-password
key: password
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: civo-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: civo-credentials
key: secret-access-key
- name: S3_BUCKET
value: "k8s-db-backup"
restartPolicy: OnFailure

In the manifest above, we created a new CronJob resource named db-backup. The job also specifies the docker image containing the backup script we created earlier and configures the appropriate environment variables.

Before we can apply this manifest, we need to supply the secrets. To do this, create a file named secrets.yaml and add the following code:

apiVersion: v1
kind: Secret
metadata:
name: db-password
type: Opaque
data:
password: <BASE64_ENCODED_DB_PASSWORD>
---
apiVersion: v1
kind: Secret
metadata:
name: civo-credentials
type: Opaque
data:
access-key-id: <BASE64_ENCODED_AWS_ACCESS_KEY_ID>
secret-access-key: <BASE64_ENCODED_AWS_SECRET_ACCESS_KEY>

💡 To encode your credentials run echo -n "credential" | base64

Apply the manifests:

# create the secret
kubectl apply -f secrets.yaml
# create the cronJob
kubectl apply -f backup.yaml

In five minutes, the CronJob should kick in, and Kubernetes will spawn a new pod using the image we provided to perform the backup. To verify the pods get created, run the following:

kubectl get pods

Viewing backups

To view the backup that has been created, head over to your Civo dashboard. Click on the object store tab and select the bucket you created. You should see the following:

Viewing Backups

Summary

Whether you run your database within or outside Kubernetes, backups will always remain an essential part of your disaster recovery plan. In this tutorial, we covered one of many ways to backup your database using CronJobs.

Jubril Oyetunji
Jubril Oyetunji

Technical Writer @ Civo

Jubril Oyetunji is a DevOps engineer and technical writer with a strong focus on cloud-native technologies and open-source tools. His work centers on creating practical tutorials that help developers better understand platforms such as Kubernetes, NGINX, Rust, and Go.

As a contract technical writer, Jubril authored an extensive library of technical guides covering cloud-native infrastructure and modern development workflows. Many of his tutorials achieved strong search rankings, helping developers around the world learn and adopt emerging technologies.

View author profile