In November 2015, Kubernetes 1.1 brought about a new feature called the horizontal pod autoscaler. Designed to help users scale out their workloads more dynamically based on CPU and memory usage.
Fast forward to Kubernetes 1.8, and the vertical pod autoscaler is introduced as a way to dynamically resize the CPU and memory allocated to existing pods. Both these features saw mass adoption within the Kubernetes community because they solved a problem modern applications faced, scaling up or out as load increased.
Not all applications are created equal, and scaling via CPU and memory usage isn't always the answer, as modern applications are decoupled; an increase in load on one of your dependencies could signal the need to scale to handle the increase in load.
In this tutorial, we will take a look at how to use KEDA to scale workloads in an event-driven manner.
What is KEDA?
As part of a joint effort between Microsoft and Red Hat in 2019, Kubernetes event-driven auto scaling, or KEDA for short, was born. It was initially geared toward better supporting Azure functions on OpenShift, but being open source, the community quickly expanded the use case far beyond its original scope.
KEDA is an open-source project under the Cloud Native Computing Foundation (CNCF) that helps scale workloads based on external events or custom metrics. This is useful for responding to load on external systems, such as a cache that your application depends on.
KEDA vs. HPA vs. VPA
With an understanding of how KEDA operates, you might be wondering how it compares or differs from the first-party auto-scalers. Here is a quick table that details the differences between the two:
| Feature | HPA (Horizontal Pod Autoscaler) | VPA (Vertical Pod Autoscaler) | KEDA |
|---|---|---|---|
| Scaling Method | Scales out (increases pod replicas) | Scales up (increases CPU/memory per pod) | Scales out or up based on external events |
| Scaling Triggers | CPU and memory utilization | CPU and memory resource requirements | External metrics and events (queues, databases, HTTP requests, etc.) |
| Kubernetes Version | Available since v1.1 (2015) | Available since v1.8 (2017) | (2017)Third-party add-on (2019) |
| Use Case | Traditional workloads with predictable load patterns | Workloads needing resource optimization | Event-driven and microservices architectures |
| Scaling Scope | Application-level scaling | Resource-level optimization | Event-driven scaling across distributed systems |
| Metrics Source | Kubernetes metrics server | Kubernetes metrics server | Metrics server, External systems (Redis, RabbitMQ, Kafka, cloud services) |
How does KEDA work?
KEDA differs from first-party auto scalers (HPA and VPA) by introducing scalers, components that connect to external event sources and retrieve metrics. Scalers support various systems, such as message queues (RabbitMQ, Apache Kafka), databases (Redis, PostgreSQL), and custom HTTP endpoints. Each scaler is designed to understand the specific protocol and metrics format of its target system.
KEDA's scaling engine uses the metrics retrieved by scalers to evaluate whether a workload needs to be scaled. This is configured through KEDA's primary custom resource definition (CRD), the ScaledObject.

When you create a ScaledObject, KEDA automatically generates and manages an HPA resource behind the scenes.
Prerequisites
This portion of the tutorial assumes some working knowledge of Kubernetes. Additionally, you will need the following installed in order to follow along:
- Helm (used to install KEDA)
- Kubernetes cluster
- kubectl (installed and configured to interact with the cluster)
- A Civo account
- Civo CLI (optional)
Install Metrics Server
The metrics server remains a crucial component of using KEDA, as it is used to make the underlying scaling of pods through an HPA or VPA.
If you do not have a metrics server deployed in your cluster already, run the following command to install:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
This will install the most recent version of the metrics server, which is compatible with Kubernetes version 1.91+
Deploy KEDA
With the metrics server installed, you can deploy KEDA using Helm:
Add the chart:
helm repo add kedacore https://kedacore.github.io/charts
Update your local repository:
helm repo update
Install KEDA:
helm install keda kedacore/keda --namespace keda --create-namespace
Output is similar to:

Scaling a time-based application
To kick things off, let's discuss time-based scaling. This occurs when you have predictable spikes in traffic, such as a lunch rush for a food delivery app, or events that happen at specific times, like a Black Friday sale or the release of a popular clothing item.
These types of predictable and time-bound events are good candidates for cron-based scaling.
First, let's deploy a simple nginx application:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: cron-scaled-app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: cron-scaled-app
template:
metadata:
labels:
app: cron-scaled-app
spec:
containers:
- name: app
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
EOF
Now for the ScaledObject that enables time-based scaling:
kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cron-scaledobject
namespace: default
spec:
scaleTargetRef:
name: cron-scaled-app
minReplicaCount: 1
maxReplicaCount: 3
triggers:
# Scale up to 6 replicas every 10 minutes (for testing)
- type: cron
metadata:
timezone: UTC
start: "*/10 * * * *" # Every 10 minutes
end: "2-4/10 * * * *" # End 2-4 minutes later (scale down)
desiredReplicas: "6"
EOF
The ScaledObject uses standard cron syntax for scheduling scaling events. The key component here is spec.scaleTargetRef, which tells KEDA which deployment to scale, in this case, our cron-scaled-app deployment. Every 10 minutes, KEDA will scale the deployment up to 6 replicas, then scale it back down to the minimum of 2 replicas a few minutes later.
To verify that the scaling rule worked, run:
kubectl get pods -w
This will list the pods in the default namespace and watch for new pods. After ten minutes, your output should be similar to the following:

Scaling cache-dependent applications
Cron-based scaling is great, but not all applications have time-based spikes. More often than not, developers use a caching layer like Redis. In this next example, let's take a look at how we can scale a Redis worker based on the length of a queue.
First, we need to deploy Redis:
kubectl apply -f - <<EOF
---
# Redis Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
---
# Redis Service
apiVersion: v1
kind: Service
metadata:
name: redis-service
namespace: default
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
type: ClusterIP
EOF
Next, deploy a worker that will simulate processing jobs from the Redis queue:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-scaled-worker
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: redis-scaled-worker
template:
metadata:
labels:
app: redis-scaled-worker
spec:
containers:
- name: worker
image: python:3.9-slim
command: ["/bin/sh"]
args:
- -c
- |
pip install "redis[hiredis]"
python -c "
import redis, time, random
r = redis.Redis(host='redis-service', port=6379, decode_responses=True)
while True:
try:
job = r.blpop('job-queue', timeout=5)
if job:
print(f'Processing job: {job[1]}')
time.sleep(random.randint(1, 5)) # Simulate work
else:
print('No jobs, waiting...')
except Exception as e:
print(f'Error: {e}')
time.sleep(1)
"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
EOF
Now, we will add a producer to add jobs to the queue:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-job-producer
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: redis-job-producer
template:
metadata:
labels:
app: redis-job-producer
spec:
containers:
- name: producer
image: python:3.9-slim
command: ["/bin/sh"]
args:
- -c
- |
pip install "redis[hiredis]"
python -c "
import redis, time, random, json, os
r = redis.Redis(host='redis-service', port=6379, decode_responses=True)
producer_id = os.environ.get('HOSTNAME', 'producer')
counter = 0
print(f'[{producer_id}] Starting job producer...')
while True:
try:
# Simulate varying load - sometimes burst, sometimes quiet
jobs_to_add = random.randint(1, 10)
for i in range(jobs_to_add):
job_data = {
'id': counter,
'task': f'process_data_{counter}',
'producer': producer_id,
'timestamp': time.time()
}
r.rpush('job-queue', json.dumps(job_data))
counter += 1
current_queue_length = r.llen('job-queue')
print(f'[{producer_id}] Added {jobs_to_add} jobs. Queue length now: {current_queue_length}')
# Variable sleep to create realistic load patterns
sleep_time = random.randint(3, 12)
time.sleep(sleep_time)
except Exception as e:
print(f'[{producer_id}] Error: {e}')
time.sleep(5) # Wait before retrying
"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
env:
- name: REDIS_HOST
value: "redis-service"
- name: REDIS_PORT
value: "6379"
EOF
Finally, a ScaledObject that monitors the Redis queue and scales the worker:
kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-scaledobject
namespace: default
spec:
scaleTargetRef:
name: redis-scaled-worker # This is the deployment to scale
minReplicaCount: 1
maxReplicaCount: 5
triggers:
- type: redis
metadata:
address: redis-service.default.svc.cluster.local:6379
listName: job-queue
listLength: "5"
EOF
This ScaledObject uses the Redis scaler to monitor the job-queue list. When the queue length exceeds 5 items, KEDA will scale up the redis-scaled-worker deployment. The spec.scaleTargetRef points to our worker deployment, and KEDA will automatically add more worker pods to handle the increased queue load, then scale back down when the queue shrinks.
After a few minutes, run:
kubectl get pods
Your output should be similar to:
redis-scaled-worker-5fbc5475b8-f2hnv 1/1 Running 0 7m44s
redis-job-producer-7d5cdfb97b-z5ztb 1/1 Running 0 3m23s
redis-scaled-worker-5fbc5475b8-tmxlv 1/1 Running 0 103s
redis-scaled-worker-5fbc5475b8-nwgrl 1/1 Running 0 88s
redis-scaled-worker-5fbc5475b8-jnvv4 1/1 Running 0 73s
redis-scaled-worker-5fbc5475b8-lpsct 1/1 Running 0 73s
redis-scaled-worker-5fbc5475b8-bxg2b 1/1 Running 0 28s
Pro tip: Debugging
If you are trying to figure out why a ScaledObject or cron isn’t working. A good place to start is to describe the ScaledObject and check events using kubectl:
# Describe the ScaledObject to see its current status
kubectl describe scaledobject redis-scaledobject
# Check events for any scaling-related issues
kubectl get events --sort-by='.lastTimestamp' | grep -i scale
Closing thoughts
KEDA is a fantastic project that allows you to scale workloads based on more dynamic criteria, such as application dependencies or time of day.
While in this tutorial, we covered the cron and Redis scalers, KEDA supports a lot more scalers, such as Postgres and Prometheus. If you are looking to scale your cluster nodes on Civo, check out this section of the docs and happy scaling!