In November 2015, Kubernetes 1.1 brought about a new feature called the horizontal pod autoscaler. Designed to help users scale out their workloads more dynamically based on CPU and memory usage.

Fast forward to Kubernetes 1.8, and the vertical pod autoscaler is introduced as a way to dynamically resize the CPU and memory allocated to existing pods. Both these features saw mass adoption within the Kubernetes community because they solved a problem modern applications faced, scaling up or out as load increased.

Not all applications are created equal, and scaling via CPU and memory usage isn't always the answer, as modern applications are decoupled; an increase in load on one of your dependencies could signal the need to scale to handle the increase in load.

In this tutorial, we will take a look at how to use KEDA to scale workloads in an event-driven manner.

What is KEDA?

As part of a joint effort between Microsoft and Red Hat in 2019, Kubernetes event-driven auto scaling, or KEDA for short, was born. It was initially geared toward better supporting Azure functions on OpenShift, but being open source, the community quickly expanded the use case far beyond its original scope.

KEDA is an open-source project under the Cloud Native Computing Foundation (CNCF) that helps scale workloads based on external events or custom metrics. This is useful for responding to load on external systems, such as a cache that your application depends on.

KEDA vs. HPA vs. VPA

With an understanding of how KEDA operates, you might be wondering how it compares or differs from the first-party auto-scalers. Here is a quick table that details the differences between the two:

Feature HPA (Horizontal Pod Autoscaler) VPA (Vertical Pod Autoscaler) KEDA
Scaling Method Scales out (increases pod replicas) Scales up (increases CPU/memory per pod) Scales out or up based on external events
Scaling Triggers CPU and memory utilization CPU and memory resource requirements External metrics and events (queues, databases, HTTP requests, etc.)
Kubernetes Version Available since v1.1 (2015) Available since v1.8 (2017) (2017)Third-party add-on (2019)
Use Case Traditional workloads with predictable load patterns Workloads needing resource optimization Event-driven and microservices architectures
Scaling Scope Application-level scaling Resource-level optimization Event-driven scaling across distributed systems
Metrics Source Kubernetes metrics server Kubernetes metrics server Metrics server, External systems (Redis, RabbitMQ, Kafka, cloud services)
A quick tip to remember the differences between HPA, VPA, and KEDA is: HPA scales your application out based on CPU and memory, VPA scales up your application using the same metrics, and KEDA can trigger vertical and horizontal scaling based on external event sources.

How does KEDA work?

KEDA differs from first-party auto scalers (HPA and VPA) by introducing scalers, components that connect to external event sources and retrieve metrics. Scalers support various systems, such as message queues (RabbitMQ, Apache Kafka), databases (Redis, PostgreSQL), and custom HTTP endpoints. Each scaler is designed to understand the specific protocol and metrics format of its target system.

KEDA's scaling engine uses the metrics retrieved by scalers to evaluate whether a workload needs to be scaled. This is configured through KEDA's primary custom resource definition (CRD), the ScaledObject.

How does KEDA work?

When you create a ScaledObject, KEDA automatically generates and manages an HPA resource behind the scenes.

Prerequisites

This portion of the tutorial assumes some working knowledge of Kubernetes. Additionally, you will need the following installed in order to follow along:

Install Metrics Server

The metrics server remains a crucial component of using KEDA, as it is used to make the underlying scaling of pods through an HPA or VPA.

If you do not have a metrics server deployed in your cluster already, run the following command to install:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

This will install the most recent version of the metrics server, which is compatible with Kubernetes version 1.91+

Deploy KEDA

With the metrics server installed, you can deploy KEDA using Helm:

Add the chart:

helm repo add kedacore https://kedacore.github.io/charts

Update your local repository:

helm repo update

Install KEDA:

helm install keda kedacore/keda --namespace keda --create-namespace

Output is similar to:

Deploying KEDA

Scaling a time-based application

To kick things off, let's discuss time-based scaling. This occurs when you have predictable spikes in traffic, such as a lunch rush for a food delivery app, or events that happen at specific times, like a Black Friday sale or the release of a popular clothing item.

These types of predictable and time-bound events are good candidates for cron-based scaling.

First, let's deploy a simple nginx application:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cron-scaled-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cron-scaled-app
  template:
    metadata:
      labels:
        app: cron-scaled-app
    spec:
      containers:
      - name: app
        image: nginx:alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
EOF

Now for the ScaledObject that enables time-based scaling:

kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cron-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: cron-scaled-app
  minReplicaCount: 1
  maxReplicaCount: 3
  triggers:
  # Scale up to 6 replicas every 10 minutes (for testing)
  - type: cron
    metadata:
      timezone: UTC
      start: "*/10 * * * *"    # Every 10 minutes
      end: "2-4/10 * * * *"    # End 2-4 minutes later (scale down)
      desiredReplicas: "6"
EOF

The ScaledObject uses standard cron syntax for scheduling scaling events. The key component here is spec.scaleTargetRef, which tells KEDA which deployment to scale, in this case, our cron-scaled-app deployment. Every 10 minutes, KEDA will scale the deployment up to 6 replicas, then scale it back down to the minimum of 2 replicas a few minutes later.

To verify that the scaling rule worked, run:

kubectl get pods -w

This will list the pods in the default namespace and watch for new pods. After ten minutes, your output should be similar to the following:

Scaling a time-based application

Scaling cache-dependent applications

Cron-based scaling is great, but not all applications have time-based spikes. More often than not, developers use a caching layer like Redis. In this next example, let's take a look at how we can scale a Redis worker based on the length of a queue.

First, we need to deploy Redis:

kubectl apply -f - <<EOF
---
# Redis Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
---
# Redis Service
apiVersion: v1
kind: Service
metadata:
  name: redis-service
  namespace: default
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
  type: ClusterIP
EOF

Next, deploy a worker that will simulate processing jobs from the Redis queue:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-scaled-worker
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-scaled-worker
  template:
    metadata:
      labels:
        app: redis-scaled-worker
    spec:
      containers:
      - name: worker
        image: python:3.9-slim
        command: ["/bin/sh"]
        args:
        - -c
        - |
          pip install "redis[hiredis]"
          python -c "
          import redis, time, random
          r = redis.Redis(host='redis-service', port=6379, decode_responses=True)
          while True:
              try:
                  job = r.blpop('job-queue', timeout=5)
                  if job:
                      print(f'Processing job: {job[1]}')
                      time.sleep(random.randint(1, 5))  # Simulate work
                  else:
                      print('No jobs, waiting...')
              except Exception as e:
                  print(f'Error: {e}')
                  time.sleep(1)
          "
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
EOF

Now, we will add a producer to add jobs to the queue:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-job-producer
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-job-producer
  template:
    metadata:
      labels:
        app: redis-job-producer
    spec:
      containers:
      - name: producer
        image: python:3.9-slim
        command: ["/bin/sh"]
        args:
        - -c
        - |
          pip install "redis[hiredis]"
          python -c "
          import redis, time, random, json, os
          r = redis.Redis(host='redis-service', port=6379, decode_responses=True)
          producer_id = os.environ.get('HOSTNAME', 'producer')
          counter = 0
          print(f'[{producer_id}] Starting job producer...')
          while True:
              try:
                  # Simulate varying load - sometimes burst, sometimes quiet
                  jobs_to_add = random.randint(1, 10)
                  for i in range(jobs_to_add):
                      job_data = {
                          'id': counter,
                          'task': f'process_data_{counter}',
                          'producer': producer_id,
                          'timestamp': time.time()
                      }
                      r.rpush('job-queue', json.dumps(job_data))
                      counter += 1
                  current_queue_length = r.llen('job-queue')
                  print(f'[{producer_id}] Added {jobs_to_add} jobs. Queue length now: {current_queue_length}')
                  # Variable sleep to create realistic load patterns
                  sleep_time = random.randint(3, 12)
                  time.sleep(sleep_time)
              except Exception as e:
                  print(f'[{producer_id}] Error: {e}')
                  time.sleep(5)  # Wait before retrying
          "
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            cpu: 100m
            memory: 128Mi
        env:
        - name: REDIS_HOST
          value: "redis-service"
        - name: REDIS_PORT
          value: "6379"
EOF

Finally, a ScaledObject that monitors the Redis queue and scales the worker:

kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: redis-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: redis-scaled-worker  # This is the deployment to scale
  minReplicaCount: 1
  maxReplicaCount: 5
  triggers:
  - type: redis
    metadata:
      address: redis-service.default.svc.cluster.local:6379
      listName: job-queue
      listLength: "5"
EOF

This ScaledObject uses the Redis scaler to monitor the job-queue list. When the queue length exceeds 5 items, KEDA will scale up the redis-scaled-worker deployment. The spec.scaleTargetRef points to our worker deployment, and KEDA will automatically add more worker pods to handle the increased queue load, then scale back down when the queue shrinks.

After a few minutes, run:

kubectl get pods

Your output should be similar to:

redis-scaled-worker-5fbc5475b8-f2hnv   1/1     Running     0          7m44s
redis-job-producer-7d5cdfb97b-z5ztb    1/1     Running     0          3m23s
redis-scaled-worker-5fbc5475b8-tmxlv   1/1     Running     0          103s
redis-scaled-worker-5fbc5475b8-nwgrl   1/1     Running     0          88s
redis-scaled-worker-5fbc5475b8-jnvv4   1/1     Running     0          73s
redis-scaled-worker-5fbc5475b8-lpsct   1/1     Running     0          73s
redis-scaled-worker-5fbc5475b8-bxg2b   1/1     Running     0          28s

Pro tip: Debugging

If you are trying to figure out why a ScaledObject or cron isn’t working. A good place to start is to describe the ScaledObject and check events using kubectl:

# Describe the ScaledObject to see its current status
kubectl describe scaledobject redis-scaledobject

# Check events for any scaling-related issues
kubectl get events --sort-by='.lastTimestamp' | grep -i scale
Please note that event retention in Kubernetes may be limited at times, so this command may not give accurate information if a significant amount of time has passed.

Closing thoughts

KEDA is a fantastic project that allows you to scale workloads based on more dynamic criteria, such as application dependencies or time of day.

While in this tutorial, we covered the cron and Redis scalers, KEDA supports a lot more scalers, such as Postgres and Prometheus. If you are looking to scale your cluster nodes on Civo, check out this section of the docs and happy scaling!