Kafka on Kubernetes: A Strimzi & GitOps Guide

What is Apache Kafka?

Apache Kafka is an open-source distributed publish-subscribe messaging platform. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. Kafka operates by maintaining events as records within a cluster of servers, and these servers can span multiple data centers. Kafka is the go-to software for sharing and processing data amongst distributed applications and teams, where high throughput and scalability are the goals.

Installing Apache Kafka on Kubernetes can be quite tasking. All the components have to be deployed separately; the zookeeper alone contains a deployment and a service. By the time all the components are declared, your Kubernetes manifests could be in the dozens. Then the additional task of storing and maintaining the containers for every component is not an easy endeavor.

Finally, making changes to our Kafka cluster and tracking the changes made when there are dozens of manifests can be tedious and lead to many breakages.

Kubernetes Operators are application-specific controllers built by extending the API of Kubernetes. They are tasked with packaging, deploying, and managing applications.

GitOps is a set of practices adopted to manage codified Infrastructure and application configurations using Git as a source of truth.

Between Kubernetes Operators and GitOps, we can simplify the installation and management of our Kafka cluster.

How to install Apache Kafka on Kubernetes with Strimzi

Strimzi is a Kubernetes operator which acts as a dedicated SRE for running Apache Kafka on Kubernetes. It assumes the responsibility of the entire life cycle of Kafka clusters: creating, managing, and monitoring the clusters, and their associated entities (Topics and Users).

By extending the Kubernetes API with Kafka-related custom resource definitions, Strimzi allows us to declare our Kafka components as custom resources. We can describe components such as the number of replicas, the listeners, resource allocation etc., declaratively and Strimzi will ensure that our cluster matches our desired state.

Describing our Kafka cluster for Strimzi

We can describe our Kafka cluster using a YAML file. We can define it using the file detailed below, which you can save as kafka.yaml.

1apiVersion: kafka.strimzi.io/v1beta2
2kind: Kafka
3metadata:
4  name: my-cluster
5spec:
6  kafka:
7    version: 3.0.0
8    replicas: 3
9    listeners:
10      - name: plain
11        port: 9092
12        type: internal
13        tls: false
14      - name: tls
15        port: 9093
16        type: internal
17        tls: true
18    storage:
19      type: jbod
20      volumes:
21      - id: 0
22        type: persistent-claim
23        size: 100Gi
24        deleteClaim: false
25      - id: 1
26        type: persistent-claim
27        size: 100Gi
28        deleteClaim: false
29  zookeeper:
30    replicas: 3
31    storage:
32      type: persistent-claim
33      size: 100Gi
34      deleteClaim: false
35  entityOperator:
36    topicOperator: {}
37    userOperator: {}

Our Kafka Cluster is simple. It contains just three replicas of Kafka brokers making our cluster and three replicas of zookeepers. Other features including types of listeners, MirrorMaker etc., can also be configured.

We can also define our topics using YAML - in this case the below file, topics.yaml.

1apiVersion: kafka.strimzi.io/v1beta2
2kind: KafkaTopic
3metadata:
4  name: my-topic
5  labels:
6    strimzi.io/cluster: apk-cluster
7spec:
8  partitions: 1
9  replicas: 1
10  config:
11    retention.ms: 7200000
12    segment.bytes: 1073741824

Provisioning our Kubernetes cluster

Our Kafka Cluster will be installed on Civo.

Civo’s cloud-native infrastructure services are powered by Kubernetes and use the lightweight Kubernetes distribution K3s for superfast launch times.

Prerequisites

To get started, we will need the following:

After setting up the Civo command line with our API key using the instructions in the repository, we can create our cluster using the following command:

1civo kubernetes create apk-cluster

Our K8s cluster is created:

You will also need to download the KUBECONFIG file for your cluster. The easiest way to do this is to use the CLI again:

1civo kubernetes config apk-cluster --save --merge

This should give you output like the following:

1Merged with main kubernetes config: ~/.kube/config
2
3Access your cluster with:
4kubectl config use-context apk-cluster
5kubectl get node

Next, we will install Strimzi in the same namespace as we want our cluster to reside in, using Helm.

First, we add the Strimzi repo using this command in our terminal:

1helm repo add strimzi https://strimzi.io/charts/

Then we will install it using this command:

1helm install strimzi/strimzi-kafka-operator --generate-name

Deploying our applications with GitOps

ArgoCD is a declarative GitOps tool that provides a continuous delivery system for deploying applications on Kubernetes. It uses the GitOps pattern of using Git repositories as the source of truth for defining the desired application state, its environment and configurations. After it is implemented as a custom resource definition, ArgoCD will continuously monitor running applications and compare them with their desired state as specified in a git repository, while taking the required steps to keep them in sync. ArgoCD supports a variety of configuration management tools, including Helm, Kustomize, Ksonnet, Jsonnet, as well as plain YAML/JSON files.

Prerequisites

Git
GitHub account
GitHub CLI

First, we will create the Git repository we want ArgoCD to monitor. This repo will contain our apps and their configuration files.

We can create a GitHub repository with the web console or the command line.

To create a repo from the command line use the following command that uses the GitHub CLI tool gh.

1gh repo create

Now that we have our source of truth, aka our git repo, we can install ArgoCD and configure it to start monitoring our repo.

We start by creating a namespace in our cluster:

1kubectl create namespace argocd

Then we apply the manifests for ArgoCD to our K8s cluster, into our argocd namespace:

1kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Once all the components of ArgoCD are installed in our cluster, we can use port forwarding to access the API server:

1kubectl port-forward svc/argocd-server -n argocd 8080:443

By directing our browser to https://localhost:8080, we can navigate to the ArgoCD web console.

👉🏾 use “admin” as the username to login, and the password can be found as plain text in the “argocd-initial-admin-secret” secret found in the ArgoCD installation space. You can find it using kubectl with the command kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo

Our console, once logged in, should look like this:

We can interact with ArgoCD through the web console and its command line. To download the ArgoCD CLI tool, you can install it through brew or your operating system's choice of package manager. On a Mac, using brew:

1brew install argocd

Finally, we will configure ArgoCD to watch our desired repository by connecting it using HTTPS. This setting can be found under "Settings" on the left hand side, and then "Repositories" on the settings page:

👉🏾 GitHub allows you generate security tokens that can permit apps to take some actions, providing a safer option to using your password. New tokens are generated from the developer settings of our Github profile.

Once you add the URL of the repository you created, our repo is connected and ArgoCD is ready to start monitoring and syncing.

Describing our applications as ArgoCD Apps

An ArgoCD application is a group of Kubernetes resources as defined by a manifest. We can define our application resources, such as the cluster, namespace, repository and directory containing the manifests, sync policy etc., as a deployable unit.

We will define our Kafka cluster as an ArgoCD app in YAML format with the following. Save the yaml code below as a file such as apk-app.yaml, but edit the repoURL to be your Git repository's URL.

1apiVersion: argoproj.io/v1alpha1
2kind: Application
3metadata:
4  name: strimziapp
5  namespace: argocd
6spec:
7  project: default
8  source:
9    repoURL: https://github.com/ehienabs/civo-kafka
10    targetRevision: HEAD
11    path: apk-cluster/strimzi-apk-cluster
12  destination:
13    name: in-cluster
14    namespace: default
15  syncPolicy:
16    automated:
17      prune: true
18      selfHeal: true

The above code tells ArgoCD to look in the apk-cluster/strimzi-apk-cluster’ directory of ‘https://github.com/ehienabs/civo-kafka.git’ repository, and sync the desired state, as declared in the Kubernetes manifests, with that of the live state in the Kubernetes cluster. Be sure to replace the repoURL field with your git repository's details.

We can also create an app to manage our Kafka topics with the following. Save the YAML code below as a file, with a name such as topic-app.yaml:

1apiVersion: argoproj.io/v1alpha1
2kind: Application
3metadata:
4  name: topicapp
5  namespace: argocd
6spec:
7  project: default
8  source:
9    repoURL: https://github.com/ehienabs/civo-kafka.git
10    targetRevision: HEAD
11    path: apk-cluster/topics
12  destination:
13    name: in-cluster
14    namespace: default
15  syncPolicy:
16    automated:
17      prune: true
18      selfHeal: true

Once again, make sure you change the repoURL field with your git repository's details.

Using the apps of apps pattern to simplify deployment

The app of apps pattern helps us define a root application. This application, rather than watch a directory containing our Kubernetes manifests, watches a directory containing all our apps. It is, effectively, an app containing other apps and greatly simplifies deployment when we are dealing with multiple apps.

Our supervisor app manifest looks like this:

1apiVersion: argoproj.io/v1alpha1
2kind: Application
3metadata:
4  name: bigapp
5  namespace: argocd
6spec:
7  project: default
8  source:
9    repoURL: https://github.com/ehienabs/civo-kafka.git
10    targetRevision: HEAD
11    path: argocd-apps
12  destination:
13    name: in-cluster
14    namespace: default

Once again, make sure you change the repoURL field with your git repository's details. Save the file as bigapp.yaml

Finally, we install our Kafka cluster and its components by creating our root application. Run kubectl apply against our root app manifest.

1kubectl apply -f bigapp.yaml

Then push our manifests to our git repo using git:

1$ git add bigapp.yaml apk-app.yaml topic-app.yaml
2$ git commit -m "add manifest files"
3$ git push

ArgoCD immediately detects the changes to our repo and should begin syncing our desired state with the live state of the cluster.

Feels a little like magic installing an Apache Kafka cluster using one command, but we can verify that all our components are in our cluster using:

1kubectl get pods

We can also check to make sure our Kafka cluster is sending and receiving messages by first creating a producer using the following command:

1kubectl run kafka-producer -ti --image=quay.io/strimzi/kafka:0.19.0-kafka-2.4.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list apk-cluster-kafka-bootstrap:9092 --topic my-topic

Then we create a consumer in a separate terminal.

We can now send messages from our producer terminal:

And we can view them in our consumer terminal:

We can add more messages:

And view them in our consumer terminal:

Wrapping up

By following this guide, we will have provisioned a cluster and set up Apache Kafka on it with the help of the Strimzi operator. We also employed ArgoCD to set a GitOps pipeline to manage our application. Once the Kafka cluster was brought online, we demonstrated it works by adding and viewing messages in the stream.

You will be able to clear up the cluster if you so wish by deleting it from the Civo dashboard. This will also stop the ArgoCD deployment from watching for changes in the app specification.

If you followed this guide, let Civo know on Twitter along with how you plan to extend the setup outlined here, or if you have any feedback or corrections!

Installing an Apache Kafka cluster on Kubernetes using Strimzi and GitOps

What is Apache Kafka?

How to install Apache Kafka on Kubernetes with Strimzi

Describing our Kafka cluster for Strimzi

Provisioning our Kubernetes cluster

Prerequisites

Deploying our applications with GitOps

Prerequisites

Describing our applications as ArgoCD Apps

Using the apps of apps pattern to simplify deployment

Wrapping up

Further Reading

Using Argo CD in Kubernetes to deploy applications with GitOps

Deploying Emissary Ingress with Terraform and Helm

Automating infrastructure as code: Deploying Kubernetes with Digger and GitHub actions

Using Argo CD in Kubernetes to deploy applications with GitOps

Deploying Emissary Ingress with Terraform and Helm

Automating infrastructure as code: Deploying Kubernetes with Digger and GitHub actions

Company

Company

Public Cloud

Public Cloud

Private Cloud

Private Cloud

Civo AI

Civo AI

Solutions

Solutions

Resources

Resources

Contact

Contact

Legal

Social