In this tutorial, I'm going to be taking you through how to build a Kubernetes operator from scratch. On top of this, I will be going into a little bit of detail on how we use them at Civo. If you want to follow along with the information from this tutorial, check out this video where Dinesh Majrekar, CTO at Civo, takes you through a live demo:
What is a Kubernetes Operator?
An operator is a type of software that communicates with Kubernetes API to reconcile objects to desired state. For example, an operator could be used to manage an Nginx web server that serves data to customers. Operators work on a model of eventual consistency, which is a feature that is common in Kubernetes.
Instances when you may use an operator include:
- If an application can’t be run as a StatefulSet or monitored using health checks, using an operator may not be necessary. Operators are useful for managing clustered databases or other situations where the application's state cannot be monitored using health checks.
- When deploying multiple copies of an application, such as multiple databases in a single Kubernetes deployment, an operator may be necessary to manage the different instances effectively.
What does a Kubernetes object look like?
At the core, there are three main sections:
- ObjectMeta: This has metadata related to the object. Labels, Annotations, CreationTimestamp, etc
- Spec: Desired state of the object is defined here
- Status: Current state of the object is reported here
What is eventual consistency?
Eventual consistency is a concept in distributed systems that refers to the idea that all nodes in a system will eventually have the same data, but there may be a delay before this happens. In the context of Kubernetes and operators, it means that the operator will eventually bring the objects it manages into the desired state, but there may be a delay before this happens. It's important to note that this delay could be short or long depending on the operator's workload or other factors. Kubernetes and operators use the "status" field to communicate the current state of the objects they manage, and to indicate when they have reached the desired state(which is usually defined in spec). This field can be used to check the readiness of objects such as pods or deployments.
What is the Operator Lifecycle?
The operator lifecycle involves the following steps:
- Operator starts a cache locally waiting for changes to occur in the resource it’s reconciling.
- The operator runs a reconcile loop and may create/update/delete objects in the Kubernetes API or an external API.
- If the objects created by the operator are deleted or their state changes, the operator notices the changes (eventually) and reconciles the resource.
- The operator can then make further changes to the objects or other objects in the API.
How do Civo use Operators?
To manage the state of customer instances
At Civo, we manage the state of customer instances by closely monitoring public IP addresses, networking, virtual machines, and clusters. We heavily rely on the power of operators to manage customers' Kubernetes clusters and other Civo services, allowing us to leverage Kubernetes' default capabilities for customer clusters as well.
We look at managing public IP addresses that are attached to clusters, ensuring that they are allocated and handed back to the pool as needed. Our networking stack is more complicated, allowing customers to set up firewalls and custom routing.
Virtual machines are launched and managed by Kubevirt, which have pods underneath the hood, and all of our operators communicate with one another (via Kubernetes API) to ensure that the virtual machines are running smoothly.
Additionally, we keep a close eye on Kubernetes clusters, making sure that public IP addresses are available to the virtual machines as they move around our supercluster.
Operations on Customer environments
One of the key things we focus on is rebuilding objects through our website. By clicking on a rebuild button on a node, customers can send an operation to our Kubernetes operator, which will then perform a rebuild of the node, including the virtual machine and underlying storage. Additionally, we also handle reboots as an operation, and we need to be more state-aware of the VM's status during the process. This is what leads us to use the operator route, as it allows us to handle a more complicated state model.
In this tutorial, we will be using Kubebuilder to create a custom resource, which will be run against one of the Kubernetes clusters deployed on Civo. To start off, we will have a completely blank folder set up and ready to go. We'll be creating a single resource that is managed by the customer resource. The idea is to create a demo volume, which will have a size and a name.
We will also explain the status field by copying the name from the spec to the status field. The next step is to add functionality where the demo volume creates a PVC within our Civo Kubernetes cluster, with a size that matches the spec. If everything goes as planned, we will be very happy with the outcome.
Scaffolding using Kubebuilder
After installing Kubebuilder, the first thing I'm going to do is just initialize Kubebuilder and initialize the project.
This is called scaffolding, and in this instance Kubebuilder will create a whole set of files that are the base of your project. It includes some of the Go modules that we'll need by default and because Kubebuilder needs to be in a blank folder when it starts.
You’ll need to grab the
Kube.config for the new cluster and make sure that both of these shells are connected to the right cluster.
Creating a custom resource
Once that has gone through, we will create the resource and the controller.
During this process, it should have modified a few files, including some more Go modules that we need. It should also have created an API folder, config folder, and our controllers folder.
In the API folder, a demo volume type is created with a sample
Foo string in the spec area of the custom resource. The first step is to change the name from
name and to add a new field called
size that should be an int. This will create a custom resource with a spec section that contains
We also have a structure for the status field, where we will copy the name from the spec. Kubebuilder then uses the makefile to take our go code and structures and convert it into a custom resource definition that can be loaded directly into Kubernetes. This process is made possible by the use of a webhook.
The custom resource definition that we have created can be loaded directly into Kubernetes.
The spec section has been updated to include a
type integer field. This means that every time we make changes to the code, Kubebuilder generates the necessary Kubernetes configurations, which is very convenient for developers as it eliminates the need to handle boilerplate yaml code.
Kubebuilder also scaffolded a controller for us. The controller may seem long, but the only section that needs attention is the one at the bottom. This section handles the logic of the reconcile process and the changes to the custom resource.
The reconcile loop handles the logic of the reconcile process, where all the code is executed and the information about what is being reconciled is contained in the request object.
To get the process up and running, we will change the code to log
enter reconcile and output the request for debugging purposes.
make install in the terminal, we can install our custom resources into the Kubernetes cluster. This process involves downloading Kustomize to help generate the custom resources. Once the installation is complete, we can see that the demo volume custom resource definition has been loaded into our cluster by checking the custom resource definitions in the cluster. We can see that the demo volume has been created recently.
Kubebuilder also provides a sample demo volume that allows us to easily create an object that can be used in the cluster.
Kubebuilder provides a makefile that we can use to run the process. By running
make run against this makefile, it will connect to the Kubernetes cluster and start the reconcile process against our request, in this case
default/demo volume sample. This is a convenient way to automate the process, and the output should show that the reconcile has been successful.
The first step is to copy and paste some boilerplate code, which will copy the name from the spec to the status field. This is an important step in establishing the contract between the operator and the rest of the system.
Next, we will load the volume object. We will then check if the
volume.Status.Name are not equal. If they are not, we will set volume.Status.Name to equal
volume.Spec.Name and then save this back directly to the API by updating the status and sending the context and the volume itself.
Now that we have made the necessary adjustments, we will re-run the process. If all goes well, we should expect to see all green. Let's take a look at the output and see if we can make sense of it.
Running the Operator against a Civo Kubernetes Cluster
The process of running an operator against a Civo Kubernetes Cluster involves several steps. The operator starts up and begins a process called reconcile, which involves grabbing a demo volume and outputting it. The operator then updates the demo volume and calls another reconcile on the same object. This process continues until the status is set correctly and the object is considered done.
The next step is to use this information to create a PVC in the cluster of a specific size, using a new function called reconcile PVC. This process involves sending in the context, demo volume, and log, and can be a bit tricky to get right. Code may need to be copied and pasted to complete this task, and the process will be explained in further detail.
Creating native resources related to the Custom Resource
When creating native resources related to a Custom Resource, it's important to keep in mind the concept of eventual consistency. This means that the reconcile function may be run at different times and on startup.
The first step is to retrieve a PVC from the cluster with the same name as the demo volume. If it is found, we will exit the function as everything is considered to be in the correct state.If there are any errors connecting to Kubernetes, the function will also exit and allow for the reconcile process to be run again.
If the PVC is not found, a log will be recorded and a new PVC will be created.
Managing native resources related to custom resources can be a bit tricky due to the way these objects are defined. The process involves handling details such as resource quantities, which can be confusing at first. However, the necessary information can be found in the documentation for the Client Go Libraries, which explains the process in detail.
Now that the object is ready to be created, we will use Kubernetes to actually create it. We will retrieve all of the PVCs in the specified namespace. The operator log will be kept small for now, but if everything goes as planned, we should see the operator start up and receive a reconcile request for the demo volume. This time, we will call the appropriate method and observe the PVC being created, with the correct capacity set. This is made possible by using the same CSI driver in Civo Kubernetes, and once the PVC is found, it will be ready for use.
Want to know more about this topic?
If you’re interested in learning more about operators, check out the resources linked below: