Understanding Kubernetes StatefulSets

Saiyam Pathak avatar
By Saiyam Pathak
Director of Technical Evangelism


Find out how StatefulSets are different from Deployments in Kubernetes and how they work.


What are StatefulSets?

In this video, we'll be going through StatefulSet. StatefulSet is a Kubernetes object which is used for Stateful applications such as databases. However, there are cases just like databases where you need persistence, or you need to store the state of the application, which Deployments cannot serve.

StatefulSet vs Deployment

StatefulSet has a lot of similar characteristics to a Deployment, and it's scaled like that, it has a pod spec like that, but it is different from Deployment. Now, before going to the differences, say that you try to create a Deployment of a database. It will create three replicas in random order with a random hash. This is unacceptable when you talk about persisted applications because you can read from a database any of the replicas. Still, you cannot write to any replicas because it can lead to inconsistency. So let me give you another view from another angle.

Imagine that you have a Deployment of two replicas. When you create a Deployment of two replicas, it will create the product demo pods in the form of a demo. The pods will be in the name of the demo with a random string and demo random string. These are not created in any orders, they don't maintain any sticky session, and they can be used by any service means. Then you can create a load balancer, a cluster IP, a node pod, or any kind of service you can create to serve the traffic to these pods.

Now, on the other hand, StatefulSet will maintain a sticky identity. What does that mean? That means all the names of the pod will be predictable. For example, if you have named a StatefulSet as web, the pods will be named in the order of web-0, web-1, and so on till n minus one. This is a simple example of the above one. You can see that with the kubectl get deploy and kubectl get statefulset> commands, both are ready. Suppose you see the pods using the kubectl get pods command. In that case, you will see that the Deployment ones have the random hash attached to that and the StatefulSet have an ordinal number attached to them, which is in a predictable manner, meaning if I skin a Deployment, there will be another pod, which will get created. We don't know the random hash that will get generated to that pod. Whereas in StatefulSet, we are sure that the next pod will be web-2. Another critical difference is, in Deployment, all the pods will get started or start getting created irrespective of the errors. Now, unless and until your previous pod is ready, the newer one will not even get started creating. That's what happens when you create a StatefulSet.

What is a Headless service and why is it important for StatefulSets?

Another important thing to note over here is that we have to use headless services for Stateful applications. A headless service is a service where the cluster IP is none. But why a headless service? Because if we talk about any services, it can serve traffic to any of the instances. But we cannot afford that when we are using a database application. You can see that this is the web-0 pod, and this is web-1 which is maybe a MySQL database. When you have a service over here, it automatically sends the right operation to this and the next one. It sends the right operation to this, leading to data inconsistency. Here we need to maintain a master/slave architecture with the configuration we have to define, but we know the order. We can predefine that web-0 will be the master and web-1 will be the slave because we know the order, and there can be data replication. As a result, whenever a new pod comes up, it replicates the data and will be in the same state as the previous pod. So there has to be a replication mechanism that we have to handle.

But what Kubernetes can do is, if you have a headless service over here, we remove this particular service, and we type our headless service over here. Now, when you have a headless service, what will happen is that each pod will get its unique network identity as well, which can be referred to when you are doing the replication because you will know what the headless service name is. Also, when we create the specification for StatefulSet, we will define the service name.

We will define the service name, and we'll be creating the service. Now, Kubernetes helps you manage the StatefulSet, which will help you keep the sticky state session for the pods. However, there are certain aspects that a user must maintain, such as the creation of the headless service. Now, if the data is persistent, it will use PVC or a Persistent Volume Claim. Now, for that, the persistent volume or the volume provisioner has to be taken care of by the user, or they have to be pre-created by the user. It is because this particular thing will be two whenever a new pod comes in place or whenever we scale. Again, whatever the replication method we have created, it will replicate and create a PVC, which will then get connected to or attached to a PV. Then if you have any external mechanism for the external storage, that can be used, but that has to be set by the user.

The scaling will also get a unique identifier and a unique network identifier because it is a StatefulSet. Hence, whenever the deletion happens, deletion also happens in the previous reverse order. So, the last created pod will be the first one to go with the scale down, and then the others will follow, and this is how the scaling down also happens in the reverse order. The scaling up happens in this way where you know what will be the next pod name and the following network identity of the pod name.


That's how StatefulSet works, and we'll also see some of the things in a demo. But overall, this is how the StatefulSet works and how it is different from Deployment. Thank you for watching, see you in the next lecture.

Don't stop now, check out your next lesson