Problem: Keeping track of events in your clusters
Workloads running in Kubernetes cluster are dynamic in nature. The pods, replicas, deployments in your cluster keeps going on and off over the period of time due to their ephemeral nature. There are lot of situations when you want to check what happened in your cluster:
- to debug historic incidents
- to debug common tasks like:
- finding info and events related to Kubernetes resources (like pods, replicasets, deployments, etc) that have been deleted, like:
- finding info related to pods/replicasets that are replaced by newer pods/replicasets after a deployment update
- getting details of pods evicted from a lost node
- getting info about lost Kubernetes nodes, which no longer exist
- knowing rollout details of older deployments
- discovering hosts where pods a from previous deployment were running
- retrieving timings of pod replacements and their health checks
- long term behavioral analysis of your workloads running on your Kubernetes cluster
- and so on...
basically, we may need information about all the events happening in a Kubernetes cluster.
Kubernetes events is the answer to tackle the above problem. Kubernetes events are great way to analyze past events in your cluster, since it captures all the events and resource state changes happening in your cluster. But there are a few drawbacks:
- Kubernetes Events can generally only be be accessed using kubectl
- The default retention period of kubernetes events is 1 hour.
- The retention period can be increased using
--event-ttlflag of kube-apiserver. But doing so can cause issues with the cluster's key-value store.
- There is no way to visualize these events.
Kubewatch - Kubewatch is a tool to watch Kubernetes events and push notifications to available channels.
Eventrouter - In Eventrouter, Kubernetes events are captured and routed to a backend sink. A sink can be anything like an Amazon S3 bucket or an Elasticsearch cluster where you can dump all your events. Later, you can create dashboards based on captured events using tools like Kibana or Grafana. It supports multiple sinks.
Event-exporter - A Prometheus exporter to expose Kubernetes events in Prometheus format, which can then be stored in your Prometheus server, and you can then either create alerts using Alertmanager or create visualisation dashboards using Grafana based on these collected events.
The tools mentioned above are a good way to tackle most of the challenges posed by Kubernetes events. But these are not a standalone solution, you have lot of work to do as an end user. You also need to configure other tools apart from these ones to store and visualize the events.
Sloop: Your Ultimate and Easy Solution
Sloop is a standalone solution which can store and visualize Kubernetes events without needing as much effort from an end user perspective. Sloop monitors Kubernetes, recording histories of events and resource state changes, providing visualizations to aid in debugging past events.
- Allows you to find and inspect resources that no longer exist in your kubernetes cluster
- Helps in answering almost all the queries mentioned at the beginning of this blog
- Provides a timeline display that shows rollouts of related resources in updates to Deployments, ReplicaSets and StatefulSets
- Helps in debugging transient and intermittent errors
- Allows you to see changes over time in a Kubernetes application
- Is a self-contained service with no dependencies on distributed storage
Sloop can be installed using
helm or as a standalone Docker container.
All methods will require you to have a kubernetes cluster running, and the
KUBECONFIG environment variable set up. If you have not yet signed up to Civo, you can sign up to apply for our managed Kubernetes beta to try this out for yourself!
$ git clone https://github.com/salesforce/sloop $ cd sloop/helm $ kubectl create ns sloop $ helm install sloop -n sloop ./sloop
Refer to this document to run Sloop as a standalone docker container.
then use kubectl's port-forward function to access the dashboard:
kubectl port-forward -n sloop service/sloop 8080:80
http://localhost:8080/ to view the dashboard.
As you can see sloop provides timeline of your kubernetes resources. It also provides different filters to visualize it.
With Sloop, you can filter out Kubernetes resources based on the time range, the Kubernetes namespace, the kind of resource (like pods, pvc, node, etc), the resource name and also sort events based on different options. Selecting a particular Kubernetes resource in a specified timeline will show you different events occurring at that particular moment on that resource. This helps in capturing all past events that happened on that resource in your cluster.
Sloop also exposes a debug menu where you can see its configuration, internal metrics and different settings. You can also query its internal data store, there are lot of things to tweak around here.
For more information, check out the Sloop project on GitHub.