Kubernetes Metrics and why we need them

There are more than a few benefits of having insights into how our applications and the resources that power them perform.

We can improve our users’ experience, understand how our services are used, help reduce mean time to resolution (MTTR) when our services run into trouble, and reduce overall downtime.

Metrics are measurements that capture data about the performance of a process over a period of time. They represent a designator or identifier where data points are updated continuously.

Our applications’ response times, error rates, uptimes, and the like, are useful metrics we could use to measure the performance of our apps. Metrics like CPU and Memory utilization, read and write operations, etc., help us learn more about our applications’ resource usage.

What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit which collects and stores metrics as time series. It has a multidimensional data model which uses key/value pairs to identify data, a fast and efficient query language (PromQL), service discovery, and does not rely on distributed storage.

Using client libraries we can leverage Prometheus to monitor our services.

Client libraries allow us to define internal metrics for our applications using the same programming language as our app is written in.

These metrics are then exposed via an HTTP endpoint which Prometheus proceeds to scrape according to a set of rules we configure.

The Prometheus community offers several languages we can use for our app instrumentation, including Golang, Python, Java, and Ruby. Various third-party libraries are maintained by multiple communities, including Bash, C++, Perl, and more.

Prometheus metrics

Using Prometheus client libraries, we can instrument four main types of metrics in our application.

They include:

  • Counter: When we want to know how often an event occurred, we use the counter metric type. It is a cumulative metric whose value can only be increased or reset. Examples of counter metrics include the total number of HTTP requests our service has received, the number of versions of our service, the number of restarts of a particular pod, and the like.
  • Gauge: When we want to take a snapshot of a metric at a point in time, we use the gauge metric type. The gauge metric type is similar to the counter in that they both take measures of the total occurrence of an event. Whereas the counter metric type measures the whole occurrence of an event over a period of time, the gauge metric type takes a snapshot of the total occurrence of an event at a point in time. Gauge is used for metrics that can either go up or down. Examples of gauge metrics include current temperature, number of concurrent connections, number of online users, number of items in a queue, etc.
  • Histogram: When we want to group observations by their frequency and place them in pre-defined buckets. For example, we can create buckets that specify the requests durations we want to keep track of, e.g., the Go client library uses .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10 second increments as the defaults if not specified. So when a request is made to our service, the histogram metric calculates the duration and stores it in the appropriate bucket. That way, if we want to know how many requests took less than .005 seconds, we can get that information. And so it goes for the rest of the buckets. Examples of histogram metrics include request duration, byte size, response size, etc.
  • Summary: Summaries are similar to histograms in that they both track distributions. They are best used for measuring latencies, especially when a near accurate value is desired. Care needs to be taken when using summaries as they can not accurately perform aggregations and tend to be expensive in terms of resources.

Prometheus labels

Data in Prometheus is stored as a time series. A combination of metric and labels provide an identifier for our time series data.

Labels are attributes of metrics that provide dimensionality. They are key-value pairs that enrich metrics by providing a unique identifier for each time series, enabling aggregation and filtering.

Examples of labels include:

  • Instance - the name of the instance being monitored.
  • Handler - the function being executed.
  • Status_code - the returned HTTP status code.

Prometheus exporters

There are services, such as those with legacy code or third-party software for which we do not have access to their code, but we would like to monitor. For these types of services, Prometheus offers exporters.

Prometheus exporters help us monitor systems we are unable to instrument metrics for. They fetch non-prometheus metrics, statistics, and other data types, convert them into the prometheus metric format, start a server and expose these metrics at the /metric endpoint.

The Prometheus community offers and maintains a host of official exporters, including those for:

  • HTTP such as Webdriver exporter, Apache exporter, HAProxy exporter, etc.
  • Messaging systems such as Kafka exporter, RabbitMQ, exporter, Beanstalkd exporter, etc.
  • Databases such as MySQL server exporter, Oracle database exporter, Redis exporter, etc.
  • Exporters for APIs and other monitoring systems.

Developers are also encouraged to write their exporters should none of the available ones meet their requirements.

Monitoring Kubernetes resources with Prometheus

Kubernetes is a container orchestration platform. It is a portable, extensible, open-source platform for managing containerized applications. A Kubernetes cluster can handle thousands of microservices packaged and run as containers making it ideal for scale-running services.

Using Kubernetes to deploy and manage our containers, we can improve our development cycle.

Prometheus integrates well with Kubernetes. Supporting service discovery, Prometheus automatically pulls metrics from newly created replicas as we scale up our services. Kubernetes and Prometheus also work with labels used to select and aggregate objects for queries, meaning the two work well together conceptually.

Installing Prometheus on Kubernetes

We will be using Civo managed Kubernetes service for our cluster.

Civo’s cloud-native infrastructure services are powered by Kubernetes and use the lightweight Kubernetes distribution, K3s, for superfast launch times.

Prerequisites for Installing Prometheus

To get started, we will need the following:

How to install Prometheus on Kubernetes?

After setting up the Civo command line with our API key using the instructions in the repository, we can create our cluster using the following command:

civo kubernetes create civo-cluster

and our cluster, named ‘civo-cluster’, is created.

Civo cluster dashboard showing a running cluster

Kube-prometheus-stack provides a way to install Prometheus on Kubernetes using Helm easily. It is a collection of manifests that creates an end-to-end monitoring stack when applied to the cluster.

The kube-prometheus-stack chart creates the following resources when applied to a cluster:

  • The Prometheus Operator
  • Alertmanager
  • Prometheus node exporter
  • Prometheus adapter for Kubernetes Metrics API
  • Kube State Metrics
  • Grafana

To install the Kube-Prometheus-Stack we begin by first adding the repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Then we update our repositories:

helm repo update

Finally, we install our chart using this command:

helm install prom-stack prometheus-community/kube-prometheus-stack

Where ‘prom-stack’ is our release name.

Terminal output showing the installation of Prometheus stack onto a cluster

By running the following command:

Kubectl get pods

We can see all the components deployed with our chart.

Pods for Prometheus are running

Using port forwarding we can expose our Prometheus service externally with the following command:

kubectl port-forward prometheus-prom-stack-kube-prometheus-prometheus-0 9090:9090

Where ‘prometheus-prom-stack-kube-prometheus-prometheus-0’ is the name of our Prometheus service, and ‘9090’ is the port number.

Now we can access the user interface at http://localhost:9090/

Prometheus dashboard view with query interface

Using the metrics explorer, we can view the list of available metrics automatically pulled by Prometheus.

There are ready-made metrics about the API server, pods, containers, nodes, deployments, alert manager, and even the Prometheus server itself.

What is Grafana?

Grafana is an open-source visualization and analytics software. It allows us to pull metrics from various sources, run queries against them, and visualize them, making it easy to gain insight and make decisions about our services.

Using Grafana Dashboards, we can pull metrics from various data sources such as InfluxDB, MySQL, Datadog, etc. Most importantly, Grafana integrates seamlessly with Prometheus and is deployed as one of the components when using the Kube-Prometheus-stack to install our monitoring tools.

We can expose our Grafana service by port-forwarding using the following command:

kubectl port-forward prom-stack-grafana-6c56cdfbfb-hnlhc 3000:3000

Where ‘prom-stack-grafana-6c56cdfbfb-hnlhc’ is our Grafana service and ‘3000’ is the port number. Your Grafana service will have a different name, which you can identify with kubectl get pods -A and finding the service name.

Now we can access our Grafana user interface at http://localhost:3000/

Grafana login screen

To login to our service, we use ‘admin’ as the username and we can get the default password from our Grafana secret using the following command:

kubectl get secret prom-stack-grafana -o jsonpath="{.data.admin-password}" | base64

and we are in!

Grafana dashboard main page

And because we deployed our monitoring stack using Kube-Prometheus-Stack charts, our Prometheus server has already been added as a data source.

Grafana configuration showing a Prometheus datasource pre-configured

Wrapping up

By following this guide, we have also understood the significant features of Prometheus and how they aid Kubernetes monitoring.

We have also set up a complete monitoring stack for our Kubernetes cluster and observed Node metrics, Container Metrics, and Kubernetes State Metrics scraped by Prometheus.

Finally, we have shown how we can use Grafana to visualize metrics data.