Distributed tracing is crucial for tracking requests as they traverse throughout your entire application. For developers and operators managing complex systems, especially those orchestrated with Kubernetes, it is an indispensable tool.

Configuring distributed tracing allows you to understand and diagnose every functional part of your application through external outputs (traces). This enhances your workflow by improving debugging and troubleshooting capabilities, enabling a faster understanding of component interactions and error pinpointing. Decision-making becomes more informed with clear insights into your application's performance, guiding optimizations and proactive management to prevent issues before they impact end-users, ultimately reducing application downtime.

This tutorial guides you through setting up end-to-end distributed tracing in Kubernetes using Grafana Tempo and Civo Object Store, demonstrated with a Django application instrumented with OpenTelemetry.

An Introduction to Distributed Tracing

What is Distributed Tracing?

Distributed tracing is an essential methodology in modern application development, particularly within Kubernetes-based microservice architectures. It helps to address several critical challenges, such as:

Aspect Description
Handling Complexity in Microservices Applications deployed in Kubernetes environments often adopt a microservices architecture. Distributed tracing, integrated with visualization tools, demystifies request flows, aiding in managing application complexity.
Optimizing Performance Distributed tracing implementation visualizes request pathways, aiding in identifying bottlenecks and understanding service call latencies. Developers gain insights to pinpoint delays, decipher causes, and implement targeted performance enhancements.
Service Interaction Analysis Distributed tracing is crucial for deciphering service interactions within a system, allowing precise updates and debugging of underperforming services by understanding how they interact with each other.
Facilitating Error Diagnosis and Troubleshooting Distributed tracing offers detailed views or insights into internal application mechanisms, streamlining error diagnosis and troubleshooting. Tracing issues to specific services or requests significantly reduces time and effort in effective debugging.

Why Grafana Tempo and OpenTelemetry?

Employing Grafana Tempo and OpenTelemetry in Kubernetes environments isn’t a matter of convenience but a strategic choice grounded in technical superiority and adaptability. Here’s why:

Feature Description
High-Volume Trace Data Management Grafana Tempo efficiently handles large trace data volumes with optimized architecture for high throughput and low-latency processing, making it ideal for data-intensive applications.
Unified Telemetry Framework OpenTelemetry is a unified solution for telemetry data, integrating traces, metrics, and logs into a single platform. This simplifies observability infrastructure by eliminating the need for multiple tools.
Seamless Data Processing Pipeline The Grafana Tempo and OpenTelemetry combination forms a seamless data pipeline. OpenTelemetry, with its optional Collector, not only gathers but also preprocesses telemetry data, efficiently feeding it into Grafana Tempo for advanced tracing and storage.
Ecosystem Compatibility and Integration Grafana Tempo and OpenTelemetry seamlessly integrate into existing systems. Tempo complements Grafana, and OpenTelemetry is highly compatible with various languages and frameworks, making them versatile choices for diverse application stacks. They are both open-source.

Prerequisites

To follow along in this tutorial, you should meet the following requirements:

Please note that this tutorial uses a Linux OS with an Ubuntu 22.04 (Jammy Jellyfish) with amd64 architecture.

Installing and configuring Grafana Tempo

Once you have successfully set up your Kubernetes environment, we will proceed to install and configure Grafana Tempo in our cluster. This way, we'll have a specific endpoint ready for the OpenTelemetry collector to send traces to.

Grafana Tempo is the tracing backend we will be using in this tutorial. It is built for handling large-scale distributed tracing with few external dependencies and supports multiple storage options. For the purpose of this tutorial, we will configure Grafana Tempo to use the Civo Object Store, which is S3-compatible as its storage backend.

Step 1: Add the Grafana Helm Repository

After creating the Civo Object Store, we must add the Grafana Helm repository to our Helm setup. This repository contains the necessary charts to install Tempo and other Grafana tools.

Execute the commands below to add the Grafana Helm chart repository and then update your local Helm chart repository list to ensure you have the latest chart information:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

You should see the following output:

"grafana" has been added to your repositories

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "grafana" chart repository
Update Complete. ⎈Happy Helming!⎈

Step 2: Configuring Tempo

With the repository added, we can now configure Grafana Tempo.

On your machine, create a file called tempo.yaml and add the following configuration settings:

# tempo.yaml
distributor:
  receivers:
    otlp:
      protocols:
        grpc:

ingester:
  trace_idle_period: 10s  
  max_block_bytes: 1_000_000  
  max_block_duration: 1m  

compactor:
  compaction:
    compaction_window: 1h             
    max_compaction_objects: 1000000   
    block_retention: 1h
    compacted_block_retention: 10m
    flush_size_bytes: 5242880 

storage:
  trace:
    backend: s3
    s3:
      access_key: your-civo-objectstore-access-key
      secret_key: your-civo-objectstore-secret-key
      endpoint: your-civo-objectstore-endpoint
      bucket: tempo # Replace this with the actual name of your civo object store
      insecure: true

Here’s what the configuration settings above are doing:

  • distributor: Manages the distribution of trace data across Tempo's services. It's essential for handling incoming data efficiently.
  • OTLP: Configures the OpenTelemetry Protocol receiver, crucial for Tempo to receive trace data from instrumented applications.
  • ingester: Processes incoming trace data and compiles it into blocks. Key settings like trace_idle_period and max_block_bytes control how data is aggregated and stored.
  • compactor: Improves storage efficiency by consolidating trace data blocks. Settings such as compaction_window and max_compaction_objects are important for optimizing data storage and retrieval.
  • storage: Defines where and how trace data is stored. The configuration specifies using S3-compatible storage, with key details like bucket and endpoint indicating where the data is stored.

In production environments, it's recommended to handle access credentials securely. So, when setting up S3-compatible storage backends, avoid hardcoding credentials such as access keys and secret access keys. Instead, use environment variables to inject these credentials at runtime or store them using Kubernetes secret objects for a more secure approach.

Step 3: Installing Grafana Tempo

After configuring Tempo, we can now go ahead to install it using the configuration file we created in the previous step:

helm install grafana-tempo grafana/tempo -f tempo.yaml

Once tempo is installed, you should see the following outputted:

NAME: grafana-tempo
LAST DEPLOYED: Sun Nov  5 13:29:53 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

To confirm that Grafana Tempo is up and running in a ready state, have a Kubernetes pod and service use the following kubectl commands:

kubectl get pods
kubectl get service

You should have the following output:

kubectl get pods
NAME              READY   STATUS    RESTARTS   AGE
grafana-tempo-0   1/1     Running   0          5m6s

kubectl get service
NAME            TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                                                                                                   AGE
kubernetes      ClusterIP   10.43.0.1             443/TCP                                                                                                   18d
grafana-tempo   ClusterIP   10.43.68.96           3100/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP   5m14s

From the output above, the Grafana Tempo service has a couple of ports exposed, in this tutorial, we will be focusing on the following ports:

3100/TCP: This is the default port used by Grafana Tempo for its gRPC endpoint. It is used for receiving trace data from clients, or services instrumented to send traces using gRPC.

4317/TCP: This is designated for the OpenTelemetry Collector's gRPC receiver. It is the standard port for receiving trace data sent over gRPC following the OpenTelemetry protocol (OTLP).

4318/TCP: Similar to port 4317, is used for the OpenTelemetry Collector's HTTP receiver. It accepts trace data sent over HTTP using the OTLP format. This provides an alternative to gRPC for environments where HTTP is preferred or required to submit trace data.

Installing and Configuring OpenTelemetry Collector

To deploy the OpenTelemetry collector in our Kubernetes cluster, we will use a pre-made Helm chart provided by OpenTelemetry.

Step 1: Configuring the OpenTelemetry Collector

The OpenTelemetry collector needs to be configured to forward traces to Grafana Tempo. This involves setting up the otlp endpoint section in our configuration file to point to our Grafana Tempo instance.

Create a file called collector.yaml and paste it into the configuration settings:

mode: "deployment"

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

  processors:
    batch:
     timeout: 10s
     send_batch_size: 1024

  exporters:
    logging:
      loglevel: debug
    otlp:
     endpoint: grafana-tempo:4317
     tls:
      insecure: true

  service:
    pipelines:
      traces:
       receivers: [otlp]
       processors: [batch]
       exporters: [debug, otlp]

resources:
  limits:
    cpu: 250m
    memory: 512Mi

Here's what the configuration above does:

  • mode: Sets the Collector's mode to "deployment" for scalability and centralized data collection. This mode is one of the options like daemonset or Statefulset, depending on the use case.
  • receivers: Configures the OTLP receiver to listen for telemetry data on ports 4317 (gRPC) and 4318 (HTTP), enabling the collection of trace data over different protocols.
  • processors: Includes a batch processor to aggregate traces into batches, optimizing data processing with settings for timeout and batch size.
  • exporters: Defines the exporters used, including a debug exporter for logging and an OTLP exporter to forward data to Grafana Tempo. The OTLP exporter uses insecure TLS for simplicity in a development setup.
  • service: Establishes a pipeline for trace data, specifying how data is received, processed, and exported. It uses the configured OTLP receiver, batch processor, and both debug and OTLP exporters.
  • resources: Sets resource limits for the Collector's CPU and memory usage in a Kubernetes environment, ensuring efficient resource utilization and preventing excessive consumption.
Note: For production environments, be sure to focus on efficient resource utilization and secure data transmission. Ensure the collector is appropriately resourced in terms of CPU and memory, and use TLS encryption for secure communication with Grafana Tempo.

Step 2: Installing the OpenTelemetry Collector

Once OpenTelemetry collector is configured, follow these steps to install it on your Kubernetes cluster.

Add the OpenTelemetry Helm repository using the following commands:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

Install the OpenTelemetry collector using the following command:

helm install opentelemetry-collector open-telemetry/opentelemetry-collector -f collector.yaml

You should have something similar output as below when installed:

NAME: opentelemetry-collector
LAST DEPLOYED: Sun Nov  5 13:33:30 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
 NOTES:

Confirm that the OpenTelemetry collector is running as a pod and as a service using the following commands:

kubectl get pods
kubectl get services
kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
...
opentelemetry-collector-65676955c7-pxx57   1/1     Running   0          25s


kubectl get services
NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                   AGE
...
opentelemetry-collector   ClusterIP   10.43.185.170   <none>        6831/UDP,14250/TCP,14268/TCP,4317/TCP,4318/TCP,9411/TCP                                                   32s

Setting Up the Database

Now that we have Grafana Tempo and OpenTelemetry Collector setup, it’s time to set up our Django application. However, before we set up the Django application, we must first set up the Postgres database.

Step 1: Connecting to the Database

First, we need to connect to the Postgres database to create a role and a database.

Head over to your Kubernetes cluster from your Civo dashboard. Click on the Installed Apps then click on Postgres.

Civo Connecting to the Database

Copy your Admin username (shown above) and run the following commands sequentially to connect to the PostgreSQL database:

kubectl exec -it <postgres-pod-name> -- bash
psql -U <username> -d postgres
Note: Replace this <postgres-pod-name> with the actual name of the pod running your PosgreSQL database.

This will open an interactive bash shell within the PostgreSQL pod specified by <postgres-pod-name>. Once inside the pod, the second command, psql -U <username> -d postgres, connects to the PostgreSQL server using the specified username (<username>) and connects to the default "postgres" database (-d postgres).

kubectl exec -it postgresql-5546959f6d-b67fp -- bash
I have no name!@postgresql-5546959f6d-b67fp:/$

Step 2: Creating a User, Role and Database

After connecting to the Postgres database, we need to create a role and then create a database with the role as the owner. We will use this credential in our Django application to connect to and interact with the Postgres database.

Create a new role using the following command:

CREATE USER django WITH PASSWORD '1234';

Next, create a new database with the role as owner with the following command:

CREATE DATABASE notes OWNER django;

At this point, we should have the following output:

kubectl exec -it postgresql-5546959f6d-b67fp -- bash
I have no name!@postgresql-5546959f6d-b67fp:/$ psql -U 4gwvo6gFD3 -d postgres
psql (11.5 (Debian 11.5-3.pgdg90+1))
Type "help" for help.

postgres=# CREATE USER django WITH PASSWORD '1234';
CREATE ROLE
postgres=# CREATE DATABASE notes OWNER django;
CREATE DATABASE

Now grant all privileges on the database to the new role so it has the necessary permissions to operate:

GRANT ALL PRIVILEGES ON DATABASE notes TO django;

Once you have executed the above commands and set up the database and role, you can exit the PostgreSQL shell by typing:

# Exit the PostgreSQL command-line interface.
\q 
# Exit the current shell session of the PostgreSQL container.
exit
..
postgres=# GRANT ALL PRIVILEGES ON DATABASE notes TO django;
GRANT
postgres=# \q
I have no name!@postgresql-5546959f6d-b67fp:/$ exit
exit

Setting up the Django Project

In this section, we will set up our Django project, which consists of a notes application named notes_app. This Django app is already instrumented with OpenTelemetry, and the entire project is configured to send trace data to the OpenTelemetry collector instance in our Kubernetes cluster over gRPC using OTLP (OpenTelemetry Protocol).

Step 1: Setting up the Django Project for Kubernetes

To begin,fork and clone this GitHub repository which is housing the Django project.

The project achieves trace data capturing through OpenTelemetry middleware wrapped around the WSGI application and records data for incoming HTTP requests.

Additionally, a custom LoggingSpanExporter is used to log the success or failure of span exports in order to provide visibility into the trace export process. The trace data includes information such as the service name, which is set to django-notes-app, so that traces are correlated with the correct service in observability tools.

Once you have forked and cloned the GitHub repository, open it up with your default code editor, create a .env file at the root of the project, and populate it with the following:

DB_NAME=notes 
DB_USER=django
DB_PASSWORD=1234
DB_HOST=postgresql  #The name of the Postgres service in the cluster
DB_PORT=5432

This will set up environment variables with credentials to access the Postgres database in the Kubernetes cluster.

Next, build the docker image for the application and push it to your DockerHub repository using the following command:

docker build -t <your-dockerhub-username>/django-optl:latest .
docker push <your-dockerhub-username>/django-optl:latest

Step 2: Deploying the Django Project to Kubernetes

Exit out of the project directory completely, open up your command prompt, create a file called django.yaml, and paste in the following configuration settings:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: django-deployment
  labels:
    app: django-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: django-app
  template:
    metadata:
      labels:
        app: django-app
    spec:
      containers:
      - name: django-app
        image: <your-dockerhub-username>/django-optl:latest
        ports:
        - containerPort: 8000
        env:
        - name: DB_NAME
          value: "notes"
        - name: DB_USER
          value: "django"
        - name: DB_PASSWORD
          value: "1234"
        - name: DB_HOST
          value: "postgresql" # The name of the Postgresql service
        - name: DB_PORT
          value: "5432"

The configuration settings above does the following:

  • Creates a deployment called django-deployment with one replica (pod).
  • Sets up the container within the pod, named django-app, which will run the Docker image <your-dockerhub-username>/django-optl:latest.
  • Exposes port 8000 on the container, which is the port the Django application will use to serve HTTP traffic.
  • Configures environment variables for the container to connect to a PostgreSQL database, including the database name (DB_NAME), user (DB_USER), password (DB_PASSWORD), host (DB_HOST), and port (DB_PORT). The DB_HOST is set to postgresql, which is the service name for our PostgreSQL deployment within our Kubernetes cluster.

Apply these configuration settings in the cluster using the command:

kubectl apply -f django.yaml

Expose the deployment over a service using the following command:

kubectl expose deploy django-deployment --port 8000 

Now run the commands to view the pod and service associated with the Django project:

kubectl get pods
kubectl get services

At this point you should have the following output:

kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
...
django-deployment-6c4c7d4bcf-t4ccx         1/1     Running   0          45s

kubectl get services
NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                   AGE
...
django-deployment         ClusterIP   10.43.209.170   <none>        8000/TCP                                                                                                  45s

Step 3: Applying Migrations

Now that our Django project is deployed into our cluster, the next thing to do is apply the database migrations to set up the necessary tables and relationships in the Postgres database. This can be done by executing the Django management commands within the context of the Kubernetes deployment.

First, we need to identify the pod where our Django application is running. Use the following command to get the list of running pods:

kubectl get pods

Look for the pod that has the name django-deployment followed by a unique identifier. Once you have identified the correct pod, execute the following command to create new migrations based on the models present in the Django notes_app application:

kubectl exec <django-deployment-unique-identifier> -- python manage.py makemigrations

You should have the following output:

Migrations for 'notes_app':
  notes_app/migrations/0001_initial.py
    - Create model Note

Now apply the migrations using the following command:

kubectl exec <django-deployment-unique-identifier> -- python manage.py migrate

You should have the following output:

Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  ...

Step 4: Viewing the Django Project over the web

At this point, we have successfully deployed our Django project and configured it to interact with our Postgres database. Now we need to view the project over the web so we can interact with it so some traces can be sent to the OpenTelemetry collector and then to Grafana tempo.

Execute the following command to expose the Django application service to your local environment for access via localhost:8000:

kubectl port-forward svc/django-deployment 8000

Now you can access the Django project by entering the following URL into your web browser:

localhost:8000

Go-ahead and interact with the application by creating a note or more. Once you have successfully added a note, you should see the following output:

Viewing the Django Project over the web

Installing and setting up Grafana

Up until now, we have successfully deployed our Django project and have been able to interact with it. Now, it's time to visualize these traces with Grafana UI.

To install Grafana on our Kubernetes cluster, we will use Helm. Execute the following command to install the Grafana helm chart:

helm install grafana grafana/grafana
helm repo update

Confirm that Grafana is up and running and is exposed as a service using the following command:

kubectl get pods
kubectl get services

You should have the following output:

kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
...
grafana-6c9ff96d9d-jm6sc                   1/1     Running   0          71s

kubectl get services
NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                   AGE
...
grafana                   ClusterIP   10.43.80.98     <none>        80/TCP                                                                                                    81s

Now execute the following command to retrieve your Grafana password:

kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Once retrieved, execute the command below to expose the Grafana service to your local environment for access via localhost:3000:80:

kubectl port-forward svc/grafana 3000:80

Head over to the localhost:3000 and log in to Grafana UI as admin and the password you retrieved earlier on. Once logged in successfully, you should see the following:

Installing and setting up Grafana, Civo

Viewing Traces in Grafana

Now that Grafana is up and running, it's time to explore the traces collected from our Django application. Follow these steps to visualize the telemetry data:

Step 1: Add Tempo as a Data Source

Before viewing traces, ensure that Grafana Tempo is configured as a data source:

Step 1: Select the “Data Sources” box from the Grafana dashboard.

Step 2: Search for Tempo and choose Tempo from the list of available data sources:

Search for Tempo and choose Tempo from the list of available data sources

Step 3: Enter the details for your Tempo instance grafana-tempo:3100

Enter the details for your Tempo instance

Step 4: Click 'Save & Test' to ensure Grafana can connect to Tempo. You should have this pop-up if the connection is successful:

Grafana connecting to Tempo

Step 2: Explore Traces

To explore traces:

Step 1: Click on the toggle menu on the left panel to open the “Explore” section.

Step 2: Click on the “Explore” option, and select the Tempo data source you just added. You should see the traces:

Explore Traces

Step 3: You can search for traces by Trace ID, or you can use the built-in query features to filter and find traces. Like this:

Search for traces by Trace ID

Step 4: Select a trace to view detailed information, including spans and operations.

Viewing trace spans and operations

Once you have a trace open, you can now examine the spans to understand the request flow and latency, use the metadata provided to identify any issues or bottlenecks, and also view logs related to traces and application metrics if you have configured Loki or Prometheus as additional data sources in Grafana.

Troubleshooting

While following the steps of this tutorial, you may encounter various challenges. Below are some common issues along with their potential solutions to help you navigate and resolve these hurdles effectively:

  1. Grafana Tempo not receiving trace data: Check that the OpenTelemetry Collector's configuration points to the correct Tempo endpoint (grafana-tempo:4317 for gRPC or grafana-tempo:4318 for HTTP).
  2. OpenTelemetry Collector not receiving trace data: Verify that your Django application is correctly sending traces to your collector's endpoint in Kubernetes by examining the application and collector logs kubectl logs <pod-name> for any errors or misconfigurations in the trace export process.
  3. Grafana Tempo query errors: Ensure Grafana is operational and Tempo is correctly configured as a data source with the proper endpoint.
  4. Django not connecting to Postgres database: Confirm that DB_NAME, DB_USER, DB_PASSWORD, DB_HOST, DB_PORT are correctly set in your Django's Kubernetes deployment and align with your Postgres database settings.

Summary

In this tutorial, you have learned how to configure end-to-end distributed tracing with OpenTelemetry, Grafana Tempo, and Grafana for visualization. Using a pre-instrumented Django application, we have been able to configure an OpenTelemetry collector in our Civo Kubernetes cluster and have configured Grafana Tempo to receive traces from the OpenTelemetry collector using the Civo object store as our storage backend. Additionally, we went further to set up our Django project to generate some traces and have been able to view them via Grafana UI.

With the steps outlined in this tutorial, you now possess the capability to monitor and troubleshoot your applications in Kubernetes more effectively by leveraging the power of distributed tracing.

Further resources

If you want to continue learning about this topic, check out some of these resources: