Advanced analysis of Kubernetes distributed tracing
A guide on integrating traces, logs, and metrics for Django apps using Grafana Tempo, OpenTelemetry, Loki, and Prometheus in Kubernetes for full observability.
Written by
Technical writer
Written by
Technical writer
⚠️ To follow along with this tutorial, you must have read part 1, which sets up end-to-end distributed tracing using Grafana Tempo and OpenTelemetry in a Kubernetes environment.
Discussing end-to-end distributed tracing involves more than just tracing. It also encompasses other important components, such as metrics and logs. Therefore, a conversation about tracing is incomplete without addressing logs and metrics.
While traces provide information about the request flow and performance of individual services in your application, logs and metrics offer additional layers of observability. Logs give detailed, text-based records of events within your application, and metrics provide quantitative data on the performance and health of your system. Together, they offer a more complete picture of your application’s state.
In the first part of this series, we successfully set up end-to-end distributed tracing using Grafana Tempo and OpenTelemetry in a Kubernetes environment. We used a pre-instrumented Django application to send traces to Grafana Tempo through an OpenTelemetry collector. This setup used a Civo Object Store, and the trace data was visualized in Grafana.
Now, in the second part of this series, we will learn how to analyze these traces. Through this tutorial, we will examine spans to understand request flows and latency, how to identify issues or bottlenecks using metadata, and how to integrate Grafana Loki and Prometheus as additional data sources in Grafana for a complete analysis of logs related to the traces and metrics for performance.
Analyzing trace data
At the end of the previous tutorial, we could view traces in Grafana which are outlined in the image below. This image shows that the trace GET /create took 5.98 milliseconds. Additionally, we have details indicating the successful completion of the request with a 200 HTTP status code, signaling that the operation was executed without errors.

In distributed tracing, a trace is a collection of spans, where each span represents a specific operation or segment of work done in the service. Spans within a trace can have parent-child relationships that show the flow and hierarchy of operations.
In this particular trace, we observe a breakdown of individual spans, including the note_create span within the django-notes-app service. This note_create span, which took 2.57 milliseconds, is a child span of the GET /create span.
As a child span, it represents a discrete operation, or a part of the processing that contributes to the overall response of the GET /create request. This hierarchical relationship between spans is crucial for understanding the flow of requests and identifying areas within a service that contribute to the total execution time.

For a more comprehensive analysis of the trace, you have the option to export the trace data. This can be done by clicking on the export icon highlighted in the image below 👇

The exported data provides detailed information about the trace, such as:
- Services involved, such as
django-notes-app - Span details, including trace ID, span ID, parent span ID, timestamps, and more
- Specific attributes of each span, like HTTP methods, URLs, status codes, and server names
This level of detail is beneficial for in-depth analysis, allowing you to thoroughly examine each aspect of the trace, from the high-level view of the request to the granular details of individual operations.
The exported trace will be downloaded in a JSON format and once viewed it looks something like this:
{"batches": [// Batch for the trace 'GET /create'{"resource": {"attributes": [{"key": "service.name","value": {"stringValue": "django-notes-app"}}],"droppedAttributesCount": 0},"instrumentationLibrarySpans": [{"spans": [{"traceId": "a4fcabb761c0bcb79f49462d317cb769","spanId": "d28cb2de926c9ee4","parentSpanId": "0000000000000000", // Root span with no parent// ... additional span details ...}],"instrumentationLibrary": {"name": "opentelemetry.instrumentation.wsgi", // Instrumentation library"version": "0.41b0"}}]},// Batch for the trace 'note_create'{"resource": {"attributes": [{"key": "service.name","value": {"stringValue": "django-notes-app"}}],"droppedAttributesCount": 0},"instrumentationLibrarySpans": [{"spans": [{"traceId": "a4fcabb761c0bcb79f49462d317cb769","spanId": "29a715d4dba3c442","parentSpanId": "d28cb2de926c9ee4", // Parent span ID indicating this span is a child of the 'GET /create' span// ... additional span details ...}],"instrumentationLibrary": {"name": "notes_app.views", // Instrumentation library for the view"version": ""}}]}]}
Reconfiguring the Django application
Until now, we can view requests as they flow through our application (traces), including timing data and interactions between different components or services in our Django application. We can now integrate logs and metrics into this setup to enhance our observability capabilities. This addition will enable us to:
- Send logs to our OpenTelemetry collector so we can analyze log data alongside trace data.
- Send metrics to our OpenTelemetry collector so we can monitor key performance indicators for a more comprehensive understanding of our application’s behavior.
Step 1: Cloning the Django application
First, we need to configure our Django project to send logs and metrics to our OpenTelemetry collector in our Civo Kubernetes cluster.
Clone the following GitHub repository; the Django project has been configured to generate detailed logs using the OpenTelemetry Logging Instrumentation and a custom format that integrates trace and span IDs.
For metrics, it employs the OpenTelemetry Metrics API to track the number of requests it receives using a counter metric. This counter, named request_count, increments with each incoming request to the Django notes-app application, providing a straightforward yet effective way to monitor traffic load. The count data is then exported through an OpenTelemetry exporter to establish a robust framework for logging and performance monitoring of the Django application.
Step 2: Dockerizing and deploying the Django application to DockerHub
Once cloned, create a DockerHub repository, dockerize it, and deploy it to the new repository using the following commands:
docker build -t <your-dockerhub-username>/<repository-name>:latest .docker push <your-dockerhub-username>/<repository-name>:latest
Step 3: Updating the Django application deployment
Now that we have dockerized the Django project and have pushed it to DockerHub let's update our deployment.
To begin, update the previous deployment’s image to point to the new Docker image using the following commands:
kubectl set image deployment/django-deployment django-app={your-dockerhub-username}/{name-of-your-image}
This will update the existing Kubernetes deployment with our new image. You should have the following output once the deployment is has been configured:
deployment.apps/django-deployment image updated
Confirm that the Django application is running using the following command:
kubectl get pods
Once it is running, you should have the following output:
NAME READY STATUS RESTARTS AGEdjango-deployment-6c4c7d4bcf-lwx8v 1/1 Running 0 65s
Installing Grafana Loki
Having successfully configured our Django project to generate logs and metrics in addition to traces, our next step is to set up the infrastructure required for visualizing and analyzing this data.
We've already established a pipeline for forwarding traces from our OpenTelemetry collector to Grafana Tempo, which are then visualized in Grafana. Now, we'll extend this capability to include logs and metrics.
To achieve this, we'll first install Loki for log aggregation and Prometheus for metrics collection. These tools will serve as the foundational elements for our observability stack, allowing us to gain deeper insights into our application's performance and behavior.
Step 1: Configuring Loki Stack
When installing Loki Stack via Helm, it comes with a comprehensive stack that includes not only Loki but also Prometheus and Grafana. This stack provides an integrated solution for log aggregation, metrics collection, and data visualization.
However, for more granular control over these components, we will install them separately. Since we already have Grafana installed, we won't need to install it again.
Begin by creating a file named loki-values.yaml. This file will host our custom configurations for the Loki stack installation.
Use a text editor to create this file and insert the following settings:
loki:enabled: trueprometheus:enabled: falsegrafana:enabled: false
These settings ensure that only Loki is enabled during the installation, while Prometheus and Grafana are not installed as part of this stack. This approach lets us maintain the existing Grafana setup and manage Prometheus separately.
Step 2: Installing Loki Stack
With the Loki Stack configured, we can now go ahead and install the Loki Stack using Helm with the custom settings created in the previous step.
Execute the following command to add the Loki Stack Helm chart repository:
helm install loki grafana/loki-stack -f loki-values.yaml
You should have the following output:
NAME: lokiLAST DEPLOYED: Thu Nov 30 05:49:24 2023NAMESPACE: defaultSTATUS: deployedREVISION: 1NOTES:The Loki stack has been deployed to your cluster. Loki can now be added as a data source in Grafana.See http://docs.grafana.org/features/datasources/loki/ for more detail.
After running the Helm command, check your Kubernetes cluster to confirm that Loki is up and running:
kubectl get podskubectl get svc
You should have the following output:
# kubectl get podsNAME READY STATUS RESTARTS AGE...loki-0 0/1 Running 0 20sloki-promtail-tqghk 1/1 Running 0 20sloki-promtail-5nsfv 1/1 Running 0 20s# kubectl get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEloki-headless ClusterIP None <none> 3100/TCP 25sloki-memberlist ClusterIP None <none> 7946/TCP 25sloki ClusterIP 10.43.241.73 <none> 3100/TCP 25s
Installing Prometheus
With Loki configured and installed in our cluster, next up we’ll go-ahead to configure and install Prometheus. To achieve this, we will be using the Prometheus kube-prometheus-stack Helm chart.
Step 1: Configuring Prometheus
Before installing Prometheus, we need to create a job configuration that will allow Prometheus to scrape metrics from specific targets.
Create a file named prometheus-values.yaml and paste in the following configuration:
global:scrape_interval: '5s'scrape_timeout: '10s'prometheus:prometheusSpec:additionalScrapeConfigs: |- job_name: otel-collectorstatic_configs:- targets:- opentelemetry-collector:8889grafana:enabled: false
This configuration does the following:
- Sets the global scrape interval to every
5seconds and the scrape timeout to10seconds. This defines how frequently Prometheus will collect metrics and the maximum time allowed for a scrape request. - Adds a new scrape job named
otel-collector. This job is configured to scrape metrics from theopentelemetry-collectorservice at port8889. We will configure our OpenTelemetry Collector to expose this port later. - Set Grafana to
false, indicating that we are not installing Grafana as part of this Prometheus setup, as it comes with the Prometheus kube-prometheus-stack.
Step 2: Installing Prometheus
After configuring the scrape settings in prometheus-values.yaml, the next step is to install Prometheus in our Kubernetes cluster.
Begin by adding the Prometheus chart repository to your Helm setup. This ensures you have access to the latest Prometheus charts:
helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update
Now, install Prometheus with Helm using the custom configurations you've defined above:
helm install prometheus prometheus-community/kube-prometheus-stack -f prometheus-values.yaml
NAME: prometheusLAST DEPLOYED: Thu Nov 30 06:42:52 2023NAMESPACE: defaultSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:kube-prometheus-stack has been installed. Check its status by running:...
After the installation process completes, you can verify if Prometheus is running correctly using the following commands:
kubectl get podskubectl get svc
You should see something similar to this:
# kubectl get podsNAME READY STATUS RESTARTS AGE...prometheus-prometheus-node-exporter-rblhc 0/1 Pending 0 2m10sprometheus-prometheus-node-exporter-n7z8n 0/1 Pending 0 2m10sprometheus-kube-prometheus-operator-7d89b9dd4d-h24fx 1/1 Running 0 2m10sprometheus-kube-state-metrics-69bbfd8c89-xlnlk 1/1 Running 0 2m10salertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 2m7sprometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 2m6s#kubectl get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE...prometheus-prometheus-node-exporter ClusterIP 10.43.81.61 <none> 9100/TCP 6m43sprometheus-kube-prometheus-operator ClusterIP 10.43.252.136 <none> 443/TCP 6m43sprometheus-kube-prometheus-prometheus ClusterIP 10.43.64.194 <none> 9090/TCP,8080/TCP 6m43sprometheus-kube-state-metrics ClusterIP 10.43.60.6 <none> 8080/TCP 6m43sprometheus-kube-prometheus-alertmanager ClusterIP 10.43.144.21 <none> 9093/TCP,8080/TCP 6m43salertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 6m39sprometheus-operated ClusterIP None <none> 9090/TCP 6m38s 6m38s
Step 3: Creating a Service Monitor
The objective is to enable Prometheus to scrape metrics from our OpenTelemetry collector instance, allowing us to view these metrics in Grafana. To achieve this, we need to create a Service Monitor, a Kubernetes resource used by Prometheus to specify how to discover and scrape metrics from a set of services.
Create a file called service-monitor.yaml and paste in the following configuration settings:
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:name: otel-collectorlabels:release: prometheusspec:selector:matchLabels:app: opentelemetry-collector # Ensure this matches the labels of your OpenTelemetry Collector serviceendpoints:- port: metrics # The name of the port exposed by your OpenTelemetry Collector serviceinterval: 5s
This configuration sets up a service monitor called otel-collector. It has a label prometheus, which in this case is the name of our Prometheus Helm release.
The service monitor is set to look for the OpenTelemetry Collector, which we have named opentelemetry-collector. It checks the metrics port of this collector every 5 seconds. This port is where our application's metrics will be available, and we will set this up later.
Now run the following command to create the service monitor:
kubectl apply -f service-monitor.yamlkubectl get servicemonitor
You should see the following outputs:
#kubectl apply -f service-monitor.yamlservicemonitor.monitoring.coreos.com/otel-collector created#kubectl get servicemonitorNAME AGEprometheus-prometheus-node-exporter 12mprometheus-kube-prometheus-operator 12m...otel-collector 61s
Next, access the Prometheus UI on your local machine. This will allow us to confirm that it has picked up the otel-collector service monitor we just created. On your machine, run:
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090
Head over to your browser and visit the address - localhost:9090:

Click on the Status dropdown, and select Service discovery.
You should see the otel-collector listed as shown below:

Updating the OpenTelemetry Collector
Now that Loki and Prometheus are configured and installed, we need to update our OpenTelemetry Collector configuration to forward logs to Loki and metrics to Prometheus.
Navigate to your OpenTelemetry Collector configuration file and add the necessary exporters for Loki and Prometheus:
#collector.yaml...exporters:debug: {}otlp:endpoint: grafana-tempo:4317tls:insecure: trueloki:# Loki exporter configurationendpoint: http://loki:3100/loki/api/v1/pushprometheus:# Prometheus exporter configurationendpoint: 0.0.0.0:8889service:pipelines:...metrics:receivers: [otlp]processors: [batch]exporters: [debug, prometheus]...
Now upgrade the OpenTelemetry collector chart using the following command:
helm upgrade opentelemetry-collector open-telemetry/opentelemetry-collector -f collector.yaml
Next, execute the command to edit the OpenTelemetry service.
This step is necessary to add port 8889 to the list of ports exposed by the OpenTelemetry collector service.
By doing this, Prometheus will be able to access and scrape metrics from the service.
kubectl edit service opentelemetry-collector
This will open up the service manifest in a Vim editor. Scroll down to the last option in the ports section of the service specification. Press i to enter insert mode and type in the following:
- name: metricsport: 8889protocol: TCPtargetPort: 8889

Once added, exit the insert mode by pressing the Esc key. Then, type :wq and press Enter to save the changes and exit the OpenTelemetry collector service manifest file.
You should have the following output:
service/opentelemetry-collector edited
Confirm the port 8889 is actually exposed using the following command - kubectl get service. You should see the port 8889 listed among the exposed ports like so:
opentelemetry-collector ClusterIP 10.43.73.178 <none> 6831/UDP,14250/TCP,14268/TCP,4317/TCP,4318/TCP,9411/TCP,8889/TCP 108m
Head back to your Prometheus server UI, navigate to the targets option from the Status dropdown; You should see that the otel-collector service monitor is active and up as a target:

This confirms that Prometheus has been configured correctly to scrape metrics from our OpenTelemetry collector.
Viewing logs and metrics with Grafana
Up until now, we have successfully set up an infrastructure that sends logs and metrics to Loki and Prometheus. At this point, we are ready to view these components through Grafana.
Step 1: Adding Loki as a data source
To begin viewing logs in Grafana, you first need to add Loki as a data source.
Navigate to the settings icon on the left panel and select Home.

Click on Add your first data source, search and choose Loki from the list of available data sources.
In the Loki data source settings, enter the URL of your Loki service - http://loki:3100. This is usually something like http://<loki-service-name>:3100
Save and test the data source to ensure Grafana can connect to Loki.
Be sure to interact with your application so logs can be generated. If there are no logs available for Loki to pick up, the connection will not be successful.
Once connected, head over to Explore and select Loki as shown below 👇

Add the following label filters container and django-app and click on the Run query button:

You should see the following output:

This confirms that Loki is receiving logs, and based on how the Django application logging instrumentation is configured, you see the date and time the logs were generated and the TraceIds and SpanIds in every log related to the Views in the Django application.
By clicking on the logs, you get to see the label of the Django application which in this case is called django-app, the container django-app (just as it was specified in the deployment manifest for the Django application), the job representing tasks, the namespace representing the Kubernetes namespace in which the application is running.
Additionally, you will see the name of the node, indicating the specific server in the Kubernetes cluster where the pod is hosted, and the name of the pod, which is the smallest deployable unit in Kubernetes that contains the Django application.

From here, you can download the logs either in a .txt or .json format to have a complete view of what the logs comprise of:

Step 2: Adding Prometheus as a data source
Just as we did for Loki, we need to add Prometheus as a data source so we can view metrics generated by the Django application:
Follow the steps used in the previous step to add Prometheus as a data source. Use the following endpoint prometheus-kube-prometheus-prometheus:9090 in the Prometheus data settings.
Once you have successfully added Prometheus (Prometheus server) as the data source, head over to explore and select Prometheus.
Before we begin to view metrics, there are some things you should take note of:
- The Django application was instrumented using a counter metric. A counter is a simple metric type in Prometheus that only increases and resets to zero on restart. In our case, we've used it to count the number of requests the Django application receives. This gives us a straightforward yet powerful insight into the application's traffic.
- Each request to the application increments the counter by one, regardless of the request type (GET, POST, etc.) or the endpoint accessed. This approach provides a high-level overview of the application's usage and can help identify trends in traffic, peak usage times, and potential bottlenecks.
- When viewing this metric in Prometheus or Grafana, you'll see a continuously increasing graph over time, representing the cumulative count of requests.
Select the label filters exported_job and django-notes-app, click on the metric dropdown, and select request_count_total as shown below:

Once you click on Run query you should see the following:

When you run the query, you'll see a graph showing how many requests have been made over time. You can also select individual requests for a detailed view. Each request on the list is color-coded, making it easy to match with its corresponding graph.
Select the first request from the graph section; the graph will focus on that specific request and stop at its total count, as shown below:

From the image above, the first request was selected, and the graph stopped at the total count of that request which is 4.
We have successfully generated metrics in our Django application, routed them to our OpenTelemetry collector, and configured Prometheus to scrape them. Additionally, we can now view these metrics in Grafana.
Troubleshooting
In any complex setup like this, you might encounter issues. Here are some common troubleshooting steps:
- Incorrect configurations are a common source of problems. Double-check your
collector.yaml, service manifests, and any Helm value files you've used. - Ensure Prometheus is correctly discovering and scraping targets. Access Prometheus UI and check under Status → Service discovery or Status → Targets.
- Verify that the data sources in Grafana are correctly set up and can connect to Loki and Prometheus.
- If Prometheus isn't scraping metrics as expected, verify the configuration of your service monitor. Ensure the labels and selectors correctly match your OpenTelemetry Collector service. You can also use
kubectl describe servicemonitor otel-collectorto view detailed information about the service monitor.
Summary
Through this guide, we've taken a deep dive into setting up a comprehensive observability stack for a Django application pre-instrumented with OpenTelemetry running in Kubernetes. By integrating Grafana Tempo for distributed tracing, Loki for logs aggregation, and Prometheus for metrics collection, we have created a robust environment that tracks and visualizes aspects of our application's performance and health.
By completing this tutorial, you're well on your way to mastering Kubernetes-based application monitoring and troubleshooting. Keep experimenting and learning to harness the full potential of these powerful tools.
Further resources
If you want to learn more about this topic, here are some of my favorite resources:
- The OpenTelemetry Docs
- Prometheus Configuration Docs
- Loki-Stack Helm Chart Repository
- Henrik Rexed Navigate Europe 2023 talk on The Sound of Code: Instrument with OpenTelemetry

Technical writer
Mercy Bassey is a Cloud, Systems, and IT Support Specialist and technical writer with a focus on cloud infrastructure, DevOps practices, IT operations, and security. She specialises in translating complex technical concepts into clear, accessible documentation, with experience across tools and technologies including Linux, Kubernetes, Terraform, and scripting. She has contributed to Civo through the Write for Us programme and publishes additional technical content on Medium.
Share this article
Further Reading
9 October 2024