Distributed Tracing (Micrometer + Zipkin)

Overview

Distributed tracing for AdventureTube microservices using Micrometer Tracing + Brave + Zipkin. Zero Java code changes — Spring Boot 3.4.0 auto-instruments everything.

Completed: 2026-02-26
Branch: feature/distributed-tracing
Commits: f2a4812 (M1-M3), 5194301 (M3 fix)
Zipkin UI: zipkin.adventuretube.net / http://192.168.1.199:9411


Problem Statement

Requests flow through multiple services with no correlation:

  • No trace ID — impossible to follow a single request across Gateway → Auth → Member
  • No span timing — can’t identify which service/call is the bottleneck
  • No visualization — debugging production issues requires SSH into each service, grepping logs independently
  • Circuit breaker + timeout errors are logged per-service with no way to connect them to the original request

Architecture

                Zipkin Server (:9411) — Pi1
                ┌─────────────────┐
                │   Zipkin UI      │
                │   Trace Storage  │
                └────────▲─────────┘
                         │ HTTP POST /api/v2/spans
      ┌──────────────────┼──────────────────┐
      │                  │                   │
┌─────┴─────┐    ┌──────┴──────┐    ┌──────┴───────┐
│  Gateway   │───▶│ Auth Service │───▶│Member Service│
│  (:8030)   │    │  (:8010)     │    │  (:8070)     │
└────────────┘    └──────────────┘    └──────────────┘
   Span 1            Span 2              Span 3
      └──────── Same Trace ID ───────────┘

Each service reports spans to Zipkin independently. Brave auto-injects B3 propagation headers (X-B3-TraceId, X-B3-SpanId) so the trace ID flows across all services.


What Was Implemented

M1: Dependencies (Root pom.xml)

Two dependencies added to root pom.xml — all services inherit them automatically:

<!-- Micrometer Tracing Bridge for Brave -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>

<!-- Zipkin Reporter for Brave -->
<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>

Versions managed by Spring Cloud 2024.0.0 BOM — no explicit version tags needed.

M2: Zipkin Server

Zipkin was already running on Pi1 (192.168.1.199:9411) inside the grafana-pro Docker stack. No new container needed.

Reverse proxy configured: zipkin.adventuretube.net

M3: Tracing Configuration

5 config-server YAML files + 5 local application.yml files updated:

# Config-server YAML (each service)
logging:
  pattern:
    level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"

management:
  tracing:
    sampling:
      probability: 1.0
  zipkin:
    tracing:
      endpoint: ${ZIPKIN_URL}/api/v2/spans

Environment variable ZIPKIN_URL=http://192.168.1.199:9411 added to:

  • env.mac (local dev)
  • env.pi / env.pi2 (production — via Jenkins Credential Store)

Auto-Instrumented Components (Zero Code Changes)

Component Service How
WebClient (ServiceClient) Auth Service Auto-instrumented via ObservationWebClientCustomizer
Spring Cloud Gateway Gateway Built-in Micrometer support for routed requests
WebFlux (server) All services Incoming requests create spans automatically
R2DBC Member Service ConnectionFactoryDecorator auto-configuration
Resilience4j Auth Service Circuit breaker events tagged with trace context

Deployment Notes

Important: Env files (env.pi, env.pi2) are sourced from Jenkins Credential Store, NOT from Git or Pi host filesystem.

Flow: Jenkinsfile-cloudwithCredentials() copies env files → Dockerfile.pi bakes them into Docker image → DotenvEnvironmentPostProcessor reads at runtime

To add/change env vars: Jenkins → Manage Jenkins → Credentials → find env.pi or env.pi2 → check “Replace” checkbox → upload new file → rebuild

Infrastructure layout:

  • Pi1 (192.168.1.199): Zipkin, Grafana, Prometheus, MongoDB, Kafka, PostgreSQL
  • Pi2 (192.168.1.105): Eureka, Config Server, Gateway (Docker)
  • Mac: Auth, Member, Web, Geospatial services (Java runtime, not Docker)

How to Use Zipkin

Opening Zipkin UI

URL: zipkin.adventuretube.net (or http://192.168.1.199:9411)

Click the red + button → select serviceName → pick a service → click RUN QUERY


Use Case 1: Debugging a Slow Request

Scenario: iOS app reports “login took 5 seconds”

  1. Open Zipkin → click +serviceName: auth-service
  2. Add another filter: minDuration: 2s
  3. Click RUN QUERY → find the slow trace
  4. Click on it → see the span tree:
gateway-service: http post           2.509s  ████████████████████████
  gateway-service: http post         2.359s  ██████████████████████
    auth-service: /auth/token/refresh  2.063s  ████████████████████
      auth-service: security filter    70ms  █
      auth-service: secured request    1.953s  ███████████████████
  1. The bottleneck is in auth-service’s secured request span (1.953s)
  2. If that span calls member-service, you’d see a child span showing the DB query time

Use Case 2: Debugging a Failed Request

Scenario: User gets a 500 error

  1. Check the error log for any service — find the traceId:
ERROR [auth-service, 69a01b9fa945ded7, 0c5478226103ee18] Failed to refresh token
  1. Copy the traceId: 69a01b9fa945ded7
  2. Paste into “Search by trace ID” (top right of Zipkin UI)
  3. See exactly which service and span failed — error tags show the exception details

Use Case 3: Correlating Logs Across Services

Scenario: Need to follow one request through all services

The log pattern now includes [service-name, traceId, spanId]:

[gateway-service,  69a01b9fa945ded7, 6d489585326b0020] Routing POST /auth/token/refresh
[auth-service,     69a01b9fa945ded7, 0c5478226103ee18] Processing token refresh
[member-service,   69a01b9fa945ded7, a1b2c3d4e5f60000] Finding member by ID
  1. Find the traceId from any service’s log
  2. Search all service logs: grep "69a01b9fa945ded7" across gateway, auth, member
  3. See the complete request journey with timestamps

Use Case 4: Service Dependency Map

  1. Open Zipkin → click Dependencies (top nav)
  2. See auto-generated dependency graph from actual traffic:
gateway-service → auth-service → member-service
  1. Useful for understanding which services depend on which

Use Case 5: Circuit Breaker Debugging

Scenario: Auth-service returns 503 SERVICE_CIRCUIT_OPEN

  1. Search Zipkin by serviceName: auth-service + tag error
  2. Find the trace → see the span where circuit breaker tripped
  3. Look at preceding traces to see what failures caused the circuit to open
  4. Check member-service spans for the root cause (timeouts, DB errors)

Quick Reference

I want to… Do this in Zipkin
Find slow requests + → serviceName → + → minDuration → RUN QUERY
Debug a specific error Paste traceId in “Search by trace ID” (top right)
See service dependencies Click “Dependencies” tab
Find error traces + → serviceName → + → tag: error → RUN QUERY
See recent traffic + → serviceName → RUN QUERY (default: last 15 min)

Trace Context Propagation

Gateway                          Auth Service                     Member Service
   │                                  │                                │
   │  HTTP Request                    │                                │
   │  Headers:                        │  WebClient Request             │
   │  B3: traceId=abc, spanId=001     │  Headers:                      │
   │─────────────────────────────────▶│  B3: traceId=abc, spanId=002   │
   │                                  │───────────────────────────────▶│
   │                                  │                                │
   │                                  │  Response                      │
   │  Response                        │◀───────────────────────────────│
   │◀─────────────────────────────────│                                │

Propagation format: B3 (Zipkin-native). Brave auto-injects X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId, X-B3-Sampled headers.


Technology Choice: Brave vs OpenTelemetry

Chosen: Brave (Zipkin-native)

Factor Brave OpenTelemetry
Setup complexity Simpler (2 deps) More deps + exporters
Spring Boot 3.4 support Mature, well-tested Supported but newer
Zipkin integration Native Requires OTLP exporter
WebFlux auto-instrumentation Built-in Built-in

Files Changed Summary

File Change
pom.xml (root) Added micrometer-tracing-bridge-brave + zipkin-reporter-brave
5 config-server YAMLs Tracing config + log pattern with traceId/spanId
5 local application.yml Tracing fallback config (${ZIPKIN_URL})
env.mac, env.pi, env.pi2 Added ZIPKIN_URL=http://192.168.1.199:9411

Total Java code changes: 0 files. This is entirely a dependency + configuration change.


Future Considerations

  • Production sampling: Reduce management.tracing.sampling.probability to 0.1 (10%) to limit overhead
  • Persistent storage: Switch Zipkin from STORAGE_TYPE=mem to Elasticsearch for trace retention
  • Kafka tracing: When Kafka integration is active, add spring-kafka observation for trace propagation through Kafka headers
  • Custom spans: Add manual spans for important business logic (e.g., JWT validation, Google OAuth verification)
  • Grafana integration: Zipkin traces can be queried from Grafana via Zipkin datasource plugin

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top