Overview
Distributed tracing for AdventureTube microservices using Micrometer Tracing + Brave + Zipkin. Zero Java code changes — Spring Boot 3.4.0 auto-instruments everything.
Completed: 2026-02-26
Branch: feature/distributed-tracing
Commits: f2a4812 (M1-M3), 5194301 (M3 fix)
Zipkin UI: zipkin.adventuretube.net / http://192.168.1.199:9411
Problem Statement
Requests flow through multiple services with no correlation:
- No trace ID — impossible to follow a single request across Gateway → Auth → Member
- No span timing — can’t identify which service/call is the bottleneck
- No visualization — debugging production issues requires SSH into each service, grepping logs independently
- Circuit breaker + timeout errors are logged per-service with no way to connect them to the original request
Architecture
Zipkin Server (:9411) — Pi1
┌─────────────────┐
│ Zipkin UI │
│ Trace Storage │
└────────▲─────────┘
│ HTTP POST /api/v2/spans
┌──────────────────┼──────────────────┐
│ │ │
┌─────┴─────┐ ┌──────┴──────┐ ┌──────┴───────┐
│ Gateway │───▶│ Auth Service │───▶│Member Service│
│ (:8030) │ │ (:8010) │ │ (:8070) │
└────────────┘ └──────────────┘ └──────────────┘
Span 1 Span 2 Span 3
└──────── Same Trace ID ───────────┘
Each service reports spans to Zipkin independently. Brave auto-injects B3 propagation headers (X-B3-TraceId, X-B3-SpanId) so the trace ID flows across all services.
What Was Implemented
M1: Dependencies (Root pom.xml)
Two dependencies added to root pom.xml — all services inherit them automatically:
<!-- Micrometer Tracing Bridge for Brave -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<!-- Zipkin Reporter for Brave -->
<dependency>
<groupId>io.zipkin.reporter2</groupId>
<artifactId>zipkin-reporter-brave</artifactId>
</dependency>
Versions managed by Spring Cloud 2024.0.0 BOM — no explicit version tags needed.
M2: Zipkin Server
Zipkin was already running on Pi1 (192.168.1.199:9411) inside the grafana-pro Docker stack. No new container needed.
Reverse proxy configured: zipkin.adventuretube.net
M3: Tracing Configuration
5 config-server YAML files + 5 local application.yml files updated:
# Config-server YAML (each service)
logging:
pattern:
level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"
management:
tracing:
sampling:
probability: 1.0
zipkin:
tracing:
endpoint: ${ZIPKIN_URL}/api/v2/spans
Environment variable ZIPKIN_URL=http://192.168.1.199:9411 added to:
env.mac(local dev)env.pi/env.pi2(production — via Jenkins Credential Store)
Auto-Instrumented Components (Zero Code Changes)
| Component | Service | How |
|---|---|---|
| WebClient (ServiceClient) | Auth Service | Auto-instrumented via ObservationWebClientCustomizer |
| Spring Cloud Gateway | Gateway | Built-in Micrometer support for routed requests |
| WebFlux (server) | All services | Incoming requests create spans automatically |
| R2DBC | Member Service | ConnectionFactoryDecorator auto-configuration |
| Resilience4j | Auth Service | Circuit breaker events tagged with trace context |
Deployment Notes
Important: Env files (env.pi, env.pi2) are sourced from Jenkins Credential Store, NOT from Git or Pi host filesystem.
Flow: Jenkinsfile-cloud → withCredentials() copies env files → Dockerfile.pi bakes them into Docker image → DotenvEnvironmentPostProcessor reads at runtime
To add/change env vars: Jenkins → Manage Jenkins → Credentials → find env.pi or env.pi2 → check “Replace” checkbox → upload new file → rebuild
Infrastructure layout:
- Pi1 (
192.168.1.199): Zipkin, Grafana, Prometheus, MongoDB, Kafka, PostgreSQL - Pi2 (
192.168.1.105): Eureka, Config Server, Gateway (Docker) - Mac: Auth, Member, Web, Geospatial services (Java runtime, not Docker)
How to Use Zipkin
Opening Zipkin UI
URL: zipkin.adventuretube.net (or http://192.168.1.199:9411)
Click the red + button → select serviceName → pick a service → click RUN QUERY
Use Case 1: Debugging a Slow Request
Scenario: iOS app reports “login took 5 seconds”
- Open Zipkin → click
+→serviceName: auth-service - Add another filter:
minDuration: 2s - Click RUN QUERY → find the slow trace
- Click on it → see the span tree:
gateway-service: http post 2.509s ████████████████████████
gateway-service: http post 2.359s ██████████████████████
auth-service: /auth/token/refresh 2.063s ████████████████████
auth-service: security filter 70ms █
auth-service: secured request 1.953s ███████████████████
- The bottleneck is in auth-service’s
secured requestspan (1.953s) - If that span calls member-service, you’d see a child span showing the DB query time
Use Case 2: Debugging a Failed Request
Scenario: User gets a 500 error
- Check the error log for any service — find the traceId:
ERROR [auth-service, 69a01b9fa945ded7, 0c5478226103ee18] Failed to refresh token
- Copy the traceId:
69a01b9fa945ded7 - Paste into “Search by trace ID” (top right of Zipkin UI)
- See exactly which service and span failed — error tags show the exception details
Use Case 3: Correlating Logs Across Services
Scenario: Need to follow one request through all services
The log pattern now includes [service-name, traceId, spanId]:
[gateway-service, 69a01b9fa945ded7, 6d489585326b0020] Routing POST /auth/token/refresh
[auth-service, 69a01b9fa945ded7, 0c5478226103ee18] Processing token refresh
[member-service, 69a01b9fa945ded7, a1b2c3d4e5f60000] Finding member by ID
- Find the traceId from any service’s log
- Search all service logs:
grep "69a01b9fa945ded7"across gateway, auth, member - See the complete request journey with timestamps
Use Case 4: Service Dependency Map
- Open Zipkin → click Dependencies (top nav)
- See auto-generated dependency graph from actual traffic:
gateway-service → auth-service → member-service
- Useful for understanding which services depend on which
Use Case 5: Circuit Breaker Debugging
Scenario: Auth-service returns 503 SERVICE_CIRCUIT_OPEN
- Search Zipkin by
serviceName: auth-service+ tagerror - Find the trace → see the span where circuit breaker tripped
- Look at preceding traces to see what failures caused the circuit to open
- Check member-service spans for the root cause (timeouts, DB errors)
Quick Reference
| I want to… | Do this in Zipkin |
|---|---|
| Find slow requests | + → serviceName → + → minDuration → RUN QUERY |
| Debug a specific error | Paste traceId in “Search by trace ID” (top right) |
| See service dependencies | Click “Dependencies” tab |
| Find error traces | + → serviceName → + → tag: error → RUN QUERY |
| See recent traffic | + → serviceName → RUN QUERY (default: last 15 min) |
Trace Context Propagation
Gateway Auth Service Member Service
│ │ │
│ HTTP Request │ │
│ Headers: │ WebClient Request │
│ B3: traceId=abc, spanId=001 │ Headers: │
│─────────────────────────────────▶│ B3: traceId=abc, spanId=002 │
│ │───────────────────────────────▶│
│ │ │
│ │ Response │
│ Response │◀───────────────────────────────│
│◀─────────────────────────────────│ │
Propagation format: B3 (Zipkin-native). Brave auto-injects X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId, X-B3-Sampled headers.
Technology Choice: Brave vs OpenTelemetry
Chosen: Brave (Zipkin-native)
| Factor | Brave | OpenTelemetry |
|---|---|---|
| Setup complexity | Simpler (2 deps) | More deps + exporters |
| Spring Boot 3.4 support | Mature, well-tested | Supported but newer |
| Zipkin integration | Native | Requires OTLP exporter |
| WebFlux auto-instrumentation | Built-in | Built-in |
Files Changed Summary
| File | Change |
|---|---|
pom.xml (root) |
Added micrometer-tracing-bridge-brave + zipkin-reporter-brave |
| 5 config-server YAMLs | Tracing config + log pattern with traceId/spanId |
| 5 local application.yml | Tracing fallback config (${ZIPKIN_URL}) |
env.mac, env.pi, env.pi2 |
Added ZIPKIN_URL=http://192.168.1.199:9411 |
Total Java code changes: 0 files. This is entirely a dependency + configuration change.
Future Considerations
- Production sampling: Reduce
management.tracing.sampling.probabilityto0.1(10%) to limit overhead - Persistent storage: Switch Zipkin from
STORAGE_TYPE=memto Elasticsearch for trace retention - Kafka tracing: When Kafka integration is active, add
spring-kafkaobservation for trace propagation through Kafka headers - Custom spans: Add manual spans for important business logic (e.g., JWT validation, Google OAuth verification)
- Grafana integration: Zipkin traces can be queried from Grafana via Zipkin datasource plugin
