1. The Problem: Kafka Tracing Invisible on WebFlux — Both Approaches Failed
When geospatial-service ran on WebFlux, Kafka producer/consumer spans were never visible in Zipkin. Both reactor-kafka and KafkaTemplate were tried — neither worked.
Date: 2026-03-02 | Branch: feature/geospatial-spring-mvc-migration
What was tried (git history)
| Commit | Approach | Result |
|---|---|---|
1d36a94 |
Switch to reactor-kafka (ReactiveKafkaProducerTemplate) |
Kafka spans not visible in Zipkin |
b84d7c6 |
Manually inject B3 headers into reactor-kafka ProducerRecord |
Still not working properly |
f111d3a |
Give up on reactor-kafka, switch back to blocking KafkaTemplate (still on WebFlux) |
Kafka spans still not visible — KafkaTemplate reads ThreadLocal, WebFlux uses Reactor Context |
821df88 |
Wrap producer call in Mono.deferContextual() to bridge Reactor Context → ThreadLocal |
Still not reliable |
28badbf |
Migrate geospatial-service from WebFlux to Spring MVC | Kafka spans visible in Zipkin — tracing fixed |
Why neither approach worked on WebFlux
Attempt 1: reactor-kafka — A reactive Kafka library that should work with Reactor Context. But it was discontinued and its tracing instrumentation never properly integrated with Spring Boot’s Micrometer/Zipkin. Manual B3 header injection was a workaround, but still didn’t produce proper spans.
Attempt 2: KafkaTemplate — Spring Kafka’s standard blocking producer. It reads trace context from ThreadLocal. But on WebFlux, execution hops between event-loop threads — ThreadLocal is empty. Mono.deferContextual() was supposed to bridge Reactor Context to ThreadLocal, but the bridging was unreliable.
reactor-kafka → uses Reactor Context → but tracing instrumentation broken
KafkaTemplate → uses ThreadLocal → but WebFlux has no stable ThreadLocal
Both fail on WebFlux. The common root cause: WebFlux.
The fix
Switch geospatial-service from WebFlux to Spring MVC. In Spring MVC, everything uses ThreadLocal — the trace context flows naturally from the HTTP request through KafkaTemplate into the Kafka message headers. No bridging, no workarounds.
2. The Solution: Blocking Kafka on Tomcat
Migrated geospatial-service from Spring WebFlux to Spring MVC. With Spring MVC, trace context lives in ThreadLocal natively — matching what KafkaTemplate expects.
Files Changed
pom.xml—webflux→web,mongodb-reactive→mongodb, removedreactor-testGeospatialServiceConfig.java—@EnableWebFluxSecurity→@EnableWebSecurityAdventureTubeDataRepository.java—ReactiveMongoRepository→MongoRepositoryAdventureTubeDataService.java—Mono/Flux→Optional/ListAdventureTubeDataController.java—Mono<ResponseEntity>→ResponseEntityConsumer.java— reactive.subscribe()chain → blockingtry/catch- All 4 test files migrated (
WebFluxTest→WebMvcTest,StepVerifier→ JUnit assertions)
3. Performance Comparison (POST /geo/save via Gateway)
| Span | Before (WebFlux) | After (Spring MVC) |
|---|---|---|
| Total Duration | 210ms | 128ms |
| Total Spans | 7 | 9 |
| Services Traced | 2 | 3 |
| gateway: http post | 208ms | 24ms |
| geo: http post /geo/save | 106ms | 6.5ms |
| security filterchain before | 4ms | 717μs |
| authorize | 480μs (exchange) | 180μs (request) |
| secured request | 101ms | 4.6ms |
| Kafka producer span | MISSING | 9.5ms |
| Kafka consumer span | MISSING | 100.7ms |
| security filterchain after | 213μs | 285μs |
Key Results
- Kafka tracing fixed — both producer and consumer spans now visible in Zipkin
- ~6x faster HTTP processing — geo endpoint dropped from 106ms to 6.5ms
- ~10x faster gateway routing — total duration from 210ms to 128ms
- 3 services traced instead of 2 — Kafka now appears as a separate service
- Simpler code — no more Mono/Flux, reactive chains, or context propagation workarounds
- All 23 unit tests pass after migration
4. Why Blocking Kafka Doesn’t Damage the Reactive Pipeline
The write path is async by design — the HTTP response returns immediately after Kafka publish. The actual DB write happens later via Kafka consumer. Blocking geospatial-service doesn’t damage the reactive pipeline because the pipeline’s job is already done before the consumer runs.
The Data Flow: Write Path Is Async
The client does not wait for MongoDB to save. The flow is:
iOS → Auth-service → Geospatial-service → Kafka publish → 202 Accepted + trackingId
(instant return)
Kafka consumer → MongoDB save (happens later)
↓
Status: PENDING → COMPLETED
- Auth-service (Netty/reactive) forwards the request to geospatial-service
- Geospatial-service publishes to Kafka and returns 202 Accepted with a tracking ID immediately
- Auth-service returns the 202 to the client — reactive pipeline is done
- Kafka consumer picks up the message and writes to MongoDB — this is a background worker, not part of the HTTP request chain
SSE Covers the Status Update
The client needs to know when the Kafka consumer finishes. This is handled by SSE (Server-Sent Events), not polling:
POST: iOS → Auth → Geo → Kafka publish → 202 + trackingId (instant)
SSE: iOS ← Auth ← Geo ← Kafka consumer finishes → status: COMPLETED
The client opens an SSE stream with the tracking ID after receiving 202. When the consumer completes the MongoDB write, it pushes a status update through SSE back to the client.
Why Geospatial-Service Is Blocking
- The HTTP response doesn’t wait for DB write — it returns 202 after Kafka publish, so there’s no long-running reactive chain to maintain
- The Kafka consumer is a background worker — it polls, deserializes, writes to MongoDB. This is inherently sequential blocking work
- SSE handles the async notification — the status transition (PENDING → COMPLETED) is pushed via SSE, not through the HTTP response
- Tracing requires
ThreadLocal—KafkaTemplateneedsThreadLocalfor Zipkin spans, which Spring MVC provides natively
Why Leaf Services Don’t Need WebClient
| Service | Makes outbound HTTP calls? | Needs WebClient? | Why |
|---|---|---|---|
| Auth-service | Yes → member, geospatial | Yes (Netty, must be reactive) | Event-loop — .block() freezes all requests |
| Web-service | Yes → geospatial | Yes (future SSE/streaming) | Needs reactive stream support for SSE proxy |
| Member-service | No | No | Leaf service — receives calls, writes to PostgreSQL |
| Geospatial-service | No | No | Leaf service — receives calls, writes to MongoDB via Kafka |
5. Summary
| Decision | Reason |
|---|---|
| Geospatial uses blocking Kafka | Fixes Zipkin tracing — KafkaTemplate reads ThreadLocal, not Reactor Context |
| Blocking Kafka doesn’t break the pipeline | Write path is async by design — HTTP returns 202 before DB write. Kafka consumer is a background worker. |
| SSE covers status updates | Client opens SSE stream after 202 — gets pushed PENDING → COMPLETED when consumer finishes |
| Leaf services don’t need WebClient | Member/Geo make no outbound HTTP calls — they only receive and process |
