Why Blocking Kafka (Not Reactive) — And Why It Doesn’t Damage the Reactive Pipeline

1. The Problem: Kafka Tracing Invisible on WebFlux — Both Approaches Failed

When geospatial-service ran on WebFlux, Kafka producer/consumer spans were never visible in Zipkin. Both reactor-kafka and KafkaTemplate were tried — neither worked.

Date: 2026-03-02 | Branch: feature/geospatial-spring-mvc-migration

What was tried (git history)

Commit	Approach	Result
`1d36a94`	Switch to `reactor-kafka` (`ReactiveKafkaProducerTemplate`)	Kafka spans not visible in Zipkin
`b84d7c6`	Manually inject B3 headers into `reactor-kafka` ProducerRecord	Still not working properly
`f111d3a`	Give up on `reactor-kafka`, switch back to blocking `KafkaTemplate` (still on WebFlux)	Kafka spans still not visible — `KafkaTemplate` reads `ThreadLocal`, WebFlux uses Reactor Context
`821df88`	Wrap producer call in `Mono.deferContextual()` to bridge Reactor Context → `ThreadLocal`	Still not reliable
`28badbf`	Migrate geospatial-service from WebFlux to Spring MVC	Kafka spans visible in Zipkin — tracing fixed

Why neither approach worked on WebFlux

Attempt 1: reactor-kafka — A reactive Kafka library that should work with Reactor Context. But it was discontinued and its tracing instrumentation never properly integrated with Spring Boot’s Micrometer/Zipkin. Manual B3 header injection was a workaround, but still didn’t produce proper spans.

Attempt 2: KafkaTemplate — Spring Kafka’s standard blocking producer. It reads trace context from ThreadLocal. But on WebFlux, execution hops between event-loop threads — ThreadLocal is empty. Mono.deferContextual() was supposed to bridge Reactor Context to ThreadLocal, but the bridging was unreliable.

reactor-kafka  → uses Reactor Context → but tracing instrumentation broken
KafkaTemplate  → uses ThreadLocal     → but WebFlux has no stable ThreadLocal

Both fail on WebFlux. The common root cause: WebFlux.

The fix

Switch geospatial-service from WebFlux to Spring MVC. In Spring MVC, everything uses ThreadLocal — the trace context flows naturally from the HTTP request through KafkaTemplate into the Kafka message headers. No bridging, no workarounds.

2. The Solution: Blocking Kafka on Tomcat

Migrated geospatial-service from Spring WebFlux to Spring MVC. With Spring MVC, trace context lives in ThreadLocal natively — matching what KafkaTemplate expects.

Files Changed

pom.xml — webflux → web, mongodb-reactive → mongodb, removed reactor-test
GeospatialServiceConfig.java — @EnableWebFluxSecurity → @EnableWebSecurity
AdventureTubeDataRepository.java — ReactiveMongoRepository → MongoRepository
AdventureTubeDataService.java — Mono/Flux → Optional/List
AdventureTubeDataController.java — Mono<ResponseEntity> → ResponseEntity
Consumer.java — reactive .subscribe() chain → blocking try/catch
All 4 test files migrated (WebFluxTest → WebMvcTest, StepVerifier → JUnit assertions)

3. Performance Comparison (POST /geo/save via Gateway)

Span	Before (WebFlux)	After (Spring MVC)
Total Duration	210ms	128ms
Total Spans	7	9
Services Traced	2	3
gateway: http post	208ms	24ms
geo: http post /geo/save	106ms	6.5ms
security filterchain before	4ms	717μs
authorize	480μs (exchange)	180μs (request)
secured request	101ms	4.6ms
Kafka producer span	MISSING	9.5ms
Kafka consumer span	MISSING	100.7ms
security filterchain after	213μs	285μs

Key Results

Kafka tracing fixed — both producer and consumer spans now visible in Zipkin
~6x faster HTTP processing — geo endpoint dropped from 106ms to 6.5ms
~10x faster gateway routing — total duration from 210ms to 128ms
3 services traced instead of 2 — Kafka now appears as a separate service
Simpler code — no more Mono/Flux, reactive chains, or context propagation workarounds
All 23 unit tests pass after migration

4. Why Blocking Kafka Doesn’t Damage the Reactive Pipeline

The write path is async by design — the HTTP response returns immediately after Kafka publish. The actual DB write happens later via Kafka consumer. Blocking geospatial-service doesn’t damage the reactive pipeline because the pipeline’s job is already done before the consumer runs.

The Data Flow: Write Path Is Async

The client does not wait for MongoDB to save. The flow is:

iOS → Auth-service → Geospatial-service → Kafka publish → 202 Accepted + trackingId
                                                                (instant return)

                                           Kafka consumer → MongoDB save (happens later)
                                                  ↓
                                           Status: PENDING → COMPLETED

Auth-service (Netty/reactive) forwards the request to geospatial-service
Geospatial-service publishes to Kafka and returns 202 Accepted with a tracking ID immediately
Auth-service returns the 202 to the client — reactive pipeline is done
Kafka consumer picks up the message and writes to MongoDB — this is a background worker, not part of the HTTP request chain

SSE Covers the Status Update

The client needs to know when the Kafka consumer finishes. This is handled by SSE (Server-Sent Events), not polling:

POST:  iOS → Auth → Geo → Kafka publish → 202 + trackingId  (instant)
SSE:   iOS ← Auth ← Geo ← Kafka consumer finishes → status: COMPLETED

The client opens an SSE stream with the tracking ID after receiving 202. When the consumer completes the MongoDB write, it pushes a status update through SSE back to the client.

Why Geospatial-Service Is Blocking

The HTTP response doesn’t wait for DB write — it returns 202 after Kafka publish, so there’s no long-running reactive chain to maintain
The Kafka consumer is a background worker — it polls, deserializes, writes to MongoDB. This is inherently sequential blocking work
SSE handles the async notification — the status transition (PENDING → COMPLETED) is pushed via SSE, not through the HTTP response
Tracing requires ThreadLocal — KafkaTemplate needs ThreadLocal for Zipkin spans, which Spring MVC provides natively

Why Leaf Services Don’t Need WebClient

Service	Makes outbound HTTP calls?	Needs WebClient?	Why
Auth-service	Yes → member, geospatial	Yes (Netty, must be reactive)	Event-loop — `.block()` freezes all requests
Web-service	Yes → geospatial	Yes (future SSE/streaming)	Needs reactive stream support for SSE proxy
Member-service	No	No	Leaf service — receives calls, writes to PostgreSQL
Geospatial-service	No	No	Leaf service — receives calls, writes to MongoDB via Kafka

5. Summary

Decision	Reason
Geospatial uses blocking Kafka	Fixes Zipkin tracing — `KafkaTemplate` reads `ThreadLocal`, not Reactor Context
Blocking Kafka doesn’t break the pipeline	Write path is async by design — HTTP returns 202 before DB write. Kafka consumer is a background worker.
SSE covers status updates	Client opens SSE stream after 202 — gets pushed PENDING → COMPLETED when consumer finishes
Leaf services don’t need WebClient	Member/Geo make no outbound HTTP calls — they only receive and process