Why Blocking Kafka (Not Reactive) — And Why It Doesn’t Damage the Reactive Pipeline

1. The Problem: Kafka Tracing Invisible on WebFlux — Both Approaches Failed

When geospatial-service ran on WebFlux, Kafka producer/consumer spans were never visible in Zipkin. Both reactor-kafka and KafkaTemplate were tried — neither worked.

Date: 2026-03-02 | Branch: feature/geospatial-spring-mvc-migration

What was tried (git history)

Commit Approach Result
1d36a94 Switch to reactor-kafka (ReactiveKafkaProducerTemplate) Kafka spans not visible in Zipkin
b84d7c6 Manually inject B3 headers into reactor-kafka ProducerRecord Still not working properly
f111d3a Give up on reactor-kafka, switch back to blocking KafkaTemplate (still on WebFlux) Kafka spans still not visibleKafkaTemplate reads ThreadLocal, WebFlux uses Reactor Context
821df88 Wrap producer call in Mono.deferContextual() to bridge Reactor Context → ThreadLocal Still not reliable
28badbf Migrate geospatial-service from WebFlux to Spring MVC Kafka spans visible in Zipkin — tracing fixed

Why neither approach worked on WebFlux

Attempt 1: reactor-kafka — A reactive Kafka library that should work with Reactor Context. But it was discontinued and its tracing instrumentation never properly integrated with Spring Boot’s Micrometer/Zipkin. Manual B3 header injection was a workaround, but still didn’t produce proper spans.

Attempt 2: KafkaTemplate — Spring Kafka’s standard blocking producer. It reads trace context from ThreadLocal. But on WebFlux, execution hops between event-loop threads — ThreadLocal is empty. Mono.deferContextual() was supposed to bridge Reactor Context to ThreadLocal, but the bridging was unreliable.

reactor-kafka  → uses Reactor Context → but tracing instrumentation broken
KafkaTemplate  → uses ThreadLocal     → but WebFlux has no stable ThreadLocal

Both fail on WebFlux. The common root cause: WebFlux.

The fix

Switch geospatial-service from WebFlux to Spring MVC. In Spring MVC, everything uses ThreadLocal — the trace context flows naturally from the HTTP request through KafkaTemplate into the Kafka message headers. No bridging, no workarounds.


2. The Solution: Blocking Kafka on Tomcat

Migrated geospatial-service from Spring WebFlux to Spring MVC. With Spring MVC, trace context lives in ThreadLocal natively — matching what KafkaTemplate expects.

Files Changed

  • pom.xmlwebfluxweb, mongodb-reactivemongodb, removed reactor-test
  • GeospatialServiceConfig.java@EnableWebFluxSecurity@EnableWebSecurity
  • AdventureTubeDataRepository.javaReactiveMongoRepositoryMongoRepository
  • AdventureTubeDataService.javaMono/FluxOptional/List
  • AdventureTubeDataController.javaMono<ResponseEntity>ResponseEntity
  • Consumer.java — reactive .subscribe() chain → blocking try/catch
  • All 4 test files migrated (WebFluxTestWebMvcTest, StepVerifier → JUnit assertions)

3. Performance Comparison (POST /geo/save via Gateway)

Span Before (WebFlux) After (Spring MVC)
Total Duration 210ms 128ms
Total Spans 7 9
Services Traced 2 3
gateway: http post 208ms 24ms
geo: http post /geo/save 106ms 6.5ms
security filterchain before 4ms 717μs
authorize 480μs (exchange) 180μs (request)
secured request 101ms 4.6ms
Kafka producer span MISSING 9.5ms
Kafka consumer span MISSING 100.7ms
security filterchain after 213μs 285μs

Key Results

  1. Kafka tracing fixed — both producer and consumer spans now visible in Zipkin
  2. ~6x faster HTTP processing — geo endpoint dropped from 106ms to 6.5ms
  3. ~10x faster gateway routing — total duration from 210ms to 128ms
  4. 3 services traced instead of 2 — Kafka now appears as a separate service
  5. Simpler code — no more Mono/Flux, reactive chains, or context propagation workarounds
  6. All 23 unit tests pass after migration

4. Why Blocking Kafka Doesn’t Damage the Reactive Pipeline

The write path is async by design — the HTTP response returns immediately after Kafka publish. The actual DB write happens later via Kafka consumer. Blocking geospatial-service doesn’t damage the reactive pipeline because the pipeline’s job is already done before the consumer runs.

The Data Flow: Write Path Is Async

The client does not wait for MongoDB to save. The flow is:

iOS → Auth-service → Geospatial-service → Kafka publish → 202 Accepted + trackingId
                                                                (instant return)

                                           Kafka consumer → MongoDB save (happens later)
                                                  ↓
                                           Status: PENDING → COMPLETED
  1. Auth-service (Netty/reactive) forwards the request to geospatial-service
  2. Geospatial-service publishes to Kafka and returns 202 Accepted with a tracking ID immediately
  3. Auth-service returns the 202 to the client — reactive pipeline is done
  4. Kafka consumer picks up the message and writes to MongoDB — this is a background worker, not part of the HTTP request chain

SSE Covers the Status Update

The client needs to know when the Kafka consumer finishes. This is handled by SSE (Server-Sent Events), not polling:

POST:  iOS → Auth → Geo → Kafka publish → 202 + trackingId  (instant)
SSE:   iOS ← Auth ← Geo ← Kafka consumer finishes → status: COMPLETED

The client opens an SSE stream with the tracking ID after receiving 202. When the consumer completes the MongoDB write, it pushes a status update through SSE back to the client.

Why Geospatial-Service Is Blocking

  1. The HTTP response doesn’t wait for DB write — it returns 202 after Kafka publish, so there’s no long-running reactive chain to maintain
  2. The Kafka consumer is a background worker — it polls, deserializes, writes to MongoDB. This is inherently sequential blocking work
  3. SSE handles the async notification — the status transition (PENDING → COMPLETED) is pushed via SSE, not through the HTTP response
  4. Tracing requires ThreadLocalKafkaTemplate needs ThreadLocal for Zipkin spans, which Spring MVC provides natively

Why Leaf Services Don’t Need WebClient

Service Makes outbound HTTP calls? Needs WebClient? Why
Auth-service Yes → member, geospatial Yes (Netty, must be reactive) Event-loop — .block() freezes all requests
Web-service Yes → geospatial Yes (future SSE/streaming) Needs reactive stream support for SSE proxy
Member-service No No Leaf service — receives calls, writes to PostgreSQL
Geospatial-service No No Leaf service — receives calls, writes to MongoDB via Kafka

5. Summary

Decision Reason
Geospatial uses blocking Kafka Fixes Zipkin tracing — KafkaTemplate reads ThreadLocal, not Reactor Context
Blocking Kafka doesn’t break the pipeline Write path is async by design — HTTP returns 202 before DB write. Kafka consumer is a background worker.
SSE covers status updates Client opens SSE stream after 202 — gets pushed PENDING → COMPLETED when consumer finishes
Leaf services don’t need WebClient Member/Geo make no outbound HTTP calls — they only receive and process

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top