1. Why Mixed Reactive / Non-Reactive Architecture
The project originally attempted a fully reactive stack across all services, but was reverted to blocking (non-reactive) for member, geospatial, and web services. Only auth-service remains reactive.
Reasons for reverting
1. Debugging
Reactive code is difficult to debug with IntelliJ breakpoints. The event-loop model means execution doesn’t follow a linear call stack, making step-through debugging impractical for business logic.
2. Kafka distributed tracing
The reactive Kafka library (reactor-kafka) was discontinued, making Zipkin tracing for Kafka producer/consumer spans unfeasible in a reactive context. Despite extensive attempts, reactive context propagation through Kafka could not be resolved. Reverting to blocking made Kafka tracing work properly.
3. Future streaming/WebSocket needs
WebClient and reactive infrastructure are kept in the codebase (via ServiceClient in common-api) because future features like streaming or WebSocket services will benefit from reactive support.
4. WebClient .block() on Tomcat is safe with Java 21 virtual threads
The blocking services still use WebClient (not RestTemplate) for inter-service calls, calling .block() to get synchronous results. This is normally an anti-pattern because .block() holds a platform thread. However, with Java 21 virtual threads enabled (spring.threads.virtual.enabled=true), the virtual thread yields its carrier thread on .block() and releases it almost instantly — eliminating the concurrency problem.
Service threading summary
| Service | Server | Model | Reason |
|---|---|---|---|
| Auth | Netty | Reactive | Stateless proxy — no DB, just forwards to member/geospatial service |
| Member | Tomcat | Blocking | JPA/JDBC is inherently blocking |
| Geospatial | Tomcat | Blocking | Spring Data MongoDB blocking driver |
| Web | Tomcat | Blocking | Calls geo-service via .block() |
Service routing
Writes (create/update/delete): iOS → Gateway (/auth/geo/**) → Auth Service → Geospatial Service
Reads: iOS → Gateway (/web/geo/**) → Web Service → Geospatial Service
Internal only: /geo/** not exposed through gateway
2. Parameter Type vs Return Type — Why It Matters
On Netty (Auth Service)
Rule: On Netty, parameter type doesn’t matter but return type must be reactive (Mono/Flux).
Parameter: T vs Mono<T> — doesn’t matter
// Both are non-blocking on Netty:
public Mono<ResponseEntity<?>> save(@RequestBody JsonNode body) // OK
public Mono<ResponseEntity<?>> save(@RequestBody Mono<JsonNode> body) // OK (but pointless extra .flatMap())
Even with plain T, Spring WebFlux reads the request body non-blocking internally using the event-loop. It resolves the body first, then calls your method. No OS thread is held waiting for bytes — Netty registers interest with the OS and moves on until bytes arrive.
Mono<T> parameter gives theoretical control over when to trigger deserialization (backpressure), but in practice you always subscribe immediately via .flatMap(), so you never use that control.
Return: Mono<T> vs plain T — matters!
// CORRECT — Netty subscribes to Mono, sends response when ready
public Mono<ResponseEntity<?>> save(...) {
return serviceClient.postRawReactive(...) // non-blocking chain
.map(result -> ResponseEntity.ok(result));
}
// WRONG — must .block() to get plain T, freezes the event-loop
public ResponseEntity<?> save(...) {
String result = serviceClient.postRawReactive(...).block(); // blocks event-loop!
return ResponseEntity.ok(result);
}
Mono<T> tells Netty: “subscribe and send when ready.” Plain T means the result must already be resolved — which requires .block(), freezing the event-loop thread and stopping all other requests on that thread.
On Tomcat + Virtual Threads (Member, Geospatial, Web Services)
Rule: On Tomcat with virtual threads, neither parameter type nor return type matters. Use plain T for both — it’s simpler.
// Simple and correct on Tomcat:
public ResponseEntity<String> save(@RequestBody AdventureTubeData data) {
// blocking call is fine — virtual thread yields carrier thread
String result = serviceClient.postRawNonReactive(...);
return ResponseEntity.ok(result);
}
Tomcat uses one thread per request. With Java 21 virtual threads (spring.threads.virtual.enabled=true), .block() causes the virtual thread to yield its carrier (OS) thread almost instantly — no concurrency problem. Reactive return types (Mono/Flux) on Tomcat add complexity for zero benefit — Tomcat would just block waiting for the Mono to resolve anyway.
Summary
| Netty (Auth Service) | Tomcat + Virtual Threads (Other 3) | |
|---|---|---|
| Parameter type | Doesn’t matter | Doesn’t matter |
| Return type | Must be Mono/Flux |
Doesn’t matter — use plain T |
| I/O inside method | Must be reactive chain (no .block()) |
.block() is safe (virtual thread yields) |
| Why | Event-loop — .block() freezes all requests on that thread |
Virtual thread — .block() yields carrier thread instantly |
Why WebClient everywhere (not RestTemplate)
All services use WebClient via ServiceClient in common-api, even the blocking Tomcat services. Reasons:
- Unified codebase — one
ServiceClientclass, reactive methods + NonReactive wrappers - Future SSE/streaming —
WebClientsupports SSE streams natively,RestTemplatedoes not - Virtual threads make it safe —
.block()onWebClientwith virtual threads has no concurrency penalty RestTemplateis in maintenance mode — Spring recommendsWebClientgoing forward
Why Auth-Service Still Needs Reactive Coding Style
Java 21 virtual threads do NOT help on Netty. Auth-service must use reactive code regardless of Java version.
This is confusing because virtual threads solve the .block() problem on Tomcat — so why not on Netty?
The key difference is how each server handles threads:
Tomcat creates a thread per request. With virtual threads, these become cheap lightweight threads. When .block() is called, the virtual thread yields its carrier (OS) thread — the carrier is released to handle other work. When the I/O completes, the virtual thread resumes on any available carrier. This is why .block() is safe on Tomcat + virtual threads.
Netty does not create threads per request. It uses a small fixed pool of event-loop threads (typically 2x CPU cores). These are real OS threads, not virtual threads. All requests share these few threads via non-blocking I/O. When you .block() on an event-loop thread, there is no virtual thread mechanism to yield — the event-loop thread itself is frozen. All other requests assigned to that event-loop thread stop until .block() returns.
| Tomcat | Netty | |
|---|---|---|
| Thread model | 1 thread per request (virtual thread) | Fixed event-loop pool (OS threads, shared) |
What happens on .block() |
Virtual thread yields carrier — carrier freed | Event-loop thread freezes — all requests on it stop |
| Virtual threads help? | Yes — makes .block() cheap |
No — event-loop is not a virtual thread |
| Coding style required | Blocking is fine | Must be reactive (Mono/Flux chains) |
Example: 4 event-loop threads, one .block() call takes 2 seconds
- 25% of all server capacity is frozen for 2 seconds
- All requests on that event-loop thread queue up and wait
- With enough
.block()calls, the entire server becomes unresponsive
Bottom line: Virtual threads and Netty event-loop are two completely different concurrency models. They don’t mix. Auth-service runs on Netty, so it must use reactive chains — no .block(), no exceptions.
3. SSE for Async Kafka Result Notification
Current problem
The /geo/save endpoint sends data to Kafka and returns 202 Accepted, but the client has no way to know when processing completes. It is fire-and-forget with no tracking.
Decision: SSE over polling
Why SSE
- Cleaner iOS code — Subscribe and react. No timer, no retry loop, no polling interval decisions.
- Zero wasted requests — No unnecessary polls hitting the server.
- Instant feedback — Result pushed the moment Kafka consumer finishes.
- Pi is dev only — Production will run on proper infrastructure that handles SSE connections easily.
Why not polling
- Requires timer, retry logic, and state management on iOS.
- Wasteful requests when Kafka processing is near-instant (< 1 second).
- More complex client code for no real benefit.
Implementation approach
SseEmitteron geospatial-service (Tomcat) — holds connection open until Kafka consumer finishes- Auth-service (Netty) proxies the SSE stream via
WebClientreactiveFlux - Tracking ID returned on
/auth/geo/save, client opens SSE stream on/auth/geo/status/stream/{trackingId}
SSE flow
iOS AuthService GeospatialService Kafka MongoDB
│ │ │ │ │
│ POST /auth/geo/save │ │ │ │
│ + JWT │ │ │ │
│ ───────────────────────>│ │ │ │
│ │ Extract email, │ │ │
│ │ inject ownerEmail │ │ │
│ │ POST /geo/save │ │ │
│ │ ────────────────────────>│ │ │
│ │ │ Send to topic │ │
│ │ │ ──────────────────>│ │
│ │ 202 + trackingId │ │ │
│ │ <────────────────────────│ │ │
│ 202 + trackingId │ │ │ │
│ <───────────────────────│ │ │ │
│ │ │ │ │
│ GET /auth/geo/status/ │ │ │ │
│ stream/{trackingId} │ │ │ │
│ + JWT │ │ │ │
│ ───────────────────────>│ Proxy SSE stream │ │ │
│ │ ────────────────────────>│ │ │
│ │ │ SseEmitter held │ │
│ │ │ open │ │
│ │ │ │ │
│ │ │ Consumer receives │ │
│ │ │ <──────────────────│ │
│ │ │ │ Save data │
│ │ │ ──────────────────────────────────>│
│ │ SSE event COMPLETED │ │ │
│ │ <────────────────────────│ │ │
│ SSE event COMPLETED │ │ │ │
│ <───────────────────────│ │ │ │
