Architecture Decisions: Mixed Reactive/Non-Reactive & SSE

1. Why Mixed Reactive / Non-Reactive Architecture

The project originally attempted a fully reactive stack across all services, but was reverted to blocking (non-reactive) for member, geospatial, and web services. Only auth-service remains reactive.

Reasons for reverting

1. Debugging

Reactive code is difficult to debug with IntelliJ breakpoints. The event-loop model means execution doesn’t follow a linear call stack, making step-through debugging impractical for business logic.

2. Kafka distributed tracing

The reactive Kafka library (reactor-kafka) was discontinued, making Zipkin tracing for Kafka producer/consumer spans unfeasible in a reactive context. Despite extensive attempts, reactive context propagation through Kafka could not be resolved. Reverting to blocking made Kafka tracing work properly.

3. Future streaming/WebSocket needs

WebClient and reactive infrastructure are kept in the codebase (via ServiceClient in common-api) because future features like streaming or WebSocket services will benefit from reactive support.

4. WebClient .block() on Tomcat is safe with Java 21 virtual threads

The blocking services still use WebClient (not RestTemplate) for inter-service calls, calling .block() to get synchronous results. This is normally an anti-pattern because .block() holds a platform thread. However, with Java 21 virtual threads enabled (spring.threads.virtual.enabled=true), the virtual thread yields its carrier thread on .block() and releases it almost instantly — eliminating the concurrency problem.

Service threading summary

Service	Server	Model	Reason
Auth	Netty	Reactive	Stateless proxy — no DB, just forwards to member/geospatial service
Member	Tomcat	Blocking	JPA/JDBC is inherently blocking
Geospatial	Tomcat	Blocking	Spring Data MongoDB blocking driver
Web	Tomcat	Blocking	Calls geo-service via `.block()`

Service routing

Writes (create/update/delete):  iOS → Gateway (/auth/geo/**) → Auth Service → Geospatial Service
Reads:                          iOS → Gateway (/web/geo/**)  → Web Service  → Geospatial Service
Internal only:                  /geo/** not exposed through gateway

2. Parameter Type vs Return Type — Why It Matters

On Netty (Auth Service)

Rule: On Netty, parameter type doesn’t matter but return type must be reactive (Mono/Flux).

Parameter: T vs Mono<T> — doesn’t matter

// Both are non-blocking on Netty:
public Mono<ResponseEntity<?>> save(@RequestBody JsonNode body)          // OK
public Mono<ResponseEntity<?>> save(@RequestBody Mono<JsonNode> body)    // OK (but pointless extra .flatMap())

Even with plain T, Spring WebFlux reads the request body non-blocking internally using the event-loop. It resolves the body first, then calls your method. No OS thread is held waiting for bytes — Netty registers interest with the OS and moves on until bytes arrive.

Mono<T> parameter gives theoretical control over when to trigger deserialization (backpressure), but in practice you always subscribe immediately via .flatMap(), so you never use that control.

Return: Mono<T> vs plain T — matters!

// CORRECT — Netty subscribes to Mono, sends response when ready
public Mono<ResponseEntity<?>> save(...) {
    return serviceClient.postRawReactive(...)        // non-blocking chain
            .map(result -> ResponseEntity.ok(result));
}

// WRONG — must .block() to get plain T, freezes the event-loop
public ResponseEntity<?> save(...) {
    String result = serviceClient.postRawReactive(...).block();  // blocks event-loop!
    return ResponseEntity.ok(result);
}

Mono<T> tells Netty: “subscribe and send when ready.” Plain T means the result must already be resolved — which requires .block(), freezing the event-loop thread and stopping all other requests on that thread.

On Tomcat + Virtual Threads (Member, Geospatial, Web Services)

Rule: On Tomcat with virtual threads, neither parameter type nor return type matters. Use plain T for both — it’s simpler.

// Simple and correct on Tomcat:
public ResponseEntity<String> save(@RequestBody AdventureTubeData data) {
    // blocking call is fine — virtual thread yields carrier thread
    String result = serviceClient.postRawNonReactive(...);
    return ResponseEntity.ok(result);
}

Tomcat uses one thread per request. With Java 21 virtual threads (spring.threads.virtual.enabled=true), .block() causes the virtual thread to yield its carrier (OS) thread almost instantly — no concurrency problem. Reactive return types (Mono/Flux) on Tomcat add complexity for zero benefit — Tomcat would just block waiting for the Mono to resolve anyway.

Summary

	Netty (Auth Service)	Tomcat + Virtual Threads (Other 3)
Parameter type	Doesn’t matter	Doesn’t matter
Return type	Must be `Mono`/`Flux`	Doesn’t matter — use plain `T`
I/O inside method	Must be reactive chain (no `.block()`)	`.block()` is safe (virtual thread yields)
Why	Event-loop — `.block()` freezes all requests on that thread	Virtual thread — `.block()` yields carrier thread instantly

Why WebClient everywhere (not RestTemplate)

All services use WebClient via ServiceClient in common-api, even the blocking Tomcat services. Reasons:

Unified codebase — one ServiceClient class, reactive methods + NonReactive wrappers
Future SSE/streaming — WebClient supports SSE streams natively, RestTemplate does not
Virtual threads make it safe — .block() on WebClient with virtual threads has no concurrency penalty
RestTemplate is in maintenance mode — Spring recommends WebClient going forward

Why Auth-Service Still Needs Reactive Coding Style

Java 21 virtual threads do NOT help on Netty. Auth-service must use reactive code regardless of Java version.

This is confusing because virtual threads solve the .block() problem on Tomcat — so why not on Netty?

The key difference is how each server handles threads:

Tomcat creates a thread per request. With virtual threads, these become cheap lightweight threads. When .block() is called, the virtual thread yields its carrier (OS) thread — the carrier is released to handle other work. When the I/O completes, the virtual thread resumes on any available carrier. This is why .block() is safe on Tomcat + virtual threads.

Netty does not create threads per request. It uses a small fixed pool of event-loop threads (typically 2x CPU cores). These are real OS threads, not virtual threads. All requests share these few threads via non-blocking I/O. When you .block() on an event-loop thread, there is no virtual thread mechanism to yield — the event-loop thread itself is frozen. All other requests assigned to that event-loop thread stop until .block() returns.

	Tomcat	Netty
Thread model	1 thread per request (virtual thread)	Fixed event-loop pool (OS threads, shared)
What happens on `.block()`	Virtual thread yields carrier — carrier freed	Event-loop thread freezes — all requests on it stop
Virtual threads help?	Yes — makes `.block()` cheap	No — event-loop is not a virtual thread
Coding style required	Blocking is fine	Must be reactive (`Mono`/`Flux` chains)

Example: 4 event-loop threads, one .block() call takes 2 seconds

25% of all server capacity is frozen for 2 seconds
All requests on that event-loop thread queue up and wait
With enough .block() calls, the entire server becomes unresponsive

Bottom line: Virtual threads and Netty event-loop are two completely different concurrency models. They don’t mix. Auth-service runs on Netty, so it must use reactive chains — no .block(), no exceptions.

3. SSE for Async Kafka Result Notification

Current problem

The /geo/save endpoint sends data to Kafka and returns 202 Accepted, but the client has no way to know when processing completes. It is fire-and-forget with no tracking.

Decision: SSE over polling

Why SSE

Cleaner iOS code — Subscribe and react. No timer, no retry loop, no polling interval decisions.
Zero wasted requests — No unnecessary polls hitting the server.
Instant feedback — Result pushed the moment Kafka consumer finishes.
Pi is dev only — Production will run on proper infrastructure that handles SSE connections easily.

Why not polling

Requires timer, retry logic, and state management on iOS.
Wasteful requests when Kafka processing is near-instant (< 1 second).
More complex client code for no real benefit.

Implementation approach

SseEmitter on geospatial-service (Tomcat) — holds connection open until Kafka consumer finishes
Auth-service (Netty) proxies the SSE stream via WebClient reactive Flux
Tracking ID returned on /auth/geo/save, client opens SSE stream on /auth/geo/status/stream/{trackingId}

SSE flow

iOS                    AuthService              GeospatialService         Kafka        MongoDB
 │                         │                          │                    │              │
 │  POST /auth/geo/save    │                          │                    │              │
 │  + JWT                  │                          │                    │              │
 │ ───────────────────────>│                          │                    │              │
 │                         │  Extract email,          │                    │              │
 │                         │  inject ownerEmail       │                    │              │
 │                         │  POST /geo/save          │                    │              │
 │                         │ ────────────────────────>│                    │              │
 │                         │                          │  Send to topic     │              │
 │                         │                          │ ──────────────────>│              │
 │                         │        202 + trackingId  │                    │              │
 │                         │ <────────────────────────│                    │              │
 │   202 + trackingId      │                          │                    │              │
 │ <───────────────────────│                          │                    │              │
 │                         │                          │                    │              │
 │  GET /auth/geo/status/  │                          │                    │              │
 │  stream/{trackingId}    │                          │                    │              │
 │  + JWT                  │                          │                    │              │
 │ ───────────────────────>│  Proxy SSE stream        │                    │              │
 │                         │ ────────────────────────>│                    │              │
 │                         │                          │  SseEmitter held   │              │
 │                         │                          │  open              │              │
 │                         │                          │                    │              │
 │                         │                          │  Consumer receives │              │
 │                         │                          │ <──────────────────│              │
 │                         │                          │                    │  Save data   │
 │                         │                          │ ──────────────────────────────────>│
 │                         │   SSE event COMPLETED    │                    │              │
 │                         │ <────────────────────────│                    │              │
 │  SSE event COMPLETED    │                          │                    │              │
 │ <───────────────────────│                          │                    │              │