Problem
When auth-service called member-service and got a 4xx error (e.g. USER_NOT_FOUND on second delete), the circuit breaker treated it as a failure and eventually opened the circuit. This meant legitimate business errors (user not found, duplicate, validation) were incorrectly tripping the circuit breaker.
The same problem existed in web-service calling geospatial-service.
Root Cause: Two Separate Layers
Circuit breaker error handling has two independent layers, and both need to be configured:
Layer 1: YAML Configuration (Failure Counting)
resilience4j:
circuitbreaker:
instances:
MEMBER-SERVICE:
ignore-exceptions:
- com.adventuretube.common.client.ServiceClient4xxException
This controls whether the circuit breaker counts the error toward the failure threshold. With ignore-exceptions, 4xx errors don’t increment the failure counter.
Layer 2: Java Fallback (Error Propagation)
return circuitBreaker.run(call, throwable -> {
if (throwable instanceof ServiceClient4xxException) {
return Mono.error(throwable); // Pass through!
}
log.error("Circuit breaker open for {}: {}", serviceName, throwable.getMessage());
return Mono.error(new ServiceClient5xxException(
serviceName, "CIRCUIT_OPEN",
serviceName + " circuit breaker is open", 503));
});
The circuitBreaker.run(call, fallback) catches ALL errors (including ignored ones) in the fallback. Without the instanceof check, 4xx errors get swallowed and replaced with CIRCUIT_OPEN — even though they weren’t counted as failures.
Key Lesson: YAML ignore-exceptions only controls counting. The Java fallback controls propagation. Both must be updated together.
Solution: 4xx/5xx Exception Split
New Exception Hierarchy (common-api module)
ServiceClientException (abstract)
├── ServiceClient4xxException ← business errors, ignored by circuit breaker
└── ServiceClient5xxException ← infrastructure failures, trips circuit breaker
ServiceClient Changes
All three methods (postServiceResponseReactive, getServiceResponseReactive, getRawReactive) updated:
- 4xx handler → throws
ServiceClient4xxException - 5xx handler → throws
ServiceClient5xxException - Network/timeout errors → throws
ServiceClient5xxException - Circuit breaker fallback →
instanceof ServiceClient4xxExceptioncheck to pass through 4xx
ServiceClient Method Naming Convention
Renamed all methods to follow: post/get + Raw/ServiceResponse + Reactive/NonReactive
| Method | Returns | Blocking? |
|---|---|---|
postServiceResponseReactive() |
Mono<ServiceResponse<T>> |
No |
getServiceResponseReactive() |
Mono<ServiceResponse<T>> |
No |
getRawReactive() |
Mono<T> |
No |
postServiceResponseNonReactive() |
ServiceResponse<T> |
Yes |
getServiceResponseNonReactive() |
ServiceResponse<T> |
Yes |
getRawNonReactive() |
T |
Yes |
- ServiceResponse = inter-service calls that return
ServiceResponse<T>wrapper - Raw = calls to services not using
ServiceResponse(e.g. geospatial-service returns raw JSON) - Reactive = returns
Mono<>for WebFlux callers - NonReactive =
.block()wrapper for Spring MVC callers
Web-Service Error Handling (New)
Added full error handling for web-service calling geospatial-service:
New Files
WebErrorCodeenum —DATA_NOT_FOUND(404),DUPLICATE_KEY(409),SERVER_NOT_AVAILABLE(503),SERVICE_CIRCUIT_OPEN(503),INTERNAL_ERROR(500)BaseServiceException— abstract base withWebErrorCode+ auto-captured originGeoServiceException— concrete exception for geospatial-service errorsGlobalExceptionHandler—@ControllerAdvicereturning structuredServiceResponse
GeoDataService Error Mapping
private GeoServiceException mapServiceClientException(ServiceClientException ex) {
WebErrorCode errorCode = switch (ex.getErrorCode()) {
case "DATA_NOT_FOUND", "USER_NOT_FOUND" -> WebErrorCode.DATA_NOT_FOUND;
case "DUPLICATE_KEY" -> WebErrorCode.DUPLICATE_KEY;
case "SERVER_NOT_AVAILABLE" -> WebErrorCode.SERVER_NOT_AVAILABLE;
case "CIRCUIT_OPEN" -> WebErrorCode.SERVICE_CIRCUIT_OPEN;
default -> WebErrorCode.INTERNAL_ERROR;
};
return new GeoServiceException(errorCode);
}
Web-service uses try/catch (blocking getRawNonReactive()) vs auth-service’s reactive .onErrorMap().
Auth-Service Fix: Missing UserNotFoundException Handler
Custom com.adventuretube.auth.exceptions.UserNotFoundException had no @ExceptionHandler — it fell to the catch-all Exception handler returning INTERNAL_ERROR 500 instead of the proper USER_NOT_FOUND 404.
@ExceptionHandler(UserNotFoundException.class)
public ResponseEntity<ServiceResponse<?>> handleUserNotFoundException(UserNotFoundException ex) {
return buildErrorResponse(ex.getErrorCode(), ex);
}
Verification
Delete User (second time — user already deleted)
- Zipkin: 3 services, 13 spans,
outcome: CLIENT_ERROR,status: 404 - Response: structured
ServiceResponsewitherrorCode: USER_NOT_FOUND - Circuit breaker: stays CLOSED, 0 failed calls
- Log:
MEMBER-SERVICE 4xx error: USER_NOT_FOUND(not CIRCUIT_OPEN)
Commits
8ecb0a2— Rename ServiceClient methods for clarity18be88f— Add web-service circuit breaker error handling and rename exception classes6f25630— Fix circuit breaker fallback swallowing 4xx exceptions7aa8e52— Add missing UserNotFoundException handler
Files Changed
| File | Change |
|---|---|
common-api/.../ServiceClient.java |
Method rename + 4xx/5xx split + fallback instanceof check |
common-api/.../ServiceClient4xxException.java |
New — 4xx business errors |
common-api/.../ServiceClient5xxException.java |
New — 5xx infrastructure failures |
web-service/.../exceptions/code/WebErrorCode.java |
New — error codes enum |
web-service/.../exceptions/base/BaseServiceException.java |
New — abstract base |
web-service/.../exceptions/GeoServiceException.java |
New — geospatial errors |
web-service/.../exceptions/GlobalExceptionHandler.java |
New — exception handler |
web-service/.../service/GeoDataService.java |
Added try/catch + mapServiceClientException |
auth-service/.../GlobalExceptionHandler.java |
Added UserNotFoundException handler |
config-service/.../auth-service.yml |
Added ignore-exceptions for 4xx |
config-service/.../web-service.yml |
Added circuit breaker config for GEOSPATIAL-SERVICE |
