Resilience4j Circuit Breaker in Spring Boot: Stop Cascading Failures Before They Stop You
Author: Regal Singh
Last updated: 2026-04-10
Category: Java / Spring Boot / Resilience / Microservices
Resilience4j Circuit Breaker in Spring Boot: Stop Cascading Failures Before They Stop You
Abstract
In distributed systems, one slow or failing downstream service can bring down your entire application. Resilience4j gives you a lightweight, functional circuit breaker that integrates cleanly with Spring Boot. This post covers the circuit breaker pattern — why it matters, how to configure it, and how to observe it under real failure conditions. No Hystrix. No heavy dependencies. Just Resilience4j doing exactly what it is supposed to do.
Problem framing: why you need a circuit breaker
Without a circuit breaker, a failing dependency stalls every thread waiting on it. Thread pools fill up. Response times spike across unrelated services. One bad service becomes everyone's problem.
A circuit breaker sits in front of that dependency and does three things: Detects when the failure rate crosses a threshold. Opens the circuit and stops forwarding calls. Probes with limited traffic after a wait period to see if the dependency recovered.
It is the difference between graceful degradation and a full-system cascade.
The three states of a circuit breaker
CLOSED — traffic flows normally, failures are counted. OPEN — traffic is blocked, fallback is returned immediately, no calls reach the downstream service. HALF_OPEN — a limited number of probe calls are allowed through to test recovery.
The transition logic is driven by your configuration thresholds.
Adding Resilience4j to Spring Boot
<!-- pom.xml -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
AOP is required. Without it the @CircuitBreaker annotation does nothing.
Actuator exposes the circuit breaker state via /actuator/health and /actuator/circuitbreakers.
Configuring the circuit breaker in application.yml
resilience4j:
circuitbreaker:
instances:
paymentService:
registerHealthIndicator: true
slidingWindowType: COUNT_BASED
slidingWindowSize: 10
minimumNumberOfCalls: 5
failureRateThreshold: 50
waitDurationInOpenState: 10s
permittedNumberOfCallsInHalfOpenState: 3
automaticTransitionFromOpenToHalfOpenEnabled: true
recordExceptions:
- java.io.IOException
- java.util.concurrent.TimeoutException
- feign.FeignException
ignoreExceptions:
- com.example.exceptions.BusinessValidationException
What each property does:
slidingWindowSize: 10 — evaluate failure rate over the last 10 calls.
minimumNumberOfCalls: 5 — do not open the circuit until at least 5 calls have been recorded.
failureRateThreshold: 50 — open the circuit if 50% or more of calls fail.
waitDurationInOpenState: 10s — stay open for 10 seconds before probing.
permittedNumberOfCallsInHalfOpenState: 3 — allow 3 probe calls in HALF_OPEN.
ignoreExceptions — business exceptions should not count as circuit failures.
Applying the annotation to a service
// PaymentClient.java
package com.example.payments;
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;
@Service
public class PaymentClient {
private final RestTemplate restTemplate;
public PaymentClient(RestTemplate restTemplate) {
this.restTemplate = restTemplate;
}
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
public PaymentResponse charge(PaymentRequest request) {
return restTemplate.postForObject(
"https://payment-gateway.internal/charge",
request,
PaymentResponse.class
);
}
private PaymentResponse paymentFallback(PaymentRequest request, Throwable ex) {
// Log the cause, return a safe degraded response
return PaymentResponse.queued(request.getOrderId(), "Payment service unavailable. Will retry.");
}
}
Key rules for the fallback method:
It must be in the same class.
It must have the same return type.
It must accept the same parameters plus a Throwable as the last argument.
The name in @CircuitBreaker(name = "paymentService") must match the instance key in application.yml.
Combining with a timeout
A circuit breaker does not enforce timeouts on its own.
Use @TimeLimiter alongside it to bound how long a call is allowed to run.
resilience4j:
timelimiter:
instances:
paymentService:
timeoutDuration: 3s
cancelRunningFuture: true
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
@TimeLimiter(name = "paymentService")
public CompletableFuture<PaymentResponse> chargeAsync(PaymentRequest request) {
return CompletableFuture.supplyAsync(() ->
restTemplate.postForObject(
"https://payment-gateway.internal/charge",
request,
PaymentResponse.class
)
);
}
private CompletableFuture<PaymentResponse> paymentFallback(PaymentRequest request, Throwable ex) {
return CompletableFuture.completedFuture(
PaymentResponse.queued(request.getOrderId(), "Timeout. Request queued.")
);
}
When using @TimeLimiter, the method must return a CompletableFuture.
Observing the circuit breaker via Actuator
Add the following to expose circuit breaker health details:
management:
endpoints:
web:
exposure:
include: health, circuitbreakers, metrics
endpoint:
health:
show-details: always
health:
circuitbreakers:
enabled: true
Check the state at runtime:
curl http://localhost:8080/actuator/health | jq '.components.circuitBreakers'
Example response when the circuit is open:
{
"paymentService": {
"status": "CIRCUIT_OPEN",
"details": {
"failureRate": "60.0%",
"slowCallRate": "0.0%",
"bufferedCalls": 10,
"failedCalls": 6,
"state": "OPEN"
}
}
}
Check metrics for deeper analysis:
curl http://localhost:8080/actuator/metrics/resilience4j.circuitbreaker.calls
This gives you counts broken down by kind (successful, failed, not_permitted, ignored).
Writing a test that verifies fallback behavior
// PaymentClientTest.java
package com.example.payments;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.web.client.RestTemplate;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.when;
@SpringBootTest
class PaymentClientTest {
@Autowired
private PaymentClient paymentClient;
@Autowired
private CircuitBreakerRegistry circuitBreakerRegistry;
@MockBean
private RestTemplate restTemplate;
@BeforeEach
void resetCircuitBreaker() {
circuitBreakerRegistry.circuitBreaker("paymentService").reset();
}
@Test
void shouldReturnFallbackWhenCircuitOpens() {
when(restTemplate.postForObject(any(), any(), eq(PaymentResponse.class)))
.thenThrow(new java.io.IOException("Downstream down"));
PaymentRequest request = new PaymentRequest("order-42", 99.99);
// Trigger enough failures to open the circuit
for (int i = 0; i < 6; i++) {
paymentClient.charge(request);
}
CircuitBreaker cb = circuitBreakerRegistry.circuitBreaker("paymentService");
assertThat(cb.getState()).isEqualTo(CircuitBreaker.State.OPEN);
PaymentResponse response = paymentClient.charge(request);
assertThat(response.getStatus()).isEqualTo("queued");
assertThat(response.getMessage()).contains("unavailable");
}
}
Reset the circuit breaker in @BeforeEach so tests do not bleed state into each other.
Common mistakes to avoid
AOP not on classpath — spring-boot-starter-aop must be included. Without it the annotation is silently ignored and no circuit breaker is applied.
Calling the annotated method from within the same class — Spring AOP proxies do not intercept self-invocations. Move the call through a separate bean.
Not ignoring business exceptions — if a 400 Bad Request or a validation error counts as a circuit failure, you will open the circuit on bad input, not on infrastructure failures. Always tune ignoreExceptions.
Using COUNT_BASED with too small a window — a slidingWindowSize of 5 with a single slow deploy can open your circuit prematurely. Size the window to match realistic traffic at your lowest load.
No fallback return type match — the fallback method must return the exact same type as the guarded method. A mismatch silently disables the fallback.
When to use COUNT_BASED vs TIME_BASED sliding windows
COUNT_BASED — opens after N calls exceed the failure threshold. Predictable under steady traffic. Simple to reason about.
TIME_BASED — evaluates failure rate over a rolling time window (e.g. the last 60 seconds). Better for services with variable request rates.
For internal microservice calls with consistent traffic, COUNT_BASED is easier to calibrate.
For public APIs with bursty traffic, TIME_BASED gives a more accurate picture of recent health.
Summary
Resilience4j circuit breaker integrates with Spring Boot in four steps:
Add the dependency with AOP and Actuator.
Configure the instance thresholds in application.yml.
Annotate the method with @CircuitBreaker and provide a fallback.
Expose Actuator endpoints to observe circuit state in production.
The pattern buys you two things: it stops your application from hammering a failing downstream service, and it gives callers a fast, predictable response instead of a slow timeout. That is the contract a well-behaved service should offer its upstream callers.
Related blogs
- Why a Good Baseline Should Come Before a More Complex Model
- Choosing the Right Predictive Model: Steady Patterns vs Condition-Driven Behavior
- From Code Review to Ownership and Decision-Making: How Engineering Systems Scale
- Why History Should Lead Before Text in Forecasting
- Not Every Text Pattern Deserves to Become a Feature
- Why Raw Logs Are Hard to Model Directly
- NLP Foundations Part 3: Why Some Words Matter More
- NLP Foundations Part 2: How Text Becomes Measurable Patterns
- NLP Foundations Part 1: How Machines Begin Reading Text
- Signal vs Noise: A Decision Framework Before Modeling
- Why Graphs Matter Before Modeling: Seeing Noise, Mean, Median, and Variable Relationships
- Statistics & Predictive Modeling: Data Foundations
- Prefetching Static Chunks Across Apps: How It Improves Page Performance
- End-to-End Caching in Next.js: React Query (UI) → SSR with memory-cache
- How Next.js Helps SEO for Google Search