Microservices Resilience Patterns
Microservices are small, separate parts of a bigger application that work on their own. But if one part stops working, it can cause problems for the whole system. That’s where resilience patterns come in. These are smart ways to keep things running smoothly, even when something goes wrong. For example, a retry pattern tries again if a service doesnt respond at first. A circuit breaker stops sending requests to a broken service so the system doesn’t get overloaded. Timeouts stop the system from waiting too long for a reply. These patterns help keep the app steady and fast, giving users a better experience even if some parts fail.
Using Resilience Patterns to Keep Microservices Running Smoothly
Resilience patterns are smart techniques used to keep microservices working
well, even when something goes wrong or when there’s a lot of traffic. In
systems where many services talk to each other over a network, one small failure
can cause big problems if not handled properly. These patterns help prevent that
by making sure the whole system doesn’t break when one part fails. For example,
we can use Go (Golang) to build these patterns: the context package
helps set time limits (timeouts), the gobreaker library
adds a circuit breaker to stop sending requests to broken services, and the
backoff library lets us try failed
requests again in a smart way (retry). We’ll start with the policy and policy
provider first.
You can check the full implementation in the github repo
Policy and Policy Provider
The Provider and Policy work together to implement a Factory Pattern
with Named Configuration for resilience patterns.
The Provider acts as a centralized configuration factory that:
- Stores Named Components
- Configuration Management
- Target-Based Policy Assembly
This pattern enables declarative resilience configuration where you define “what” resilience patterns to apply rather than “how” to implement them, making the system more maintainable and configurable.
// The Policy struct is a central component that implements the Resilience Policy
// Pattern for executing operations with multiple fault tolerance mechanisms.
type Policy struct {
timeout time.Duration // Timeout pattern
retry *retry // Retry pattern
circuitBreaker *circuitBreaker // Circuit Breaker pattern
}
// Centralized configuration factory
type Provider struct {
timeouts map[string]time.Duration // Named timeout configurations
retries map[string]*retry // Named retry strategies
circuitBreakers map[string]*circuitBreaker // Named circuit breakers
targets map[string]target // Named target configurations
}
Execution Flow
The NewExecutor function creates an executor that applies resilience patterns in a specific order:
- Timeout (outermost layer)
- Circuit Breaker (middle layer)
- Retry (innermost layer)
func NewExecutor(ctx context.Context, policy *Policy) Executor {
if policy == nil {
return func(oper Operation) (any, error) {
return oper(ctx)
}
}
return func(oper Operation) (any, error) {
operation := oper
if policy.timeout > 0 {
operation = policy.withTimeout(operation)
}
if policy.circuitBreaker != nil {
operation = policy.withCircuitBreaker(operation)
}
if policy.retry == nil {
return operation(ctx)
}
return policy.withRetry(ctx, operation)
}
}
This layering ensures that:
- Timeouts prevent operations from running indefinitely
- Circuit breakers can fail fast when services are unhealthy
- Retries happen within the circuit breaker’s execution context
Timeout pattern
The timeout pattern is used to stop a service from waiting too long for a response from another service. If the reply takes too much time, the system cancels the request and moves on. This helps the application fail quickly instead of getting stuck, which keeps things running smoothly and prevents slow services from affecting the whole system.
withTimeout applies the timeout pattern to an operation by using Go’s
context.WithTimeout. It wraps a given operation so that if it takes longer than
a specified time limit (p.timeout), the operation is canceled. It runs the
operation in a separate goroutine and waits for either the result or the timeout
to occur. If the operation finishes in time, the result is returned; if not, the
context times out and returns an error. It also safely handles panics inside the
operation to avoid crashing the program. This approach helps prevent slow or
stuck services from blocking the system, making it more responsive and
resilient.
func (p *Policy) withTimeout(oper Operation) Operation {
return func(ctx context.Context) (any, error) {
// Context-Based Timeout Setup
timeoutCtx, cancel := context.WithTimeout(ctx, p.timeout)
defer cancel()
resultCh := make(chan operationResult, 1)
go func() {
defer func() {
if r := recover(); r != nil {
select {
case resultCh <- operationResult{nil, fmt.Errorf("operation panicked: %v", r)}:
default:
}
}
}()
value, err := oper(timeoutCtx)
select {
case resultCh <- operationResult{value, err}:
case <-timeoutCtx.Done():
}
}()
select {
case result := <-resultCh:
return result.value, result.err
case <-timeoutCtx.Done():
return nil, timeoutCtx.Err()
}
}
}
The timeout pattern wraps an operation to ensure it doesn’t run longer than a specified duration, preventing resource exhaustion and improving system responsiveness.
This context.WithTimeout usage creates a robust, standards-compliant timeout
mechanism that integrates seamlessly with Go’s ecosystem and provides both
automatic timeout enforcement and cooperative cancellation opportunities.
Retry Pattern
The retry pattern is useful when something fails for a short time, like a network issue or a service being briefly unavailable. Instead of giving up right away, the system waits a little and then tries again. This helps the application recover from quick, temporary problems without causing bigger issues for users.
func (r *retry) backoff(ctx context.Context) backoff.BackOff {
var b backoff.BackOff = backoff.NewConstantBackOff(r.duration)
if r.maxRetries >= 0 {
b = backoff.WithMaxRetries(b, uint64(r.maxRetries))
}
return backoff.WithContext(b, ctx)
}
func OperationRetry(operation backoff.OperationWithData[any], b backoff.BackOff) (any, error) {
return backoff.RetryWithData(func() (any, error) {
return operation()
}, b)
}
The OperationRetry function uses the github.com/cenkalti/backoff library to retry a failing operation according to a backoff strategy (e.g., constant backoff). You give it an operation (one that can return a value and an error) and a BackOff policy b. It calls backoff.RetryWithData, which will repeatedly invoke your operation, waiting between attempts as defined by b, until it either succeeds or the backoff policy gives up. The function then returns the last result and error, so callers can see what eventually happened. This makes it easy to wrap transient failures (like brief network glitches) in automatic, controlled retries.
Circuit Breaker
The circuit breaker pattern helps stop a service from constantly trying to reach another service that isn’t working. If something keeps failing, trying again and again wastes resources and slows everything down. Like a real circuit breaker that cuts off electricity to prevent damage, this pattern “opens” the connection after a few failures, blocking more tries for a while. Once the problem is fixed and the service is working again, the connection “closes” and normal communication continues. This keeps the system running more smoothly and avoids making things worse during failures.
This code builds and applies a circuit breaker using the sony/gobreaker library to stop calling a failing service too often. The newCircuitBreaker function reads settings (like how long to keep counts before resetting (Interval), how long to stay open after tripping (Timeout), how many allowed trial requests in half-open (MaxRequests), and how many consecutive failures trigger a trip) and creates a breaker with a custom ReadyToTrip that opens when the consecutive failure count reaches the configured threshold.
func newCircuitBreaker(name string, config CircuitBreaker) (*circuitBreaker, error) {
...
cb.breaker = gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: name,
MaxRequests: maxRequest,
Interval: interval,
Timeout: timeout,
ReadyToTrip: tripFn,
})
return cb, nil
}
withCircuitBreaker wrapper then runs an operation through that breaker: if the
breaker is open or the call fails, the failure is returned; if retries are also
configured and the error is considered permanent, it’s marked so the retry logic
won’t keep retrying. This setup prevents repeated calls to an unhealthy
downstream service while allowing controlled recovery checks and integrating
with retry policies.
func (p *Policy) withCircuitBreaker(oper Operation) Operation {
return func(ctx context.Context) (any, error) {
res, err := p.circuitBreaker.breaker.Execute(func() (any, error) {
return oper(ctx)
})
if p.retry != nil && IsErrorPermanent(err) {
err = backoff.Permanent(err)
}
return res, err
}
}
Conclusion
Resilience patterns are strategies used in microservices to keep systems stable and reliable, even when some parts fail. Three common patterns are timeout, retry, and circuit breaker. The timeout pattern sets a limit on how long a service waits for a response, helping the system avoid getting stuck. The retry pattern handles temporary failures by trying the operation again after a short pause. The circuit breaker pattern protects the system by stopping calls to a service that keeps failing and only allowing requests again once it recovers. Together, these patterns help applications stay responsive and fault-tolerant in the face of errors or high demand.