Introducing a Baby Chaos Monkey for Our Microservices

In a microservice system, the only real way to know how resilient you are is to break things on purpose and watch what happens. That’s the idea behind chaos engineering. Netflix’s Chaos Monkey is the famous example: it randomly kills services in production so the team finds out early whether the system can take it. Killing live services is overkill for most teams, especially outside production, but the underlying idea is worth borrowing. So we added a small piece of middleware to one of our services: a manually triggered, route-level failure tool that acts like a baby Chaos Monkey. ...

May 30, 2025