- Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
- As we build more and more distributed systems by leveraging microservices and cloud platforms, we create a lot of moving parts and potential points of failure which makes these systems unpredictable.
- These architectures are adding a strong network dependency on external service calls, increasing the number of potential hazards, which grow proportionally to the connections or links that are created between services.
- A very common pitfall is to treat distributed computing systems with the same degree of trust we have for local, non-distributed environments.
- The Fallacies of Distributed Computing help us understand the most common false assumptions we tend to make about distributed systems.
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn’t change
- There is only one administrator
- Transport cost is zero
- The network is homogeneous
- Chaos Engineering principles and practice helps to test systems response to turbulent behavior, such as infrastructure failure, unresponsive components, out of memory error etc.
- The idea is to perform controlled experiments in a distributed environment that help you build confidence in the system’s ability to tolerate the inevitable failures.
- It helps to create scenarios which will break the system on purpose and help to understand how it behaves. That way, you can fix them before they break unexpectedly and hurt the business and your users.
- There are a number of different tools available to support your Chaos Engineering efforts.
- CHAOS MONKEY: Tests IT infrastructure resilience.
- LITMUS: Provides tools to orchestrate chaos on Kubernetes to help SREs find bugs and vulnerabilities in both staging and production.
- CHAOS TOOLKIT: Enables experimentation at different levels: infrastructure, platform and application.
- GREMLIN: Is a “failure-as-a-service” platform built to make the Internet more reliable. It turns failure into resilience by offering engineers a fully hosted solution to safely experiment on complex systems, in order to identify weaknesses before they impact customers and cause revenue loss.
- TOXIPROXY: Simulates network conditions to support deterministic tampering with connections, with support for randomized chaos and customization. It can determine if an application has a single point.
- In this blog we will explore Chaos Monkey and see how it can be used to launch attacks on a Spring Boot App.
- This tool will help to adding latency to our REST endpoints, throw errors, or kill services in random order and understand our applications behavior
- Chaos monkey for Spring Boot (CM4SB) basically consists of Watchers and Assaults.
- Watchers: CS4SB scans a Spring Boot app for specific annotation (as per the configured values).
- Following Spring annotation are supported:
- By using AOP, CM4SB will identify the public method on which configured assaults need to be applied. One can even customize behavior of Watcher by using _watchedCustomService _property and thereby decide which classes and their public methods need to be assaulted
- Assaults: Most important component of CM4SB. Following Assaults are supported-
- Latency Assault – Adds latency to the request. Number of requests can be controlled by level
- Exception Assault – Enables throwing of RuntimeException as per the configured value
- Appkiller Assault – Shuts down the application. The only caveat with this assault is, once the application is shut down, it needs manual step to restart the application
- Memory Assault – Memory Assaults attack the memory of the Java Virtual Machine.
- Chaos Monkey Assault Scheduler – Schedule Chaos Monkey Runtime Assaults (Memory, AppKiller) using cron expressions
- To configure Chaos Monkey in a spring boot application add the following dependency in your sprint boot application
<dependency> <groupId>de.codecentric</groupId> <artifactId>chaos-monkey-spring-boot</artifactId> <version>2.1.1</version> </dependency>
- Configure the chaos-monkey properties inside your
application properties/ yaml file. The following configuration will enable chaos monkey and will generate one of the assault with a frequency of 5 (1 out 5 calls) on each rest call.
- Start your application by specifying the spring profile as chaos-monkey
- As long as you don’t set the property “chaos.monkey.enabled” to “true”, nothing will happen. the service boots up and you can check in the console to make sure that chaos monkey is ready to create chaos.
If you want Chaos Monkey to launch assaults it can be done in two ways, either adding by adding a property. –
- Or by the endpoints
- Configuration to adding random latency to the REST endpoints. In this case, each latency will vary from 1 to 5 seconds.
chaos: monkey: assaults: latency-active: true latency-range-end: 1000 latency-range-start: 5000
Configuration to throw random exceptions
chaos: monkey: assaults: exceptions-active: true
Configuration to kill the service
chaos: monkey: assaults: kill-application-active: true
- You can check the full documentation of the properties on the following link.
- As you perform chaos engineering experiments all of these assault results and metrics can be studied to improve the system behavior
- Metrics are available via Spring Boot Actuator endpoint, if you are exposing them in simple mode.
- Chaos Engineering’ should be inducted as part of (distributed) application development and testing lifecycle. Spring Boot Chaos Monkey is an extremely useful tool to verify how systems will behave in production .
- You can check this reference guide for more details
Thank you . Share your comments & feedback