Chaosblade: Ali chaos in a super fast hardware implementation tool

What Chaosblade that?

Chaosblade project is an experimental tool to follow the chaos (Chaos Engineering) principle, used to simulate common failure scenarios, to help improve recoverability, and fault-tolerant distributed system for failure.

Chaosblade is built on nearly a decade fault testing and practice drills on the basis of Alibaba combines the best ideas and practices of each group's business.

Currently supported operating system scene drill class CPU, disk, process, network, Dubbo Java application classes, MySQL, Servlet and custom class delaying or throw and kill abnormal vessel, killing Pod, perform specific blade create -hview.

Well, the above description is from Chaosblade's github page copied.

github home page address: github.com/chaosblade-...

To put it plainly, Chaosblade is a fault simulation tools that can simulate such as server CPU is full, the disk is full, the network is slow, Dubbo a service response time is long, jvm in a method throws an exception, calling Mysql slow, and so on. So this tool for large companies is very useful, because you can simulate a variety of failures in advance, to ensure the system's high availability and stability.

Chaosblade how to use?

Usage is very simple, two-step process:

  1. Download the zip and unzip: github.com/chaosblade-...
  2. File after extracting executable file in a blade, which is Chaosblade provide client tools, we mainly use this tool to perform fault simulation.

Detailed blade for various parameters, we still go up and see github Home, Not here, I mainly want everyone to look at the specific use of fault simulation and effects.

The next will introduce Chaosblade six usage scenarios:

  • Analog full server CPU
  • Simulation server disk full
  • Analog call a timeout Dubbo Service
  • Simulation of the JVM throws an exception or a modified method returns a value
  • Mysql analog call a timeout or abnormal
  • Simulation server network slow

Scene One: full server CPU

Cpu state before the system failure drill, direct use top -o CPUcommand to view:

image.png

Fault Walkthrough:

$ ./blade create cpu fullload
{"code":200,"success":true,"result":"a0682a98d0d7d900"}
复制代码

After executing the command returns successfully proved successful failure drill, and then view the top -o CPUcommand:

image.png

We can see that by the results Chaosblade should allow themselves to fill the cpu cpu server so full.

Scene Two: server disk full

If you want to simulate a disk full, in fact, only need to generate a lot of files in a folder on the line, so we are here to create a / bladedisk folder.

Fault before the exercise, the size / bladedisk folder is:

$ du -sh /bladedisk/
  0B	/bladedisk/
复制代码

Troubleshooting exercise, execute the following command:

./blade create disk fill -d --mount-point /bladedisk --size 1024
复制代码

Chaos_filldisk.log.dat creates a file in / bladedisk folder under normal circumstances. The size of this file is 1024 bytes.

Why am I here to say under normal circumstances, because I use the Max OX system, in the implementation of the above command will complain. Specific error has been submitted github issues, interested students can look, Issue address .

Tidbits: when submitting issue, I use the Chinese, but was chaosblade-bot automatically translated to English, very powerful.

Then you can try in their own systems, after solving this issue, I will update the article after supplement. Here we only need to know Chaosblade can simulate this scenario and corresponding principle on the line.

Scene Three: Call a timeout Dubbo Service

Demo in the official website provides us with:

After all the above service providers and service consumers jar package downloaded, into the download directory, then running the following command:

# 启动 dubbo-provider
nohup java -Djava.net.preferIPv4Stack=true -Dproject.name=dubbo-provider -jar dubbo-provider-1.0-SNAPSHOT.jar > provider.nohup.log 2>&1 &
# 稍等 2 秒,然后启动 dubbo-consumer
nohup java -Dserver.port=8080 -Djava.net.preferIPv4Stack=true -Dproject.name=dubbo-consumer -jar dubbo-consumer-1.0-SNAPSHOT.jar > consumer.nohup.log 2>&1 &
复制代码

nohup is linux commands, allowing java command running in the background.

After running, the service can be invoked by the following command:

http://localhost:8080/hello?msg=world
复制代码

Under normal circumstances, the request will soon be completed and returned:

{
"date": "Wed Jul 03 16:33:10 CST 2019",
"msg": "Dubbo Service: Hello world"
}
复制代码

Fault Walkthrough:

$ ./blade prepare jvm --process dubbo.consumer
{"code":200,"success":true,"result":"5cdbc31f46a3d621"}
$ ./blade create dubbo delay --time 3000 --service com.alibaba.demo.HelloService --methodname hello --consumer --process dubbo.consumer
{"code":200,"success":true,"result":"3e705e8babe8a86c"}
复制代码

The above command will make the consumer increased by 3 second delay in the hello method call com.alibaba.demo.HelloService services. When we visited the above path accessible than before to wait a little longer.

When the fault of dubbo exercise, in fact, a lot of support subdivision scene, because consumer and provider is divided into two roles in dubbo, when the consumer when calling provider, we now want to make this request to increase the delay, we can both end provider of services increased for the specified delay can also be specified for the delay in the consumer service call, so we can look a little above command, it is actually in control of the consumer, the command also supports the provider end control, we run the following command:

blade create dubbo delay --help
复制代码

You will see aid have the following information:

Flags:
      --appname string          The consumer or provider application name
      --consumer                To tag consumer role experiment.
      --effect-count string     The count of chaos experiment in effect
      --effect-percent string   The percent of chaos experiment in effect
  -h, --help                    help for delay
      --methodname string       The method name
      --offset string           delay offset for the time
      --process string          Application process name
      --provider                To tag provider experiment
      --service string          The service interface
      --time string             delay time (required)
      --timeout string          set timeout for experiment
      --version string          the service version
复制代码

Among them --consumerand --providerto indicate is that the command can control both ends of the service call. So if we want to control the provider side, wanted a timeout when an interface is called, it is entirely the fault of the exercise.

What about the underlying principle, then, we need a better understanding of Dubbo, Dubbo is dynamic configuration capabilities, so Chaosblade should also take advantage of the dubbo dynamic configuration capabilities.

Scene Four: JVM in a method throws an exception or modify the method returns a value

Chaosblade jvm supporting method in the direct manipulation, it throws an exception or modify its return value.

First prepare a MockJvm categories:

package com;
import java.util.concurrent.TimeUnit;
public class MockJvm {
    public String test() {
        return "test...";
    }

    public static void main(String[] args) throws InterruptedException {
        MockJvm testJVM = new MockJvm();

        while (true) {
            try {
                System.out.println(testJVM.test());
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
            TimeUnit.SECONDS.sleep(3);
        }
    }
}
复制代码

This class will be called every three seconds what test method, and print out the method's return value, and capture test method throws an exception for printing, test methods default return "test". We run this class, so this class has been running, running, the console will print as follows:

test...
test...
test...
test...
复制代码

Method throws an exception

$ ./blade prepare jvm --process MockJvm
{"code":200,"success":true,"result":"5ff98509d2334906"}
$ ./blade create jvm throwCustomException --process MockJvm --classname com.MockJvm --methodname test --exception java.lang.Exception
{"code":200,"success":true,"result":"f9052478db2f7ffc"}
复制代码

The above command to simulate the test method com.MockJvm class under MockJvm process throws java.lang.Exception exception. Once the command is successful, then we've been on top of the console to run code will throw an exception:

test...
test...
test...
chaosblade-mock-exception
chaosblade-mock-exception
复制代码

Use the following command to withdraw just play scenario:

./blade destroy f9052478db2f7ffc // f9052478db2f7ffc。
复制代码

After the withdrawal, the console will return to normal printing:

chaosblade-mock-exception
chaosblade-mock-exception
chaosblade-mock-exception
chaosblade-mock-exception
test...
test...
复制代码

The return value of the method of modification

Use the following command to modify the return value:

$ ./blade create jvm return --process MockJvm --classname com.MockJvm --methodname test --value hahaha...
{"code":200,"success":true,"result":"9ffce12b1fdc2580"}
复制代码

The console will print out:

test...
test...
test...
hahaha...
hahaha...
hahaha...
复制代码

You can see the success revised test method's return value.

Scene Five: Call a timeout or abnormal Mysql

An exception occurred when Chaosblade currently supports Mysql Mysql scene into calling a timeout or executing the statement. But it is this layer of control in JDBC, and no real control to mysql server.

Here the first to write a test class with JDBC:

package com;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.time.LocalDateTime;
import java.util.concurrent.TimeUnit;

public class JDBCConnection {
    public static String url_encrypt="jdbc:mysql://127.0.0.1:3306/test?useSSL=false";
    public static String user="root";
    public static String password="Nice89163";

    public static void main(String[] args) throws Exception
    {
        Class.forName("com.mysql.jdbc.Driver");
        Connection conn  = DriverManager.getConnection(url_encrypt,user,password);
        Statement stmt= conn.createStatement();

        while (true) {
            try {
                LocalDateTime before = LocalDateTime.now();
                ResultSet rs = stmt.executeQuery("select * from t_test");
                LocalDateTime after = LocalDateTime.now();
                System.out.println("执行时间:" + (after.getSecond() - before.getSecond()));
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
            TimeUnit.SECONDS.sleep(3);
        }

    }
}
复制代码

This class JDBCConnection sql performed directly JDBC, dependent mysql-connector-java corresponding jar. Here in my tests I found that if you use this version can [email protected] normal fault simulation, if [email protected] version is not normal fault simulation, the specific reasons not to investigate.

This test function is to carry out a select query, and if if thrown exception when select will be captured and printed, and will calculate the time it takes to execute select statement.

First, the top class running, the console will always print as follows:

执行时间:0
执行时间:0
执行时间:0
复制代码

Mysql call throws an exception

Run the following command to start the fault simulation:

$ ./blade prepare jvm --process JDBCConnection
{"code":200,"success":true,"result":"f278e66ddb1b4e11"}
$ ./blade create mysql throwCustomException --database test --host 127.0.0.1 --port 3306 --process JDBCConnection --sqltype select --table t_test --exception java.lang.Exception
{"code":200,"success":true,"result":"ddd6799da50f9201"}
复制代码

After the command is successful, the console will print out an exception:

执行时间:0
执行时间:0
执行时间:0
Unexpected exception encountered during query.
Unexpected exception encountered during query.
复制代码

Use the following command to withdraw just play scenario:

./blade destroy ddd6799da50f9201 
复制代码

After the withdrawal, the console will return to normal printing:

Unexpected exception encountered during query.
Unexpected exception encountered during query.
Unexpected exception encountered during query.
执行时间:0
执行时间:0
复制代码

Mysql call to increase the delay

Directly using the following command so as to increase the time delay of 4 seconds to select, note the JDBC layers are controlled.

$ ./blade create mysql delay --database test --host 127.0.0.1 --port 3306 --process JDBCConnection --sqltype select --table t_test --time 4000
{"code":200,"success":true,"result":"8e5b35e76098caab"}
复制代码

After the command is completed, the console will print out:

执行时间:0
执行时间:0
执行时间:4
执行时间:4
执行时间:4
复制代码

Scene Six: slow server network

Chaosblade the network may be controlled, such as running the following commands may be limited through the network card will be delayed three seconds eth0:

./blade create network delay --interface eth0 --time 3000
复制代码

However, the Mac does not support this scenario, because it is actually tc (Traffic Control) command linux system utilization, so to simulate, then we should use linux system, I will not go simulated.

to sum up

Originally, I was going to write an article about Chaosblade full use of, but now it seems it is not perfect, so this time I write to you, I'm going to mention the issue to the top of the github.

However, I believe that through this article, you should be the role and function of Chaosblade have to understand, and you have the harvest is my purpose.

There are pain points have innovation, it is definitely a technology to solve a pain point phenomena. Please help forward look, if you want the first time to learn more exciting content, please pay attention to micro-channel public number: 1:25

reny125.jpeg

Guess you like

Origin juejin.im/post/5d1cab7ef265da1ba77cc018