Netty high-performance analytic source of ByteBuf

Reactor performance reasons Netty addition NIO thread model previously mentioned, a zero copy important reason of its high performance.

Zero-copy

  • Eliminating the need to copy data from a user process to the kernel (data within the jvm heap os is not directly used to make os can use it, you need to copy the data in a heap outside the heap)
  • CompositeByteBuf complex multiple ByteBuf, netty use is associated with a logical, unified interface to provide external access, rather than re-apply for the new data is written to memory and then ByteBuf

Netty type of ByteBuf

  • Pooled (pooling), Unpooled (non-pooled)

  • Direct (Direct buffer / stack outside), Heap (jvm internals)

  • unsafe (Local unsafe method calls), safe (in general would not say that this is in relation to unsafe operation, referring to jvm heap)

Netty default priority will be to achieve unsafe use of


Pooling / non-pooled (Pooled / Unpooled):

Netty first apply a continuous space as ByteBuf pool, they are needed to take directly to the pool inside, returned to the pool ByteBuf used up, without the need to use ByteBuf every time when they are to apply. Heap objects created than the outer consuming within the stack.

Summary: The effect of pooling is to speed up the operation of the program gets to the target


Off heap / stack the (direct / heap):

Refers to the data in the heap of the JVM, the application, the operations are in jvm.

Direct buffer stack outside jvm refers to a non-memory heap when the application memory of the native application method, and this memory can be used directly OS, OS memory to be used in the stack if not need to copy once direct buffers. application is outside the heap memory, java objects (DirectByteBuf) just some reader / writer Index (memory (memory addresses), offset (offset) or the like) is performed at this time, the write data / read data It is performed by operation of native data outside the stack.

Summary: external memory heap with the copy object is to prevent, improve efficiency


unsafe

This thing is unsafe sun.misc provided a class, the class can be used directly by the native operating method of memory, of course, also improve efficiency, the above said external application and operation of the heap memory is to use what is called to complete unsafe but with this unsafe must be very familiar with the memory operation, otherwise very easy to make mistakes, so why call it unsafe official is justified.

Summary: direct access to the memory, enhance efficiency, the use of error-prone




Some categories and concepts used in the Pool

Pool Arena, PoolChunk, PoolThreadLocalCache, PoolSubpage, Recycler

  • PoolArena: Arena Stage meaning, as the name suggests is a need for pool operation This class provides the environment
  • PoolChunk: Netty Request memory block, the storage chunkSize, their offset, the size of the remaining space freeSize information, in accordance with the official description: In order to find the block (the chunk) in size at least to meet the request, a complete binary tree is constructed, like the stack (this is one of the biggest heap, chunk the nodes will form a complete binary tree)
  • PoolThreadLocalCache: thread local variable storage PoolArena -> chunk (-> page-> subPage)
  • PoolSubPage: Page is located on the bottom of the chunk
  • Recycle: Recycle Bin name suggests, this is an abstract class, the main role in ThreadLocal to get from the recycle bin ByteBuf

Brief Description: PoolThreadLocalCache and Recycle ThreadLocal variables are used to reduce the competition for multi-threaded, improve operational efficiency.


Several important property values.

11 maxOrder Default: depth complete binary tree (root level is 0, so a total of maxOrder objectively + 1 layer)

pageSize default 8192 (8k): default size of a complete binary tree above the bottom of the leaf node page

Default pageShifts 13: This is the number of pageSize, 2 ^ pageShifts = pageSize, pageSize default is 8192, so the default value of 13

chunkSize default 16m (pageSize * maxOrder): This is the size of each chunk, the chunk size is below each layer of FIG.

The minimum division unit inside a page for 16byte, 16 this figure is very important to follow a few key place to use computing


ByteBuf size type:

  • size < 512 , tiny
  • 512 < size < 8192 , small
  • 8192 < size < 16m , normal
  • 16m < size , huge

chunk structure

Each layer is the sum of the 16m, has been broken down to the bottom of each page to 8192 (8k), so that the bottom has 2k nodes, not all drawn here, of course, are operated on Subpage page.




Heap outer / inner heap memory allocation ByteBuffer

A simple test

Do a simple test, test the external heap memory and time-consuming application heap memory within the application:
	static void nioAllocTest(){
        int num = 10;
        int cnt = 100;
        int size = 256;
        ByteBuffer buf;

        long start1,end1,start2,end2;
        long sum1,sum2;
        for(int i = 0;i<num;i++){
            sum1=sum2=0;
            int j;
            for(j = 0;j<cnt;j++) {
                start1 = System.nanoTime();
                buf = ByteBuffer.allocateDirect(size);
                end1 = System.nanoTime();
                sum1+=(end1-start1);
//                System.out.println("direct 申请时间: "+(end1-start1));

                start2 = System.nanoTime();
                buf = ByteBuffer.allocate(size);
                end2 = System.nanoTime();
//                System.out.println("heap 申请时间: "+(end2-start2));
//                System.out.println("-----");
                sum2+=(end2-start2);
            }
            System.out.println(String.format("第 %s 轮申请 %s 次 %s 字节平均耗时 [direct: %s , heap: %s].",i,j,size,sum1/cnt, sum2/cnt));
        }
    }
复制代码

The output is:

The first application 0 100 256-byte average time [direct: 4864, heap: 1616 ].
Round 1 100 256 bytes application average time [direct: 5763, heap: 1641 ].
The first two applications 100 average duration of 256 bytes [direct: 4771, heap: 1672 ].
The first three bytes of the application 100 takes an average of 256 [direct: 4961, heap: 883 ].
The first four bytes of the application took an average of 100 times 256 [direct: 3556, heap: 870 ].
The first five bytes of the application 100 takes an average of 256 [direct: 5159, heap: 726 ].
round application 6 100 256 bytes times the average time [direct: 3739, heap : 843]
Round 7 takes an average of 100 256 bytes [direct: 3910, heap: 221 ].
The first 8 bytes of the application 100 takes an average of 256 [direct: 2191, heap: 590 ].
9 round the average time of 256 bytes 100 [direct: 1624, heap: 615 ].

Can be seen outside the direct application of heap memory consuming significantly more time-consuming application jvm heap, time-consuming here several times (the number of tests is not much may not be accurate, interested students can test larger / smaller the size, might find some "interesting" thing).


Ikeka / Hiikeka

A simple test

Do a simple test to test the effect of pooling
	static void nettyPooledTest(){
        try {
            int num = 10;
            int cnt = 100;
            int size = 8192;
            ByteBuf direct1, direct2, heap1, heap2;

            long start1, end1, start2, end2, start3, end3, start4, end4;
            long sum1, sum2, sum3, sum4;
            for (int i = 0; i<num; i++) {
                sum1 = sum2 = sum3 = sum4 = 0;
                int j;
                for (j = 0; j<cnt; j++) {

                    start1 = System.nanoTime();
                    direct1 = PooledByteBufAllocator.DEFAULT.directBuffer(size);
                    end1 = System.nanoTime();
                    sum1 += (end1-start1);

                    start2 = System.nanoTime();
                    direct2 = UnpooledByteBufAllocator.DEFAULT.directBuffer(size);
                    end2 = System.nanoTime();
                    sum2 += (end2-start2);

                    start3 = System.nanoTime();
                    heap1 = PooledByteBufAllocator.DEFAULT.heapBuffer(size);
                    end3 = System.nanoTime();
                    sum3 += (end3-start3);

                    start4 = System.nanoTime();
                    heap2 = UnpooledByteBufAllocator.DEFAULT.heapBuffer(size);
                    end4 = System.nanoTime();
                    sum4 += (end4-start4);

                    direct1.release();
                    direct2.release();
                    heap1.release();
                    heap2.release();
                }
                System.out.println(String.format("Netty 第 %s 轮申请 %s 次 [%s] 字节平均耗时 [direct.pooled: [%s] , direct.unpooled: [%s] , heap.pooled: [%s] , heap.unpooled: [%s]].", i, j, size, sum1/cnt, sum2/cnt, sum3/cnt, sum4/cnt));
            }
        }catch(Exception e){
            e.printStackTrace();
        }finally {

        }
    }
复制代码

The final output of the results:

Netty Round 0 100 [8192] Byte average time [direct.pooled: [1,784,931], direct.unpooled: [105 310], heap.pooled: [202 306], heap.unpooled: [23317]].
Netty round 1 100 application [8192] byte average time [direct.pooled: [12849], direct.unpooled: [15457], heap.pooled: [12671], heap.unpooled: [12693.]].
Netty of application 2 100 [8192] byte average time [direct.pooled: [13589], direct.unpooled: [14459], heap.pooled: [18783], heap.unpooled: [13803]].
Netty 3 round 100 [8192] byte average time [direct.pooled: [10185], direct.unpooled: [11644], heap.pooled: [9809], heap.unpooled: [12770]].
Netty round 4 application 100 [8192] byte average time [direct.pooled: [15980], direct.unpooled : [53980], heap.pooled: [5641], heap.unpooled: [12467]].
Netty Round 5 100 [8192] Byte average time [direct.pooled: [4903], direct.unpooled: [34215], heap.pooled: [6659], heap.unpooled: [12311]].
Netty round 6 times 100 application [8192] byte average time [direct.pooled: [2445], direct.unpooled: [7197], heap.pooled: [2849], heap.unpooled: [11010]].
Netty of application 7 100 [8192] byte average time [direct.pooled: [2578 of], direct.unpooled: [4750], heap.pooled: [3904], heap.unpooled: [255 689]].
Netty 8 round 100 [8192] byte average time [direct.pooled: [1855], direct.unpooled: [3492], heap.pooled: [37822], heap.unpooled: [3983]].
Netty round 9 application 100 [8192] byte average time [direct.pooled: [1932], direct.unpooled : [2961], heap.pooled: [1825], heap.unpooled: [6098]].

Here mainly to see DirectByteBuffer, frequent application heap memory outside it, will reduce the performance of the server's role this time pooled began to unravel. Pooling just beginning to apply a large enough memory, subsequent acquisition of the object only from the pool He was removed Pool return run, not always to apply separate, eliminating the need for time-consuming heap for subsequent use from an outer space application.


ByteBuf specific implementation

Here speak personally feel the most important one, it is the default type used netty: PooledUnsafeDirectByteBufwe also from its application PooledByteBufAllocator.DEFAULT.directBuffer () start with.

From below into the PooledByteBufAllocator.DEFAULT.directBuffer ()

  // 到第一个要分析的方法
  protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
      // 从 threadlLocal 获取一个线程本地缓存池
      PoolThreadCache cache = (PoolThreadCache)this.threadCache.get();
      // 这个缓存池包含 heap 和 direct 两种, 获取直接缓存池
      PoolArena<ByteBuffer> directArena = cache.directArena;
      Object buf;
      if (directArena != null) {
        buf = directArena.allocate(cache, initialCapacity, maxCapacity); // 这里往下 -- 1
      } else {
        // 如果没有堆外缓存池, 直接申请堆外的 ByteBuf, 优先使用 unsafe
        buf = PlatformDependent.hasUnsafe() ? UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) : new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
      }

      return toLeakAwareBuffer((ByteBuf)buf);
    }

  // 1  directArena.allocate(cache, initialCapacity, maxCapacity);
  PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
      // newByteBuf(maxCapacity); 有两种实现, directArena 和 heapArena
      // Pool 的为在 recycle 中重用一个 ByteBuf
      PooledByteBuf<T> buf = newByteBuf(maxCapacity); // -- 2
      allocate(cache, buf, reqCapacity); // -- 7
      return buf;
    }
	
  // 2 newByteBuf(maxCapacity)
  protected PooledByteBuf<ByteBuffer> newByteBuf(int maxCapacity) {
      // 优先使用 PooledUnsafeDirect
      if (HAS_UNSAFE) {
        // PooledUnsafeDirect
        return PooledUnsafeDirectByteBuf.newInstance(maxCapacity); // -- 3
      } else {
        // PooledDirect
        return PooledDirectByteBuf.newInstance(maxCapacity);
      }
    }

  // 3 PooledUnsafeDirectByteBuf.newInstance
  static PooledUnsafeDirectByteBuf newInstance(int maxCapacity) {
      // 从用于回收的 ThreadLocal 中获取一个 ByteBuf
      PooledUnsafeDirectByteBuf buf = RECYCLER.get();	// -- 4
      // 重置 ByteBuf 的下标等
      buf.reuse(maxCapacity);	// -- 6
      return buf;
    }

  // 4 Recycler.get()
  public final T get() {
      if (maxCapacityPerThread == 0) {
        return newObject((Handle<T>) NOOP_HANDLE);
      }
      // 每个线程都有一个栈
      Stack<T> stack = threadLocal.get();
      // 弹出一个 handle
      DefaultHandle<T> handle = stack.pop();
      // 如果 stack 中没有 handle 则新建一个 
      if (handle == null) {
        handle = stack.newHandle();
        // newObject 由调用者实现, 不同的 ByteBuf 创建各自不同的 ByteBuf, 需要由创建者实现
        // handle.value is ByteBuf, 从上面跟下来, 所以这里是 PooledUnsafeDirectByteBuf
        handle.value = newObject(handle); // -- 5
      }
      // 返回一个 ByteBuf
      return (T) handle.value;
    }
		
  // 5 Stack.pop() , 从栈中取出一个 handle
  DefaultHandle<T> pop() {
      int size = this.size;
      if (size == 0) {
        if (!scavenge()) {
          return null;
        }
        size = this.size;
      }
      size --;
      // 取出栈最上面的 handle
      DefaultHandle ret = elements[size];
      elements[size] = null;
      if (ret.lastRecycledId != ret.recycleId) {
        throw new IllegalStateException("recycled multiple times");
      }
      // 重置这个 handle 的信息
      ret.recycleId = 0;
      ret.lastRecycledId = 0;
      this.size = size;
      return ret;
    }

  // 6 重用 ByteBuf 之前需要重置一下之前的下标等
  final void reuse(int maxCapacity) {
      maxCapacity(maxCapacity);
      setRefCnt(1);
      setIndex0(0, 0);
      discardMarks();
    }	
复制代码

The above steps 1 to 6, the outer stack acquired from PoolThreadLocalCache Arena, and a thread-local ByteBuf acquired from the stack according to the size of the RECYCLE desired, a ByteBuf pop from the stack and the write reset ByteBuf subscripts like.


Talking about this, even if the tracking code, the second step is over, the next step is to start the seventh.

PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
      // newByteBuf(maxCapacity); 有两种实现, directArena 和 heapArena
      // Pool 的为在 recycle 中重用一个 ByteBuf
      PooledByteBuf<T> buf = newByteBuf(maxCapacity); // -- 2
      allocate(cache, buf, reqCapacity); // -- 7
      return buf;
    }
复制代码

Obtained from the above mentioned RECYCLE thread local stack to a ByteBuf, and reset the index to read and write and so on. The next be considered is the key. We continue to follow the code to go

	// allocate(cache, buf, reqCapacity); -- 7
	// 这一段都很重要,代码复制比较多, normal(>8192) 和 huge(>16m) 的暂时不做分析
	private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) 	{
    		// 计算应该申请的大小
        final int normCapacity = normalizeCapacity(reqCapacity); // -- 8

        // 申请的大小是否小于一页 (默认8192) 的大小
        if (isTinyOrSmall(normCapacity)) { // capacity < pageSize
            int tableIdx;
            PoolSubpage<T>[] table;
            // reqCapacity < 512 
            boolean tiny = isTiny(normCapacity);
            if (tiny) { // < 512 is tiny
                // 申请 tiny 容量的空间
                if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                    return;
                }
                // 计算属于哪个子页, tiny 以 16B 为单位
                tableIdx = tinyIdx(normCapacity);
                table = tinySubpagePools;
            } else {
                //8192 >  reqCapacity >= 512 is small
                // small 以 1024为单位
                if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                    return;
                }
                tableIdx = smallIdx(normCapacity);
                table = smallSubpagePools;
            }

            // head 指向自己在 table 中的位置的头
            final PoolSubpage<T> head = table[tableIdx];

            /**
             * Synchronize on the head. This is needed as {@link PoolChunk#allocateSubpage(int)} and
             * {@link PoolChunk#free(long)} may modify the doubly linked list as well.
             */
            synchronized (head) {
                final PoolSubpage<T> s = head.next;
                // 这里判断是否已经添加过 subPage
                // 添加过的话, 直接在该 subPage 上面进行操作, 记录标识位等
                if (s != head) {
                    assert s.doNotDestroy && s.elemSize == normCapacity;
                    // 在 subPage 的 bitmap 中的下标
                    long handle = s.allocate();
                    assert handle >= 0;
                    // 用 已经初始化过的 bytebuf 初始化 subPage 中的信息
                    s.chunk.initBufWithSubpage(buf, handle, reqCapacity);
                    // 计数
                    incTinySmallAllocation(tiny);
                    return;
                }
            }
          
            // 第一次创建该类型大小的 ByteBuf, 需要创建一个subPage
            synchronized (this) {
                allocateNormal(buf, reqCapacity, normCapacity);
            }

            // 增加计数
            incTinySmallAllocation(tiny);
            return;
        }
  }

复制代码

You should apply for the calculation of the size of ByteBuf


    // 8 以下代码是在 normalizeCapacity(reqCapacity) 中
    // 如果 reqCapacity >= 512 ,则使用 跟hashMap 相同的扩容算法
    // reqCapacity < 512(tiny类型) 则将 reqCapacity 变成 16 的倍数	
	if (!isTiny(reqCapacity)) { 
    // 是不是很熟悉, 有没有印象 HashMap 的扩容, 找一个不小于原数的2的指数次幂大小的数
    int normalizedCapacity = reqCapacity;
    normalizedCapacity --;
    normalizedCapacity |= normalizedCapacity >>>  1;
    normalizedCapacity |= normalizedCapacity >>>  2;
    normalizedCapacity |= normalizedCapacity >>>  4;
    normalizedCapacity |= normalizedCapacity >>>  8;
    normalizedCapacity |= normalizedCapacity >>> 16;
    normalizedCapacity ++;

    //
    if (normalizedCapacity < 0) {
      normalizedCapacity >>>= 1;
    }
    assert directMemoryCacheAlignment == 0 || (normalizedCapacity & directMemoryCacheAlignmentMask) == 0;

    return normalizedCapacity;
  }

	// reqCapacity < 512
	// 已经是16的倍数,不做操作
	if ((reqCapacity & 15) == 0) {
    	return reqCapacity;
  	}
	// 不是16的倍数,转化为16的倍数
	return (reqCapacity & ~15) + 16; 
复制代码

Because small 和 tinythere are more similar, so we have chosen tinyterms

// 申请 tiny 容量的空间
if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
  return;
}
// 计算属于哪个子页, tiny 以 16b 为单位
tableIdx = tinyIdx(normCapacity);
table = tinySubpagePools;

// head 指向自己在 table 中的位置的头
final PoolSubpage<T> head = table[tableIdx];
复制代码

See here tinySubpagePools, look at the name should be stored tinySubPage place, tracking what you can see, tinySubPage initialized in the constructor in

tinySubpagePools = newSubpagePoolArray(numTinySubpagePools);
// 初始化 32 种类型的 subPage 的 head , 这里是记录 head
for (int i = 0; i < tinySubpagePools.length; i ++) {
  tinySubpagePools[i] = newSubpagePoolHead(pageSize);
}
// 512 / 16 = 32
static final int numTinySubpagePools = 512 >>> 4;
复制代码

numTinySubpagePools, this is a static variable, 512 is a boundary point of small and tiny, 512 >>> 4 = 32, why is unsigned right shift 4, remember the above said basic unit subPage allocated it, the distribution of basic subPage unit is 16byte, so here is calculated 16 from 16 to 512 units of a total number of type sizes ByteBuf, tinySubpagePools -> [16,32,48 .... 512] , the above tinyIdx (int normCapacity) is calculated ByteBuf which type belongs and acquires the type ByteBuf subscripts tinySubpagePools the subsequent can be obtained pool corresponding subscripts head, the constructor initializes all of the head, the actual application, it is not with the head in accordance with the subscript to apply, but will be a new addition Subpage, then a doubly linked list with this head. accordance with the above order of the code, the next step to poolSubPage (init or allocate)


Some fields of action subPage

  final PoolChunk<T> chunk;
  // 当前 subPage 所处的 Page 节点下标
  private final int memoryMapIdx;
  // 当前子页的 head 在 该 chunk 中的偏移值, 单位为 pageSize(default 8192)
  private final int runOffset;
  // default 8192
  private final int pageSize;
  // 默认 8 个 long 的字节长度, long是64位, 8*64 = 512, 512 * 16(subPage最低按照16字节分配) = 8192(one default page)
  // 意思是将 一个page分为 512 个 16byte, 每一个 16byte 用一位(bit)来标记是否使用, 一个long有64bit, 所以一共需要 512 / 64 = 8个long类型来作为标记位
  private final long[] bitmap;
  // 这个是指一个 Page 中最多可以存储多少个 elemSize 大小 ByteBuf
  // maxNumElems = pageSize / elemSize
  private int maxNumElems;
  // 已经容纳多少个 elemSize 大小的 ByteBuf
  private int numAvail;
  // 这个是记录真正能使用到的 bit 的length, 因为你不可能每个 page 中的 elemSize 都是16,肯定是有其他大小的, 在 PoolSubPage 的 init 方法中可以看到: bitmapLength = maxNumElems >>> 6; 
  private int bitmapLength;
  // 所以初始化方法 init(), 只初始化 bitmapLength 个 long 类型
	/**
	* for (int i = 0; i < bitmapLength; i ++) {
  *             bitmap[i] = 0;
  * }
  */          

复制代码

Conclude that, a size Page 8192, the first number of bytes of the array size to accommodate up to (all bytes outside the stack array) maxNumElems passed in size calculation, and then calculates the number up to the maximum can hold how many types of digital long used as a marker bit bitmapLength, last initialized bitmap, bitmap visible mark page is already used position (to 16byte units).

PoolSubPage There is also a very important way: toHandle (); This method is to effect the nodes and the subscript memoryMapIdx bitmapIdx put together, a single long handle value recorded by this, can obtain a corresponding node (according memoryMapIdx. ) and a lower node (page) corresponding to the offset position (that is bitmapIdx * 16)

  private long toHandle(int bitmapIdx) {
        // 后续会用 (int)handle 将这个 handle 值变回为 memoryMapIdx , 即所属节点下标
        return 0x4000000000000000L | (long) bitmapIdx << 32 | memoryMapIdx;
  }
复制代码

After the introduction is over the meaning of subPage the field, continue to follow the above code:

This piece of code is acquired after the processing target node according to the size of the head corresponding to the application to do, s! = Head is applied for determining whether there is the same size subPage, any direct operation on initBufWithSubpage original subPage, without call back allocateNormal (buf, reqCapacity, normCapacity); to allocate a new subPage

  synchronized (head) {
    final PoolSubpage<T> s = head.next;
    // 这里判断是否已经添加过 subPage
    // 添加过的话, 直接在该 subPage 上面进行操作, 记录标识位等
    if (s != head) {
      assert s.doNotDestroy && s.elemSize == normCapacity;
      // 在 subPage 的 bitmap 中的下标 && 节点下标
      long handle = s.allocate();
      assert handle >= 0;
      // 用已经初始化过的 bytebuf 更新 subPage 中的信息
      s.chunk.initBufWithSubpage(buf, handle, reqCapacity);
      // 计数
      incTinySmallAllocation(tiny);
      return;
    }
  }
复制代码

initBufWithSubpage way to track down you can see:

buf.init(
        this, handle,
        runOffset(memoryMapIdx) + (bitmapIdx & 0x3FFFFFFF) * subpage.elemSize + offset,
            reqCapacity, subpage.elemSize, arena.parent.threadCache());
复制代码

runOffset (memoryMapIdx): memoryMapIdx node index, runOffset represents the offset of the node in the chunk to 8192 units of a node offset (bitmapIdx & 0x3FFFFFFF) * subpage.elemSize: This offset indicates bitmapIdx subscripts subPage the offset offset: indicates own chunk offset 

The offset is the sum of three subscripts entire cache pool represents a specific offset value bitmapIdx




This article is a personal understanding of, what is wrong place to hope you can point out.

Guess you like

Origin juejin.im/post/5db8ea506fb9a02061399ab3