（*文章基于Netty4.1.22版本）

整体介绍

在讲PoolArena分配的时候，有说到会先从线程缓存里分配，这个线程缓存其实就是PoolThreadCache(好了，从这里开始，要开始出现几个名字差不多的东西….别弄混淆了….(Ｔ▽Ｔ))，Netty从几个方面去减少线程之间的竞争，一个是在PooledByteBufAllocator持有多个PoolArena，使用线程相关的cache，通过这两种方法去减少竞争，当然其中会带来复杂度。
下面从这两句代码介绍一下整体的架构

        PooledByteBufAllocator allocator = new PooledByteBufAllocator();
        ByteBuf buffer = allocator.buffer(1000);

PooledByteBufAllocator即池化的分配器，内存分配的入口，内部会包含之前我们讲的Arena，Chunk，Page等内存结构，那么他和PoolThreadCache有什么关系呢，下面是我总结的一个图，简单的概括了整个PooledByteBufAllocator内结构的关系
PooledByteBufAllocator内部关系图.png
下面介绍一下每个结构的作用：

headpArenas/directArenas

其实PooledByteBufAllocator中包含了headpArenas和directArenas，一个堆内一个堆外，图中省略。
这是一个数组共8个Arena，既然要减少线程之间的竞争那么，多个Arena就是为了让多个线程持有，减少对Arena的竞争，因为在分配的时候需要加锁。
- 那么线程是如何数组中的Arena呢？分配是否有规律？

Arena中有个数值，线程在获取到Arena的时候，这个Arena对应分配的次数会+1，每次都去取数值最小的那个，分配给当前线程

PoolThreadLocalCache

这是PoolThreadLocalCache，不是PoolThreadCache，名字有点像，PooledByteBufAllocator中持有一个PoolThreadLocalCache的实例，主要用其来创建PoolThreadCache，而创建PoolThreadCache的与一个Arena关联，它是与线程相关的，看下PoolThreadLocalCache的UML图(不太会画UML，大概看下就好….)

PoolThreadLocalCache有个initialValue方法，初始化一个PoolThreadCache，其继承与FastThreadLocal(图中4个类都不是Java的ThreadLocal)，FastThreadLocal会先通过InternalThreadLocalMap获取一个InternalThreadLocalMap实例，其继承于UnpaddedInternalThreadLocalMap，UnpaddedInternalThreadLocalMap中的slowThreadLocalMap是ThreadLocal，而InternalThreadLocalMap是从slowThreadLocalMap即ThreadLocal中取的，那么就是说InternalThreadLocalMap是与线程绑定的，而PoolThreadCache会保存到InternalThreadLocalMap父类的indexedVariables中，那么就完成了PoolThreadCache和线程的绑定(有点绕=_=)，其实关系就如下图：

3者是一对一的关系，具体实现后面再分析

PoolThreadCache

PoolThreadCache中有6个MemoryRegionCache类型的数组，heap和direct分别为3个，而每个又分为tiny/small/normal，这个tiny/small/normal在Arena，Chunk内存分配中都出现过，我们大概都可以猜出是为了存放不同大小的内存块，的确，这里也是同样的道理。
PoolThreadCache中还持有Arena，Arena在这里并不是用来分配的，主要是用到了其内部的一些属性，最重要的一个就是每次PoolThreadCache和一个Arena关联起来的时候，这个Arena有个线程持有数字段据会+1

MemoryRegionCache

从图中可以看出其内部实现是一个队列Queue，当PoolThreadCache获取到对应类型的MemoryRegionCache(tiny/small/normal)，进行分配的时候，会从队列中poll一个元素，该元素与一个Chunk关联，那么从队列中获取成功之后，就可以直接使用其分配。
在释放的时候，会将该Chunk加入到该队列中，下次就可以从队列中使用

源码分析

上面只是简单介绍了一下，由于这块的层次比较多，还是需要深入源码才比较好理解整个结构，下面的分析会先分析每个结构的源码，最后再从最开始的两句代码从外到内进行分析

PoolThreadLocalCache

PoolThreadLocalCache是PooledByteBufAllocator的内部类，看下其定义

    final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> {
        private final boolean useCacheForAllThreads;//是否使用线程缓存，默认true

        PoolThreadLocalCache(boolean useCacheForAllThreads) {
            this.useCacheForAllThreads = useCacheForAllThreads;
        }

        // 这里就是图中创建PoolThreadCache那个箭头的核心方法
        @Override
        protected synchronized PoolThreadCache initialValue() {
            // 在heapArenas数组中选取一个Arena进行使用
            final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
            final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

            Thread current = Thread.currentThread();
            if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
                return new PoolThreadCache(
                        heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
                        DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
            }
            // No caching so just use 0 as sizes.
            return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
        }

        @Override
        protected void onRemoval(PoolThreadCache threadCache) {
            threadCache.free();
        }

        private <T> PoolArena<T> leastUsedArena(PoolArena<T>[] arenas) {
            if (arenas == null || arenas.length == 0) {
                return null;
            }
            // 在数组中选择一个numThreadCaches数值最小的进行使用
            // numThreadCaches在每分配给线程一次，都会进行加一
            PoolArena<T> minArena = arenas[0];
            for (int i = 1; i < arenas.length; i++) {
                PoolArena<T> arena = arenas[i];
                if (arena.numThreadCaches.get() < minArena.numThreadCaches.get()) {
                    minArena = arena;
                }
            }

            return minArena;
        }
    }

再看下其父类FastThreadLocal的几个主要方法

public class FastThreadLocal<V> {

    private final int index;

    public FastThreadLocal() {
        index = InternalThreadLocalMap.nextVariableIndex();// 递增的一个数字
    }

    public final V get() {
        // 从InternalThreadLocalMap中获取一个InternalThreadLocalMap对象
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
        Object v = threadLocalMap.indexedVariable(index);// 在数组中获取index这个位置的元素
        if (v != InternalThreadLocalMap.UNSET) {// 一开始为空
            return (V) v;
        }

        V value = initialize(threadLocalMap);//初始化
        registerCleaner(threadLocalMap);
        return value;
    }

    private V initialize(InternalThreadLocalMap threadLocalMap) {
        V v = null;
        try {
            v = initialValue();// 调用PoolThreadLocalCache的方法获取PoolThreadCache对象
        } catch (Exception e) {
            PlatformDependent.throwException(e);
        }
        // 将PoolThreadLocalCache对象设置到数组的index这个位置
        threadLocalMap.setIndexedVariable(index, v);
        addToVariablesToRemove(threadLocalMap, this);
        return v;
    }
}

那么接下来看下InternalThreadLocalMap是什么

public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {
    public static final Object UNSET = new Object();
    public static InternalThreadLocalMap get() {
        Thread thread = Thread.currentThread();
        if (thread instanceof FastThreadLocalThread) {
            return fastGet((FastThreadLocalThread) thread);
        } else {
            return slowGet();
        }
    }
        private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
        // 直接从FastThreadLocalThread中获取InternalThreadLocalMap
        InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
        if (threadLocalMap == null) {
            thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
        }
        return threadLocalMap;
    }

    // 获取ThreadLocal中的InternalThreadLocalMap对象
    private static InternalThreadLocalMap slowGet() {
        // UnpaddedInternalThreadLocalMap中的一个ThreadLocal对象
        // 这个ThreadLocal放的是InternalThreadLocalMap
        ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
        InternalThreadLocalMap ret = slowThreadLocalMap.get();
        if (ret == null) {
            ret = new InternalThreadLocalMap();
            slowThreadLocalMap.set(ret);
        }
        return ret;
    }

    public static int nextVariableIndex() {
        int index = nextIndex.getAndIncrement();
        if (index < 0) {// 溢出了
            nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
    }

    public Object indexedVariable(int index) {
        Object[] lookup = indexedVariables;
        return index < lookup.length? lookup[index] : UNSET;
    }

    public boolean setIndexedVariable(int index, Object value) {
        Object[] lookup = indexedVariables;
        if (index < lookup.length) {
            Object oldValue = lookup[index];
            lookup[index] = value;
            return oldValue == UNSET;
        } else {
            expandIndexedVariableTableAndSet(index, value);
            return true;
        }
    }
}

代码比较简单，看一下就好了，然后理一下结构：
- 由于InternalThreadLocalMap与ThreadLocal绑定，那么其与线程相关
- 而PoolThreadCache与InternalThreadLocalMap绑定(实际放在其中的数组中)，那么PoolThreadCache也是和线程相关
- PoolThreadLocalCache只有一个实例，它只是负责初始化PoolThreadCache、

InternalThreadLocalMap，FastThreadLocal等这几个是Netty实现了类似JDK的ThreadLocal的东西，在后续会详细说明，内存分配相关的目前只需要了解上面说的就OK了

MemoryRegionCache

初始化

        // 初始化对应的数据，核心是队列
        // size默认：tiny->512 small->256 normal->64
        MemoryRegionCache(int size, SizeClass sizeClass) {
            this.size = MathUtil.safeFindNextPositivePowerOfTwo(size);
            queue = PlatformDependent.newFixedMpscQueue(this.size);
            this.sizeClass = sizeClass;
        }

分配

        public final boolean allocate(PooledByteBuf<T> buf, int reqCapacity) {
            Entry<T> entry = queue.poll();// 从队列获取元素
            if (entry == null) {// 为空返回分配失败
                return false;
            }
            // 调用chunk的initBuf/initBufWithSubpage方法
            initBuf(entry.chunk, entry.handle, buf, reqCapacity);
            entry.recycle();// 断开chunk引用，handle设置为-1

            ++ allocations;
            return true;
        }

添加元素

        public final boolean add(PoolChunk<T> chunk, long handle) {
            Entry<T> entry = newEntry(chunk, handle);// 通过chunk和handle构建一个Entry元素
            boolean queued = queue.offer(entry);//放入队列
            if (!queued) {
                // 分配失败进行回收
                entry.recycle();
            }
            return queued;
        }
        private static Entry newEntry(PoolChunk<?> chunk, long handle) {
            Entry entry = RECYCLER.get();
            entry.chunk = chunk;
            entry.handle = handle;
            return entry;
        }

释放

        public final int free() {
            return free(Integer.MAX_VALUE);
        }

        private int free(int max) {
            int numFreed = 0;
            for (; numFreed < max; numFreed++) {// 从队列释放最多max个元素
                Entry<T> entry = queue.poll();
                if (entry != null) {
                    freeEntry(entry);// 释放entry并调用Arena的freeChunk方法
                } else {
                    return numFreed;
                }
            }
            return numFreed;
        }

PoolThreadCache

从初始化方法开始看起

初始化

    PoolThreadCache(PoolArena<byte[]> heapArena, PoolArena<ByteBuffer> directArena,
                    int tinyCacheSize, int smallCacheSize, int normalCacheSize,
                    int maxCachedBufferCapacity, int freeSweepAllocationThreshold) {
        if (maxCachedBufferCapacity < 0) {
            throw new IllegalArgumentException("maxCachedBufferCapacity: "
                    + maxCachedBufferCapacity + " (expected: >= 0)");
        }
        this.freeSweepAllocationThreshold = freeSweepAllocationThreshold;
        this.heapArena = heapArena;
        this.directArena = directArena;
       // ....direct和heap类似，故省略direct相关的代码
        if (heapArena != null) {
            // 初始化对应数组，数组元素类型分别为SubPageMemoryRegionCache/NormalMemoryRegionCache/
            tinySubPageHeapCaches = createSubPageCaches(
                    tinyCacheSize, PoolArena.numTinySubpagePools, SizeClass.Tiny);
            smallSubPageHeapCaches = createSubPageCaches(
                    smallCacheSize, heapArena.numSmallSubpagePools, SizeClass.Small);

            numShiftsNormalHeap = log2(heapArena.pageSize);
            normalHeapCaches = createNormalCaches(
                    normalCacheSize, maxCachedBufferCapacity, heapArena);
            // 每次将Arena分配给PoolThreadCache，numThreadCaches都会加一
            heapArena.numThreadCaches.getAndIncrement();
        } else {
            tinySubPageHeapCaches = null;
            smallSubPageHeapCaches = null;
            normalHeapCaches = null;
            numShiftsNormalHeap = -1;
        }

        // ....
    }

分配

    boolean allocateTiny(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
        return allocate(cacheForTiny(area, normCapacity), buf, reqCapacity);
    }

    boolean allocateSmall(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
        return allocate(cacheForSmall(area, normCapacity), buf, reqCapacity);
    }

    boolean allocateNormal(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
        return allocate(cacheForNormal(area, normCapacity), buf, reqCapacity);
    }

    private boolean allocate(MemoryRegionCache<?> cache, PooledByteBuf buf, int reqCapacity) {
        //cache是通过大小从不同数组不同下标中获取到的元素
        if (cache == null) {
            return false;
        }
        // 调用MemoryRegionCache的分配方法
        boolean allocated = cache.allocate(buf, reqCapacity);
        if (++ allocations >= freeSweepAllocationThreshold) {
            allocations = 0;
            trim();
        }
        return allocated;
    }

添加一个Chunk到线程缓存

    // 获取对应的MemoryRegionCache，并使用add方法加入到队列中
    boolean add(PoolArena<?> area, PoolChunk chunk, long handle, int normCapacity, SizeClass sizeClass) {
        MemoryRegionCache<?> cache = cache(area, normCapacity, sizeClass);
        if (cache == null) {
            return false;
        }
        return cache.add(chunk, handle);
    }

整体流程分析

allocator.buffer这个方法最终会调用到newHeapBuffer这个方法(以堆内存为例)

    protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity) {
        // 通过上面分析的PoolThreadLocalCache，知道get调用的是FastThreadLocal的get方法
        // 然后再调用PoolThreadLocalCache的initialValue方法来创建一个PoolThreadCache，PoolThreadCache间接与线程绑定
        PoolThreadCache cache = threadCache.get();
        // PoolThreadLocalCache在创建PoolThreadCache的时候会选择一个Arena与其绑定，那么这个Arena也是线程相关的
        PoolArena<byte[]> heapArena = cache.heapArena;

        final ByteBuf buf;
        if (heapArena != null) {// 使用Arena进行分配
            buf = heapArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            buf = PlatformDependent.hasUnsafe() ?
                    new UnpooledUnsafeHeapByteBuf(this, initialCapacity, maxCapacity) :
                    new UnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
        }

        return toLeakAwareBuffer(buf);
    }

Arena的分配方法在之前分析Arena的时候已经分析过了，但是省略了线程缓存相关的，这里主要过一下这块内容

    private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
        final int normCapacity = normalizeCapacity(reqCapacity);
        if (isTinyOrSmall(normCapacity)) { // capacity < pageSize
            int tableIdx;
            PoolSubpage<T>[] table;
            boolean tiny = isTiny(normCapacity);
            if (tiny) { // < 512
                if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {// 最后调用到的是PoolThreadCache 的allocate方法
                    return;
                }
                //....
            } else {
                if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {// 最后调用到的是PoolThreadCache 的allocate方法
                    return;
                }
                //....
            }
            //....
            return;
        }
        if (normCapacity <= chunkSize) {
            if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {// 最后调用到的是PoolThreadCache 的allocate方法
                return;
            }
            //....
        } else {
            //....
        }
    }

在分配之前都会到队列中获取，看看是否有对应的元素且分配成功，如果成功，那么就不走其他流程了

Netty源码分析----PoolThreadCache

整体介绍

headpArenas/directArenas

PoolThreadLocalCache

PoolThreadCache

MemoryRegionCache

源码分析

PoolThreadLocalCache

MemoryRegionCache

初始化

分配

添加元素

释放

PoolThreadCache

初始化

分配

添加一个Chunk到线程缓存

整体流程分析

猜你喜欢