highlight: arduino-light

FastThreadLocal

FastThreadLocal 的实现与 ThreadLocal 非常类似，Netty 为 FastThreadLocal 量身打造了 FastThreadLocalThread 和 InternalThreadLocalMap 两个重要的类。下面我们看下这两个类是如何实现的。

FastThreadLocalThread 是对 Thread 类的一层包装，每个线程对应一个 InternalThreadLocalMap 实例。只有 FastThreadLocal 和 FastThreadLocalThread 组合使用时，才能发挥 FastThreadLocal 的性能优势。首先看下 FastThreadLocalThread 的源码定义：

java public class FastThreadLocalThread extends Thread { private InternalThreadLocalMap threadLocalMap; // 省略其他代码 }

可以看出 FastThreadLocalThread 主要扩展了 InternalThreadLocalMap 字段，我们可以猜测到 FastThreadLocalThread 主要使用 InternalThreadLocalMap 存储数据，而不再是使用 Thread 中的 ThreadLocalMap。所以想知道 FastThreadLocalThread 高性能的奥秘，必须要了解 InternalThreadLocalMap 的设计原理。

上文中我们讲到了 ThreadLocal 的一个重要缺点，就是 ThreadLocalMap 采用线性探测法解决 Hash 冲突性能较慢，那么 InternalThreadLocalMap 又是如何优化的呢？首先一起看下 InternalThreadLocalMap 的内部构造。

java class UnpaddedInternalThreadLocalMap { static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>(); static final AtomicInteger nextIndex = new AtomicInteger(); Object[] indexedVariables; UnpaddedInternalThreadLocalMap(Object[] indexedVariables) { this.indexedVariables = indexedVariables; } // 省略其他代码 } public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap { private static final int DEFAULT_ARRAY_LIST_INITIAL_CAPACITY = 8; private static final int STRING_BUILDER_INITIAL_SIZE; private static final int STRING_BUILDER_MAX_SIZE; public static final Object UNSET = new Object(); private BitSet cleanerFlags; private InternalThreadLocalMap() { super(newIndexedVariableTable()); } private static Object[] newIndexedVariableTable() { Object[] array = new Object[32]; Arrays.fill(array, UNSET); return array; } public static int nextVariableIndex() { int index = nextIndex.getAndIncrement(); if (index < 0) { nextIndex.decrementAndGet(); throw new IllegalStateException("too many thread-local indexed variables"); } return index; } // 省略其他代码 }

从 InternalThreadLocalMap 内部实现来看，与 ThreadLocalMap 一样都是采用数组的存储方式。但是 InternalThreadLocalMap 并没有使用线性探测法来解决 Hash 冲突，而是在 FastThreadLocal 初始化的时候分配一个数组索引 index，index 的值采用原子类 AtomicInteger 保证顺序递增，通过调用 InternalThreadLocalMap.nextVariableIndex() 方法获得。然后在读写数据的时候通过数组下标 index 直接定位到 FastThreadLocal 的位置，时间复杂度为 O(1)。如果数组下标递增到非常大，那么数组也会比较大，所以 FastThreadLocal 是通过空间换时间的思想提升读写性能。下面通过一幅图描述 InternalThreadLocalMap、index 和 FastThreadLocal 之间的关系。

通过上面 FastThreadLocal 的内部结构图，我们对比下与 ThreadLocal 有哪些区别呢？FastThreadLocal 使用 Object 数组替代了 Entry 数组，Object[0] 存储的是一个Set\ > 集合，从数组下标 1 开始都是直接存储的 value 数据，不再采用 ThreadLocal 的键值对形式进行存储。

假设现在我们有一批数据需要添加到数组中，分别为 value1、value2、value3、value4，对应的 FastThreadLocal 在初始化的时候生成的数组索引分别为 1、2、3、4。如下图所示。

至此，我们已经对 FastThreadLocal 有了一个基本的认识，下面我们结合具体的源码分析 FastThreadLocal 的实现原理。

FastThreadLocal 示例

在讲解源码之前，我们回过头看下上文中的 ThreadLocal 示例，如果把示例中 ThreadLocal 替换成 FastThread，应当如何使用呢？ java package io.netty.example.chapter1.echo; import io.netty.util.concurrent.FastThreadLocal; import io.netty.util.concurrent.FastThreadLocalThread; public class FastThreadLocalTest { private static final FastThreadLocal<String> THREAD_NAME_LOCAL = new FastThreadLocal<>(); private static final FastThreadLocal<String> TRADE_THREAD_LOCAL = new FastThreadLocal<>(); public static void main(String[] args) { for (int i = 0; i < 2; i++) { int tradeId = i; String threadName = "thread-" + i; new FastThreadLocalThread(() -> { THREAD_NAME_LOCAL.set(threadName); String String = new String("未支付" + Thread.currentThread().getName()); TRADE_THREAD_LOCAL.set(String); System.out.println("threadName: " + THREAD_NAME_LOCAL.get()); System.out.println("String info：" + TRADE_THREAD_LOCAL.get()); }, threadName).start(); } } } threadName: thread-1 String info：未支付thread-1 threadName: thread-0 String info：未支付thread-0 可以看出，FastThreadLocal 的使用方法几乎和 ThreadLocal 保持一致，只需要把代码中 Thread、ThreadLocal 替换为 FastThreadLocalThread 和 FastThreadLocal 即可，Netty 在易用性方面做得相当棒。下面我们重点对示例中用得到 FastThreadLocal.set()/get() 方法做深入分析。

FastThreadLocal 构造分析

```java public FastThreadLocal() { //下标递增 index = InternalThreadLocalMap.nextVariableIndex(); }

public static int nextVariableIndex() {
        int index = nextIndex.getAndIncrement();
        if (index < 0) {
            nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
    }

FastThreadLocal set源码分析

public final void set(V value) {
    // 1. value 是否为缺省值
    if (value != InternalThreadLocalMap.UNSET) { 
        // 2. 获取当前线程的 InternalThreadLocalMap
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); 
        // 3. 将 InternalThreadLocalMap 中数据替换为新的 value
        setKnownNotUnset(threadLocalMap, value); 
    } else {
        remove();
    }
}

//setKnownNotUnset() 如何将数据添加到 InternalThreadLocalMap 的。
private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
    // 1. 找到数组下标 index 位置，设置新的 value
    //返回true 代表第一次放入
    //同一个 index重复放入不再放入
    if (threadLocalMap.setIndexedVariable(index, value)) { 
        // 2. 将 FastThreadLocal 对象保存到待清理的 Set 中
        addToVariablesToRemove(threadLocalMap, this); 
    }
}

public boolean setIndexedVariable(int index, Object value) {
    Object[] lookup = indexedVariables;
    if (index < lookup.length) {
        Object oldValue = lookup[index]; 
        // 直接将数组 index 位置设置为 value，时间复杂度为 O(1)
        lookup[index] = value; 
        return oldValue == UNSET;
    } else {
        // 容量不够，先扩容再设置值
        expandIndexedVariableTableAndSet(index, value); 
        return true;
    }
}

``` indexedVariables 就是 InternalThreadLocalMap 中用于存放数据的数组，如果数组容量大于 FastThreadLocal 的 index 索引，那么直接找到数组下标 index 位置将新 value 设置进去，事件复杂度为 O(1)。在设置新的 value 之前，会将之前 index 位置的元素取出，如果旧的元素还是 UNSET 缺省对象，那么返回成功。

如果数组容量不够了怎么办呢？InternalThreadLocalMap 会自动扩容，然后再设置 value。接下来看看 expandIndexedVariableTableAndSet() 的扩容逻辑： ```java private void expandIndexedVariableTableAndSet(int index, Object value) { Object[] oldArray = indexedVariables; final int oldCapacity = oldArray.length; int newCapacity = index; newCapacity |= newCapacity >>> 1; newCapacity |= newCapacity >>> 2; newCapacity |= newCapacity >>> 4; newCapacity |= newCapacity >>> 8; newCapacity |= newCapacity >>> 16; newCapacity ++; Object[] newArray = Arrays.copyOf(oldArray, newCapacity); Arrays.fill(newArray, oldCapacity, newArray.length, UNSET); newArray[index] = value; indexedVariables = newArray; }

上述代码的位移操作是不是似曾相识？我们去翻阅下 JDK HashMap 中扩容的源码，其中有这么一段代码：

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

``` 可以看出 InternalThreadLocalMap 实现数组扩容几乎和 HashMap 完全是一模一样的，所以多读源码还是可以给我们很多启发的。InternalThreadLocalMap 以 index 为基准进行扩容，将数组扩容后的容量向上取整为 2 的次幂。然后将原数组内容拷贝到新的数组中，空余部分填充缺省对象 UNSET，最终把新数组赋值给 indexedVariables。

为什么 InternalThreadLocalMap 以 index 为基准进行扩容，而不是原数组长度呢？假设现在初始化了 70 个 FastThreadLocal，但是这些 FastThreadLocal 从来没有调用过 set() 方法，此时数组还是默认长度 32。当第 index = 70 的 FastThreadLocal 调用 set() 方法时，如果按原数组容量 32 进行扩容 2 倍后，还是无法填充 index = 70 的数据。所以使用 index 为基准进行扩容可以解决这个问题，但是如果 FastThreadLocal 特别多，数组的长度也是非常大的。

回到 setKnownNotUnset() 的主流程，向 InternalThreadLocalMap 添加完数据之后，接下就是将 FastThreadLocal 对象保存到待清理的 Set 中。我们继续看下 addToVariablesToRemove() 是如何实现的。 private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) { // 获取数组下标为 0 的元素 Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex); Set<FastThreadLocal<?>> variablesToRemove; if (v == InternalThreadLocalMap.UNSET || v == null) { // 创建 FastThreadLocal 类型的 Set 集合 variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>()); // 将 Set 集合填充到数组下标 0 的位置 threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove); } else { // 如果不是 UNSET，Set 集合已存在，直接强转获得 Set 集合 variablesToRemove = (Set<FastThreadLocal<?>>) v; } //放入的是threadLocal 是为了释放threadLocal variablesToRemove.add(variable); // 将 FastThreadLocal 添加到 Set 集合中 } variablesToRemoveIndex 是采用 static final 修饰的变量，在 FastThreadLocal 初始化时 variablesToRemoveIndex 被赋值为 0。InternalThreadLocalMap 首先会找到数组下标为 0 的元素，如果该元素是缺省对象 UNSET 或者不存在，那么会创建一个 FastThreadLocal 类型的 Set 集合，然后把 Set 集合填充到数组下标 0 的位置。如果数组第一个元素不是缺省对象 UNSET，说明 Set 集合已经被填充，直接强转获得 Set 集合即可。这就解释了 InternalThreadLocalMap 的 value 数据为什么是从下标为 1 的位置开始存储了，因为 0 的位置已经被 Set 集合占用了。

为什么 InternalThreadLocalMap 要在数组下标为 0 的位置存放一个 FastThreadLocal 类型的 Set 集合呢？这时候我们回过头看下 remove() 方法。 java public final void remove() { remove(InternalThreadLocalMap.getIfSet()); } public static InternalThreadLocalMap getIfSet() { Thread thread = Thread.currentThread(); if (thread instanceof FastThreadLocalThread) { return ((FastThreadLocalThread) thread).threadLocalMap(); } return slowThreadLocalMap.get(); } public final void remove(InternalThreadLocalMap threadLocalMap) { if (threadLocalMap == null) { return; } // 删除数组下标 index 位置对应的 value Object v = threadLocalMap.removeIndexedVariable(index); // 从数组下标 0 的位置取出 Set 集合，并删除当前 FastThreadLocal removeFromVariablesToRemove(threadLocalMap, this); if (v != InternalThreadLocalMap.UNSET) { try { onRemoval((V) v); // 空方法，用户可以继承实现 } catch (Exception e) { PlatformDependent.throwException(e); } } } 在执行 remove 操作之前，会调用 InternalThreadLocalMap.getIfSet() 获取当前 InternalThreadLocalMap。有了之前的基础，理解 getIfSet() 方法就非常简单了，如果是 FastThreadLocalThread 类型，直接取 FastThreadLocalThread 中 threadLocalMap 属性。如果是普通线程 Thread，从 ThreadLocal 类型的 slowThreadLocalMap 中获取。

找到 InternalThreadLocalMap 之后，InternalThreadLocalMap 会从数组中定位到下标 index 位置的元素，并将 index 位置的元素覆盖为缺省对象 UNSET。接下来就需要清理当前的 FastThreadLocal 对象，此时 Set 集合就派上了用场，InternalThreadLocalMap 会取出数组下标 0 位置的 Set 集合，然后删除当前 FastThreadLocal。最后 onRemoval() 方法起到什么作用呢？Netty 只是留了一处扩展，并没有实现，用户需要在删除的时候做一些后置操作，可以继承 FastThreadLocal 实现该方法。

至此，FastThreadLocal.set() 的完成过程已经讲完了，接下来我们继续 FastThreadLocal.get() 方法的实现就易如反掌拉。

FastThreadLocal get源码分析

java public final V get() { InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); Object v = threadLocalMap.indexedVariable(index); // 从数组中取出 index 位置的元素 if (v != InternalThreadLocalMap.UNSET) { return (V) v; } return initialize(threadLocalMap); // 如果获取到的数组元素是缺省对象，执行初始化操作 } public Object indexedVariable(int index) { Object[] lookup = indexedVariables; return index < lookup.length? lookup[index] : UNSET; } private V initialize(InternalThreadLocalMap threadLocalMap) { V v = null; try { v = initialValue(); } catch (Exception e) { PlatformDependent.throwException(e); } threadLocalMap.setIndexedVariable(index, v); addToVariablesToRemove(threadLocalMap, this); return v; } 首先根据当前线程是否是 FastThreadLocalThread 类型找到 InternalThreadLocalMap，然后取出从数组下标 index 的元素，如果 index 位置的元素不是缺省对象 UNSET，说明该位置已经填充过数据，直接取出返回即可。

如果 index 位置的元素是缺省对象 UNSET，那么需要执行初始化操作。可以看到，initialize() 方法会调用用户重写的 initialValue 方法构造需要存储的对象数据，如下所示。 java private final FastThreadLocal<String> threadLocal = new FastThreadLocal<String>() { @Override protected String initialValue() { return "hello world"; } }; 构造完用户对象数据之后，接下来就会将它填充到数组 index 的位置，然后再把当前 FastThreadLocal 对象保存到待清理的 Set 中。整个过程我们在分析 FastThreadLocal.set() 时都已经介绍过，就不再赘述了。

到此为止，FastThreadLocal 最核心的两个方法 set()/get() 我们已经分析完了。

FastThreadLocalThread

无参构造方法:和普通Thread一样

```java public class FastThreadLocalThread extends Thread { //无参构造方法 //使用无参构造方法跟普通的Thread一样 public FastThreadLocalThread() { //不需要清理FastThreadLocals cleanupFastThreadLocals = false; } ｝

//无参构造 
FastThreadLocalThread fastThreadLocalThread = new FastThreadLocalThread();
//其实调用的是父类Thread的run方法
/***
 @Override
    public void run() {
        if (target != null) {
            target.run();
        }
    }
***/    
fastThreadLocalThread.run();

```

有参构造方法:做了包装

```java public class FastThreadLocalThread extends Thread { //有参构造方法 public FastThreadLocalThread(Runnable target) { //使用FastThreadLocalRunnable做了1个包装 super(FastThreadLocalRunnable.wrap(target)); //需要清理FastThreadLocals cleanupFastThreadLocals = true; } }

final class FastThreadLocalRunnable implements Runnable {
    private final Runnable runnable;
    //包装类
    //判断传入进来的runnable是否是FastThreadLocalRunnable
    //如果是 就直接返回传入进来的runnable
    //如果不是 构造1个FastThreadLocalRunnable返回
    static Runnable wrap(Runnable runnable) {
        return runnable instanceof FastThreadLocalRunnable ? 
                        runnable : new FastThreadLocalRunnable(runnable);
    } 

    //将runnable赋值给成员变量的runnable
    private FastThreadLocalRunnable(Runnable runnable) {
        this.runnable = ObjectUtil.checkNotNull(runnable, "runnable");
    }

    //关键点在于这里做了1个包装
    //业务逻辑runnable的run方法走完会调用
    // FastThreadLocal.removeAll();
    @Override
    public void run() {
        try {
            runnable.run();
        } finally {
            FastThreadLocal.removeAll();
        }
    }
}

  public static void removeAll() {
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.getIfSet();
        if (threadLocalMap == null) {
            return;
        }

        try {
            Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
            if (v != null && v != InternalThreadLocalMap.UNSET) {
                @SuppressWarnings("unchecked")
                Set<FastThreadLocal<?>> variablesToRemove = (Set<FastThreadLocal<?>>) v;
                FastThreadLocal<?>[] variablesToRemoveArray =
                        variablesToRemove.toArray(new FastThreadLocal[0]);
                for (FastThreadLocal<?> tlv: variablesToRemoveArray) {
                    tlv.remove(threadLocalMap);
                }
            }
        } finally {
            InternalThreadLocalMap.remove();
        }
    }

```

判断是否会自动清理

java @UnstableApi public static boolean willCleanupFastThreadLocals(Thread thread) { //是FastThreadLocalThread //并且cleanupFastThreadLocals为true return thread instanceof FastThreadLocalThread && ((FastThreadLocalThread) thread).willCleanupFastThreadLocals(); }

FTL一定比 ThreadLocal 快吗？

答案是不一定的，只有使用FastThreadLocalThread 类型的线程才会更快，如果是普通线程反而会更慢。

FTL不会浪费很大的空间

虽然 FastThreadLocal 采用的空间换时间的思路，但是在 FastThreadLocal 设计之初就认为不会存在特别多的 FastThreadLocal 对象，而且在数据中没有使用的元素只是存放了同一个缺省对象的引用，并不会占用太多内存空间。

总结

本节课我们对比介绍了 ThreadLocal 和 FastThreadLocal，简单总结下 FastThreadLocal 的优势。

高效查找。

FastThreadLocal 在定位数据的时候可以直接根据数组下标 index 获取，时间复杂度 O(1)。而 JDK 原生的 ThreadLocal 在数据较多时哈希表很容易发生 Hash 冲突，线性探测法在解决 Hash 冲突时需要不停地向下寻找，效率较低。

此外，FastThreadLocal 相比 ThreadLocal 数据扩容更加简单高效，FastThreadLocal 以 index 为基准向上取整到 2 的次幂作为扩容后容量，然后把原数据拷贝到新数组。而 ThreadLocal 由于采用的哈希表，所以在扩容后需要再做一轮 rehash。

安全性更高。

JDK 原生的 ThreadLocal 使用不当可能造成内存泄漏，只能等待线程销毁。在使用线程池的场景下，ThreadLocal 只能通过主动检测的方式防止内存泄漏，从而造成了一定的开销。

然而 FastThreadLocal 不仅提供了 remove() 主动清除对象的方法，而且在线程池场景中 Netty 还封装了 FastThreadLocalRunnable，FastThreadLocalRunnable 最后会执行 FastThreadLocal.removeAll() 将 Set 集合中所有 FastThreadLocal 对象都清理掉，

FastThreadLocal 体现了 Netty 在高性能方面精益求精的设计精神，FastThreadLocal 仅仅是其中的冰山一角，下节课我们继续探索 Netty 中其他高效的数据结构技巧。

27.Netty源码之FastThreadLocal