In-depth understanding of Netty FastThreadLocal

Author: vivo Internet Server Team-Jiang Zhu


This article takes the weird online problems as the starting point, compares the implementation logic, advantages and disadvantages of JDK ThreadLocal and Netty FastThreadLocal, and interprets the source code in depth to understand Netty FastThreadLocal from the shallower to the deeper.


I. Introduction


I have been learning about Netty recently, and when I saw the Netty FastThreadLocal chapter, I recalled a strange online problem.


Problem description : When the export business obtains user information to determine whether https is supported, the user information obtained is sometimes confusing.


Problem analysis : When using ThreadLocal to save user information, the remove() operation failed to be performed in time. Since the Tomcat working thread is based on the thread pool, thread reuse will occur, so the user information obtained may be leftover from the previous thread.


Problem fix : remove() immediately after using ThreadLocal and perform remove() double insurance operation before using ThreadLocal.


Next, let's continue to take a deeper look at JDK ThreadLocal and Netty FastThreadLocal.


2. Introduction to JDK ThreadLocal


ThreadLocal is a convenient object class provided by JDK that can be passed and obtained in different methods in this thread. Variables defined with it are only visible in this thread, are not affected by other threads, and are isolated from other threads .


How is this achieved? As shown in Figure 1, each thread will have a ThreadLocalMap instance variable, which is created using lazy loading. It will be created when the thread accesses this variable for the first time.


ThreadLocalMap uses linear detection method to store ThreadLocal objects and the data they maintain. The specific operation logic is as follows:

Suppose there is a new ThreadLocal object, and the location index where it should be stored is calculated by hashing as x.


At this time, it is found that other ThreadLocal objects have been stored at the corresponding position of the subscript x, then it will search backward, with a step size of 1, and the subscript is changed to x+1.


Next, it is found that other ThreadLocal objects have been stored at the position corresponding to the subscript x+1. In the same way, it will continue to search later, and the subscript is changed to x+2.


Until the subscript x+3 is found, it is found to be free, and then the ThreadLocal object and its maintained data are constructed into an entry object and stored at the x+3 location.


When there is a lot of data in ThreadLocalMap, hash conflicts are prone to occur. To resolve conflicts, continuous downward traversal is required. The time complexity of this operation is O(n) and the efficiency is low .



figure 1


As can be seen from the code below:

The key of Entry is a weak reference , and the value is a strong reference. During JVM garbage collection, as long as weakly referenced objects are found, they will be recycled regardless of whether the memory is sufficient.


However, when the ThreadLocal is no longer used and is recycled by GC, the Entry key in the ThreadLocalMap may be NULL. Then the Entry value will always have a strong reference to the data and cannot be released. It can only wait for the thread to be destroyed, causing a memory leak .

static class ThreadLocalMap {    // 弱引用,在资源紧张的时候可以回收部分不再引用的ThreadLocal变量    static class Entry extends WeakReference<ThreadLocal<?>> {        // 当前ThreadLocal对象所维护的数据        Object value;         Entry(ThreadLocal<?> k, Object v) {            super(k);            value = v;        }    }    // 省略其他代码}

To sum up, since the ThreadLocal provided by JDK may have low efficiency and memory leak problems, why not make corresponding optimization and transformation?

1. Judging from the ThreadLocal class annotation, it was introduced in JDK1.2 version. In the early days, the performance of the program may not have been paid much attention to.


2. In most multi-thread scenarios, there are few ThreadLocal variables in the thread, so the probability of hash conflicts is relatively small. If hash conflicts occasionally occur, the impact on program performance is relatively small.


3. Regarding the memory leak problem, ThreadLocal itself has taken certain protection measures. As a user, when a ThreadLocal object in a thread is no longer used or an exception occurs, immediately call the remove() method to delete the Entry object and develop good coding habits.


3. Introduction to Netty FastThreadLocal


FastThreadLocal is an optimized version of ThreadLocal provided by JDK in Netty. Judging from the name, it should be faster than ThreadLocal to cope with scenarios where Netty handles large concurrency and data throughput.


How is this achieved? As shown in Figure 2, each thread will have an InternalThreadLocalMap instance variable.

When each FastThreadLocal instance is created, AtomicInteger is used to ensure sequential increment to generate a unique subscript index, which is the location where the data maintained by the FastThreadLocal object should be stored.


When reading and writing data, the location of the FastThreadLocal is directly located through the subscript index of the FastThreadLocal. The time complexity is O(1) and the efficiency is high.


If the subscript index increases to a very large size, the array maintained by InternalThreadLocalMap will also be very large, so FastThreadLocal improves read and write performance by exchanging space for time.



figure 2


4. Netty FastThreadLocal source code analysis


4.1 Construction method

public class FastThreadLocal<V> {    // FastThreadLocal中的index是记录了该它维护的数据应该存储的位置    // InternalThreadLocalMap数组中的下标, 它是在构造函数中确定的    private final int index;     public InternalThreadLocal() {        index = InternalThreadLocalMap.nextVariableIndex();    }    // 省略其他代码}


public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {    // 自增索引, ⽤于计算下次存储到Object数组中的位置    private static final AtomicInteger nextIndex = new AtomicInteger();     private static final int ARRAY_LIST_CAPACITY_MAX_SIZE = Integer.MAX_VALUE - 8;     public static int nextVariableIndex() {        int index = nextIndex.getAndIncrement();        if (index >= ARRAY_LIST_CAPACITY_MAX_SIZE || index < 0) {            nextIndex.set(ARRAY_LIST_CAPACITY_MAX_SIZE);            throw new IllegalStateException("too many thread-local indexed variables");        }        return index;    }    // 省略其他代码}


The above two pieces of code have already been explained in the introduction to Netty FastThreadLocal, so they will not be repeated here.


4.2 get method


public class FastThreadLocal<V> {    // FastThreadLocal中的index是记录了该它维护的数据应该存储的位置    private final int index;     public final V get() {        // 获取当前线程的InternalThreadLocalMap        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();        // 根据当前线程的index从InternalThreadLocalMap中获取其绑定的数据        Object v = threadLocalMap.indexedVariable(index);        // 如果获取当前线程绑定的数据不为缺省值UNSET,则直接返回;否则进行初始化        if (v != InternalThreadLocalMap.UNSET) {            return (V) v;        }         return initialize(threadLocalMap);    }    // 省略其他代码}


public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {    private static final int INDEXED_VARIABLE_TABLE_INITIAL_SIZE = 32;     // 未赋值的Object变量(缺省值),当⼀个与线程绑定的值被删除之后,会被设置为UNSET    public static final Object UNSET = new Object();     // 存储绑定到当前线程的数据的数组    private Object[] indexedVariables;     // slowThreadLocalMap为JDK ThreadLocal存储InternalThreadLocalMap    private static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap =            new ThreadLocal<InternalThreadLocalMap>();     // 从绑定到当前线程的数据的数组中取出index位置的元素    public Object indexedVariable(int index) {        Object[] lookup = indexedVariables;        return index < lookup.length? lookup[index] : UNSET;    }     public static InternalThreadLocalMap get() {        Thread thread = Thread.currentThread();        // 判断当前线程是否是FastThreadLocalThread类型        if (thread instanceof FastThreadLocalThread) {            return fastGet((FastThreadLocalThread) thread);        } else {            return slowGet();        }    }     private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {        // 直接获取当前线程的InternalThreadLocalMap        InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();        // 如果当前线程的InternalThreadLocalMap还未创建,则创建并赋值        if (threadLocalMap == null) {            thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());        }        return threadLocalMap;    }     private static InternalThreadLocalMap slowGet() {        // 使用JDK ThreadLocal获取InternalThreadLocalMap        InternalThreadLocalMap ret = slowThreadLocalMap.get();        if (ret == null) {            ret = new InternalThreadLocalMap();            slowThreadLocalMap.set(ret);        }        return ret;    }     private InternalThreadLocalMap() {        indexedVariables = newIndexedVariableTable();    }     // 初始化一个32位长度的Object数组,并将其元素全部设置为缺省值UNSET    private static Object[] newIndexedVariableTable() {        Object[] array = new Object[INDEXED_VARIABLE_TABLE_INITIAL_SIZE];        Arrays.fill(array, UNSET);        return array;    }    // 省略其他代码}


The get()  method in the source code  is mainly divided into the following three steps:

通过InternalThreadLocalMap.get()方法获取当前线程的InternalThreadLocalMap。
根据当前线程的index 从InternalThreadLocalMap中获取其绑定的数据。
如果不是缺省值UNSET,直接返回;如果是缺省值,则执行initialize方法进行初始化。


Let’s continue to analyze

Implementation logic of InternalThreadLocalMap.get() method.


首先判断当前线程是否是FastThreadLocalThread类型,如果是FastThreadLocalThread类型则直接使用fastGet方法获取InternalThreadLocalMap,如果不是FastThreadLocalThread类型则使用slowGet方法获取InternalThreadLocalMap兜底处理。
兜底处理中的slowGet方法会退化成JDK原生的ThreadLocal获取InternalThreadLocalMap。
获取InternalThreadLocalMap时,如果为null,则会直接创建一个InternalThreadLocalMap返回。其创建过过程中初始化一个32位长度的Object数组,并将其元素全部设置为缺省值UNSET。


4.3 set method

public class FastThreadLocal<V> {    // FastThreadLocal初始化时variablesToRemoveIndex被赋值为0    private static final int variablesToRemoveIndex = InternalThreadLocalMap.nextVariableIndex();     public final void set(V value) {        // 判断value值是否是未赋值的Object变量(缺省值)        if (value != InternalThreadLocalMap.UNSET) {            // 获取当前线程对应的InternalThreadLocalMap            InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();            // 将InternalThreadLocalMap中数据替换为新的value            // 并将FastThreadLocal对象保存到待清理的Set中            setKnownNotUnset(threadLocalMap, value);        } else {            remove();        }    }     private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {        // 将InternalThreadLocalMap中数据替换为新的value        if (threadLocalMap.setIndexedVariable(index, value)) {            // 并将当前的FastThreadLocal对象保存到待清理的Set中            addToVariablesToRemove(threadLocalMap, this);        }    }     private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {        // 取下标index为0的数据,用于存储待清理的FastThreadLocal对象Set集合中        Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);        Set<FastThreadLocal<?>> variablesToRemove;        if (v == InternalThreadLocalMap.UNSET || v == null) {            // 下标index为0的数据为空,则创建FastThreadLocal对象Set集合            variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>());            // 将InternalThreadLocalMap中下标为0的数据,设置成FastThreadLocal对象Set集合            threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove);        } else {            variablesToRemove = (Set<FastThreadLocal<?>>) v;        }        // 将FastThreadLocal对象保存到待清理的Set中        variablesToRemove.add(variable);    }    // 省略其他代码}


public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {    // 未赋值的Object变量(缺省值),当⼀个与线程绑定的值被删除之后,会被设置为UNSET    public static final Object UNSET = new Object();    // 存储绑定到当前线程的数据的数组    private Object[] indexedVariables;    // 绑定到当前线程的数据的数组能再次采用x2扩容的最大量    private static final int ARRAY_LIST_CAPACITY_EXPAND_THRESHOLD = 1 << 30;    private static final int ARRAY_LIST_CAPACITY_MAX_SIZE = Integer.MAX_VALUE - 8;     // 将InternalThreadLocalMap中数据替换为新的value    public boolean setIndexedVariable(int index, Object value) {        Object[] lookup = indexedVariables;        if (index < lookup.length) {            Object oldValue = lookup[index];            // 直接将数组 index 位置设置为 value,时间复杂度为 O(1)            lookup[index] = value;            return oldValue == UNSET;        } else { // 绑定到当前线程的数据的数组需要扩容,则扩容数组并数组设置新value            expandIndexedVariableTableAndSet(index, value);            return true;        }    }     private void expandIndexedVariableTableAndSet(int index, Object value) {        Object[] oldArray = indexedVariables;        final int oldCapacity = oldArray.length;        int newCapacity;        // 判断可进行x2方式进行扩容        if (index < ARRAY_LIST_CAPACITY_EXPAND_THRESHOLD) {            newCapacity = index;            // 位操作,提升扩容效率            newCapacity |= newCapacity >>>  1;            newCapacity |= newCapacity >>>  2;            newCapacity |= newCapacity >>>  4;            newCapacity |= newCapacity >>>  8;            newCapacity |= newCapacity >>> 16;            newCapacity ++;        } else { // 不支持x2方式扩容,则设置绑定到当前线程的数据的数组容量为最大值            newCapacity = ARRAY_LIST_CAPACITY_MAX_SIZE;        }        // 按扩容后的大小创建新数组,并将老数组数据copy到新数组        Object[] newArray = Arrays.copyOf(oldArray, newCapacity);        // 新数组扩容后的部分赋UNSET缺省值        Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);        // 新数组的index位置替换成新的value        newArray[index] = value;        // 绑定到当前线程的数据的数组用新数组替换        indexedVariables = newArray;    }    // 省略其他代码}


The set() method in the source code   is mainly divided into the following three steps:


判断value是否是缺省值UNSET,如果value不等于缺省值,则会通过InternalThreadLocalMap.get()方法获取当前线程的InternalThreadLocalMap,具体实现3.2小节中get()方法已做讲解。
通过FastThreadLocal中的setKnownNotUnset()方法将InternalThreadLocalMap中数据替换为新的value,并将当前的FastThreadLocal对象保存到待清理的Set中。
如果等于缺省值UNSET或nullelse的逻辑),会调用remove()方法,remove()具体见后面的代码分析。


Let's take a look next

Implementation logic of InternalThreadLocalMap.setIndexedVariable method.

Determine whether index exceeds the length of the array indexedVariables that stores data bound to the current thread. If not, obtain the data at the index position and set a new value to the array index position data.


If it exceeds the limit and the array bound to the data of the current thread needs to be expanded, the array is expanded and the data at its index position is set to a new value.


The expanded array is expanded based on index, and the expanded capacity of the array is rounded up to the power of 2. Then copy the contents of the original array to the new array, fill the empty part with the default value UNSET, and finally assign the new array to indexedVariables.


Let’s continue to look at

Implementation logic of FastThreadLocal.addToVariablesToRemove method.

1. Get the data with index 0 (used to store the FastThreadLocal object Set to be cleaned). If the data is the default value UNSET or null, a FastThreadLocal object Set will be created and the Set will be filled in The array position with subscript index 0.


2.如果该数据不是缺省值UNSET,说明Set集合已金被填充,直接强转获取该Set集合。


3.最后将FastThreadLocal对象保存到待清理的Set集合中。


4.4 remove、removeAll方法

public class FastThreadLocal<V> {    // FastThreadLocal初始化时variablesToRemoveIndex被赋值为0    private static final int variablesToRemoveIndex = InternalThreadLocalMap.nextVariableIndex();     public final void remove() {        // 获取当前线程的InternalThreadLocalMap        // 删除当前的FastThreadLocal对象及其维护的数据        remove(InternalThreadLocalMap.getIfSet());    }     public final void remove(InternalThreadLocalMap threadLocalMap) {        if (threadLocalMap == null) {            return;        }         // 根据当前线程的index,并将该数组下标index位置对应的值设置为缺省值UNSET        Object v = threadLocalMap.removeIndexedVariable(index);        // 存储待清理的FastThreadLocal对象Set集合中删除当前FastThreadLocal对象        removeFromVariablesToRemove(threadLocalMap, this);         if (v != InternalThreadLocalMap.UNSET) {            try {                // 空方法,用户可以继承实现                onRemoval((V) v);            } catch (Exception e) {                PlatformDependent.throwException(e);            }        }    }     public static void removeAll() {        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.getIfSet();        if (threadLocalMap == null) {            return;        }         try {            // 取下标index为0的数据,用于存储待清理的FastThreadLocal对象Set集合中            Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);            if (v != null && v != InternalThreadLocalMap.UNSET) {                @SuppressWarnings("unchecked")                Set<FastThreadLocal<?>> variablesToRemove = (Set<FastThreadLocal<?>>) v;                // 遍历所有的FastThreadLocal对象并删除它们以及它们维护的数据                FastThreadLocal<?>[] variablesToRemoveArray =                        variablesToRemove.toArray(new FastThreadLocal[0]);                for (FastThreadLocal<?> tlv: variablesToRemoveArray) {                    tlv.remove(threadLocalMap);                }            }        } finally {            // 删除InternalThreadLocalMap中threadLocalMap和slowThreadLocalMap数据            InternalThreadLocalMap.remove();        }    }     private static void removeFromVariablesToRemove(            InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {        // 取下标index为0的数据,用于存储待清理的FastThreadLocal对象Set集合中        Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);         if (v == InternalThreadLocalMap.UNSET || v == null) {            return;        }         @SuppressWarnings("unchecked")        // 存储待清理的FastThreadLocal对象Set集合中删除该FastThreadLocal对象        Set<FastThreadLocal<?>> variablesToRemove = (Set<FastThreadLocal<?>>) v;        variablesToRemove.remove(variable);    }     // 省略其他代码}


public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {     // 根据当前线程获取InternalThreadLocalMap       public static InternalThreadLocalMap getIfSet() {        Thread thread = Thread.currentThread();        if (thread instanceof FastThreadLocalThread) {            return ((FastThreadLocalThread) thread).threadLocalMap();        }        return slowThreadLocalMap.get();    }     // 数组下标index位置对应的值设置为缺省值UNSET    public Object removeIndexedVariable(int index) {        Object[] lookup = indexedVariables;        if (index < lookup.length) {            Object v = lookup[index];            lookup[index] = UNSET;            return v;        } else {            return UNSET;        }    }     // 删除threadLocalMap和slowThreadLocalMap数据    public static void remove() {        Thread thread = Thread.currentThread();        if (thread instanceof FastThreadLocalThread) {            ((FastThreadLocalThread) thread).setThreadLocalMap(null);        } else {            slowThreadLocalMap.remove();        }    }    // 省略其他代码}


源码中 remove() 方法主要分为下面2个步骤处理:

通过InternalThreadLocalMap.getIfSet()获取当前线程的InternalThreadLocalMap。具体和3.2小节get()方法里面获取当前线程的InternalThreadLocalMap相似,这里就不再重复介绍了。
删除当前的FastThreadLocal对象及其维护的数据。


源码中 removeAll() 方法主要分为下面3个步骤处理:

通过InternalThreadLocalMap.getIfSet()获取当前线程的InternalThreadLocalMap。
取下标index为0的数据(用于存储待清理的FastThreadLocal对象Set集合),然后遍历所有的FastThreadLocal对象并删除它们以及它们维护的数据。
最后会将InternalThreadLocalMap本身从线程中移除。


五、总结


那么使用ThreadLocal时最佳实践又如何呢?

 每次使用完ThreadLocal实例,在线程运行结束之前的finally代码块中主动调用它的remove()方法,清除Entry中的数据,避免操作不当导致的内存泄漏。


使⽤Netty的FastThreadLocal一定比JDK原生的ThreadLocal更快吗?

不⼀定。当线程是FastThreadLocalThread,则添加、获取FastThreadLocal所维护数据的时间复杂度是 O(1),⽽使⽤ThreadLocal可能存在哈希冲突,相对来说使⽤FastThreadLocal更⾼效。但如果是普通线程则可能更慢。


使⽤FastThreadLocal有哪些优点?

正如文章开头介绍JDK原生ThreadLocal存在的缺点,FastThreadLocal全部优化了,它更⾼效、而且如果使⽤的是FastThreadLocal,它会在任务执⾏完成后主动调⽤removeAll⽅法清除数据,避免潜在的内存泄露。



END

猜你喜欢


本文分享自微信公众号 - vivo互联网技术(vivoVMIC)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

雷军:小米全新操作系统澎湃 OS 正式版已完成封包 国美 App 抽奖页面弹窗辱骂其创始人 美国政府限制向中国出口 NVIDIA H800 GPU 小米澎湃OS界面曝光 大神用 Scratch 手搓 RISC-V 模拟器,成功运行 Linux 内核 RustDesk 远程桌面 1.2.3 发布,增强 Wayland 支持 拔出罗技 USB 接收器后,Linux 内核竟然崩溃了 DHH 锐评“打包工具”:前端根本不需要构建 (No Build) JetBrains 推出 Writerside,创建技术文档的工具 Node.js 21 正式发布
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/vivotech/blog/10120430