ThreadLocalMap analysis of open addressing

What is a hash

Hash table (hash table) we usually call it a hash table or Hash table, it uses an array of support features for accessing data in accordance with the subscript random, so the hash table is actually an extension of the array, the array evolved. It can be said, no array no hash table.

For example, we have 100 items, 4-digit number is not the law, and now we want quick access to product information through the numbers, how to do it? We can these 100 product information in an array, by 100% of such product number way to get a value, the value 1 into commodity array subscript position 1, the value of the goods 2, we put array subscript 2 position. And so on, into the number K players array index position K is. Because the number of goods through a hash function (No. 100%) with the index data correspond, so we need to query the number x of product information, we use the same way, the number is converted to an array index, you can data taken from the array corresponding to the target position. This is a typical thought hash.

We can come to rule by example above: the hash table is used to support an array of random access according to the index, the time complexity is O (1) features. Through a hash function (Product Number 100%) mapped to the key element index, and then in an array corresponding to the target position in the data store. When we query elements in accordance with the key, we use the same hash function, the key conversion array subscript, take data from the array corresponding to the target position.

OpenAddressed

When it comes to hash (or called hash table), we are more familiar with the HashMap or LinkedHashMap, and today's protagonist is ThreadLocalMap, it is an internal class of ThreadLocal. When analyzing source code ThreadLocal not bypass it.

Since the hash table uses an array, hash function regardless of how the design would inevitably exist hash conflict. The above example if id two commodities are 1001 and 1101, their data will be placed in the same location will be the array, there is a conflict

Pigeonhole principle, and the principle of 名狄利克雷 drawer, pigeonhole principle. One simple method is expressed: If n and n + 1 cages pigeons all pigeons in pigeon holes are shut in, then at least one of at least two cage pigeons

ThreadLocalMap as an implementation hash table, which is what way to resolve the conflict? It uses open addressing method to solve this problem.

Element insertion

The core OpenAddressed law is that if a hash collision occurs, it re-detect an idle position, insert it. When we insert data into the hash table, if a data after a hash function hash, storage location has been occupied, we started from the current position, turn back look to see if there is an idle position, until you find .

As can be seen from the figure, the size of the hash table is 10, before inserting the element x hash table has 6 elements inserted into the hash table. x After Hash algorithms are hashed to the position index is 7, respectively, but the location has data, so conflict arises. So we back sequentially one by one to find, to see if there is no idle position, traversing to the end of all free location is not found, then we'll start looking from the header, until you find the idle position 2, then insert it into this position .

Find elements

Find the elements in the hash table process somewhat similar to the insertion process. We're looking to obtain a hash value corresponding to the key elements by a hash function, and then compares the array index for the hash value of the element and the element you want to find. If they are equal, then we are looking for elements; otherwise the order back and then click Find. If traversed array idle position, has not been found, it shows the element you are looking for is not in the hash table.

Delete element

ThreadLocalMap with arrays, not only supports insert, the search operation, also supports the delete operation. For the hash table using linear probing method of conflict resolution, delete somewhat special. We can not simply put elements to be deleted is set to null.

Remember lookup operation we just talk about it? When looking for, once we pass linear detection method, to find a free position, we can assume that there is no data in the hash table. However, if the idle position is that we later removed, it will lead to failure of the original search algorithm. Data as it exists, will be identified as non-existent. How to solve this problem it?

After that we can remove elements, will then not null data rehash, so it will not affect the logical query

Another method is: You can delete elements, specially marked as deleted. When linear probing to find, encounter space marked as deleted, not stop, but continued to probe down

rehash

Here rehash explain the procedure: time when deleting elements 8, the first subscript is the value of 8 is null, then no later rehash empty array elements. 8 is behind such elements 9, which was supposed position (9 = 10% 9) so that it does not move 9. The next element is 19, it should be at an index position 9, it has been occupied, so look for the next free location at index position 3 is free, into tab [3]. Then the next element in Tab 1 [1] is not moved, the position of the element 7 Tab [7], as is already occupied, the next free position into tab [8]. The next element 19 remains, since this tab [9] has been occupied, the next free position into tab [0]. Followed by the last element in the 4 position of the tab [4], it is not moving. The next position of the element 4 is empty, the end of the whole process rehash.

Load factor

You may have found, in fact, there is a big problem linear detection method. When inserted into the hash table of data rises, the possibility of hash conflict will be increasing, idle position will be less and less time linear probe will be getting longer. In extreme cases, we may need to probe the entire hash table, so the time complexity in the worst case is O (n). Similarly, when deleting and searching, there may be linear probing the entire hash table, or you are looking to find deleted data.

Regardless of the detection method uses, hash functions designed to be reasonable, when much of the idle position when the hash table, hash collision probability will be greatly improved. In order to maintain the highest operating efficiency of the hash table, under normal circumstances, we will try to ensure that the hash table in a certain percentage of idle slot. We load factor (load factor) to represent the number of vacancies.

Loading factors are calculated: the hash table is filled the length of the load factor = the number of elements in the table / hash table loading factor greater, indicating the less rest position, the more conflict, the hash table will decrease performance.

Source code analysis

ThreadLocalMap defined

Is the core data structure ThreadLocal ThreadLocalMap, its data structure is as follows:

static class ThreadLocalMap {
  
  // 这里的entry继承WeakReference了
  static class Entry extends WeakReference<ThreadLocal<?>> {
      Object value;
      Entry(ThreadLocal<?> k, Object v) {
          super(k);
          value = v;
      }
  }

  // 初始化容量,必须是2的n次方
  private static final int INITIAL_CAPACITY = 16;

  // entry数组,用于存储数据
  private Entry[] table;

  // map的容量
  private int size = 0;

  // 数据量达到多少进行扩容,默认是 table.length * 2 / 3
  private int threshold;
复制代码

ThreadLocalMap can be seen from the definition of the Entry key is ThreadLocal, and value is the value. Meanwhile, Entry inherited WeakReference, so that the key (ThreadLocal example) is referred to as a weak reference corresponding Entry. And it defines the load factor of two-thirds the length of the array.

set () method

private void set(ThreadLocal<?> key, Object value) {

  Entry[] tab = table;
  int len = tab.length;
  int i = key.threadLocalHashCode & (len-1);

  // 采用线性探测,寻找合适的插入位置
  for (Entry e = tab[i]; e != null; e = tab[i = nextIndex(i, len)]) {
      ThreadLocal<?> k = e.get();
      // key存在则直接覆盖
      if (k == key) {
          e.value = value;
          return;
      }
      // key不存在,说明之前的ThreadLocal对象被回收了
      if (k == null) {
          replaceStaleEntry(key, value, i);
          return;
      }
  }

  // 不存在也没有旧元素,就创建一个
  tab[i] = new Entry(key, value);
  int sz = ++size;
  // 清除旧的槽(entry不为空,但是ThreadLocal为空),并且当数组中元素大于阈值就rehash
  if (!cleanSomeSlots(i, sz) && sz >= threshold)
    expungeStaleEntries();
    // 扩容
    if (size >= threshold - threshold / 4)
      resize();
}

复制代码

The main source of the above steps as follows:

  1. Linear detection method, to find a suitable insertion position. First determines whether there is a key, direct coverage exists. If the key does not exist to prove the elements of the old garbage collected at this time needs to be replaced with a new element
  2. The corresponding element does not exist, we need to create a new element
  3. Clear entry is not empty, but ThreadLocal (entry of the key was recovered) of elements, to prevent memory leaks
  4. If the condition: size> = threshold - threshold / 4 will expand twice the previous array, and recalculates the position in which the elements of the array and moving (rehash). Such as the beginning of the initial size of the array 16, when the size> = (16 * 2/3 = 10) - (10/4) = 8, when will the expansion, the expansion of the size of the array to 32.

Whether replaceStaleEntry () method or cleanSomeSlots () method, the most important method call is expungeStaleEntry (), you can ThreadLocalMap the get, set, remove calls can be found in its presence.

private int expungeStaleEntry(int staleSlot) {
  Entry[] tab = table;
  int len = tab.length;

  // 删除对应位置的entry
  tab[staleSlot].value = null;
  tab[staleSlot] = null;
  size--;

  Entry e;
  int i;

  // rehash过程,直到entry为null
  for (i = nextIndex(staleSlot, len);(e = tab[i]) != null; i = nextIndex(i, len)) {
    ThreadLocal<?> k = e.get();
    // k为空,证明已经被垃圾回收了
    if (k == null) {
        e.value = null;
        tab[i] = null;
        size--;
    } else {
        int h = k.threadLocalHashCode & (len - 1);
        // 判断当前元素是否处于"真正"应该待的位置
        if (h != i) {
            tab[i] = null;
            // 线性探测
            while (tab[h] != null)
                h = nextIndex(h, len);
            tab[h] = e;
        }
    }
  }
  return i;
}
复制代码

The above description of the code in combination rehash beginning of the article is easy to understand, when the ThreadLocalMap add, get, delete rehash is going to be carried out according to the conditions, the following conditions

  1. ThreadLocal object is recovered, in this case Entry key is null, value is not null. Then will trigger rehash
  2. When the threshold is two-thirds the capacity of ThreadLocalMap

get () method

private Entry getEntry(ThreadLocal<?> key) {
  int i = key.threadLocalHashCode & (table.length - 1);
  Entry e = table[i];
  // 现在数据中进行查找
  if (e != null && e.get() == key)
      return e;
  else
      return getEntryAfterMiss(key, i, e);
}

// 采用线性探测找到对应元素
private Entry getEntryAfterMiss(ThreadLocal<?> key, int i, Entry e) {
  Entry[] tab = table;
  int len = tab.length;

  while (e != null) {
      ThreadLocal<?> k = e.get();
      // 找到元素
      if (k == key)
          return e;
      // ThreadLocal为空,需要删除过期元素,同时进行rehash
      if (k == null)
          expungeStaleEntry(i);
      else
          i = nextIndex(i, len);
      e = tab[i];
  }
  return null;
}

复制代码

Linear detection method throughout get, set all the processes, understand the principles look at the code is very simple.

remove () method

private void remove(ThreadLocal<?> key) {
   Entry[] tab = table;
   int len = tab.length;
   int i = key.threadLocalHashCode & (len-1);
   for (Entry e = tab[i];
        e != null;
        e = tab[i = nextIndex(i, len)]) {
       if (e.get() == key) {
           e.clear();
           expungeStaleEntry(i);
           return;
       }
   }
}

复制代码

remove the time back to delete the old entry, and then rehash.

Use of ThreadLocal

public class Counter {

  private static ThreadLocal<Integer> seqCount = new ThreadLocal<Integer>(){
    public Integer initialValue() {
        return 0;
    }
  };

  public int nextInt(){
    seqCount.set(seqCount.get() + 1);

    return seqCount.get();
  }
  public static void main(String[] args){
    Counter seqCount = new Counter();

    CountThread thread1 = new CountThread(seqCount);
    CountThread thread2 = new CountThread(seqCount);
    CountThread thread3 = new CountThread(seqCount);
    CountThread thread4 = new CountThread(seqCount);

    thread1.start();
    thread2.start();
    thread3.start();
    thread4.start();
  }

  private static class CountThread extends Thread{
    private Counter counter;

    CountThread(Counter counter){
        this.counter = counter;
    }

    @Override
    public void run() {
        for(int i = 0 ; i < 3 ; i++){
            System.out.println(Thread.currentThread().getName() + " seqCount :" + counter.nextInt());
        }
    }
  }
}


复制代码

Operating results are as follows:

Thread-3 seqCount :1
Thread-0 seqCount :1
Thread-3 seqCount :2
Thread-0 seqCount :2
Thread-0 seqCount :3
Thread-2 seqCount :1
Thread-2 seqCount :2
Thread-1 seqCount :1
Thread-3 seqCount :3
Thread-1 seqCount :2
Thread-1 seqCount :3
Thread-2 seqCount :3
复制代码

ThreadLocal for each thread is actually provided a copy of variables, enabling simultaneous access is not affected. From here also see different scenarios between it and synchronized, synchronized in order to allow each thread changes to the variable are visible to other threads, and ThreadLocal to the thread object data without affecting other threads, it is most suitable the scene should be the shared data at different levels of development of the same thread.

Guess you like

Origin juejin.im/post/5d43e415e51d4561db5e39ed