A simple implementation of Java HashMap and the main differences between HashMap, HashSet, HashTable, etc.

hash table

Hash tables, also known as hash tables, are a technique that can perform insertions, deletions, and lookups in constant time, but the ordering relationship between elements is often unsupported.

Generally speaking, hashing of strings is the most frequent. In Java, hashCode is a method under java.lang.Object, which is available to all Objects, and is declared as:

public native int hashCode();

There is also a hash method also declared as native:

public static native int identityHashCode(Object x);

Execute the following code snippet, the result will return true

	//ss is an instance of a class
	int i = ss.hashCode();
	int ii = System.identityHashCode(ss);
	System.out.println(i == ii);

Commonly used String hash source code:

		//The hash method of the String class
	    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return

Hash method for key in LinkedHashMap (not hashCode):

	//hash in LinkedHashMap
	    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

In a hash table, a good hash method can make full use of the allocated memory and is not prone to collisions.

Ways to resolve conflicts:

1. Separate links

Combine elements with the same hash value into a linked list and put them into an array, that is, the entire hash table is an array of linked lists

2. Open address

That is, elements with the same hash value are allowed to occupy the unoccupied array positions. As for the methods of allocating positions, there are three main methods:

2.1. Linear detection method: that is, to find the next free position next to each other

2.2. Square detection method: Assuming that the position i allocated by an element has been occupied, the element tries to occupy position i^2

2.3. Double hashing: that is, hash the occupied position i again (the secondary hashing method is generally different from the hashing method used for the first time), and get a new position

However, the above three methods each have different defects or performance problems. A common problem is that when faced with a large amount of data, repeated expansion may be required, and the expansion is often accompanied by the operation of re-hashing (reHash) all the elements to assign the location. Not a small performance overhead.

In addition, there are improved algorithms such as perfect hashing, cuckoo hashing, and hopscotch hashing.

Perfect hashing: by creating a secondary hash table of N^2 (N > 1) size for a non-empty bucket, and hashing the elements of the same secondary table with different hash functions

Cuckoo Hash: The basic idea is to create N hash tables, and use N hash functions to hash the elements to be inserted, so that the element can get N expected insertion positions in the N tables, and from the first A table starts to try to insert, if the expected position of the first table is already occupied, then 1. Replace the element originally in this position, and let the element change to another vacant position; 2. Continue to find the next table. Cuckoo hashing can be implemented in parallel, improving efficiency through multithreading

Hopscotch Hash: An improved version of linear detection. In linear detection, if there is a conflict, it will continue to try to find the next vacancy, while hopscotch hashing limits the number of attempts. When the number of attempts to find an empty bucket reaches the upper limit , go back and replace elements within a certain limit, and the replaced elements continue for a limited number of attempts. In this way, operations such as lookups can be completed in constant time.

E.g:

a b c d _ _ corresponds to 0-5 buckets

Let the limit of attempts be three times. Assuming that the hash value of f is the same as that of a, and its expected position is 0, but since 0 is already occupied, two more attempts have to be made until it is found that the position 2 where c is located is also occupied, then Instead of trying again, choose to replace b and let b try three more times:

The result is:

a f c d b _

In this way, it can be guaranteed that the search operation can be completed in constant time.

If the separation link method is implemented by a singly linked list (a simple implementation written by myself will be given at the end of the article) , the performance of all basic operations will generally decline when faced with a large amount of data, but this problem can be optimized. There are three main methods:

When initializing, select the appropriate initialized array length according to the amount of data

Choose to expand (reHash) when the amount of data slowly increases, and re-hash the location at the same time (for unpredictable increases in the amount of data)

Elements with the same hash value are stored in a double-linked list (LinkedList is a double-linked list) model

The main differences between HashMap, HashSet, LinkedHashMap, HashTable and ConcurrentHashMap in Java

HashMap：

Implemented using the separation chaining method, it is not thread-safe, thread competition will occur when reHash is performed, and key and Value can be null

HashSet：

First look at a source code:

	//hashset source code
	private transient HashMap<E,Object> map;

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }

HashSet is actually implemented based on HashMap, with high access speed, unsafe thread, and nullable

LinkedHashMap：

The double-linked list structure is used to maintain the structure of the insertion order. Basically, the double-linked list is used to maintain the insertion sequence on the basis of HashMap. The thread is unsafe, nullable, and orderly comparison example:

	//hashmap and linkedhashmap order comparison
		HashMap<String,Integer> s = new HashMap<String,Integer>();
		for(int i=0 ;i<10 ;i++) {
			s.put("aaa"+i, i);
		}
		System.out.println(s);
		LinkedHashMap<String,Integer> ss = new LinkedHashMap<String,Integer>();
		for(int i=0 ;i<10 ;i++) {
			ss.put("aaa"+i, i);
		}
		System.out.println(ss);

result:

HashTable：

The thread-safe version of HashMap locks the entire table, and the key and value cannot be null

ConcurrentHashMap：

Thread safety, the implementation of segment lock synchronization, the security is not as good as HashTable, but the performance and scalability are better than hashTable, not empty

A simple implementation based on the separated link model HashMap :

package com.ryo.structure.hash;

/**
 * <p>Hash table of separate linked models
 * <p>The implementation here is to form a singly linked list of elements hashed to the same position<br>The default length is 103,
 * But you can also create a custom size hash table by passing an integer greater than 103 to the constructor,
 * Passing in an integer less than 103 will create a table of size 103 by default<br>Entry is not implemented
 * <br>The function of eliminating the load factor expansion is not implemented, and the initialization of variable length is used to solve this problem
 * @author shiin
 * @param <K>	key
 * @param <V>	value
 */
@SuppressWarnings("unchecked")
public class SCHashMap<K ,V> implements HashMap<K ,V>{
	private static final int DEFAULT_CAPACITY = 103;
	
	private HashNode<K ,V>[] map;
	private int size;
	private int currentCapacity;
	
	public SCHashMap() {
		this(DEFAULT_CAPACITY);
	}
	
	public SCHashMap(int capacity) {
		if(capacity < DEFAULT_CAPACITY)
			capacity = DEFAULT_CAPACITY;
		map = new HashNode[capacity];
		size = 0;
		currentCapacity = capacity;
	}

	@Override
	public boolean contains(K key) {
		HashNode<K ,V> node = map[hash(key)];
		while(node != null) {
			if(node.key.equals(key)) {
				return true;
			}
			node = node.next;
		}
		return false;
	}

	@Override
	public V get(K key) {
		HashNode<K ,V> node = map[hash(key)];
		while(node != null) {
			if(node.key.equals(key)) {
				return node.value;
			}
			node = node.next;
		}
		return null;
	}

	@Override
	public int put(K key, V value) {
		int index = hash(key);
		if(map[index] == null) {
			map[index] = new HashNode<K ,V>(key ,value);
		}
		else {
			HashNode<K ,V> node = map[index];
			HashNode<K ,V> prev = null;
			while(node != null) {
				if(node.key.equals(key)) {	
					node.value = value;
					return 1;
				}
				prev = node;
				node = node.next;
			}
			node = new HashNode<K ,V>(key ,value);
			prev.next = node;
		}
		size++;
		return 1;
	}

	@Override
	public int remove(K key) {
		HashNode<K ,V> node = map[hash(key)];
		HashNode<K ,V> prev = null;
		while(node != null) {
			if(node.key.equals(key)) {
				if(prev == null)
					map[hash(key)] = node.next;
				else
					prev.next = node.next;
				node = null;//GC
				size--;
				return 0;
			}
			prev = node;
			node = node.next;
		}
		return 0;
	}

	@Override
	public int size() {
		return this.size;
	}

	@Override
	public void clearAll() {
		map = new HashNode[currentCapacity];
		size = 0;
	}

	@Override
	public boolean isEmpty() {
		return size == 0;
	}
	
	/**
	 * Hash method for key
	 * @param key key value
	 * @return hashed storage location
	 */
	private int hash(K key) {
		if(key == null)
			return 0;
		int hash = key.hashCode() % currentCapacity;
		if(hash < 0)
			hash += currentCapacity;
		return hash;
	}
	
	private static class HashNode<K ,V>{
		K key;
		V value;
		HashNode<K ,V> next;
		
		HashNode(K key ,V value ,HashNode<K ,V> next){
			this.key = key;
			this.value = value;
			this.next = next;
		}
		
		HashNode(K key,V value){
			this(key ,value ,null);
		}
	}

}

A simple implementation of Java HashMap and the main differences between HashMap, HashSet, HashTable, etc.

Guess you like