Hash Table (HashTable)

1. Related concepts

  • Hash function: the storage location of the keyword can be calculated through this function;
  • Hash conflict: the storage locations calculated by the hash function for different keywords are consistent;

2. Hash function selection (commonly used)

method introduce
direct order Take a linear function of the key to calculate the hash address: Hash(key)=A*key+B; simple and uniform, but you need to know the distribution of the key in advance, suitable for small and continuous search
divisor remainder method Assuming that the number of addresses allowed in the hash table is m, take a prime number p that is not greater than m but closest to or equal to m as the divisor, and according to the hash function: Hash(key) = key% p(p<=m), the The key is converted into a hash address

3. Hash collision (commonly used)

Open hash method/hash bucket method/chain address method:

Implemented in the form of an array + linked list :

  • Perform hash mapping on keywords, calculate the hash address of each keyword, and all keywords with the same hash address are in the same "bucket";
  • All elements in the "bucket" are connected in the form of a single linked list, and the head node of the linked list is stored in the hash table;
  • The "buckets" store the elements with hash collisions;

If the hash collision is serious, the size of the "bucket" will be very large, which will affect the search efficiency, so it can be optimized:

  • implement the "bucket" using another hash table or implement the "bucket" using a search tree;

4. Set interface and implementation class

  • The Set interface implements the Collection interface, which is used to store unordered and non-repeated data;
  • HashSet, LinkedHashSet, and TreeSet are the implementation classes of the Set interface;
  • LinkedHashSet is a subclass of HashSet, so that when traversing the set, elements can be read in the order of insertion;
  • HashSet thread is not safe, it is the main implementation class of Set interface, and it can store the key of null value;
  • TreeSet is not thread-safe, and the underlying implementation is a red-black tree, that is, a balanced search tree. Elements must be of the same type of data, which can be compared, so TreeSet can be customized for sorting;
    insert image description here

4.0 Common methods

insert image description here

4.1 HashSet

  • Disorder: refers to the storage of data not according to the index order of the underlying array, but by hash value;
  • Non-repeatability: When adding an element, it will compare the equals method and the hashCode method of the corresponding type of element to check whether it is repeated. If the equals method returns true, it means that the element is repeated;
  • Thread is not safe, it is the main implementation class of the Set interface, which can store the key of null value;
  • Element insertion operation:
    1) First, calculate the hash value hashVal of element x through the hashCode method of the corresponding class of element x;
    2) Calculate the hash address hashIndex of element x in the hash table through a certain algorithm with the help of hashVal:
    Check whether hashIndex is There are already elements: (1) If there is no element, directly store x in this position; (2) If there is an element, it will be compared with each element in this hash bucket in turn. During the traversal comparison process, if a If the existing element has the same hash value as x and the equals method returns true, it means that there is a duplicate element, and the addition fails; otherwise, if the hash value is the same but the equals method returns false or the hash value is different, continue traversing the singly linked list until the end of the chain It means that there are no duplicate elements and the addition is successful;
  • Requirements: 1) The class of the element to be added must rewrite the hashCode method and the equals method; 2) The rewritten hashCode method and the equals method ensure consistency as much as possible: equal objects must have the same hash code;
    insert image description here
    insert image description here

4.2 LinkedHashSet

  • As a subclass of HashSet, it is enhanced on the basis of HashSet, and all added elements are connected in the form of a linked list in the order of addition, which is convenient for frequent traversal operations;
  • A key that can store null values;
  • When each element is stored, two references are maintained at the same time, one for the predecessor node and one for the successor node;

4.3 TreeSet

  • TreeSet thread is not safe, the underlying implementation is a red-black tree, that is, a balanced search tree, elements must be of the same type of data, and can be compared between them, so TreeSet can perform natural sorting (java.lang.Comparable interface) or custom sorting (java.lang.Comparable interface) .util.Comparator interface);
    insert image description here

  • Under natural sorting, when adding elements to the TreeSet, use the compareTo() method in the class where the element is located to compare the size. If the return value is 0, it means that the contents of the two elements are the same;

  • Under custom sorting, when adding elements to the TreeSet, use the compare() method in the Comparator passed in when the TreeSet object is instantiated for size comparison. If the return value is 0, it means that the contents of the two elements are the same;

4.4 Examples

insert image description here
Source: Shang Silicon Valley

Guess you like

Origin blog.csdn.net/qq_43665602/article/details/130144974