Six common hash constructors

Introduction to related concepts

  • Hash:
    Hash, generally translated as hash, hash, or transliterated as hash, is to transform an input of any length (also called a hash algorithm into a fixed-length output, the output is the hash value. This conversion is A kind of compression mapping, that is, the space of the hash value is usually much smaller than the space of the input, different inputs may be hashed into the same output, so it is impossible to determine the unique input value from the hash value. Simply put, A function that compresses messages of any length into a fixed-length message digest.

  • Hash table : Hash table (Hash table, also called hash table) is a data structure that is directly accessed based on the key value. In other words, it accesses the record by mapping the key code value to a position in the table to speed up the search. This mapping function is called a hash function, and the array storing records is called a hash table.
  • Hash conflict:
    Since the data calculated by the hash algorithm is unlimited, and the range of the calculated result is limited, there will always be different data after the calculation of the same value, which is a hash conflict.

1. Direct addressing method

  • Introduction: Take a keyword or a linear function value of a keyword as a hash address, that is, H(key)=key or H(key)=a*key+b (a, b are constants).
  • Example: ['A','B','D','A','C','E','F','C'], find the number of occurrences of each character in the character array (in the array Only uppercase letters).
  • Analysis: We can know that the ASCLL code of'A'-'Z' is 65-90, then the hash function can be directly addressed by H(key)=key-'A' (corresponding to a=1,b in the definition) =-'A' means 65), so that for each key, its H (key) value can be regarded as an array subscript and placed in an int array of length 26. The statistical length
    assumes that the character array is a and the int array is b. That is, b[a[i]-'A']++ (i represents the subscript index of the a array).
  • The result
    b[0]=2 (representing A appears twice); b[1]=1 (representing B appears once), b[2]=2 (representing C appears twice)...

2. Digital analysis method

  • Introduction: Analyze the appearance frequency of the same digit (ones digit, tens digit, hundreds digit...) in a set of data. If the result of this digit is relatively concentrated, if it is used as the basis for constructing a hash address, it is easy to have a hash Conflict, on the contrary, if the result of this digit is relatively even, then it is not easy to have a hash conflict if it is used as the basis for constructing a hash address.
  • Example: A company recruited some interns whose birthdays were [19990104, 20000910, 20000315, 20001128, 20001014, 19990413, 19990920, 20000517], and they were hashed.
  • Analysis
    If you take 8 digits as the hash address, although it is difficult to have a hash conflict, the space waste is very large, so consider only a few of them as the hash address, which can reduce space waste and reduce the possibility of hash conflicts Observing the above 8 sets of data, the first 4 digits are concentrated in 1999 and 2000. If the first 4 digits are taken, it is easy to have hash conflicts, while the distribution of the last 4 digits is relatively scattered, and it is not easy to have hash conflicts. meets the.
  • The result is
    H(19990104)=104, H(20000910)=910, H(20000315)=315...

Three, folding method

  • Introduction: The folding method divides the keyword value from left to right into several parts with equal digits. The digits of each part should be the same as the digits of the hash table address. Only the digits of the last part can be shorter. Add these parts of the data (remove the carry) to get the hash address of the key value.
    There are two superposition methods:
    (1) Shift floding: align and add the last bit of each part.
    (2) Floding at the boudaries: Folding back and forth along the boundaries of each part, and then adding them.
  • Example: key=1234791, hash address is 2 bits
  • Analysis
    Divide the key into four parts
    : 12, 34, 79, 1 (1) Shift method: 12+34+79+1
    (2) Demarcation method: 12+43+79+1 (that is, the even number of addends and shift Law in reverse)
  • Result
    (1) Shift method: H(1234791)=35 (addition is 135, remove carry 1)
    (2) Demarcation method: H(1234791)=44 (addition is 144, remove carry 1)

Four, square taking method

  • Introduction: When it is impossible to determine which bits in the keyword are more uniform, you can find the square value of the keyword, and then take the middle bits of the square value as the hash address as needed. This is because the middle bits after squaring are related to each of the keywords, so different keywords will generate different hash addresses with higher probability.
  • Example: Keyword sequence: {3213,3113,3212,4312}.
  • Analysis:
    3213^2=10323369
    3113^2=9690769
    3212^2=10316944
    4312^2=18593344
    Take the middle 4 bits of the square value as the hash address (the square value of 3113 is filled with 0 in front of it to make 8 bits)
  • Result
    H(3213)=3233, H(3113)=6907, H(3212)=3169, H(4312)=5933

Five, the method of removing the remainder

  • Introduction: Take the remainder of the key divided by a number p not greater than the length m of the hash table as the hash address, that is, H(key)=key mod p(p<=m). This method can not only directly address the key The word modulus can also be taken after folding and squaring. Note: The choice of p is very important. Generally, the largest prime number or m that satisfies the condition (that is, p<=m) is selected. If the selection is not good, it is easy to cause a hash collision.
  • Example: Keyword sequence: {16,11,19,23,2,6,10}, the hash table length is 11.
  • Analysis
    The length of the hash table is just a prime number, so choose p=11 directly
  • Result:
    H (16) = 5, H (11) = 0, H (19) = 8, H (23) = 1, H (2) = 2, H (6) = 6, H (10) = 10

Six, random number method

  • Introduction: Choose a random function, take the keyword as the seed of the random function, and generate a random value as the hash address, that is, H(key)=random (key), where random is a random function, usually used in situations where the length of the key is different .
  • For example: key=123, the random function is continuous multiplication after each digit is squared.
  • Analysis
    The choice of random function is very important for the probability of hash collision, here is just a simple selection of a function
  • Result
    H(123)=36

Guess you like

Origin blog.csdn.net/weixin_44027397/article/details/113972606