String hash principle

Source string hash thoughts (personal guess):

  For a series of strings, if we compare whether they need to repeat the same words, obviously it would be a O (N ^ 3) algorithm close to the time complexity is too high. So we expect a better algorithm.

 

  To look to focus our decimal digits, that is, we usually use the above figures. It is easy to find the time complexity of comparing two numbers is O (1) is. The digital nature but also a string, which means that the comparative figures of us a revelation.

  For example, if a numeric string of said given configuration, we just need to be converted into a numeric string, can compare the two strings are the same by O (1) time. Of course, this argument is wrong, when the length of the string of Numbers is very large, even if there is a method that can be stored numbers, the cost of comparing two numbers are the same as surely as it takes longer strings and increases.

  Back to the top string. Next, assume that only discuss the string contains only lowercase letters, then a string can actually be seen as a 26-digit hexadecimal (hex fact, as long as large equal to 26 can get very good theoretical results) . Of course, as we said earlier, this "number" could not be saved. But consider the small scale of the problem we usually deal with, the number of strings could probably also between 1e3 to 1e5, that is to say if the use of a number represents a string, then even the int type integer to store these numbers We are more than enough. As a fact, we could not save a full 26 hexadecimal string corresponding. But if we can make the end of this string of eight hardly repeat (in fact, is generally not repeated in a congruence system, the end of the eight is a special case), you can play a number represents a unique string effect, and can be done if the same comparison longer string within a pair of O (1) time.

  The above ideas are reflected in the string comparison approach is the string hash . A corresponding number string called the hash value of the string. Obviously, as mentioned above, there are two possible strings corresponding to the same hash value occurs. To avoid such a situation, they tend to take the same modulo remainder system into a large prime number, and a hash value of the amplified range to long long, even bis hash (i.e., a string in two hash values obtained with different parameters, both when the two hash values identical strings that only the same two strings) this approach.

 

Algorithm Scope:

  known hash suitable set of strings (strings need not be fully determined, and if it is changed within a determined range, of course, can also be considered known string) , whether the comparison string requires a lot of the same. Pretreated by linear time, it can be determined whether any two strings are the same with O (1) time.

 

Algorithm:

  Personal usual practice, hash value is defined as a long long, taken modulo 1e15 + 7, 29 as hexadecimal string (prime number the better).

 

A code is given (default string only lowercase letters):

 1 typedef long long LL;
 2 const int maxn=10005;
 3 const LL mo=1e15+7;
 4 
 5 LL gethash(char *s){
 6     int n=strlen(s); LL tmp=1,re=0;
 7     for(int i=0;i<n;i++){
 8         re=(re+tmp*(s[i]-'a'))%mo;
 9         tmp=tmp*29%mo;
10     }
11     return re;
12 }
View Code

 

Guess you like

Origin www.cnblogs.com/Golden-Elf/p/11999866.html