Database Middleware points stringhash piece Algorithms

Foreword

It is a dark and stormy night high night, put on headphones to listen to a radio. Suddenly feeling very words: 生活就像心电图,一帆风顺就证明你挂了。just as we do the operation and maintenance, and feel very simple things, sometimes capable out 无限可能. Or get on it, this time for us to say stringhash partitioning algorithm.

1.hash partitioning algorithm
2.stringhash分区算法
partitioning algorithm 3.enum
4.numberrange partitioning algorithm
5.patternrange partitioning algorithm
6.date partitioning algorithm
7.jumpstringhash algorithm

Configuring StringHash district method

<tableRule name="rule_hashString">
    <rule>
        <columns>name</columns>
        <algorithm>func_hashString</algorithm>
    </rule>
</tableRule>

<function name="func_hashString" class="StringHash">
    <property name="partitionCount">3,2</property>
    <property name="partitionLength">3,4</property>
    <property name="hashSlice">0:3</property>
</function>

And before the same hash algorithm. TableRule and function need to be configured in rule.xml in.

  • tableRule tag, name is the name corresponding to the rule, the rule label the corresponding columns fragmentation field, which fields in the table must be consistent. algorithm represents the name's slicing function.
  • function name tag, name represents the name of slicing algorithm, algorithm to be above and tableRule Label, respectively. class: Specifies fragment algorithm classes. property specifies the parameters corresponding to fragmentation algorithm. Different algorithms different parameters.

1.partitionCount: Specifies the number of sections of the partition, in particular + C2 + ... + a C1 Cn
2.partitionLength: Specifies the length of each section, divided into a specific interval [0, L1), [L1 , 2L1), ..., [(C1-1) L1, C1L1) , [C1L1, C1L1 + L2), [C1L1 + L2, C1L1 + 2L2), ... where each interval corresponds to a data node.
3.hashSlice: Specifies the hash value calculation is involved in the key string. String index count from zero

Next, we explain in detail the working principle of StringHash. Let's take the above configuration.

1. At startup, two dot arrays do arithmetic, modulo the number obtained.

2. Two array cross product, derived physical partition table.

The hashSlice two-dimensional array, the slice string field is taken.

String interception range is hashSlice [0] to hashSlice [1]. For example, I set here 0,3. 'buddy' will be taken out of the string bud, similar to database substringfunctions.

4. The interception of the string do hash, hash this calculation method, I study a little dble source code. Source code is as follows:

 /**
  * String hash:s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] <br>
  * h = 31*h + s.charAt(i); => h = (h << 5) - h + s.charAt(i); <br>
  *
  * @param start hash for s.substring(start, end)
  * @param end   hash for s.substring(start, end)
  */
 public static long hash(String s, int start, int end) { 
     if (start < 0) {
         start = 0;
     }
     if (end > s.length()) {
         end = s.length();
     }
     long h = 0;
     for (int i = start; i < end; ++i) {
         h = (h << 5) - h + s.charAt(i);
     }
     return h;
 }

In fact, the meaning of this source code are explained above. Algorithm is S [0] 31 is ^ (. 1-n-) + S [. 1] 31 is ^ (n--2) + ... + S [n--. 1]. Next, it is described then h = 31 * h + s.charAt ( i) is equal to h = (h << 5) - h + s.charAt (i). It is not still foggy. You can see the end of the article detailed explanation on this point.

Here we break down what this formula, according to the above formula, we can derive the following arithmetic formula:
i=0 -> h = 31 * 0 + s.charAt(0)
i=1 -> h = 31 * (31 * 0 + s.charAt(0)) + s.charAt(1)
i=2 -> h = 31 * (31 * (31 * 0 + s.charAt(0)) + s.charAt(1)) + s.charAt(2)
i=3 -> h = 31 * (31 * (31 * (31 * 0 + s.charAt(0)) + s.charAt(1)) + s.charAt(2)) + s.charAt(3)
....... this interpolation

We assume that the string is a "buddy", we intercept 0-3 string, we'll work it out. According to the above function to write java code is compiled to run segment.

public class test {
        public static void main(String args[]) {
                String Str = new String("buddy");
                System.out.println(hash(Str,0,3));
        }

public static long hash(String s, int start, int end) {
     if (start < 0) {
         start = 0;
     }
     if (end > s.length()) {
         end = s.length();
     }
     long h = 0;
     for (int i = start; i < end; ++i) {
         h = (h << 5) - h + s.charAt(i);
     }
     return h;
   }
}

[root@mysql5 java]# javac test.java 
[root@mysql5 java]# java test
97905

Taken buddy string by running the program, the result obtained is 97905 0-3. Then the result is how come. First 0,3 interception, the interception of the final three strings bud. From zero count corresponds index is i = 2. The formula for i = 2:
i=2 -> h = 31 * (31 * (31 * 0 + s.charAt(0)) + s.charAt(1)) + s.charAt(2)
We can query table ascii
s.charAt (0), is calculated ASCII value "b" of the letter, as a decimal number 98
s.charAt (. 1), is considered "u" of the letter ASCII value, the decimal number of 117
s.charAt (. 1), is calculated ASCII value "d" of the alphabet, decimal digits 100

The three values ​​into the above equation to obtain 31 * (31 * (31 * 0 + 98) + 117) = 100 + 97,905. And the value of our program calculates exactly the same.

The calculated value of the modulo, and then falls into a specified partition.

97905 mod 17 = 2 The value of the modulo, falls dn1 partition, the partition is located dn1 (0,3) of.

6. Let us build the table to test, is not it falls on the first partition.


As shown, when we insertion name = 'buddy', then again query name = 'buddy' when routed directly to the first partition. And the results of our calculations before the same.

Precautions

  1. The partitioning algorithm and hash partitioning algorithm has the same restrictions (except Note 3)
  2. Partition string type field

postscript

今天介绍的stringhash和hash分区算法大致相同,只不过对于字符串需先计算出hash值。该算法有个经典的数字叫31。这个数字大有来头。《Effective Java》中的一段话说明了为什么要用31,因为31是一个奇质数,如果选择一个偶数的话,乘法溢出信息将丢失。因为乘2等于移位运算。使用质数的优势不太明显,但这是一个传统。31的一个很好的特性是乘法可以用移位和减法来代替以获得更好的性能:31*i==(i<<5)-i。现代的 Java 虚拟机可以自动的完成这个优化。

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

如果你前面没看懂前面那段java代码,现在应该明白(h << 5) - h的结果其实就等于31*h。
今天到这儿,后续将继续分享其他的算法。谢谢大家支持!

Guess you like

Origin www.cnblogs.com/buddy-yuan/p/12145085.html