---- graphic to explain the string of string sorting (LSD, MSD)

  This article around the core idea of ​​the sort of string, by way of illustration and example code analysis to explain the two classical string sorting method, the content is very detailed and complete code placed at the end of the article.

 

First, the key index notation

  In general sort, which must be constantly compared with the elements, and the string this stuff need not compare, there is another idea. In the key index counting method, can break the sorting algorithm NlongN run time limit, and it's time the level is linear! 

  

The introduction of the concept of the alphabet: 

  Character string which does not want to compare, we need to introduce the concept of the alphabet, such as the 'a' regarded as 1, 'b' regarded as 2, 'c' seen as 3, this continues, requires only 26 letters a length of the array 27 can be represented (without subscript 0), and they are ordered according to figures (corresponding to from a to z 1 26).

  So "abcdefg .." These characters are converted to an integer (using charAt () function), there is a corresponding natural sequence, so we just need to find an array of suitable size will be used to save each character of information It can be. Now we create count [] array size of 256, the index used to save time and corresponding frequency sorting characters appear.

  Index notation is divided into four steps, and an example will be described below. (Type character represented by R, r represents the order of the characters in R)

 

1, calculate the frequency:

for(int i=0;i<N;i++){//计算频率
       count[a[i].charAt(d)+1]++;
}

Through all strings, d is the d-th character of the string (in the example below are a single digit string).

What character appears, we will be corresponding count [r + 1] plus one (which is why the r + 1, see the next step you will know why).

 

 2, calculating an index: 

for ( int R & lt = 0; R & lt <R & lt; R & lt ++) { // converts the frequency index 
        COUNT [R & lt +. 1] + = COUNT [R & lt]; 
}

We need to be based on the calculated frequency: COUNT [R & lt +. 1] + COUNT = [R & lt]

It will always be a plus former post count array.

  

Examples :

The teacher organized a play, the students were divided into four groups, you need to do it is to sort students by group number (where R is 4, count the array size is R + 2, at index 0 no)

              FIG 1 calculates the occurrence frequency   

 

 

            FIG 2 is converted to the frequency starting index

 

The last two can be seen from FIG row, r is an index corresponding to 0, i.e., a set of ordered from zero.

r is 2 corresponding to index 1, i.e. the two groups from a start sorting.

And the third group index is 5, indicating from one to four full position of the second group.

 

3, data classification

for(int i=0;i<N;i++){//数据分类
        aux[count[a[i].charAt(d)]++]=a[i];
} 

Data classification we need an auxiliary array aux, used to temporarily store the sorted data.

The auxiliary data into a string array, when all have been formed into ordered.

 

 

4, write-back

for(int i=0;i<N;i++){//回写
       a[i]=aux[i];
}

 The contents of the auxiliary string array to move back to the herd.

到此为止键索引计数法就完成了,接下来利用它来实现LSD/MSD。

 

二、低位优先排序(LSD)

第位优先排序与高位优先排序的主要区别在于排序的方向,核心思想算法都是通过键索引计数法。低位优先算法是从字符串的右到左来排序(这可能会出现一些问题,在高位优先排序的介绍中将会提到)。

 

下图为一个地位优先排序的完整过程:

 利用索引计数法,从左到右对每一位进行索引计数,这就形成了第位优先排序。

 

for (int d=W-1;d>=0;d--){//从右到左对所有字符串的每位判断
      int count[]=new int[R+1];
      for(int i=0;i<N;i++){//计算频率
          count[a[i].charAt(d)+1]++;
      }
      for(int r=0;r<R;r++){//将频率转换为索引
          count[r+1]+=count[r];
      }
      for(int i=0;i<N;i++){//排序
          aux[count[a[i].charAt(d)]++]=a[i];
      }
      for(int i=0;i<N;i++){//回写
          a[i]=aux[i];
      }
}

 

 

三、高位优先排序(MSD)

在低位优先排序中,可能会出现一点问题。比如字符串“ab”与“ba”,长度为2需要进行两次排序,第一次排序结果为“ba”、“ab”,第二次排序结果为“ab”、“ba”,第一次排序的结果对第二次毫无意义,这就造成了时间上的浪费。

而在高位优先排序中,只会进行一次排序。结果为“ab”、“ba”。

 

不同之处:

在高位排序中又引入了分组的概念,即用首字母来切分下一个排序组。

 

在代码中我们使用递归的方式来不断切分排序组。

 1 public static void sort(String[] a,int lo,int hi,int d){
 2         if(lo>=hi){
 3             return;
 4         }
 5         int[] count=new int[R+2];
 6         for(int i=lo;i<=hi;i++){
 7             count[charAt(a[i],d)+2]++;
 8         }
 9         for(int r=0;r<R+1;r++){
10             count[r+1]+=count[r];
11         }
12         for(int i=0;i<=hi;i++){
13             aux[count[charAt(a[i],d)+1]++]=a[i];
14         }
15         for(int i=0;i<=hi;i++){
16             a[i]=aux[i];
17         }
18         for(int r=0;r<R;r++){
19             sort(a,lo+count[r],lo+count[r+1]-1,d+1);
20         }
21     }

 

 

上面这段代码非常简洁,但其中有一些地方是复杂的,请研究下面例子的调用过程确保你理解了算法。

                           图3 sort(a,0,9,0)的顶层调用 

 

在下一期带来另一种字符串排序方法,三向字符串快速排序,相比于这两种方法,将会有更广的受用面!

四、完整代码

 

 1 public class LSD {
 2     public static void sort(String[] a,int W){//W表示字符串的长度
 3         int N=a.length;
 4         int R=256;//依字符的种类数目而定
 5         String aux[]=new String[N];
 6         for (int d=W-1;d>=0;d--){//从右到左对所有字符串的每位判断
 7             int count[]=new int[R+1];
 8             for(int i=0;i<N;i++){//计算频率
 9                 count[a[i].charAt(d)+1]++;
10             }
11             for(int r=0;r<R;r++){//将频率转换为索引
12                 count[r+1]+=count[r];
13             }
14             for(int i=0;i<N;i++){//排序
15                 aux[count[a[i].charAt(d)]++]=a[i];
16             }
17             for(int i=0;i<N;i++){//回写
18                 a[i]=aux[i];
19             }
20         }
21     }
22 }

 

 1 public class MSD {
 2     private static int R=256;
 3     private static String[] aux;
 4 
 5     public static int charAt(String s,int d){
 6         if(d<s.length()){
 7             return s.charAt(d);
 8         }else{
 9             return -1;
10         }
11     }
12 
13     public static void sort(String[] a){
14         int N=a.length;
15         aux=new String[N];
16         sort(a,0,N-1,0);
17     }
18 
19     public static void sort(String[] a,int lo,int hi,int d){
20         if(lo>=hi){
21             return;
22         }
23         int[] count=new int[R+2];
24         for(int i=lo;i<=hi;i++){
25             count[charAt(a[i],d)+2]++;
26         }
27         for(int r=0;r<R+1;r++){
28             count[r+1]+=count[r];
29         }
30         for(int i=0;i<=hi;i++){
31             aux[count[charAt(a[i],d)+1]++]=a[i];
32         }
33         for(int i=0;i<=hi;i++){
34             a[i]=aux[i];
35         }
36         for(int r=0;r<R;r++){
37             sort(a,lo+count[r],lo+count[r+1]-1,d+1);
38         }
39     }
40 }

 

Guess you like

Origin www.cnblogs.com/Unicron/p/11531111.html