The beauty of data structures and algorithms (linear sorting)

1. Introduction to Linear Sorting Algorithm

  • Linear sorting algorithms include bucket sorting, counting sorting, and radix sorting .
  • The time complexity of the linear sorting algorithm is O(n) .
  • These three sorting algorithms do not involve comparison operations between elements, and are non-comparison-based sorting algorithms.
  • The requirements for sorting data are very strict, and it is important to master the applicable scenarios of these three sorting algorithms.

2. Bucket sort

1. Algorithm principle:

The core idea is to divide the data to be sorted into several ordered buckets, and then sort the data in each bucket separately.
After the buckets are sorted, the data in each bucket is taken out in order, and the formed sequence is ordered.

2. Conditions of use

First, the data to be sorted needs to be easily divided into m buckets, and there is a natural size order between buckets . In this way, after the data in each bucket is sorted, the data between the buckets does not need to be sorted.

Second, the distribution of data among buckets is relatively uniform . If the data is divided into buckets, some buckets contain a lot of data, and some buckets have very little data, which is very uneven, and the time complexity of sorting the data in the buckets is not constant. In extreme cases, if the data is divided into a bucket, it degenerates into an O(nlogn) sorting algorithm

3. Applicable scenarios

Bucket sort is more suitable for external sorting .
The so-called external sorting means that the data is stored in the external disk, the amount of data is relatively large, and the memory is limited, so it is impossible to load all the data into the memory.

4. Application case

1) Requirement description:
There is 10GB of order data, which needs to be sorted by order amount (assuming that the amount is a positive integer),
but the memory is limited, only a few hundred MB

2) Solution:
Scan the file once to see the data range of the order amount, such as 1-100,000 yuan, then divide it into 100 buckets.
The first bucket stores orders within 1-1,000 yuan, the second bucket stores orders within 1,001-2,000 yuan, and so on.
Each bucket corresponds to a file and is numbered and named according to the size order of the amount range (00, 01, 02, ..., 99).
Put 100 small files into memory one by one and sort them with quicksort.
After all files are sorted, just read each small file according to the file number from small to large and write it to the large file.

3) Note: If a single file cannot be fully loaded into the memory, continue to process the file according to the previous ideas.

3. Counting sort

insert image description here

Algorithm principle

1) Counting is actually a special case of bucket sort.
2) When the range of the n data to be sorted is not large, for example, the maximum value is k, it is divided into k buckets
3) The data values ​​in each bucket are the same, so the sorting in the bucket is omitted. time.

case analysis:

Suppose there are only 8 candidates with scores between 0-5, and the scores are stored in the array A[8] = [2, 5, 3, 0, 2, 3, 0, 3].
Use an array C[6] of size 6 to represent the bucket, and the subscript corresponds to the score, ie 0, 1, 2, 3, 4, 5.
C[6] stores the number of candidates, and you can get C[6] = [2, 0, 2, 3, 0, 1] just by traversing the scores of the candidates.
Sequentially summing the C[6] array, then C[6]=[2, 2, 4, 7, 7, 8], c[k] stores the number of candidates whose score is less than or equal to k.
The array R[8] = [0, 0, 2, 2, 3, 3, 3, 5] stores the candidates' rankings. So how to get R[8]?
insert image description here

Scan array A from back to front. For example, when 3 is scanned, the value 7 with subscript 3 can be taken out from array C. That is to say, so far, including yourself, there are 7 candidates with scores less than or equal to 3 , that is to say, 3 is the seventh element of the array R (that is, the position of the subscript 6 in the array R). When 3 is put into the array R, there are 6 elements less than or equal to 3 left, and the corresponding C[3] needs to be reduced by 1 to become 6.
By analogy, when the second candidate with a score of 3 is scanned, it will be placed in the position of the 6th element in the array R (that is, the position with the subscript 5). When the array A is scanned, the data in the array R is arranged according to the scores from small to large.

// 计数排序,a是数组,n是数组大小。假设数组中存储的都是非负整数。
public void countingSort(int[] a, int n) {
    
    
  if (n <= 1) return;

  // 查找数组中数据的范围
  int max = a[0];
  for (int i = 1; i < n; ++i) {
    
    
    if (max < a[i]) {
    
    
      max = a[i];
    }
  }

  int[] c = new int[max + 1]; // 申请一个计数数组c,下标大小[0,max]
  for (int i = 0; i <= max; ++i) {
    
    
    c[i] = 0;
  }

  // 计算每个元素的个数,放入c中
  for (int i = 0; i < n; ++i) {
    
    
    c[a[i]]++;
  }

  // 依次累加
  for (int i = 1; i <= max; ++i) {
    
    
    c[i] = c[i-1] + c[i];
  }

  // 临时数组r,存储排序之后的结果
  int[] r = new int[n];
  // 计算排序的关键步骤,有点难理解
  for (int i = n - 1; i >= 0; --i) {
    
    
    int index = c[a[i]]-1;
    r[index] = a[i];
    c[a[i]]--;
  }

  // 将结果拷贝给a数组
  for (int i = 0; i < n; ++i) {
    
    
    a[i] = r[i];
  }
}

Conditions of Use

1) It can only be used in scenarios with a small data range. If the data range k is much larger than the data n to be sorted, it is not suitable to use counting sorting;
2) Counting sorting can only sort non-negative integers, and other types need to be sorted in Convert to a non-negative integer without changing the relative size;

If your test scores are accurate to one decimal place, you will need to multiply all scores by 10 to convert them to whole numbers.

4. Radix sort

insert image description here

Algorithm principle (take sorting 100,000 mobile phone numbers as an example to illustrate)

1) Compare the sizes of the two mobile phone numbers a and b. If a is already larger than b in the first few digits, then the latter digits do not need to be read.
2) With the help of the idea of ​​stable sorting algorithm, the mobile phone numbers can be sorted according to the last digit, and then reordered according to the penultimate digit, and so on, and finally reordered according to the first digit.
3) After 11 sortings, the mobile phone numbers are sorted.
4) The sorted data range is small each time, which can be done by using bucket sorting or counting sorting.

Conditions of Use

1) It is required that the data can be divided into independent "bits" for comparison;
2) There is a progressive relationship between the bits. If the high-order bit of the data a is larger than the data of b, then the remaining positions do not need to be compared;
3) Each bit of The data range cannot be too large, and linear sorting can be used, otherwise the time complexity of radix sorting cannot be O(n).

Guess you like

Origin blog.csdn.net/qq_54729417/article/details/123449254