[Data Structure] This article will give you a comprehensive understanding of sorting (Part 1) - direct insertion sorting, Hill sorting, selection sorting, heap sorting

Table of contents

1. The concept of sorting and its application

1.1 The concept of sorting

1.2 Common algorithm sorting

Second, the implementation of common sorting algorithms

2.1 Insertion sort

2.1.1 Ideas

2.1.2 Direct insertion sort

2.1.3 Hill sort (shrink incremental sort) 

2.2 Selection sort

2.2.1 Basic idea

2.2.2 Direct Selection Sort

2.2.3 Heap sort 


Efforts without persistence are essentially meaningless!


1. The concept of sorting and its application

1.1 The concept of sorting

Sorting : The so-called sorting is the operation of arranging a string of records in increasing or decreasing order according to the size of one or some of the keywords.
Stability : Assume that there are multiple records with the same keyword in the sequence of records to be sorted. If sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i]=r[j ] , and r[i] is before r[j] , and in the sorted sequence, r[i] is still before r[j] , the sorting algorithm is said to be stable; otherwise it is called unstable.

 Stable: direct insertion sort, bubble sort, merge sort

Unstable: Hill Sort, Selection Sort, Heap Sort, Quick Sort

Internal sorting : A sorting in which all data elements are placed in memory.
External sorting : Too many data elements cannot be placed in memory at the same time, and the sorting of data cannot be moved between internal and external memory according to the requirements of the sorting process.

1G = 1024MB 1024MB = 1024*1024KB 1024*1024KB = 1024 * 1024 * 1024byte (1 billion)

(1) Direct insertion sorting, Hill sorting, direct selection sorting, bucket sorting, bubble sorting, quick sorting, and merge sorting are all internal sorting. Merge sort can be sorted externally 

(2) Internal sorting: data is in memory, fast, and subscripts can be accessed randomly (because it is an array); external sorting: data is on disk, slow, serial access (file), and a large amount of data

 Given a file with 1 billion (4G memory) integers, but only 1G of running memory, please sort the 1 billion data in the file?   

Idea : The amount of data cannot be loaded into the memory, so find a way to control the merging of two ordered files into a larger ordered file

Idea : first divide into 4 equal parts, read memory sorting (merge sorting cannot be used here, because extra space is needed), after sorting, write back to the small disk file, and then merge it into the disk

1.2 Common algorithm sorting

 Bubble Sort<Direct Insertion Sort<Heap Sort<Hill Sort<[The bigger the better]

Second, the implementation of common sorting algorithms

2.1 Insertion sort

2.1.1 Ideas

Insertion sort is a simple insertion sort method.

The basic idea of ​​insertion sort: There is an ordered interval, insert a data, and keep it in order. 【Comparing the inserted data and the interval data from back to front, if it is in front of the interval data, move the interval data backward by one bit, if it is behind the data, insert it directly into that position. This way there is no problem of overwriting data.

2.1.2 Direct insertion sort

However, how to sort an unordered range. [The first data can be regarded as an ordered interval, and the second data behind is inserted into the first data. When the data is moved, there will be a problem of data overwriting, which can be saved by using a temporary variable. At this time, Two data are called a valid interval, and the third data after that is inserted, and so on.

//插入排序//升序
void InsertSort(int* a, int n)
{
	int i = 0;
	for (i = 0; i < n - 1; i++)
	{
		int end = i;//即将插入区间的最后一个元素下标
		int tmp = a[end + 1];
		while (end >= 0)
		{
			if (tmp > a[end])
			{
				//如果把a[end + 1] = tmp;写到这里,会遗漏一种情况,tmp的值是最小的,end=-1,不能进入循环,tmp的值没有赋值给a[0]
				break;//插入完成,跳出循环
			}
			else
			{
				a[end + 1] = a[end];
				end--;
			}
		}
		//还有一种情况,当tmp的值是最小的,此时tmp的值是-1,还没有赋值给a[0],就不能进入循环了
		a[end + 1] = tmp;
	}
}

 The n-1 of the first layer of loop refers to the subscript of the last element

The time complexity is O(N^2); the space complexity is O(1);

1. The closer the element set is to order, the higher the time efficiency of the direct insertion sorting algorithm
2. Time complexity: O(N^2) [The worst result is reverse order O(N^2), and the good result is orderly or close to order O(N)]
3. Space complexity: O(1) , it is a stable sorting algorithm
4. Stability: stable

2.1.3 Hill sort (shrink incremental sort) 

(1) Hill sorting is divided into two parts: the first part is: pre-sorting [make it close to order, because when direct insertion sorting is close to order, the time complexity is O(N)] The second part is: direct insertion sort

 Code:

void ShellSort(int* a, int n)
{
	int gap = n;
	while (gap > 1)
	{
		gap = (gap / 3) + 1;
		for (int j = 0; j < n - gap; j++)
		{
			int end = j;
			int tmp = a[j + gap];
			while (end >= 0)
			{
				if (a[end] < tmp)
				{
					break;
				}
				else
				{
					a[end + gap] = a[end];
					end -= gap;
				}
			}
			a[end + gap] = tmp;
		}
	}
	
}

The n-gap of the second layer loop is also the subscript of the last element, but there are a total of gap groups, so it is n-gap

Pre-sorting : Larger data comes to the back faster, and smaller data comes to the front faster, making it close to order

First group , take gap = 3, then take the first element, the fourth element, the seventh element... (until the subscript of this element is greater than the last subscript of the array) into a group; take the second Elements, the fifth element... Divided into a group; Take the third element, the sixth element... Divided into a group; [How many gaps are divided into groups] After grouping, sort , respectively Sort the gap set of data using direct insertion sort

If the gap is smaller , it is closer to order; if the gap is larger , large data can go to the end faster, and small data can go to the front faster, but it is less close to order.

Time complexity : O(N*log is the logarithm of N with base 3), the average is O(N^1.3)

Look at the pre-sorting part, if the gap is very large, then the innermost loop can be ignored, it is O(N),

If the gap is small and the data is already very close to order, then it is also O(N).

Look at the outermost loop, N/3/3/3/3/3.../3=1, that is, the x power of 3 is equal to N, that is, log is the logarithm of N with base 3

Stability : Unstable

2.2 Selection sort

2.2.1 Basic idea

  Select the smallest (or largest) element from the data elements to be sorted each time, and store it at the beginning of the sequence until all the data elements to be sorted are exhausted.

2.2.2 Direct Selection Sort

Traversing to find the position of the largest and smallest number, put them at both ends of the array [Note: This is to exchange elements, otherwise the data will be overwritten], then traverse to find the second smallest and second largest numbers, and put them at the second head of the array Position...until all data elements are lined up.

Code display : [Optimized] What is not optimized is to find one element at a time

void SelectSort(int* a, int n)
{
	int left = 0;
	int right = n - 1;
	while (left < right)
	{
		int mini = left;
		int maxi = left;
		for (int i = left + 1; i <= right; i++)
		{
			if (a[mini] > a[i])
			{
				mini = i;
			}
			if (a[maxi] < a[i])
			{
				maxi = i;
			}
		}
		Swap(&a[mini], &a[left]);//如果left_0是最大值,最大值此时换到mini_5的位置,/maxi最大位置本来是0,但是此时最大值被换到5的位置,那么此时最小值会被换到最后
		if (left == maxi)
		{
			maxi = mini;
		}
		Swap(&a[maxi], &a[right]);
		right--;
		left++;
	}
}

Note: (when the minimum value is exchanged for the first time) if left_0 is the maximum value, the maximum value is changed to the position of mini_5 at this time, and the maximum position of /maxi is originally 0, but at this time the maximum value is changed to the position of 5. Then the minimum value will be replaced at the end at this time

1. Direct selection and sorting thinking is very easy to understand, but the efficiency is not very good. rarely used in practice
2. Time complexity: O(N^2)
In the best case and the worst case, the time complexity is O(N), because it is necessary to traverse to find the largest and smallest element subscripts regardless of order or disorder
3. Space complexity: O(1)
4. Stability: Unstable

Direct selection sorting is better than bubble sorting, but when the data is ordered or close to ordering, bubble sorting is better than direct selection sorting.

2.2.3 Heap sort

For details, see: Heap sort link 

Heapsort (Heapsort) refers to a sorting algorithm designed using a stacked tree (heap) data structure, which is a type of selection sort. It selects data through the heap. It should be noted that you need to build a large heap for ascending order, and a small heap for descending order.
1. Heap sorting uses the heap to select numbers, which is much more efficient.
2. Time complexity: O(N*logN)
3. Space complexity: O(1)
4. Stability: Unstable

Guess you like

Origin blog.csdn.net/m0_57388581/article/details/131734305