DS eight sorting: direct insertion sort and Hill sort

Preface

We introduced the basic data structures of linear and nonlinear in previous issues. For example, sequence lists, linked lists, stacks and queues, binary trees, etc.~! In this and the next few issues, we will explain in detail the concepts, implementation and performance analysis of each sort!

Contents of this issue

The concept of sorting and its application

Common sorting algorithms

direct insertion sort

Hill sort

1. The concept of sorting and its application

The concept of sorting

Sorting: The operation of arranging a sequence of elements in ascending or descending order according to certain rules!

Stability: Suppose there are multiple identical elements in the sequence to be sorted , ifis sorted, the relative position of these elements (relative order) is unchanged, then this sorting is said to be stable. Otherwise it is unstable!

Internal sorting: Sorting in which all data elements are placedin memory.

External sorting: Too many data elements cannot be in memory at the same time. According to the requirements of the sorting process, they cannot be moved between memory and external memory. Sorting of data!

The performance analysis of which ones are internal sorting and which ones are external sorting will be introduced later!

Use of sorting

Sorting is extremely common in daily life: for example, the comprehensive, sales volume, praise, price, etc. that you see on shopping platforms such as JD.com, Taobao, and Pinxixi every day are all sorting of products!

There are also university rankings:

There are also everyone’s rankings during the exam. These are all rankings~! Sorting can only be common in life! ! !

Common sorting algorithms

2. Direct insertion sort and its implementation

The basic idea of ​​insertion sort

Putthe elementssequence according toits< /span> until < /span>, and the new The sequence of is an ordered sequence! are inserted until they are finishedAll the elements to be sorted sequenceordered sizes are inserted one by one into the sorted

This is just like how we used to sort out the cards when we played poker when we were kids. We compare them with the previous rows to find the right position to insert!

direct insertion sort

When the i-th element is inserted (i >= 1), the previous < /span> a> element is inserted after the previous element! The currentotherwise (ascending order), the previous element is inserted into the current position, then is larger than the current ,need to compare the elements of array[i] with all the previous elements one by one , at this time, onlyorderedisThe sequence

My thoughts abovebeforetophotograph ,recomingremodeling chiropractic

	int end = 0;//一开始end置0位置
	int tmp = a[end + 1];//tmp是end 的下一个位置的元素
	while (end >= 0)
	{
		if (a[end] > tmp)//前面元素比当前的元素大
		{
			a[end + 1] = a[end];//前面元素的插入到前面元素的后一个位置
		}
		else//前面元素不比当前的元素大
		{
			a[end + 1] = tmp;//当前元素插入到前面元素的下一个位置
			break;//记得结束否则会又把排好的区间搞乱
		}

		--end;
	}

	if (end < 0)//所有的都比tmp大,此时end一直减会减到-1
	{
		a[0] = tmp;//此时把tmp(end的下一个位置的元素)插入到0下标位置
	}

overall:

In fact, if you write a single pass, it will be very easy to transform the whole thing. We can just add a layer of loops outside! Let end start from i, and each element will be compared with the previous one one by one until it is finally sorted!

void InsertSort(int* a, int n)
{
	for (int i = 0; i < n - 1; i++)
	{
		int end = i;//一开始end置0位置
		int tmp = a[end + 1];//tmp是end 的下一个位置的元素
		while (end >= 0)
		{
			if (a[end] > tmp)//前面元素比当前的元素大
			{
				a[end + 1] = a[end];//前面元素的插入到前面元素的后一个位置
			}
			else//前面元素不比当前的元素大
			{
				a[end + 1] = tmp;//当前元素插入到前面元素的下一个位置
				break;//记得结束否则会又把排好的区间搞乱
			}

			--end;
		}

		if (end < 0)//所有的都比tmp大,此时end一直减会减到-1
		{
			a[0] = tmp;//此时把tmp(end的下一个位置的元素)插入到0下标位置
		}
	}
}

But pay attention to :The judgment condition of the outer for loop, i < n - 1, that is to say, i can go to position n - 2 at most, which is the penultimate element! The reason is: tmp is the element to be inserted each time, and tmp = a[end +1]< /span>So i can only go to the second to last element! . If i reaches the position of the last element, which is n-1, then tmp = a[end+1 ] will cross the line! ! ! is the next position of end(i)

OK, insert it directly and finish writing, test it:

It’s okay to write like this, but it feels a little frustrating~! We have to judge once when the current element is not larger than the previous element, and once if it crosses the boundary.

Can we find a way to optimize it? The answer is yes! Wewhether we judge that the current element is not larger than the previous one or out of bounds, the final insertion is at the end+1 position, so when it is < a i=3>If within a single loop (while) it is not satisfied that the current value is larger than the previous one, the current loop will be jumped out immediately. When it comes to, the single-trip loop may end with break, or it may end with end < 0, but we don’t need to care about it at all, just insert it directly at the end+1 position~ !

void InsertSort(int* a, int n)
{
	for (int i = 0; i < n - 1; i++)
	{
		int end = i;
		int tmp = a[end + 1];
		while (end >= 0)
		{
			if (a[end] > tmp)
			{
				a[end + 1] = a[end];
			}
			else
			{
				break;
			}

			--end;
		}

		a[end + 1] = tmp;
	}
}

Isn’t it much simpler to write this way, and it’s less likely to make mistakes~! When I was learning in the past, I always forgot the first type of break above, and ended up looking for it for a long time (Q^Q).... So I suggest that you usually write the second type~!

Complexity analysis

Time complexity: O(N^2) ---> A single trip is O(N). In the worst case, N elements have to go through a single trip.

Space complexity: O(1) ---> The number of additional spaces used is a constant

When the sequence to be sorted is close to ordered, the performance is best O(N)~!

OK, we’ll stop here with the introduction of direct insertion sorting. Now let’s introduce its optimized version ----- Hill sorting! !

3. Hill sorting and its implementation

We introduced above that the time complexity of direct insertion is O(N^2), and its performance is average~! One day, a big guy named D.L.Shell went to learn direct insertion sort, and thought that since your direct insertion sort has good performance (O(N)) when it is close to ordered, can I put a group of unordered After processing the elements, first make them nearly ordered and then insert them directly? In fact, his idea directly optimized direct insertion sort to the point where it is almost on par with quick sort~! This is the big guy below:

OK, let’s take a look at the boss’s specific ideas!

Hill sorting idea

1. Pre-sort multiple groups

2. Finally, insert it directly

What does it mean? Let me explain: the multi-group preview here is:

First select an integer incrementgap (initially gap = n / 2), Divide the elements of the array into gap groups, and divide each distance into The gaps are divided into groups, and perform "direct plug-in" for each group with a distance of gap, and then Continuously reduce the gap (gap /= 2), repeat the above operation, until gap == 1It means thatall the groups have been arranged. Compared with the beginning, it is already very orderly~! Finally we do another direct insertion sort then the sequence is in order Yes!

OK, let’s draw a picture to understand:

This divides a complete array into gap groups. The gap at this time is 5. Let's go through a pre-sorted gap:

If he continues like this, the gap will eventually reach gap == 1. When gap == 1, the row will be plugged in directly:

From the above chestnut, we can clearly see that when gap == 1, the array before the last direct insertion is already very ordered, and it will be in order after the last direct insertion sorting~!

Implementation of Hill sort

It’s better to do a single trip first, then the whole group~!

Single trip

Single trip for each group

for (int i = 0; i < n - gap; i += gap)
{
	int end = i;
	int tmp = a[end + gap];
	while (end >= 0)
	{
		if (a[end] > tmp)
		{
			a[end + gap] = a[end];
		}
		else
		{
			break;
		}

		end -= gap;
	}

	a[end + gap] = tmp;
}

Note that here is n - gap rather than n - gap - 1

This is a single pass of a group. It is almost the same as direct insertion sort.When gap == 1, it is direct insertion sorting~! As many groups as there are at this time, there will be a single trip like this for as many groups, so setting a layer of loops outside is a single trip for all groups~!

Single trip for all groups

for (int i = 0; i < gap; i++)
{
	for (int i = 0; i < n - gap; i += gap)
	{
		int end = i;
		int tmp = a[end + gap];
		while (end >= 0)
		{
			if (a[end] > tmp)
			{
				a[end + gap] = a[end];
			}
			else
			{
				break;
			}

			end -= gap;
		}

		a[end + gap] = tmp;
	}
}

This is all the pre-scheduled single trips, so how many pre-scheduled trips does he take? I don’t know exactly, but when the gap is finally adjusted to (gap /= 2 or gap = gap / 3 + 1) gap == 1! So overall we should put a loop outside:

void ShellSort(int* a, int n)
{
	int gap = n;
	while (gap > 1)
	{
		gap /= 2;
		for (int i = 0; i < gap; i++)
		{
			for (int i = 0; i < n - gap; i += gap)
			{
				int end = i;
				int tmp = a[end + gap];
				while (end >= 0)
				{
					if (a[end] > tmp)
					{
						a[end + gap] = a[end];
					}
					else
					{
						break;
					}

					end -= gap;
				}

				a[end + gap] = tmp;
			}
		}
	}
}

The gap is n at the beginning, and then the gap /= 2 is executed for the first time; then every adjustment is /=2, and the last time it enters the loop must be 1 (n, n / 2, n / 4, n / 8, n / 16, n /32 ..... 4, 2, 1), that is, direct insertion sorting~!

OK, test it out:

no problem! But this code has been optimized by another boss, as follows:

void ShellSort(int* a, int n)
{
	int gap = n;

	while (gap > 1)
	{
		gap = gap / 3 + 1;
		for (int i = 0; i < n - gap; i++)
		{
			int end = i;
			int tmp = a[end + gap];
			while (end >= 0)
			{
				if (a[end] > tmp)
				{
					a[end + gap] = a[end];
				}
				else
				{
					break;
				}

				end -= gap;
			}

			a[end + gap] = tmp;
		}
	}
}

He changed the single group one by one into multiple groups side by side~~! In the past, after one group finished running, we would line up another group. After he finished the changes, he could arrange multiple groups in one go~! But there is no difference in performance,,.

The gap he gave here = gap / 3 + 1; the efficiency of writing this way is actually a little better than gap /= 2 (seen before on the Internet). He +1 here because gap / 3 is sometimes less than 1 (for example : 2 / 3), so the last trip won’t be scheduled~! +1 will solve this problem!

Complexity analysis

The time complexity of Hill sorting is extremely difficult to calculate, because there are too many ways to calculate the gap value, which is difficult to calculate accurately. The values ​​given in many books are different: for example:Teacher Yan Weimin’s book is as follows

Teacher Yin Renkun’s book is as follows:

Since our gap here is taken according to the method proposed by Teacher Yin Renkun, and Teacher Yin Renkun also conducted a large amount of data on it Experimental statistics, we can temporarily think that

The time complexity of Hill sorting is: O(N^1.25) or O(N^1.3)

The spatial complexity of Hill sorting is O(1)

OK, this is the introduction to insertion sorting in this issue. Good brothers, see you in the next issue of selection sorting~!

Guess you like

Origin blog.csdn.net/m0_75256358/article/details/134656550