Data structure - using heap to sort arrays

Insert image description here

The content of today’s article is about how we use the characteristics of the heap to sort our arrays, and the problem of our TopK. This time we put it in the file type, we put in 100 million numbers, and then we took out one The ten largest numbers among the billions can be solved using the problems in the previous chapter.

The first is that if we want to sort an array, the array will not have any rules, just like the array below.

int arr[] = {
    
     9,4,3,19,12,13,5,8,9 };

Then we have to make use of the characteristics of our heap, because we know the characteristics of the heap. First of all, the data on the top of the heap must be the smallest. Then before we sort, the most important step to do is to build a heap first. We There are two methods you can use, one is to build a heap upwards, and the other is to build a heap downwards. We will talk about both methods.

Build up the pile

First of all, the example we give here is ascending order, but in ascending order, should we create a large heap or a small heap? The answer is 大堆, then let’s first look at what problems will arise when reducing the heap, and then look at the large heap. After comparing the two with each other, we will find that ascending order should be established Big pile.

The first is the method of building a heap upward by reusing the contents of our last heap, which isAdjustUp. If you don’t understand this, you can look back. I will give the code directly here. .

void Swap(int* p1, int* p2)
{
    
    
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}
void AdjustUp(int* a, int child)
{
    
    
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		if (a[child] < a[parent])
		{
    
    
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
    
    
			break;
		}
	}
}

We can see that this is an upward adjustment. So when we build the heap, do we start from the second level of the binary tree and build the heap upwards? The comparison is the relationship between the child and the father. Then we can write a loop here to complete this establishment. heap process.

int arr[] = {
    
     9,4,3,19,12,13,5,8,9 };
	for (int i = 1; i < sizeof(arr) / sizeof(arr[0]); i++)
	{
    
    
		AdjustUp(arr, i);
	}

Then the heap we create through this process looks like our small heap.

Insert image description here
After the small heap is built, our next step is to sort, but there is a problem with sorting. Although we ensure that the element at the top of the heap is the lowest at the beginning, how do we find the second smallest and third smallest numbers? If someone here says that we can use the downward adjustment method of the heap, and then re-establish the heap to find the next smallest one, it will be no problem if we go down in this way. Although it is said that sorting can be completed in this way, this sorting method even It is slower than bubble sorting. It seems to use the characteristics of heap. In fact, the time complexity is higher than bubble sorting. In this way, we cannot complete the function of our heap. But if we build a large heap, the result will be something. ·It’s different, we don’t have to look for the small ones, we sort the later ones first.

Insert image description here
After creating a large heap, we can first exchange the first element and the last element, and then adjust downward. We will also calculate the time complexity of downward adjustment and upward adjustment in detail later. , let's first look at if the positions of the first and last elements are exchanged, it will become as follows.
Insert image description here
This is what it looks like after the exchange, but there is still a problem. We need to ensure that ours is still a big pile. So what should we do? The first thing is to adjust it downwards. The downward adjustment is the top of the pile. To adjust the elements downward, we use the characteristics of the heap to write AdjustDown. After adjustment, the picture below is shown.

Insert image description here
At this time, we find that the last element is the largest and in order, and we are still in a big pile, so the element at the top of the pile is the second largest number, so what we have to do now is to change the positions of the first and second to last elements. , and then adjust it so that the bottom two are in order. After order, it is still a big pile, and the element at the top of the heap is the third largest number. This cycle will become ordered until the end.

Then our code is as follows. In fact, the code is very simple and may be difficult to understand.

AdjustDown
void AdjustDown(int* a, int size, int parent)
{
    
    
	int child = 2 * parent + 1;
	while (child < size)
	{
    
    
		if (a[child + 1] > a[child] && child+1 < size)
		{
    
    
			child++;
		}
		if (a[child] > a[parent])
		{
    
    
			Swap(&a[child], &a[parent]);
			parent = child;
			child = 2 * parent + 1;

		}
		else
		{
    
    
			break;
		}
	}
}

This is the code for AdjustDown. I’ve talked about it again, so I won’t go over it. Let’s look at how we sort it.代码

int end = n - 1;
	while (end > 0)
	{
    
    
		Swap(&arr[0], &arr[end]);
		AdjustDown(arr, end, 0);
		//这里的end是元素个数,如果是下标的话就是指最后一个元素的后一个
		end--;
	}

When end = 0, it means that it has been sorted, so this is the judgment condition. Then let’s see that our end points to the last element from the beginning. Because it is an array, what it represents here is the subscript. We This is what we need to pay attention to here. Then we first exchange the top element and the last element of the heap, and then start adjusting directly. However, when adjusting, we did not perform end - because the parameter of the size position of AdjustDown It represents each element, and when we adjust it, because the last element is already in order, there is no need to adjust it.

Let’s take a look at the results
Insert image description here
We can find that we have also sorted the order. There is another content to talk about here, which is to build the heap. At that time, when we built the heap, we adjusted it upward. Starting from the second level, we can also use the method of building a heap downward. When building a heap downward, we must ensure that the subtrees on both sides are heaps. For example, we have a big heap now, so the number of subtrees must be a big heap. Our first The first parent node should be adjusted at one time. We can use (size - 1- 1)/ 2 to find the first parent node. Because although our heap looks like a binary tree, it is actually an array. Let's see how the code is implemented here.

int main()
{
    
    
	int arr[] = {
    
     9,4,3,19,12,13,5,8,9 };
	int n = sizeof(arr) / sizeof(arr[0]);
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(arr, n, i);
	}
	for (int i = 0; i < sizeof(arr) / sizeof(arr[0]); i++)
	{
    
    
		printf("%d ", arr[i]);
	}
	printf("\n");
	int end = n - 1;
	while (end > 0)
	{
    
    
		Swap(&arr[0], &arr[end]);
		AdjustDown(arr, end, 0);
		//这里的end是元素个数,如果是下标的话就是指最后一个元素的后一个
		end--;
	}
	for (int i = 0; i < sizeof(arr) / sizeof(arr[0]); i++)
	{
    
    
		printf("%d ", arr[i]);
	}

	return 0;
}

This means that only downward adjustment is used for sorting. The method of building the heap also uses the downward adjustment method. Then we have to calculate the time complexity of upward adjustment and downward adjustment later. Here first The conclusion is that the method of building the heap downward is the most efficient. We give a diagram below to calculate their time complexity respectively.
Insert image description here
We give such a graph. First, we assume that the height of this number is H, and then we write the number of nodes in each layer next to it.

Insert image description here
Then we can calculate the number of adjustments they need to make if they adjust upward.
Insert image description here
At this time, we only need to help them multiply to get a function that requires split term cancellation.

Insert image description here
And because there is an equation between our height and the number of our nodes, we can change h into N, let's take a look.
Insert image description here
This is how we adjust upwards. The same principle applies if we adjust downwards, except that we start from the second to last layer. In fact, you can just try it yourself. The calculations are the same. Method, the time complexity is O(N). We can actually get it through analysis, because the upward adjustment method is the same as the downward adjustment. I will talk about one here. It is not difficult to see that the time complexity of upward adjustment is high. For downwards, this is why, we can see that they have the most layers. When adjusting upwards, our most layer is the last layer. It has the largest number of nodes and the highest height, so it is many-to-many. The time complexity is Compared with our downward adjustment, when we adjust downward, we start from the last parent node, and we only need to adjust it once. This is more versus less. The node on the penultimate layer is basically the last node of the entire node. So the conclusion we draw here is that downward adjustment is the fastest. We can solve the problem later by just building a heap downwards. In fact, the sorting we have here is essentially selection sorting.
Insert image description here
This is the calculation process of building a heap downwards. You can take a look. If you don’t know how, just send me a private message. Thank you.

There is also a TopK question that will be included in the next article, because it will generate a lot of traffic. See you in the next article.

Insert image description here

Guess you like

Origin blog.csdn.net/2301_76895050/article/details/134642236