Sorting algorithms (c) bounded heap sort and heap sort Java implementation and analysis

1. HEAPSORT
radix sort is suitable for the size of the bounded things, in addition to him, there is one other special sorting algorithm that you may encounter: bounded heap sort. If you are dealing with very large data sets, you want to get the first 10 or the first k elements, where k is much smaller than n, it is useful.

For example, suppose you are monitoring a Web service that handles billions of transactions every day. At the end of each day, you have to report the largest k transaction (or the slowest, or other most of xx). One option is to store all transactions, sort them at the end of the day, and then select the maximum of k. The time required is proportional nlogn, which is very slow, because we may not be able to billions of transactions recorded in the memory for a single program. We must use the "external" sorting algorithm.

We first look at the heap, which is a binary search tree (BST) data structure is similar. There are some differences:

In BST, each node x has "BST Characteristics": all the nodes in the left subtree of x is less than x, all nodes in the right subtree are greater than x.
In the stack, each node x has a "heap Characteristics": all nodes in two subtrees are greater than x.
Heap like a balanced BST; when you add or delete elements, they will do some extra work to re-balance the tree. Thus, the array of elements may be used to efficiently implement them.
Now the discussion is a small root heap. If the subtree root node is less than, for the large root stack.

Heap smallest element is always the root, so we can find it in constant time. In proportional to the height h heap of time and add and remove elements needed tree. And because the heap is always balanced, so the log h is proportional to n.

JavaPriorityQueue heap implementation. Queue PriorityQueue provided a method specified in the interface, including offer and poll:

offer: an element added to the queue, updating the stack, so that each node has a "stack properties." Need logn time.
poll: delete the smallest element in the queue from the root node, and update the heap. Need logn time.
Given a PriorityQueue, you can easily sort the set of n elements like this:

Use offer, all elements of the collection add to PriorityQueue.
Use poll remove elements from the queue and added to the List.
Because the poll returns the smallest element remaining in the queue, the elements added to the ascending List. This ordering is called heap sort.

Add n elements nlogn time required to queue. Delete n elements as well. Therefore HEAPSORT running time is O (n logn).

 


2.代码实现:
/**
* @Author Ragty
* @Description 堆排序
* @Date 19:15 2019/6/12
**/
public void heapSort(List<T> list,Comparator<T> comparator) {
PriorityQueue<T> heap = new PriorityQueue<T>(list.size(),comparator);
heap.addAll(list);
list.clear();
while(!heap.isEmpty()) {
list.add(heap.poll());
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
测试代码:

list = new ArrayList<Integer>(Arrays.asList(3, 5, 1, 4, 2));
sorter.heapSort(list, comparator);
System.out.println(list);
1
2
3

 

3. bounded HEAPSORT
bounded stack is limited to a stack contains up to k elements. If you have n elements, you can track the largest k elements:

Initially stack is empty. For each element x:

Branch 1: If dissatisfied heap, add x to the heap.
Branch 2: If filled, please be compared with the smallest element of the heap of x. If x is small, it can not be one of the largest k elements, so you can discard it.
Branch 3: If filled, and x is greater than the smallest element of the heap, please delete the smallest element from the heap and add x.
Use the top of the heap is the smallest element, we can track the maximum of k elements. Let's analyze the performance of this algorithm. For each element, we have one of the following:

Branch 1: add elements to the stack is O (log k).
Branch 2: Find the smallest element in the stack is O (1).
Branch 3: Remove the smallest element is O (log k). Add x is O (log k).
In the worst case, if the element is present in ascending order, we always perform branch 3. In this case, the total processing time of n elements is O (n log k), for n is linear.

 

4.代码实现:
/**
* @Author Ragty
* @Description 有界堆排序
* @Date 19:49 2019/6/12
**/
public List<T> topK(int k,List<T> list,Comparator<T> comparator) {
PriorityQueue<T> heap = new PriorityQueue<T>(list.size(),comparator);
for (T element : list) {
if (heap.size() < k) {
heap.offer(element);
continue;
}
int cmp = comparator.compare(element,heap.peek());
if (cmp>0) {
heap.poll();
heap.offer(element);
}
}
List<T> res = new LinkedList<T>();
while (!heap.isEmpty()) {
res.add(heap.poll(http://www.my516.com));
}
return res;
}
1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
. 11
12 is
13 is
14
15
16
. 17
18 is
. 19
20 is
21 is
22 is
23 is
24
test code:

list = new ArrayList<Integer>(Arrays.asList(6, 3, 5, 8, 1, 4, 2, 7));
List<Integer> queue = sorter.topK(4, list, comparator);
System.out.println(queue);
1
2
3

 

5. Space complexity
So far, we've talked about a lot of run-time analysis, but for many algorithms, we are also concerned about space. For example, a merge sort disadvantage is that it copies the data. In our implementation, the amount of space allocated to it is O (n log n). By optimization, the space can be reduced to O (n).

In contrast, the insertion sort data is not copied, because it would place the sort of elements. It uses two temporary variables to compare disposable elements, and using some other local variables. But it does not depend on the use of space n.

Our heap sort implementation creates new PriorityQueue, to store elements, so the space is O (n); but if you can sort the list in place, you can use the space O (1) the implementation of heap sort.

One of the benefits of just bounded stack algorithm implementation, it requires space and k is proportional to (the number we want to keep the element), and k is usually much smaller than n.

Software developers tend to focus more space than the running time, for many applications, it is appropriate. But for large data sets, space may be equally or more important. E.g:

If a data set can not be placed in a program memory, then the running time is usually greatly increased, or does not run. If you choose an algorithm requires less space, and this can be calculated into memory, it may run faster. Again, using less space programs, could make better use of CPU cache and run faster.
On the server to run multiple programs at the same time, if you can reduce the space required for each program, you can run more programs on the same server, reducing hardware and energy costs.
---------------------

Guess you like

Origin www.cnblogs.com/ly570/p/11106210.html