Dachang Interview: How to find the K-th largest element in O(n) with the idea of fast sorting?

In the previous article, we learned about the three sorting algorithms of bubble sorting, insertion sorting, and selection sorting (click the link at the end of the article to view). Their time complexity is O(n2), which is relatively high and suitable for sorting small-scale data. Today, we learn about sorting algorithms with O(nlogn) time complexity, merge sorting and quick sorting . These two sorting algorithms are suitable for large-scale data sorting, and are more commonly used than the three sorting algorithms learned in the previous article.

Merge sort

The core idea of ​​merge sort is quite simple. If you want to sort an array, we first divide the array into two parts from the middle, and then sort the two parts separately, and then merge the two sorted parts together, so that the entire array is in order.

The merge sort uses the divide and conquer idea . Divide and conquer, as the name suggests, is to divide and conquer, to resolve a big problem into smaller sub-problems. When the small sub-problems are solved, the big problems are solved.

Divide and conquer algorithms are generally implemented by recursion . Divide and conquer is a processing idea to solve a problem, and recursion is a programming technique, the two do not conflict.

The key point here is to explain that the technique of writing recursive code is to analyze the recursive formula , find the termination condition , and finally translate the recursive formula into recursive code. Therefore, in order to write the code for merge sort, we first write the recursive formula for merge sort.

递推公式:
merge_sort(p…r) = merge(merge_sort(p…q), merge_sort(q+1…r))
终止条件:
p >= r 不用再继续分解

Friends who are just learning may have difficulty understanding, here is an explanation:

merge_sort(p...r) means to sort the array of subscripts from p to r. We transform this sorting problem into two sub-problems, merge_sort(p...q) and merge_sort(q+1...r), where the subscript q is equal to the middle position of p and r, which is (p+r)/2. When the two sub-arrays of subscripts from p to q and from q+1 to r are sorted, we then merge the two ordered sub-arrays together, so that the data between subscripts from p to r is also Sorted out.

Friends who like to think may notice that in the recursive formula above, the outermost merge function is to merge the decomposed small ordered arrays into an ordered array. What should I do?

We apply for a temporary array tmp with the same size as A[p...r]. We use two cursors i and j to point to the first element of A[p...q] and A[q+1...r] respectively. Compare these two elements A[i] and A[j], if A[i]<=A[j], we put A[i] into the temporary array tmp, and move i by one bit, otherwise A[j] is put into the array tmp, j is shifted one bit back.

Continue the above comparison process until all the data in one of the sub-arrays is put into the temporary array, and then the data in the other array is added to the end of the temporary array in turn. At this time, the temporary array is the merge of the two sub-arrays. The result after that. Finally, copy the data in the temporary array tmp to the original array A[p...r]. You can refer to the figure below to deepen your understanding.

Next is show code time, you can refer to my implementation here:

Quick sort

Let's look at the quick sort algorithm (Quicksort), we habitually call it "Quicksort". The idea of ​​divide-and-conquer is also used in quick row. At first glance, it looks a bit like merge sort, but the idea is actually completely different.

The idea of ​​fast sorting is this: if we want to sort a set of data from p to r in the array, we choose any data from p to r as the pivot (partition point).

We traverse the data between p and r, and put the ones smaller than pivot on the left, the ones larger than pivot on the right, and the pivot on the middle. After this step, the data between the array p and r is divided into three parts, the front between p and q-1 is less than pivot, the middle is pivot, and the back between q+1 and r is Greater than pivot.

According to the processing idea of ​​divide and conquer and recursion, we can use recursive sorting of the data between the subscripts from p to q-1 and the data between the subscripts from q+1 to r, until the interval is reduced to 1, which means that all The data is in order.

递推公式:
quick_sort(p…r) = quick_sort(p…q-1) + quick_sort(q+1… r)
终止条件:
p >= r

Convert the above recursive formula into recursive code as follows:

// 快速排序,A是数组,n表示数组的大小
quick_sort(A, n) {
  quick_sort_c(A, 0, n-1)
}
// 快速排序递归函数,p,r为下标
quick_sort_c(A, p, r) {
  if p >= r then return
  
  q = partition(A, p, r) // 获取分区点
  quick_sort_c(A, p, q-1)
  quick_sort_c(A, q+1, r)
}

There is a merge() merge function in merge sort, we need a partition() partition function here. It is to randomly select an element as the pivot (in general, you can select the last element in the interval from p to r), and then partition A[p...r]. The function returns the index of pivot.

Because we all know that fast sorting needs to achieve a space complexity of O(1), so in- situ partitioning is required . The pseudo code is as follows:

partition(A, p, r) {
  pivot := A[r]
  i := p
  for j := p to r-1 do {
    if A[j] < pivot {
      swap A[i] with A[j]
      i := i+1
    }
  }
  swap A[i] with A[r]
  return i

The processing here is somewhat similar to selection sort. We divide A[p...r-1] into two parts by cursor i. The elements of A[p...i-1] are all smaller than pivot. Let's call it the "processed interval" for the time being, and A[i...r-1] is the "unprocessed interval". Every time we take an element A[j] from the unprocessed interval A[i...r-1], and compare it with pivot. If it is less than pivot, add it to the end of the processed interval, which is A[i ]s position.

Inserting an element at a certain position in the array requires moving data, which is very time-consuming. Here is a processing technique to learn , that is, exchange , which completes the insertion operation in O(1) time complexity. Just exchange A[i] with A[j], and A[j] can be placed in the position of subscript i within O(1) time complexity.

The text is not as intuitive as the picture, so I drew a picture to show the whole process of partitioning.

Because the partitioning process involves swapping operations, if there are two identical elements in the array, such as the sequence 6, 3, 5, 9, 4, after the first partitioning operation, the two 6 are opposite The order will change. Therefore, quick sort is not a stable sorting algorithm (the same value in the sorting process is different from the original array is unstable ).

show code , for your reference:

Contrast thinking

Both fast sorting and merge use divide and conquer ideas. Recursive formulas and recursive codes are also very similar. What is the difference between them?

It can be found that the process of merging and sorting is from bottom to top , dealing with sub-problems first, and then merging. The quick sort is the opposite. Its processing is from top to bottom , partitioning first, and then sub-problems. Although merge sort is stable, time complexity is O (nlogn) sorting algorithms, but it is a non-place sorting algorithm . The main reason why merge is an out-of-place sorting algorithm is that the merge function cannot be executed in-situ. Quick sort can realize in-situ sorting by designing a clever in-place partition function, which solves the problem of merging and sorting occupying too much memory.

Extension: common interview questions

Find the Kth largest element in the unordered array in O(n) time complexity. For example, for a set of data such as 4, 2, 5, 12, 3, the third largest element is 4.

To solve this problem first, there is no doubt that we still have to think of divide and conquer and partition.

We choose the last element A[n-1] of the array interval A[0...n-1] as the pivot, and partition the array A[0...n-1] in place, so that the array is divided into three parts, A[0 …P-1], A[p], A[p+1…n-1].

If p+1=K, then A[p] is the element to be solved; if K>p+1, it means that the K-th largest element appears in the interval of A[p+1…n-1], we then follow the above ideas Recursively search in the interval A[p+1...n-1]. Similarly, if K<p+1, then we will search in the interval A[0...p-1].

You may ask: Why is the time complexity of the above-mentioned solution idea O(n)?

For the first partition search, we need to partition an array of size n and need to traverse n elements. For the second partition search, we only need to partition an array of size n/2 and need to traverse n/2 elements. By analogy, the number of partition traversal elements is n/2, n/4, n/8, n/16... until the interval is reduced to 1.

If we add up the number of elements traversed by each partition, it is: n+n/2+n/4+n/8+...+1. This is the sum of a geometric sequence, and the final sum is equal to 2n-1. Therefore, the time complexity of the above solution is O(n).

summary

Merge sort and quick sort are two slightly more complicated sorting algorithms. They both use the idea of ​​divide and conquer. The code is implemented through recursion, and the process is very similar. The key to understanding merge sort is to understand the recursion formula and merge() merge function. In the same way, the key to understanding quick sort is to understand the recursion formula, as well as the partition() partition function.

The merge sort algorithm is a sorting algorithm with relatively stable time complexity under any circumstances, which also makes it have a fatal shortcoming, that is, merge sort is not an in-place sort algorithm, and the space complexity is relatively high, which is O(n). Because of this, it has not been widely used in fast sorting.

 

Recommended reading

Recently I interviewed Byte and BAT, and compiled an interview material "Java Interview BAT Clearance Manual", covering Java core technologies, JVM, Java concurrency, SSM, microservices, databases, data structures, etc. Obtaining method: Click "Watching", follow the official account and reply to 666 to receive, more content will be provided one after another

Guess you like

Origin blog.csdn.net/taurus_7c/article/details/105179184