BFPRT algorithm - Find the Kth smallest element in the array.

1. Brief description and analysis of experimental problems

  1. Experimental question :

Find the Kth smallest element in the array.

  1. Problem example :

Usually we need to find the top K numbers among a large number of numbers. For example, in the search engine, find the top 1,000 hot words that users click on that day; in the text feature selection, find the top k words ranked from large to small problem. This problem is also called TOP-K problem.

  1. Problems with conventional algorithms:

The conventional algorithm refers to sorting using the quick sorting algorithm. The average time complexity of quick sorting is O(nlogn), but there are cases where the time complexity is O(n²). In fact, only the first K small/large numbers are needed , there is no need to sort other redundant numbers, and the quick sort algorithm wastes redundant sorting time.

  • Brief description of the experimental process and time complexity analysis

1. BFPTR algorithm

BFPTR algorithm, also known as the median algorithm. According to the algorithm analysis, its worst time complexity is only O(n) . The difference between it and the conventional method (that is, the quick sort algorithm) is only in the sorting selected in each sorting Whether the number is specified. In the conventional algorithm, the selection of sorting numbers is random, but in the BFPRT algorithm, the array is first divided into groups of five adjacent ones, and if there are less than five remaining in the end, the same is also divided. For a group, form a new array with the median in each group, and then obtain the median of the new array as the selected comparison number. Solve it recursively to ensure that the comparison number every time is in the middle of the array , and finally reduce the time complexity to O(n).

  1. Brief description of the experimental steps

① Group the original array and divide it into five groups. In the end, if there are less than five remaining, it can also be counted as a group.

②Sort each obtained group internally, take the median, and store it in a new array.

③Find the median of the new array, and use this number as the comparison number m* to carry out the partiton process, that is, store the number smaller than m* in S1, and put the number larger than m* in S2.

④ Judgment situation 1: If k is exactly equal to |m*| at this time, output

Judgment situation 2: If k>|m*| at this time, shorten the recursive  BFPRT( a , low , m.position-1 , int  key )

Judgment situation 3: If k<|m*| at this time, shorten the recursive  BFPRT( a , m.position+1 , high ,  int  key )

 Figure 1 Schematic diagram of BFPRT algorithm

  1. Time Complexity Analysis of Algorithms

The worst time complexity of the BFPRT algorithm is O(n). Let T(n) be the time complexity, then it is easy to have the following formula:

                                               (1)

                                                              ②                      ③

Among them, ① comes from finding the median in the group, and ② comes from the BFPRT() process. The initially selected m* is first greater than 1/2 in the median array, that is, (1/2)*(n/5) of the total , and among the n/10 numbers, they must be greater than or equal to 3 numbers in the original 5 groups. Therefore, in the worst case, 7/10 parts are selected every time ③c*n comes from other process, such as sorting.

  1. Thoughts on experimental problems and new attempts :

In the algorithm explanation, 5 has been used to group the array, but why not use other numbers for grouping, I used the program to compare in the experiment.

Because even grouping is inconvenient to take the median, it is not considered. In the experiment, I chose 7 and 9 to group the original array, and found that when the length of the array is the same, the calculation time of grouping with 5 is generally better than that of 7 , 9 The case of grouping.

Analyze the reasons:

  1. First of all, the more elements in each group, the worse the worst case of the data: for example, a group of 7, the worst data is divided by 4:10 or 2:5, which is higher than the ratio of 3:7, and it is even worse. uniform.
  2. This will increase the number of recursions.

The figure below shows the results of the BFPRT algorithm for grouping 5, 7, and 9 in the case of 200 points

 5 groups

7 groups

  9 groups

After comparison, it can be seen that the calculation time efficiency of the 5-group algorithm is generally the highest, and the running time is faster.

The figure below shows the results of the BFPRT algorithm in which 5, 7, and 9 are grouped in the case of 500 points

5 groups

7 groups

9 groups

Similarly, after comparison, it can be seen that the 5-group algorithm generally has the highest calculation time efficiency, and the running time is faster.

Guess you like

Origin blog.csdn.net/qq_52913088/article/details/127010629