Detailed binary search algorithm details

Thinking

I believe many readers friends, write binary search algorithm programming code that belongs to metaphysics, although it looks very simple, it is to be wrong, or leak equal sign, add 1 or less.

Do not be discouraged, because binary search is not simple. Knuth look big brother (who KMP algorithm of the invention) how to say:

Although is binary at The Basic Search of IDEA IS comparatively straightforward,
at The CAN BE Surprisingly the Details Tricky ...

This sentence can be understood: the idea is very simple, the details are devil.

In this paper, in the form of questions and answers, to explore some of the most commonly used binary search scene: looking for a number, look for the left side of the border, looking for the right side of the border. The first scenario is the simplest form of an algorithm to solve this question, the latter two scenarios is the point.

Moreover, we are to get into the details, such as whether inequality should take an equal sign, whether mid plus one, and so it should. Analysis of these differences in detail and the reasons for these differences, you can ensure the flexibility to accurately write the correct binary search algorithm.
Zero, binary search framework

int binarySearch(int[] nums, int target) {
    int left = 0, right = ...;

    while(...) {
        int mid = (right + left) / 2;
        if (nums[mid] == target) {
            ...
        } else if (nums[mid] < target) {
            left = ...
        } else if (nums[mid] > target) {
            right = ...
        }
    }
    return ...;
}

Analysis of a binary search technique is: do not appear else, but to all cases be clearly written with else if, so you can clearly show all the details. This article will use else if, is intended to make it clear, the reader can understand their own simplified.

... mark the place where part of the details that may arise. When you see a binary code looks, first of all pay attention to these places. Later analysis with examples of these places can have any kind of change.

Further statement about the need to prevent the overflow of skill, i.e., mid = left + (right-left ) / 2 calculating mid. This article ignore this problem.
First, look for a number (basic binary search)

This scenario is the simplest, but also willing to be the most familiar, that is searching for a number, if present, returns its index, otherwise -1.

int binarySearch(int[] nums, int target) {
    int left = 0;
    int right = nums.length - 1; // 注意

    while(left <= right) {
        int mid = (right + left) / 2;
        if(nums[mid] == target)
            return mid;
        else if (nums[mid] < target)
            left = mid + 1; // 注意
        else if (nums[mid] > target)
            right = mid - 1; //Note 
        }
     return -1 ; 
}

1. Conditions Why the while loop is <= instead of <?

A: Because the assignment is initialized right nums.length-1, which is the last element of the index, rather than nums.length.

This may occur in two different binary search function, the difference is: the former corresponding to both ends closed interval [left, right], which corresponds to the closed left and right open interval [left, right), because the index size nums .length is out of bounds.

We use this algorithm is that the former [left, right] closed at both ends of the interval. In fact, this interval is a section for each search, we might as well be called "search space」 .

When should we stop searching it? Of course, when you can find a target termination:

    if(nums[mid] == target)
        return mid;

But if not found, you need to terminate the while loop, then return -1. When it while loop should be terminated? The search space is empty should be terminated , which means you do not have to look for, and did not mean to find the thing.

while (left <= right) termination condition is left == right + 1, is written in the form of section [right + 1, right], or with a specific number into [3, 2], showing that this search time interval empty, either because there is no number less than or equal to 333 and 222 bar. So this time while the loop termination is correct, it can directly return -1.

while (left <right) termination condition is left == right, is written in the form of sections [left, right], or a tape into a specific number [2, 2], this time the search space is not empty, there are a number 222, but this time while loop terminates. This means that the interval [2, 2] is missing, the index 222 has not been searched, the direct return -1 if this time is wrong.

Of course, if you have to use while (left <right) can be, we already know the cause of the error, hit a good patch:

//...
while(left < right) {
    // ...
}
return nums[left] == target ? left : -1;

2. Why left = mid + 1, right = mid - 1? I see some code right = mid or left = mid, none of these simple calculations, how matter in the end, how to judge?

A: This is a difficult dichotomy to find, but as long as you understand the foregoing, it can be very easy to judge.

Just clear the "search space" concept, and the algorithm of the search space is closed at both ends, that is [left, right]. So when we find an index mid mean the target, how to determine the next step of the search space it?

Course [left, mid - 1] or [mid + 1, right] right? Since mid been searched, should be removed from the search interval.

3. This algorithm is there any flaws?

A: At this point, you should already know all the details of the algorithm, and the reasons for such treatment. However, this method has limitations.

For example, give you an ordered array nums = [1,2,2,2,3], target = 2, this algorithm returns the index is 222, that's right. But if I want to get left border target, ie 111 index, or I want to get right border target, ie 333 index, so the algorithm can not handle.

Such requirements are common. You might say, find a target, and then to the left or right will not do a linear search? Yes, but not because it is difficult to guarantee binary search logs to the complexity.

We'll discuss the follow-up of these two algorithms binary algorithm to find.
Second, find the left boundary of the binary search

directly look at the code, wherein the label is a detail to note:

int left_bound(int[] nums, int target) {
    if (nums.length == 0) return -1;
    int left = 0;
    int right = nums.length; // 注意
    
    while (left < right) { // 注意
        int mid = (left + right) / 2;
        if (nums[mid] == target) {
            right = mid;
        } else if (nums[mid] < target) {
            left = mid + 1;
        } else if (nums[mid] > target) {
            right = mid; // 注意
        }
    }
    return left;
}

1. Why while (left <right) instead of <=?

A: The same analysis method, because right = nums.length instead nums.length - 1. Therefore, each cycle of "search space" is the [left, right) left and right to open and close.

condition while (left <right) termination was left == right, this time searching interval [left, left) is empty, it can be terminated correctly.

2. Why did not return operations -1? If the target of this value does not exist in nums, how do?

A: Because you want to go step by step, to understand what this "left border" have any special meaning:


For this array, the algorithm will return 111. The implication 111 may interpret this: nums elements 222 are less than 111.

For example, an ordered array nums = [2,3,5,7], target = 1, the algorithm 000 returns, meaning: nums element 111 are less than 000.

Another example nums unchanged, target = 8, the algorithm will return to 444, meaning: nums elements are less than 888 444.

In summary it can be seen, the return value of the function (i.e., the value of the left variable) value interval is closed interval [0, nums.length], so we simply add two lines of code at the correct time to return -1:

the while (left < right) {
     // ... 
}
 // target is larger than the number of all 
IF (left == nums.length) return -1 ;
 // handled similar algorithm before 
return the nums [left] == target ? left: -1;

3. Why left = mid + 1, right = mid? And before the algorithm is not the same?

A: This is a very good explanation, because our "search space" is the [left, right) left and right to open and close, so when nums [mid] After being detected, the next step should be removed mid search space is divided into two sections, namely [left, mid) or [mid + 1, right).

4. Why is the algorithm can search the left side of the border?

A: The key is to nums [mid] == target processing in this case:

    if (nums[mid] == target)
        right = mid;

Visible, when not to return immediately to find the target, but the narrow upper bound right "search space" to continue the search in the interval [left, mid), that is shrinking to the left, the purpose of locking the left side of the border.

5. Why return to left instead of right?

A: they are the same, because while termination condition is left == right.
Third, find the right border of a binary search

to find the right side of the border and look for the code on the left border of the same, only two differences, has been marked:

int right_bound(int[] nums, int target) {
    if (nums.length == 0) return -1;
    int left = 0, right = nums.length;
    
    while (left < right) {
        int mid = (left + right) / 2;
        if (nums[mid] == target) {
            left = mid + 1; // 注意
        } else if (nums[mid] < target) {
            left = mid + 1;
        } else if (nums[mid] > target) {
            right = mid;
        }
    }
    return left - 1; // 注意
}


1. Why is this algorithm can be found right border?

A: Similarly, the key point is here:

if (nums[mid] == target) {
    left = mid + 1;

When nums [mid] == target, do not return immediately, but left to increase the lower bound of the "search space", making the interval shrinking right, the purpose of locking the right side of the border.

2. Why did finally return left - 1 rather than the function of the left boundary, return left? And I feel that since there is a search for the right side of the border, should return right son.

A: First, the termination condition of the while loop is left == right, so the left and right are the same, you have to reflect the characteristics of the right of return right - 1 well.

As to why a reduction, which is a special point search right border, the key condition in this judgment:

if (nums[mid] == target) {
    left = mid + 1;
    // 这样想: mid = left - 1


因为我们对 left 的更新必须是 left = mid + 1,就是说 while 循环结束时,nums[left] 一定不等于 target 了,而 nums[left-1] 可能是 target。更多精彩文章欢迎关注我的众公号 labuladong。

至于为什么 left 的更新必须是 left = mid + 1,同左侧边界搜索,就不再赘述。

3. 为什么没有返回 −1-1−1 的操作?如果 nums 中不存在 target 这个值,怎么办?

答:类似之前的左侧边界搜索,因为 while 的终止条件是 left == right,就是说 left 的取值范围是 [0, nums.length],所以可以添加两行代码,正确地返回 −1-1−1:

while (left < right) {
    // ...
}
if (left == 0) return -1;
return nums[left-1] == target ? (left-1) : -1;

四、最后总结

来梳理一下这些细节差异的因果逻辑:

第一个,最基本的二分查找算法:

1 因为我们初始化 right = nums.length - 1
2 所以决定了我们的「搜索区间」是 [left, right]
3 所以决定了 while (left <= right)
4 同时也决定了 left = mid+1 和 right = mid-1
5 
6 因为我们只需找到一个 target 的索引即可
7 所以当 nums[mid] == target 时可以立即返回

第二个,寻找左侧边界的二分查找:

1 因为我们初始化 right = nums.length
2 所以决定了我们的「搜索区间」是 [left, right)
3 所以决定了 while (left < right)
4 同时也决定了 left = mid + 1 和 right = mid
5 
6 因为我们需找到 target 的最左侧索引
7 所以当 nums[mid] == target 时不要立即返回
8 而要收紧右侧边界以锁定左侧边界

第三个,寻找右侧边界的二分查找:

 1 因为我们初始化 right = nums.length
 2 所以决定了我们的「搜索区间」是 [left, right)
 3 所以决定了 while (left < right)
 4 同时也决定了 left = mid + 1 和 right = mid
 5 
 6 因为我们需找到 target 的最右侧索引
 7 所以当 nums[mid] == target 时不要立即返回
 8 而要收紧左侧边界以锁定右侧边界
 9 
10 又因为收紧左侧边界时必须 left = mid + 1
11 所以最后无论返回 left 还是 right,必须减一

如果以上内容你都能理解,那么恭喜你,二分查找算法的细节不过如此。

通过本文,你学会了:

    分析二分查找代码时,不要出现 else,全部展开成 else if 方便理解。

    注意「搜索区间」和 while 的终止条件,如果存在漏掉的元素,记得在最后检查。

    如需要搜索左右边界,只要在 nums[mid] == target 时做修改即可。搜索右侧时需要减一。

以后就算遇到其他的二分查找变形,运用这几点技巧,也能保证你写出正确的代码。

Guess you like

Origin www.cnblogs.com/mxj961116/p/11945444.html