Common Algorithms - Search Algorithms

    Search : In some data elements, the process of finding elements with the same data as a given keyword through a certain method is called search. The method of realizing the search is called the search algorithm .

    Search algorithms can be roughly divided into seven search algorithms :

  • sequential search
  • binary search
  • interpolation lookup
  • Fibonacci lookup
  • block search
  • tree table lookup
  • HashTable Lookup

    This article introduces the first four search algorithms.

1. Sequential search ( linear search)

    Sequential search is also called linear search, and sequential search belongs to primitive, exhaustive, and violent search algorithms. It is easy to understand, and the coding implementation is also simple. But when the amount of data is large, because the algorithm idea is simple and exhaustive, there is not much optimization design in the algorithm, and the performance will be very low.

Sequential lookup ideas:

1. The basic search algorithm for traversing and comparing queries on arrays or lists according to the original order of the sequence.

2. For any sequence and a given element, compare the given element with the elements in the sequence in turn until the element that is the same as the given keyword is found, or all the elements in the sequence are compared with it.

time complexity:

Problem size: the length of the list (n)

O (1) : best case when the element being looked up is first in the list

O ( n ) : iterate over all elements of the list, worst case

The complete program (function) of the sequential search algorithm is as follows:

def linear_search(li, val):
    for ind, v in enumerate(li): # 枚举列表的索引和值
        if v == val:
            return ind, li[ind]
        else:
            return None

Two, binary search ( half search)

    Binary search is also called binary search (Binary Search), which is a more efficient search method. Binary search is an ordered search. The so-called ordered search means that the searched sequence must be ordered. Therefore, the binary search requires that the linear table must adopt a sequential storage structure, and the elements in the table are arranged in order according to the keywords.

Binary search idea:

    First, assuming that the elements in the table are arranged in ascending order, compare the keywords recorded in the middle position of the table with the search keywords. If the two are equal, the search is successful; otherwise, the table is divided into two sub-tables, front and rear, by using the records in the middle position. If the key recorded in the middle position is greater than the search key, the previous sub-table is further searched, otherwise the latter sub-table is further searched. Repeat the above process until a record meeting the condition is found, making the search successful, or until the child table does not exist, at this time the search is unsuccessful.

    A sequential storage structure must be used. Must be ordered by keyword size.

Time complexity: O (log n )

Method 1: The search algorithm is designed as a function

Example 1 Design a binary search function, from the list a = [80, 58, 73, 90, 31, 92, 39, 24, 14, 79, 46, 61, 31, 61, 93, 62, 11, 52, 34, 17] to find 31.

def binary_search(li, val):        # 二分查找函数
    left = 0
    right = len(li) - 1
    while left <= right:           # 候选区有值
        mid = (left + right) // 2
        if li[mid] == val:         # 如果循环都结束了,还没找到那就返回None
            return mid, li[mid]
        elif li[mid] > val:        # 待查找的值在mid左边
            right = mid - 1
        else:                      # 待查找的值在mid右边
            left = mid + 1

a=[80,58,73,90,31,92,39,24,14,79,46,61,31,61,93,62,11,52,34,17]  # 原始数据
a.sort()                           # 排序(默认升序)
jieguo = binary_search(a,31)       # 调用函数(a列表中找31)
print(jieguo)                      # 输出结果

Results of the:

(4, 31)

Method 2: Search algorithm without function

Example 2 Design a binary search algorithm (without functions), from the list a = [80, 58, 73, 90, 31, 92, 39, 24, 14, 79, 46, 61, 31, 61, 93, 62, 11, 52, 34, 17] to find 11.

a = [80, 58,73,90,31,92,39,24,14,79,46,61,31,61,93,62,11,52,34,17]  # 原始数据
a.sort()                               # 排序(默认升序)
left, right = 0, len(a) - 1
val = 11                               # a列表中找val
binary_search = None
while left <= right:                   # 候选区有值
    mid = (left + right) // 2
    if a[mid] == val:                  # 如果循环都结束了,还没找到那binary_search的初值None
        binary_search = mid, a[mid]
        break                          # 如果找到,结果赋给binary_search,并中断退出循环
    elif a[mid] > val:                 # 待查找的值在mid左边
        right = mid - 1
    else:                              # 待查找的值在mid右边
        left = mid + 1
print(binary_search)

Results of the:

(0, 11)

    A similar algorithm is dichotomy , which is generally used mathematically to approximate solutions to equations . For the function y=f(x) that is continuous on the interval [a,b] and f(a)·f(b)<0, by continuously dividing the interval where the zero point of the function f(x) is located into two, The method of making the two endpoints of the interval gradually approach the zero point to obtain the approximate value of the zero point is called bisection.

Example 3: Find an approximate solution (accurate to 0.00001) of the equation x²-2x-1=0 by dichotomy.

Solution: Let f(x)= x²-2x-1

First draw a simplified diagram of the graph of the function. (As shown in Figure 1)

figure 1 

    Because f(2)=-1<0, f(3)=2>0, so within the interval (2, 3), the equation x²-2x-1=0 has a solution, denoted as x1. The mathematical solution to the equation is:

The procedure to find one of the approximate solutions of the equation x²-2x-1=0 by dichotomy is as follows:

# 二分法求方程x²-2x-1=0的解
def f(x):                           # 输入x求f(x)
    return x**2-2*x-1

left = 2                            # 左区间f(left)=-1<0
right = 3                           # 右区间f(right)=2>0,[2,3]内有解
e = 0.00001                         # 精度要求
mid=(left+right)/2
while abs(f(mid))>e:                # 此处采用f(mid)来判断是否满足误差要求
    if f(left)*f(mid)<0:            # 说明区间[left,mid]内有解
        right = mid
    else:                           # 说明解在区间[mid,right]内
        left = mid
    mid = (left + right) / 2
print(mid)                          # 方程的解
print(f(mid))                       # 验证解的正确性

Results of the:

3. Interpolation search ( interpolation search)

    Interpolation search , also known as interpolation search , the English name is Interpolation Search .

    Interpolation search is similar to binary search and is an improvement of binary search. But it is partitioned by predicted index value instead of simple binary.

    First consider a new question, why must the binary search algorithm be divided into two points instead of a quarter or more points?

    For example, when looking up a dictionary, if you look up "Zhang", you will look up next to the word "gong", and look up from the side of the word "gong", and you will not turn from the beginning of the dictionary to the end of the dictionary.

    Similarly, if you want to search for 5 in a list with 100 elements that range from 1 to 10,000 and are uniformly distributed from small to large, you will naturally consider starting from the list with a smaller index.

     After the above analysis, the binary search method is not adaptive (that is to say, it is foolish). The search point in binary search is calculated as follows:

mid=(left+right)/2, 即

mid=left+(right-left)/2= left+1/2*(right-left)                             ①

  By analogy, we can improve the search point as follows:

mid=left+(key-a[left])/(a[right]-a[left])*(right-left)                    ②

In the formula: a is a list (or array), and key is the keyword to be found.

    That is to improve the ratio parameter 1/2 of formula ① to adaptive (key-a[left])/(a[right]-a[left]), see formula ②. According to the position of the keyword in the entire ordered list, the change of the mid value is closer to the keyword key, which indirectly reduces the number of comparisons.

    Basic idea: Based on the binary search algorithm, the selection of search points is improved to adaptive selection, which can improve the search efficiency. Of course, interpolation search is also an ordered search.

Time complexity: O (log n )

Example 4 Use interpolation to find tgt from the list lst. The complete procedure is as follows:

The complete procedure for standard interpolation lookup is as follows:

def interpolation_search(lst, tgt):	# 定义插值查找函数
    left = 0						# 第一个lst元素的索引为0
    right = len(lst) - 1			# 最后一个lst元素的索引为元素个数-1
    found = False					# found=False表示还没找到
    while left <= right:
        mid = left + int((tgt-lst[left])/(lst[right]-lst[left])*(right-left))
        if lst[mid] == tgt:			# 第一种情况,mid对应的元素= tgt,找到退出循环
            found = True
            break
        else:
            if tgt < lst[mid]:		# 第二种情况,mid对应的元素>tgt,调右边界
                right = mid - 1
            else:					# 第三种情况,即mid对应的元素< tgt,调左边界
                left = mid + 1
    return found					# 返回查找结果

a = [80,58,73,90,31,92,39,24,14,79,46,61,31,61,93,62,11,52,34,17]	# 定义a
a.sort()
tgt1 = 84							# 定义要查找的目标元素1
print(interpolation_search(a, tgt1))	# 运行插值查找(从列表a查找tgt1)
lst2 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] 		# 定义lst2
tgt2 = 5							# 定义要查找的目标元素2
print(interpolation_search(lst2, tgt2))	# 运行插值查找(从列表lst2查找tgt2)

Results of the:

Crash (dead loop)

    After testing: (1) If the element in the list is searched, it is normal. (2) If the search is not an element in the list, a crash (infinite loop) may occur.

    After analyzing and checking the reason, it is found that when tgt<lst[left], there will be a crash (dead loop).

The complete program of the improved version is as follows:

def interpolation_search(lst, tgt):	# 定义插值查找函数
    left = 0						# 第一个lst元素的索引为0
    right = len(lst) - 1			# 最后一个lst元素的索引为元素个数-1
    found = False					# found=False表示还没找到
    if tgt < lst[left] or tgt > lst[right]:		# 查找值超范围,直接返回没找到
        return found
    
    while left <= right and tgt >= lst[left]:	# tgt >= lst[left]保证找不到时不“死循环”
        mid = left + int((tgt-lst[left])/(lst[right]-lst[left])*(right-left))
        if lst[mid] == tgt:			# 第一种情况,mid对应的元素= tgt,找到退出循环
            found = True
            break
        else:
            if tgt < lst[mid]:		# 第二种情况,mid对应的元素>tgt,调右边界
                right = mid - 1
            else:					# 第三种情况,即mid对应的元素< tgt,调左边界
                left = mid + 1
    return found					# 返回查找结果

a = [80,58,73,90,31,92,39,24,14,79,46,61,31,61,93,62,11,52,34,17]	# 定义a
a.sort()							# 排序
tgt1 = 84							# 定义要查找的目标元素1
print(interpolation_search(a, tgt1))			# 运行插值查找(从列表a查找tgt1)
lst2 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] 		# 定义lst2
tgt2 = 5							# 定义要查找的目标元素2
print(interpolation_search(lst2, tgt2))			# 运行插值查找(从列表lst2查找tgt2)

Results of the:

Fourth, Fibonacci search

    Before introducing the Fibonacci search algorithm, let's introduce a concept that is closely related to it and is well-known to everyone-the golden ratio.

    The golden ratio, also known as the golden section, refers to a certain mathematical proportional relationship between the parts of things, that is, the whole is divided into two, and the ratio of the larger part to the smaller part is equal to the ratio of the whole to the larger part, and its ratio is about 1: 0.618 or 1.618:1.

    0.618 is recognized as the most aesthetically significant proportional figure. The role of this value is not only reflected in art fields such as painting, sculpture, music, and architecture, but also plays a role that cannot be ignored in management and engineering design. Hence it is called the golden section.

    Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...(starting from the third number, each subsequent number is the sum of the previous two numbers ). Then, as the Fibonacci sequence increases, the ratio of the two numbers will get closer and closer to 0.618. Using this feature, we can apply the golden ratio to the search technology.

figure 2 

    Basic idea: It is also an improved algorithm of binary search. By using the concept of golden ratio to select search points in the sequence for search, the search efficiency is improved. Similarly, Fibonacci search is also an ordered search algorithm.

    Fibonacci search is very similar to binary search. It divides the ordered list according to the characteristics of the Fibonacci sequence. He requires that the number of records in the start table be 1 less than a certain Fibonacci number, and n=F(k)-1;

 Start to compare the k value with the record at the F(k-1) position (and mid=left+F(k-1)-1), and the comparison results are also divided into three types

  1) lst[mid] = tgt, the element at the mid position is the desired one

  2)lst[mid] < tgt,left=mid+1,k-=2;

    Explanation: left=mid+1 indicates that the element to be searched is in the range of [mid+1,right], k-=2 indicates that the number of elements in the range [mid+1,right] is n-(F(k-1 ))= F(k)-1-F(k-1)=F(k)-F(k-1)-1=F(k-2)-1, so Fibonacci can be applied recursively find.

  3)lst[mid] > tgt,right=mid-1,k-=1。

    Explanation: left=mid+1 indicates that the element to be searched is in the range of [left, mid-1], k-=1 indicates that the number of elements in the range [left, mid-1] is F(k-1)-1 , so the Fibonacci search can be applied recursively.

    Complexity analysis: In the worst case, the time complexity is O (log n ), and its expected complexity is also O (log n ).

    Fibonacci search is to use the value of the Fibonacci sequence as the segmentation point, and then perform binary search. Compared with ordinary binary search, the performance is better.

Example 5 Use Fibonacci to search for tgt from the list lst.

The complete procedure is as follows:

fib = lambda n: n if n < 2 else fib(n-1) + fib(n-2) # 斐波那契函数

def fib_search(lst, x):
    if len(lst) == 0:
        return '列表是空的'
    left = 0
    right = len(lst) - 1
    # 求key值,fib(key)是大于等于lst的长度的,所以使用时要用key-1
    key = 0
    while fib(key) < len(lst):
        key += 1
    while left <= right:
        # 当x在分隔的后半部分时,fib计算的mid值可能大于lst长度
        mid = min(left + fib(key - 1) - 1, len(lst) - 1)
        if x < lst[mid]:
            right = mid - 1
            key -= 1
        elif x > lst[mid]:
            left = mid + 1
            key -= 2
        else:
            return mid
    return "没有找到"

a = [80,58,73,90,31,92,39,24,14,79,46,61,31,61,93,62,11,52,34,17]	# 定义a
a.sort()								# 排序(升序)
tgt1 = 84								# 定义要查找的目标元素1
print(fib_search(a, tgt1))
tgt2 = 62								# 定义要查找的目标元素1
print(fib_search(a, tgt2))

Results of the:

Guess you like

Origin blog.csdn.net/hz_zhangrl/article/details/129600164