First, the sorting algorithm
1. Swap variables
Swapping variables is much less hassle than in other languages.
var1 = 1
var2 = 2
var1,var2 = var2,var1
print(var1,var2)
2. Bubble sort
Since there are two layers of loops, the worst-case runtime complexity is O(n2).
# 声明数组
list = [25,21,22,24,23,27,26]
# 定义排序方法
def BubbleSort(list):
# Excahnge the elements to arrange in order
lastElementIndex = len(list)-1
for passNo in range(lastElementIndex,0,-1):
for idx in range(passNo):
if list[idx]>list[idx+1]:
list[idx],list[idx+1]=list[idx+1],list[idx]
return list
# 进行排序
InsertionSort(list)
3. Insertion sort
The basic idea of insertion sort is that in each iteration, a data point is removed from the dataset and then inserted in the correct position, which is why it is called an insertion sort algorithm.
def InsertionSort(list):
for i in range(1, len(list)):
j = i-1
next = list[i]
# Compare the current element with next one
while (list[j] > next) and (j >= 0):
list[j+1] = list[j]
j=j-1
list[j+1] = next
return list
4. Merge sort
The main feature of this algorithm is that its performance does not depend on whether the input data is sorted. Like MapReduce and other big data algorithms, the merge sort algorithm is also designed based on the divide and conquer strategy.
def MergeSort(list):
if len(list)>1:
mid = len(list)//2 #splits list in half
left = list[:mid]
right = list[mid:]
MergeSort(left) #repeats until length of each list is 1
MergeSort(right)
a = 0
b = 0
c = 0
while a < len(left) and b < len(right):
if left[a] < right[b]:
list[c]=left[a]
a = a + 1
else:
list[c]=right[b]
b = b + 1
c = c + 1
while a < len(left):
list[c]=left[a]
a = a + 1
c = c + 1
while b < len(right):
list[c]=right[b]
b = b + 1
c = c + 1
return list
5. Hill sort
Hill sort is not suitable for large datasets, it is used for medium datasets. Roughly speaking, it has fairly good performance on a list with up to 6000 elements, and it performs better if the parts of the data are in the correct order. In the best case, if a list is already sorted, it only needs to iterate over N elements once to verify the order, yielding the best performance of O(N).
def ShellSort(list):
distance = len(list) // 2
while distance > 0:
for i in range(distance, len(list)):
temp = list[i]
j = i
# Sort the sub list for this distance
while j >= distance and list[j - distance] > temp:
list[j] = list[j - distance]
j = j-distance
list[j] = temp
# Reduce the distance for the next element
distance = distance//2
return list
6. Selection sort
The worst time complexity of selection sort is O(N2). Note that its worst-case performance approximates that of bubble sort and should not be used to sort larger datasets. However, selection sort is still a better designed algorithm than bubble sort, and its average complexity is better than bubble sort due to the reduction in the number of swaps.
def SelectionSort(list):
for fill_slot in range(len(list) - 1, 0, -1):
max_index = 0
for location in range(1, fill_slot + 1):
if list[location] > list[max_index]:
max_index = location
list[fill_slot],list[max_index] = list[max_index],list[fill_slot]
return list
The search algorithm
1. Linear search
The simplest strategy for finding data is linear search, which simply goes through each element looking for a target.
def LinearSearch(list, item):
index = 0
found = False
# Match the value with each data element
while index < len(list) and found is False:
if list[index] == item:
found = True
else:
index = index + 1
return found
list = [12, 33, 11, 99, 22, 55, 90]
print(LinearSearch(list, 12))
print(LinearSearch(list, 91))
Linear search is a simple algorithm that performs an exhaustive search, and its worst time complexity is O(N).
2. Binary search
The premise of the binary search algorithm is that the data is ordered. The algorithm repeatedly divides the current list into two parts, keeping track of the lowest and highest two indices, until it finds the value it is looking for.
def BinarySearch(list, item):
first = 0
last = len(list)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if list[midpoint] == item:
found = True
else:
if item < list[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
list = [12, 33, 11, 99, 22, 55, 90]
sorted_list = BubbleSort(list)
print(BinarySearch(list, 12))
print(BinarySearch(list, 91))
Binary Search In each iteration, the algorithm divides the data into two parts. If the data has N items, it takes at most O(log N) steps to complete the iteration, which means the running time of the algorithm is O(log N).
3. Interpolation search
The basic logic of binary search is to focus on the middle part of the data. Interpolated lookup is more complex and uses the target value to estimate the approximate location of an element in an ordered array.
def IntPolsearch(list,x ):
idx0 = 0
idxn = (len(list) - 1)
found = False
while idx0 <= idxn and x >= list[idx0] and x <= list[idxn]:
# Find the mid point
mid = idx0 +int(((float(idxn - idx0)/( list[idxn] - list[idx0])) * ( x - list[idx0])))
# Compare the value at mid point with search value
if list[mid] == x:
found = True
return found
if list[mid] < x:
idx0 = mid + 1
return found
list = [12, 33, 11, 99, 22, 55, 90]
sorted_list = BubbleSort(list)
print(IntPolsearch(list, 12))
print(IntPolsearch(list,91))
If the data is not evenly distributed, the performance of the interpolation lookup algorithm will be poor, and the worst time complexity of this algorithm is O(N). If the data is fairly evenly distributed, the optimal time complexity is O(log(log N)).
4. Depth-first search
# 深度优先dfs算法
def dfs(aGraph, root):
stack = [root]
parents = {root: root}
path = list
while stack:
print ('Stack is: %s' % stack)
vertex = stack.pop(-1)
print ('Working on %s' % vertex)
for element in aGraph[vertex]:
if element not in parents:
parents[element] = vertex
stack.append(element)
print ('Now, adding %s to the stack' % element)
path.append(parents[vertex]+'>'+vertex)
return path[1:]
# 构造树
g = dict()
g['Amine'] = ['Wassim', 'Nick', 'Mike','Elena']
g['Wassim'] = ['Amine', 'Imran']
g['Nick'] = ['Amine']
g['Mike'] = ['Amine', 'Mary']
g['Elena'] = ['Amine']
g['Imran'] = ['Wassim', 'Steven']
g['Mary'] = ['Mike']
g['Steven'] = ['Imran']
#查找Amine
dfs(g,"Amine")