One of the fastest search engines in Python: ThreadSearch (developed by myself) (abccdee1)

ThreadSearch uses a neat method to perform fast searches. It uses something called concurrent.futures in the Thread module (these two things come with Python and do not need to be downloaded), which allows multiple projects to run at the same time. This is the program code of the search module I developed:

ThreadSearch.py

import concurrent.futures
def preload(list, div_num):
  divided = []
  start_index = 0
  end_index = int(len(list)/div_num)
  for x in list:
    if list[start_index: end_index]:
      divided.append(list[start_index: end_index])
    start_index += int(len(list)/div_num)
    end_index += int(len(list)/div_num)
  return divided
out = []
def search(pres, obj):
  def find(list_index):
    global out
    list = pres[list_index]
    finded = [x for x in list if obj in x]
    out.append(finded)
  with concurrent.futures.ThreadPoolExecutor(max_workers=len(pres)) as executor:
    executor.map(find, range(len(pres)))
  return out

Here is a piece of test code:

Test.py

import random
import time
from ThreadSearch import *
def load_examples(num):
  alphabet = list('abcdefghijklmnopqrstuvwxyz')
  randomone = []
  nn = ''
  for x in range(num):
    for x in range(26):
      nn += alphabet[random.randint(0, 25)]
    randomone.append(nn)
    nn = ''
  return randomone
div_num = 10
obj = 'a'
print('loading examples')
a = load_examples(10000)
print('loading examples finished')
pres = preload(a, div_num)
x = search(pres, obj)
print(t2-t1)

Parameters of preload: 1. list: get the search list

                         2. div_num: How many segments to divide the list into

Parameters of search: 1. preloaded: get the list processed by preload

                        2. obj : Get the character/string to search for

                       3. exactly: Get whether the character/string is exactly the same as an item in preloaded, the original value is False

load_examples: I use it to randomly load a list, and the length of the list is related to num.

PS You can also print out x. But the effect. . .

 Next, let's test the speed of its search:

import random
import time
from ThreadSearch import *
def load_examples(num):
  alphabet = list('abcdefghijklmnopqrstuvwxyz')
  randomone = []
  nn = ''
  for x in range(num):
    for x in range(26):
      nn += alphabet[random.randint(0, 25)]
    randomone.append(nn)
    nn = ''
  return randomone
div_num = 10
obj = 'a'
print('loading examples')
a = load_examples(10000)
print('loading examples finished')
preloaded = preload(a, div_num)
t1 = time.time()
x = search(preloaded, obj)
t2 = time.time()
print(t2-t1)

We use the time difference before and after running the search function to determine the speed.

Run it:

 As you can see, it took only 0.03 seconds to search for 'a' among 100,000 items.

But if you use the traditional method. . .

Try it yourself, my computer's CPU is not good, and even when loading the random list, it can be red:

 You can also try to increase the length of the list to 100 million to try my program. If it is a bit slow, adjust the div_num.

Guess you like

Origin blog.csdn.net/walchina2017/article/details/125689587