urlretrieve not timed out, you need to be set up through the socket
socket.setdefaulttimeout(10)
But also the need to set up a connection pool for him, so the switch to direct requests to download the file
def download_file(self, url, filename): r = self.session.get(url, stream=True) with open(filename, 'wb') as f: for chunk in r.iter_content(chunk_size=512): if chunk: f.write(chunk)
Write native reptiles have problems can`t start new thread on your machine has no problem not found to leak out the storm on someone else's machine.
The reason is native thread does not destroy exit after the execution is complete, but into a sleeping state, leading to the final thread creation exceeds the maximum allowed. In fact, some of the acts by modifying the initialization of the Thread, the thread can be reused.
Or simply, using a thread pool to solve
from concurrent.futures.thread import ThreadPoolExecutor def thread_run(target, args_list, max_thread=12): with ThreadPoolExecutor(max_thread) as executor: for arg in args_list: executor.submit(target, arg)
Another problem is the Connection pool is full, discarding connection
Can be set as follows
session.mount(prefix='', adapter=HTTPAdapter(pool_connections=1, pool_maxsize=36, max_retries=1))
But still pool is full will appear in multiple threads. I maxsize threads than the number set slightly larger point, there is no warning, it could be my code is still hidden problems.
May also be related with the thread pool, thread pool temporarily did not see the source code, if such can be locked by semaphore
from threading import Semaphore class AA(): sem = Semaphore(12) ... def getHtml(): sem.acquire() session.get() sem.release()