Python script series - download Tsinghua open source dependency packages in batches

1. Script display

        1. Pipeline compilation process, execute apk --update add --no-cache xxx

        2.报错ERROR: xxx package mentioned in index not found (try 'apk update')

        3. There is a lack of dependent packages in the intranet environment. You need to download the corresponding packages from Tsinghua source, but you need to find them one by one according to the error report, and it is troublesome to click and download one by one.

        4. At the beginning, I planned to download all dependencies on the official website, but the number is too large, and frequent pulls are easy to reverse crawl. Instead, store it in E:\download\check.txt according to the error message, and the program will automatically identify the package name and download it to the corresponding directory

import urllib.request  # url request
import re  # regular expression
import os # dirs
import time

'''
url download URL
pattern regularized matching keywords
Directory download directory
'''

'''
1. Pipeline compilation process, execute apk --update add --no-cache xxx
2.报错ERROR: xxx package mentioned in index not found (try 'apk update')
3. There is a lack of dependent packages in the intranet environment. You need to download the corresponding packages from Tsinghua source, but you need to find them one by one according to the error report, and it is troublesome to click and download one by one.
4. At the beginning, I planned to download all dependencies on the official website, but the number is too large, and it is easy to crawl back after frequent pulls. Instead, store it in E:\download\check.txt according to the error message, and the program will automatically identify the package name and download it to the corresponding directory
'''

def BatchDownload(url, pattern, Directory):
    # Pull the request, simulate it as a browser to visit the website -> skip the anti-crawler mechanism
    headers = {'User-Agent',
               'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'}
    opener = urllib.request.build_opener()
    opener.addheaders = [headers]

    # Download from the official website Open the comment and get the content of the webpage
    # content = opener.open(url).read().decode('utf8')

    # Query directly according to the content of the error report, download comments from the official website
    with open("E:\download\check.txt", "r") as f: # open the file
        content = f.readlines() # read the file
        content = str(content) # Convert to str to facilitate regular matching
        content = content.replace(': ', '') # use two strings to get stuck keywords


    # Construct a regular expression to match the keyword pattern from the content
    raw_hrefs = re.findall(pattern, content, 0)

    #Official website download to open comments, set function to eliminate duplicate elements
    # The list is easy to traverse, set to deduplicate, choose according to the situation
    # hset = list(raw_hrefs)

    # Create a local folder to store dependent packages, depending on the situation
    dir_name = 'E:\download\main'
    if not os.path.isdir(dir_name):
        os.makedirs(dir_name)
    dir_name2 = 'E:\download\community'
    if not os.path.isdir(dir_name2):
        os.makedirs(dir_name2)


    # download link
    for href in raw_hrefs:
        # The reason for if else is to distinguish the special case of only one link
        if (len(raw_hrefs) > 1):
            # realhref = href.replace('href="', '') # The href field is only available when the official website is directly pulled
            realhref = href.replace('ERROR', '').replace('package', '')

            # main
            link = url + realhref + '.apk'
            filename = os.path.join(Directory, realhref + '.apk')

            # community
            url2 = "http://mirrors.tuna.tsinghua.edu.cn/alpine/v3.14/community/aarch64/"
            link2 = url2 + realhref + '.apk'
            Directory2 = 'E:\download\community'
            filename2 = os.path.join(Directory2, realhref + '.apk')

            # Because the download address of the dependent package is not unique, the dependency cannot be found and try other addresses
            try:
                print("Downloading", filename + '.apk')
                urllib.request.urlretrieve(link, filename)
            except IOError:
                print("main has no package, try community directory", filename2 + '.apk')
                urllib.request.urlretrieve(link2, filename2)
                print("community/" + realhref + ".apk" + "   OK!")
            else:
                print("main/" + realhref + ".apk" + "   OK!")
        else:
            link = url + href
            filename = os.path.join(Directory, href)
            print("Downloading", filename + '.apk')
            urllib.request.urlretrieve(link, filename)
            print("Downloaded successfully!")

        # No sleep interval, the website considers this behavior to be an attack, anti-reptile
        time.sleep(1)


# The official website directly downloads all dependencies
# BatchDownload('https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.14/community/aarch64/',r'\bhref\S*?.apk\b','E:\download')
# Download the corresponding dependencies according to the missing package prompt
BatchDownload('http://mirrors.tuna.tsinghua.edu.cn/alpine/v3.14/main/aarch64/', r'\bERROR\S*?package\b', 'E:\download\main')

2. Error message display

        Support to save the error message of missing dependencies to E:\download\check.txt, and the script will automatically match the package name (similar to skopeo-1.3.1-r2.apk) according to the regular pattern and download it to the corresponding directory E:\download\xxx\

3. Execution result display

Guess you like

Origin blog.csdn.net/weixin_39855998/article/details/128250268