[Python] [Crawler] Crawling without anti-climbing pictures

This is the simplest crawler without any anti-climbing measures. Donors who go to the wrong door can take a detour~~~

One, the problem

Affected by the epidemic, all school courses adopt online teaching, and various courses have emerged. I want to find the QR codes of all the course groups at once for inquiries, what should I do?

2. Principle

1. Ascertain the link format of webpage picturesInsert picture description here

Open the website designated by the school to query the course QR code, find the image link returned by the http request corresponding to the image, and find that all the image links are http://xxx.cn/os/pic/+课程号-课序号+.jpgin the same format. Therefore, as long as the corresponding course number-course serial number is spliced ​​with it, the crawling can be completed.

2. Use the requests package to crawl relevant webpage pictures

The simplest crawler for the requests package:

import requests
r = requests.get(url)

After that, the storage work can be further completed.

Three, solve

#从csv读取所有课程号-课序号,并拼接成网页链接
import pandas as pd
csv_data = pd.read_csv('D:/.../999 project/5 testing_file/num.csv', encoding = 'ANSI')
urls = []
for i in csv_data['num']:
    urls.append("http://xxx.cn/os/pic/"+i+".jpg")

import requests
import os

root = "C://...//Desktop//lesson_QR_code/"
num = 0

#爬取课程二维码
for url in urls:
    num = num + 1
    path = root + url.split('/')[-1]
    try:
    	#如果不存在该目录,创建目录
        if not os.path.exists(root):
            os.mkdir(root)
        #如果文件不存在,爬取二维码并保存,若存在则打印“文件已存在”
        if not os.path.exists(path):
            r = requests.get(url)#实际上只有这一行是爬虫
            with open(path, 'wb') as f:
                f.write(r.content)#将二进制内容写入文件
                f.close()
                print(num,"文件保存成功")
        else:
            print(num,"文件已存在")
    except:
    	#如果发生异常,打印发生异常的课程号
        print(url.split('/')[-1])
        
print("爬取完毕")

Four, reflection

In fact, the course number-course number information can be manually copied and pasted from the educational administration system. The blogger does not currently log in to the educational administration system and solve the verification code problem. After the problem is solved, there is no need to manually copy and paste (if this problem is solved, you can use the code to grab the lesson...).

Guess you like

Origin blog.csdn.net/why_not_study/article/details/104592877