[Python actual combat] Python collects university academic affairs system transcripts

foreword

In modern education, the educational administration system has become an important part of school management and teaching. However, due to various reasons, the transcripts of the educational administration system cannot be downloaded, which has brought us a lot of unnecessary trouble and distress. Therefore, the project of collecting transcripts from the educational administration system is of great significance.

Table of contents

foreword

environment use

module use

module introduction

Code

send request

retrieve data

save data

Summarize


environment use

  • python 3.9
  • pycharm

module use

  • requests

module introduction

  • requests

        requests is a very practical Python HTTP client library. It is often used when crawlers and test servers respond to data. requests is a third-party library in Python language, which is specially used to send HTTP requests. It is much simpler to use than urllib.

  • parcel

        parsel is a python third-party library, which is equivalent to css selector + xpath + re.

Parsel is developed by the scrapy team. It extracts the parsel in scrapy independently. It can easily parse html and xml content and obtain the required data.

Compared with BeautifulSoup, xpath and parser are more efficient and easier to use.

  • re

        The re module is python's unique module for matching strings. Many functions provided in this module are implemented based on regular expressions, and regular expressions perform fuzzy matching on strings and extract the string parts you need. All languages ​​are common.

  • os

        os is the abbreviation of "operating system". As the name suggests, the os module provides an interface for various Python programs to interact with the operating system. By using the os module, on the one hand, it can easily interact with the operating system, and on the other hand, it can greatly enhance the portability of the code.

  • csv

        It is a file format, commonly known as a comma-separated value file, that can be opened with Excel software or a text document. The data fields are separated by half-width commas (other characters can also be used), and when opened with Excel, commas will be converted into separators. The csv file stores tabular data in plain text and is compatible with various operating systems.

Module installation problem:

  • If installing python third-party modules:

win + R, enter cmd and click OK, enter the installation command pip install module name (pip install requests) and press Enter

Click Terminal (terminal) in pycharm to enter the installation command

  • Reason for installation failure:
  • Fail one: pip is not an internal command

                Solution: set environment variable

  • Failure 2: There are a lot of red reports (read time out)

                Solution: Because the network link times out, you need to switch the mirror source

   

    清华:https://pypi.tuna.tsinghua.edu.cn/simple
    阿里云:https://mirrors.aliyun.com/pypi/simple/
    中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
    华中理工大学:https://pypi.hustunique.com/
    山东理工大学:https://pypi.sdutlinux.org/
    豆瓣:https://pypi.douban.com/simple/
    例如:pip3 install -i https://pypi.doubanio.com/simple/ 模块名
  • Failure three: cmd shows that it has been installed, or the installation is successful, but it still cannot be imported in pycharm

                Solution: There may be multiple python versions installed (anaconda or python can only install one), just uninstall one, or the python interpreter in your pycharm is not set properly.

Code

We introduced in the previous article how to collect emotional audio with Python. Today, let's learn to collect the transcripts in the educational administration system and collect our own grades. You can also try to collect the educational administration system of your own school.

send request

We first determine our target URL, the data we need to obtain.

 

We need to get the data of each row, and we will use the developer tools next. Let's see where the results are. Is it in the source code of the web page. Next, we send a request to get the source code of the web page.

Each school's educational administration system is different, but the principle is the same. Through packet capture analysis, we can see that our school puts grades in one data packet, one data packet for each semester.

 http://jwxt.aqnu.edu.cn/student/for-std/grade/sheet/info/73127?semester={Semester}'

Then, the next step is simple, we only need to request data, of course, we must add cookies, after all, it contains our login information.

Let's see how the previous code is written.

semesters = ["44", "45", "46", "66", "126", ]
for Semester in semesters:
    url = f'http://jwxt.aqnu.edu.cn/student/for-std/grade/sheet/info/73127?semester={Semester}'

    headers = {
        'Cookie': 'cookies',       
    }
    res = requests.get(url, headers=headers)
    print(url, res)

We directly traverse multiple pages here. Our school only checks cookies, and there is no requirement for request headers. Because cookies involve login information, I won't show them here. You can use your own cookies when collecting your school's educational administration system.

retrieve data

id2semesters = res.json()['id2semesters']
semester = id2semesters[f'{Semester}']['nameZh']
semesterId2studentGrades = res.json()['semesterId2studentGrades'][f'{Semester}']

This code first fetches the semester and student grade point information from the JSON response and stores it in the variables  semester and  semesterId2studentGrades .

for semesterId2studentGrade in semesterId2studentGrades:
    course = semesterId2studentGrade['course']  # 课程

    course_nameZh = course['nameZh']  # 课程名称
    credits = course['credits']  # 课程学分
    try:
        courseProperty = semesterId2studentGrade['courseProperty']
        courseProperty_name = courseProperty['name']
    except TypeError:
        courseProperty_name = "NOLL"

    gp = semesterId2studentGrade['gp']  # 绩点

    gaGrade = semesterId2studentGrade['gaGrade']  # 成绩
    gradeDetails = semesterId2studentGrade['gradeDetail']  # 明细原文
    gradeDetail = re.findall('data-typeid=.*?>(.*?)</span>', gradeDetails)

Then, use  for a loop to iterate over  semesterId2studentGrades each element in and use  course the attributes to get course information for that semester.

Here is the json value, there is no difficulty, as long as these are written, we can get the content we want, let's see the effect.

save data

Saving data is easy, and we've done this a lot.


f = open('个人成绩单.csv', mode='a', encoding='utf-8_sig', newline='')
csv_writer = csv.DictWriter(f, fieldnames=['学期','课程名称', '课程学分', '课程类型', '成绩', '学分绩点',
                                           '成绩明细'])

We  csv_writer.writerow() write data using methods. In this example, we write a list containing the semester, course title, course credits, course type, grades, credit points, and grade details.

The next step is to write into the dictionary and save it.

dit = {
    '学期': semester,
    '课程名称': course_nameZh,
    '课程学分': credits,
    '课程类型': courseProperty_name,
    '成绩': gaGrade,
    '学分绩点': gp,
    '成绩明细': gradeDetail,
}

csv_writer.writerow(dit)

This code uses  csv_writer.writerow() the method to  dit write the dictionary to the CSV file. fieldnames The parameter specifies the column name to write. In this example, we specified  ['学期','课程名称', '课程学分', '课程类型', '成绩', '学分绩点', '成绩明细'].

Summarize

In short, collecting the transcripts of the educational administration system is a very meaningful project practice. It can not only collect transcripts, but also improve our ability to collect data. During the implementation process, we need to pay attention to the accuracy and integrity of the data, and take necessary measures to ensure the safety and reliability of the project.

6adf31c8c5dd4e6a83314f4805b30bc1.jpg

Guess you like

Origin blog.csdn.net/BROKEN__Y/article/details/131116527