JavaScript implements simulated login

Source website: JavaScript implements simulated login – WhiteNight's Site

June 1, 2023

Recently, I was engaged in crawling the educational administration system of Shenzhen University of Technology, and made a note after I got it done.

In fact, after the simulated login is done, the rest is very simple, only the climbing table and the conversion to json are left to deal with.

Pre-preparation and puppeteer package

A high-level API to control headless Chrome over the DevTools Protocol

The official description above is in English, here is how to use it directly.

First import the required npm package, and then prepare the account number and password, as well as the Url of the login interface. Here is the html of the educational administration system of Shenzhen University of Technology as an example.

const puppeteer = require('puppeteer');
const fs = require('fs');

const loginUrl = 'https://auth.sztu.edu.cn/idp/authcenter/ActionAuthChain?entityId=jiaowu';
const targetUrl = 'https://jwxt.sztu.edu.cn/jsxsd/framework/xsMain.htmlx';
const username = 'xxxxxxxxxxx'; 
const password = 'xxxxxxxxxxx'; 

When ready, use the method in the puppeteer package to simulate the browser and open the page. Since the school educational administration system often requires an intranet, it is usually inaccessible when connected to a VPN or an external network environment. Therefore, it is recommended to use try-catch to receive exceptions for later debugging. At the same time, remember to try to access the login interface of the educational administration system first to see if it can be opened before conducting subsequent tests.

const puppeteer = require('puppeteer');
const fs = require('fs');

const loginUrl = 'https://auth.sztu.edu.cn/idp/authcenter/ActionAuthChain?entityId=jiaowu';
const targetUrl = 'https://jwxt.sztu.edu.cn/jsxsd/framework/xsMain.htmlx';
const username = 'xxxxxxxxxxx'; 
const password = 'xxxxxxxxxxx'; 

async function simulateLogin() {
    try {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();

        await page.goto(loginUrl);

    } catch (error) {
        console.error('登录失败:', error);
    }
}

simulateLogin();

Then think about what else is needed? Simulated login, to put it bluntly, is still "login".

General login process:

  • Open the login screen
  • Enter the account and password in the specified text box
  • Click the login button

More detailed login process:

  • Access the login interface and initialize the page according to html
  • Enter the account and password in the specified text box, and assign values ​​to the account and password of the current web page after the input is completed.
  • Click the login button to trigger a response event, and pass the account and password (most of the passwords will be encrypted, specifically DES or md5, you need to find related functions under the source code of the webpage) to the database for verification.

We have now completed the first step. The next step is to assign values ​​to the account and password of the current web page.

Directly right-click and "check" the text box to enter the account number and password, taking the educational administration system of Shenzhen University of Technology as an example

<input type="text" class="inputLogin leoPwd" name="j_username" id="j_username" value="工号" title="202x0020xxxx" onmouseover="this.title=this.value" onblur="inputOnblur(this); verCodeDisplay('/idp', 'j_username', '1', 'yzm1Bar', true)" maxlength="64">

What is "checked" here is the account input box, and its nameh and id can be found to be "j_username". It is easy to have a name or id, directly retrieve each element of the current page, and then assign it a value. The password is the same, just put the code directly

const puppeteer = require('puppeteer');
const fs = require('fs');

const loginUrl = 'https://auth.sztu.edu.cn/idp/authcenter/ActionAuthChain?entityId=jiaowu';
const targetUrl = 'https://jwxt.sztu.edu.cn/jsxsd/framework/xsMain.htmlx';
const username = 'xxxxxxxxxxx'; 
const password = 'xxxxxxxxxxx'; 

async function simulateLogin() {
    try {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();

        await page.goto(loginUrl);
 
        await page.type('input[name="j_username"]', username);
        await page.type('input[name="j_password"]', password);

    } catch (error) {
        console.error('登录失败:', error);
    }
}

simulateLogin();

mock login

Discuss whether you need to jump to the page

Now there is one final step - simulating a button click.

But do you really need to "click" a button? Obviously not. The button is only made to trigger the response event, which means that you don't need to "click" the button, just trigger its response event directly.

But since the puppeteer package is used, it is better to simulate clicking the button directly, which obviously saves a lot of trouble. The above only provides a feasible idea, because some events are not responded through <button>, in this case, you have to consider triggering the response event directly.

Of course, for the educational affairs of my school, clicking the button directly jumps to the educational affairs interface. Then all other operations are dynamically loaded, that is, all are loaded on the same url link. for example

  • Teaching interface: https://white-night.club
  • Class schedule interface: https://white-night.club

But let's say you're climbing your own school schedule like I am. But your class schedule page is separate from the login interface. The example is as follows

  • Teaching interface: https://white-night.club
  • Course schedule interface: https://white-night.club/courses

Then it can be divided into two situations. If your situation is the same as mine, you can write a jump or not. Although I personally recommend writing a jump just in case.

If it is the second case, then I strongly recommend writing a jump. Otherwise, you have to write a bunch of operations that simulate clicking the button. Moreover, the response events of some web pages are not triggered by the button component. In this case, you need to retrieve the html of the entire web page to "manually" trigger the response event. Obviously super troublesome. So it is better to jump directly to the class schedule interface.

Still take the teaching affairs of Shenzhen University of Technology as an example, the code is as follows

const puppeteer = require('puppeteer');
const fs = require('fs');

const loginUrl = 'https://auth.sztu.edu.cn/idp/authcenter/ActionAuthChain?entityId=jiaowu';
const targetUrl = 'https://jwxt.sztu.edu.cn/jsxsd/framework/xsMain.htmlx';
const username = 'xxxxxxxxxxx'; 
const password = 'xxxxxxxxxxx'; 

async function simulateLogin() {
    try {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();

        await page.goto(loginUrl);
 
        await page.type('input[name="j_username"]', username);
        await page.type('input[name="j_password"]', password);

        await page.waitForNavigation();

        // 如果想写跳转,保留这一行
        await page.goto(targetUrl);

        // 由于网速不同,建议自行设置等待时间以保证页面完全加载
        await page.waitForTimeout(2000);
    } catch (error) {
        console.error('登录失败:', error);
    }
}

simulateLogin();

The next step is to get the html of the class schedule and then continue with some string segmentation, regular expression processing, and save it in the json file after division.

Writer's note

When discussing with a group of friends, it was discussed that there is a button on the timetable interface of our school. After triggering, the timetable in excel form is directly saved locally. It is the most convenient way to obtain the timetable. If you don't need subsequent data processing, you can try to save it directly as excel.
However, this method has a problem with subsequent data processing. The saved path is the download path of the browser and cannot be customized. Even if it can be customized, it means that I need one more method to obtain the file path in subsequent operations. And automatically obtaining the path is super, super troublesome. 100 users have 99 different paths, including English and Chinese, custom and default.
Of course, users can manually select files through filepicker. But if you think about this kind of operation carefully, you will find that there will be many modifications or super troublesome places to implement in the future (for example, the naming rules for excel changed by the school; if the user imports repeatedly, keep the imported one, and then replace the last time The imported excel file is deleted; if the user opened the last imported file and hangs in the background, this file becomes "read-only", and you cannot delete it;).
Of course, it doesn't mean that it can't be achieved, but it is not recommended to do so. If I were a user, I would definitely like to have as few manual operations as possible.
So in the end, I chose to directly save the html of the schedule webpage and operate on the html. The html saved in this way is directly under the root directory of the JavaScript program, so it is very convenient to operate.

Tags: Javascript , crawlers

Guess you like

Origin blog.csdn.net/white_night_SZTU/article/details/130984301