Questionnaire survey batch simulation real people fill in crawler actual combat

foreword

Before the summer vacation, the school assigned social practice tasks and needed to fill out a questionnaire. This was a troublesome thing, and I dragged it until the start of school. Yesterday, under the sacred power of deadline, I finally started, but after I tried my best to browse around Taobao, I found that the average cost of filling out questionnaires is 1 yuan. Can this be tolerated? I can't bear it, so I have this article. The main research of this article is to simulate real people to fill out the questionnaire, see below for details.


grab bag

The questionnaire is divided into two types of questions, single-choice questions and multiple-choice questions. The screenshot of one of the questions is as follows:

insert image description here

The package captured when answering this question is as follows

Request header:

insert image description here

Request body:

insert image description here

Response body:

insert image description here

Now let's analyze the results of packet capture. In the request header, the x-uuid and cookie change. They will change after each questionnaire. In the request body, the id also changes after each questionnaire. , the answer represents the option I chose, the others remain unchanged, and the response body represents the content of the next question.

From the above analysis, we need to focus on x-uuid, cookie and id in the request body

get cookie

Let's get the cookie first. The first step is to find the package that sets the cookie, as follows:

insert image description here

There are no fields that need attention here, just make a direct call and get the response cookie, as follows:

insert image description here


Get x-uuid

After getting the cookie, you need to get the x-uuid, here you need to reverse the js code, first locate the generation location

insert image description here

insert image description here

It can be seen that the x-uuid is finally generated by the gl function, but two parameters are passed in, namely Vl(new Date) and Math.random(). Let me test it with the console first:

insert image description here

You can see that Vl(new Date) has obtained the current time and formatted it, Math.random() has obtained a random number, and gl(t,n) has encrypted the incoming data, now first Observe the result after encryption: 32-bit hexadecimal number, so my first guess is MD5, the time passed in is plaintext, and the random number is salt, let's verify the guess

insert image description here

I found that this is not right, but when I deleted the random number, luck came

insert image description here

It turns out that the gl function directly MD5s the time, so you can easily get the x-uuid, as follows:

insert image description here


get id

After the parameters of the request header are set, we need to obtain the id in the request body, where we can find its generation location in the captured package, as follows:

insert image description here

Well, I just need to initiate a request and get the id in the response body, the code is as follows:

insert image description here


Simulate real people to fill out the questionnaire

Now that all the fields we need to obtain have been obtained, we can simulate filling in the questionnaire below. This part mainly starts with the answer in the request body

Single choice:

insert image description here

Multiple choice:

insert image description here

After the disassembly and analysis above, the request body is actually not difficult to splice. The main problem is how the crawler program should choose options to have the uncertainty and rationality of a real person. It is actually not difficult, and it can be easily solved by using probability.

For example, to select men and women, the code is as follows:

insert image description here

In the above code, I generated an array from 0 to 99, and then randomly generated a number. Assuming that the probability of choosing a boy is 60%, then I only need to choose a boy when the random number is in the range of 0 to 60. Whether it is single choice or multiple choice, it is the same idea, but it should be noted that when multiple choices are made, each option is independent of each other, so it is necessary to generate a random number for each option and specify the probability separately. as follows:

insert image description here


output feedback result

Although we can now run the program to automatically fill in the questionnaire, we still need to know whether the filling is successful, and the key information is in the last package

insert image description here

So we only need to filter out this string, as follows:

insert image description here


running result

Now that the program is finished, let's fill in a 400-piece questionnaire to see the effect. ˆﻌˆ ა

insert image description here

insert image description here

perfect

Guess you like

Origin blog.csdn.net/weixin_56039202/article/details/126293209