"Want to learn Python crawler series" Introduction to the use of chrome in crawlers

learning target

  1. Understand the purpose of creating a new incognito window

  2. Understand the use of network in chrome

  3. Learn how to find the login interface


1 Create a new incognito window

Open the website directly in the browser, it will automatically bring the cookie saved in the previous website, but the first time the page is obtained in the crawler does not carry the cookie, how to solve this situation?

Use an incognito window to open the website for the first time, without cookies, and be able to observe the acquisition of the page, including how the other party’s server sets the cookie locally

2 More functions of network in chrome

2.1 Perserve log

By default, after the page is redirected, the previous request URL address and other information will disappear, and the previous request will be retained after the perserve log is checked.

2.2 filter

When there are many url addresses, you can enter part of the url address in the filter to have a certain filtering effect on all url addresses. The specific location is at the position of 2 in the second picture above.

2.3 Observe specific types of requests

In the position of 3 in the second picture above, there are many options, which are selected by default all, that is, all kinds of requests will be observed

Many times you can choose allother options on the right for your own purposes , such as common options:

  • XHR: In most cases, it means an ajax request

  • JS:js request

  • CSS: css request

But many times we cannot guarantee what type of request we need, especially when we don’t know whether a request is an ajax request, just select it directly alland observe from the front to the back, where js, css, pictures, etc. are not observed. can

Don’t be scared by the bunch of requests in the browser. Except for js, css, image requests, there are not many other requests.

3 Find the login interface

Looking back at the previous crawlers of Renren.com, we found a login interface, so where did we find this interface?

http://www.renren.com

3.1 Find the URL address of the action pair

It can be found that this address is the URL address corresponding to the action in the login form. Reviewing the front-end knowledge points, it can be found that the address for the form submission, correspondingly, the submitted data, only needs:用户名的input标签中,name的值作为键,用户名作为值,密码的input标签中,name的值作为键,密码作为值即可

Thinking:

What can I do if there is no URL address corresponding to the action?

3.2 Finding the login URL address by capturing packets

By capturing the packet, you can find that there are parameters in the url address and the request body, such as uniqueTimestampsum rkeyand encryptedpassword

At this time, we can observe whether the login interface of the mobile version is the same

It can be found that in the mobile version, there are still parameters, but the number of parameters is less. At this time, we can use the mobile version as a reference. The next section will learn how to analyze js


summary

  1. The main purpose of using an incognito window is to avoid the problem of carrying cookies when opening the website for the first time

  2. In chrome's network, the perserve log option can still observe the previous request after the page jumps.

  3. There are two ways to determine the login address:

    • Find the url address of the from form action

    • Get through packet capture

Guess you like

Origin blog.csdn.net/weixin_45293202/article/details/114003476