The realization of paging collection of locomotive

List page paging collection and acquisition function


For setting list paging, the following settings are the most common and most commonly used.

clip_image004

Now I will teach you another way to get pagination, which is to automatically get the pagination through the paging acquisition function of the list page.

To use this function, the start page only needs to add the home page address as shown below:

clip_image006

The paging setting is in the "list paging acquisition" in the "multi-level URL acquisition" as shown below:

clip_image008

The above figure "Extract list paging URL from this area" is to find the start and end of paging in the source code, and the address contained in the middle is the paging address.

For the kind of pages that are all listed, you can set this step, but in many cases, the pages are not completely listed, and there will be an ellipsis in the middle instead of the following figure:

clip_image010

Now, for the two situations of listing all or not listing all of them, make a setting that is applicable to both. I have always used this method to obtain it, and almost all websites have been solved.

What is important for us is to find the characteristics of the current page source code. I use  the list page http://news.qq.com/newsgn/zhxw/shizhengxinwen.htm to illustrate.

Let's take a look at the paging source code of the first page as shown below:

clip_image012

Take a look at the source code on the second page as shown below:

clip_image014

Then we no longer look at one page at a time, and look at the source code, I choose the fifth page as shown below:

clip_image016

Through the red mark, do you see the pattern? The current page is all <strong></strong> This code followed by an <a > is the address of the next page.

That is to say, we want to get the next page through the current page, so that we can get down one level at a time until all the pages are obtained.

Then the representation in the collector starts with <div class="pageNav">, I use (*) for whatever it is in the middle, and then I encounter the first <strong>(*)strong>, because the page number also changes So in the middle I use (*) to indicate changes.
Then to the first occurrence of </a> as the end, the middle contains the address of the next page.

And the paging address also has a rule <a href="http://news.qq.com/newsgn/zhxw/shizhengxinwen_6.htm"> Change is the page number, the change is replaced by the parameter, the other is unchanged, then we just get the changes

is enough.

The principle is like this. The pagination I have encountered has such a rule, the source code is definitely different, but the rule is the same. Here is the method! ! ! !

Write to the collector as shown below:

clip_image018

We can set how many pages to get through the "Maximum number of pages obtained" in the figure above, and 0 is to get all of them.

On the right side, we have set the "combination to generate list page paging", and the "automatic recognition of paging" in the above figure does not need to be checked . It is best not to check, sometimes it will make mistakes.

The screenshots above are all checked. The default is checked. After setting the rules, uncheck this check.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324519259&siteId=291194637