List page paging collection and acquisition function
For setting list paging, the following settings are the most common and most commonly used.
Now I will teach you another way to get pagination, which is to automatically get the pagination through the paging acquisition function of the list page.
To use this function, the start page only needs to add the home page address as shown below:
The paging setting is in the "list paging acquisition" in the "multi-level URL acquisition" as shown below:
The above figure "Extract list paging URL from this area" is to find the start and end of paging in the source code, and the address contained in the middle is the paging address.
For the kind of pages that are all listed, you can set this step, but in many cases, the pages are not completely listed, and there will be an ellipsis in the middle instead of the following figure:
Now, for the two situations of listing all or not listing all of them, make a setting that is applicable to both. I have always used this method to obtain it, and almost all websites have been solved.
What is important for us is to find the characteristics of the current page source code. I use the list page http://news.qq.com/newsgn/zhxw/shizhengxinwen.htm to illustrate.
Let's take a look at the paging source code of the first page as shown below:
Take a look at the source code on the second page as shown below:
Then we no longer look at one page at a time, and look at the source code, I choose the fifth page as shown below:
Through the red mark, do you see the pattern? The current page is all <strong></strong> This code followed by an <a > is the address of the next page.
That is to say, we want to get the next page through the current page, so that we can get down one level at a time until all the pages are obtained.
Then the representation in the collector starts with <div class="pageNav">, I use (*) for whatever it is in the middle, and then I encounter the first <strong>(*)strong>, because the page number also changes So in the middle I use (*) to indicate changes.
Then to the first occurrence of </a> as the end, the middle contains the address of the next page.
And the paging address also has a rule <a href="http://news.qq.com/newsgn/zhxw/shizhengxinwen_6.htm"> Change is the page number, the change is replaced by the parameter, the other is unchanged, then we just get the changes
is enough.
The principle is like this. The pagination I have encountered has such a rule, the source code is definitely different, but the rule is the same. Here is the method! ! ! !
Write to the collector as shown below:
We can set how many pages to get through the "Maximum number of pages obtained" in the figure above, and 0 is to get all of them.
On the right side, we have set the "combination to generate list page paging", and the "automatic recognition of paging" in the above figure does not need to be checked . It is best not to check, sometimes it will make mistakes.
The screenshots above are all checked. The default is checked. After setting the rules, uncheck this check.