Among the many novels, Python tell you what this novel look good

Foreword

Text and images in this article from the network, only to learn, exchange, not for any commercial purposes, belongs to original author, if any questions, please contact us for treatment.

Author: Interesting Python

PS: If necessary Python learning materials can be added to a small partner click on the link below to obtain their own http://t.cn/A6Zvjdun

 

Page analysis

First, open the micro-channel reading, then pull down list you can see there are recommended, and display a total of 25 list, some only a few hundred of this list, the list there are some tens of thousands of books.

Open the "literary and artistic list", you can see a display of the 20 books of information, after the pull-down can easily find these books is information through AJAX to load.

More to the point, to get information about these books, just need to get ID and classification parameters maxIndex. But tests found that the contents of each category will only return 50 pages, that is, up to one thousand books of information. Well, if only the 25 categories of the list, to get data or a little less, so how to get more information?

Observant people can be found on the right side can select categories! As shown below:

However, the view of these elements and found that there is no display URL, as shown below:

But this does not mean that there is no way a global search to find the following picture:

CategoryId is this classification ID, which is the URL "bookListInCategory /" behind the content. As maxIndex, you can first set to 0, and then sends a request to obtain the total number of books in this classification of "totalCount", and then, depending on whether the total number of more than a thousand books to set the number of pages, you can get under this classification to be able to crawl all the URL. Analysis crawling through the preceding steps to get the book has been known as long as the classification ID, can send a request to obtain the total number of books, but also all the URL pages can be configured in the classified. How to get all that classify it? In front of a global search has found the time to book CategoryId classified information, as shown below:

  So we just need to request the page and then use the regular match CategoryId on the line! Then sends a request for each category, used to obtain the total number of books, and all URL structure under this classification. This part of the code as follows:

 

Goes through this process, the back is very simple, is to get the result of the request and resolve to. Print out the following program is running:

You can see the total number of links have 7091, then climbed to the information of how many books do? Because I use MongoDB saved, so open Robot3T view, a total of 141,137, results as shown below:

 

Drawing Analysis

Python familiar with all know, matplotlib Python is used in the most 2D graphics drawing library. But I recommend this one easy to use third-party libraries: pyecharts, this is a class library for generating Echarts charts, graphs generated more sophisticated, better visual effects, but it needs to be noted that the version 0.5 and version 1.0 of pyecharts Usage is different. Below is a transverse bar graphs generated by the library, and represent scores of the top ten of the top ten books, reading ten books amount and the total amount of reading:

Can be found high ratings the amount of reading books is not necessarily high, more often some amount of reading the novel network. Why is not how it seems now famous and lovable, and the network novels but let more people fascinated by it? Personal guess is novel in the world may be more able to meet the young man's fancy right now, real life exhausted, it will be more obsessed with the novel "Xanadu" it.

Guess you like

Origin www.cnblogs.com/python0921/p/12605194.html