2019-05-30 broken million the total amount of reading

Jane's book is the first article in September 2017 No. 6, published, after 630 days, a total output of 186 articles. I never used any push each other, two days before Children's Day, the total amount of reading reached 100,0000+ ! (Either been blocked several articles, in fact, but also a few days ago T_T)

Because Jane books do not show the total amount of reading, so wrote a small Python reptile to count the total amount of reading (as long as the replacement user_id, you can figure out the total amount of reading any author):

# 抓取简书博文总阅读量
# 我的主页:https://www.jianshu.com/u/130f76596b02

import re
import requests
from lxml import etree


header = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 '
                      '(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
    }

def get_all_article_links(user_id):
    links_list = []
    i = 1
    switch = True
    while switch:
        url = 'https://www.jianshu.com/u/{}?order_by=shared_at&page={}'.format(user_id, i)
        response = requests.get(url,
                                headers=header,
                                timeout=10
                                )
        tree = etree.HTML(response.text)
        try:
            article_links = tree.xpath('//div[@class="content"]/a[@class="title"]/@href')
        except:
            pass
        for item in article_links:
            article_link = 'https://www.jianshu.com' + item
            print(article_link)
            if not article_link in links_list:
                links_list.append(article_link)
            else:
                switch = False
                break
        i += 1
    return links_list

def get_read_num(user_id):
    num_list = []
    links_list = get_all_article_links(user_id)
    for url in links_list:
        response = requests.get(url,
                                headers=header,
                                timeout=30
                                )
        content = response.text
        read_num_pattern = re.compile(r'"views_count":\d+')
        read_num = int(read_num_pattern.findall(content)[0].split(':')[-1])
        print(read_num)
        num_list.append(read_num)
    return num_list


if __name__ == '__main__':
    read_num_list = get_read_num(user_id='130f76596b02')
    print(read_num_list)
    print(sorted(read_num_list))
    print('Total reads =', sum(read_num_list))

The number of reading each article, as shown below:

[98, 308, 244, 205, 334, 528, 743, 131, 191, 438, 368, 754, 3901, 144, 234, 280, 468, 424, 1156, 549, 3043, 260, 464, 146, 135, 2960, 904, 3346, 85, 255, 2647, 1035, 875, 1119, 863, 469, 156, 1238, 637, 1329, 636, 1826, 1078, 362, 598, 1754, 1632, 761, 1011, 1640, 1591, 317, 1540, 689, 1116, 1062, 1791, 2176, 10573, 1774, 2340, 1197, 1606, 2806, 2168, 1680, 1896, 247, 3454, 571, 104, 147, 220, 1166, 180, 306, 1797, 829, 120, 333, 400, 2151, 96, 186, 232, 1425, 7985, 837, 201, 897, 584, 2584, 3940, 348, 8300, 16597, 229, 10810, 4055, 9930, 21782, 1367, 13142, 15105, 302, 18381, 647, 376, 137, 21397, 25279, 27036, 33929, 1133, 1266, 282, 1129, 17469, 34754, 64309, 149, 305, 1078, 672, 65754, 47316, 404, 72523, 208904, 231, 790, 55, 1377, 50161, 684, 166, 27, 771, 741, 1371, 435, 542, 1498, 1106, 4375, 3104, 182, 1961, 3416, 871, 1575, 343, 479, 333, 489, 204, 120, 370, 582, 1759, 38, 392, 798, 502, 410, 185, 271, 128, 228, 653, 447, 20, 47, 3051, 5275, 2105, 5201, 2795, 2515, 111, 2688, 3257, 11373, 2667, 9269, 6795]

7178691-412249e0977cc42f.png

Draw a map like this (if the read count in ascending order, the results with the power law law, great SEO features):

7178691-e1c568325a60f6a4.png

Fans had wanted to be a portrait of the user, but because there is no ready-made label Jane book, from the user information (gender, posted, Topic) extract the tag cost is too high, do not do it. Talk about their own subjective feelings it:

  1. Female fans than I expected (manual two Kazakh dog's head);
  2. Painting prose poem written a few fans (Jane book is not a simple programmer platform, thumbs up);
  3. There are a lot of people are a medical student and the letter (in fact, I am also very interested in biology and medicine, had almost called the North Hospital, and these friends wanted to exchange exchanges);
  4. Some people are concerned about the morning Beijing time point and point favorite, should be in a foreign country (Knowledge without borders, by the way despise recent IEEE cancel Huawei qualifications);
  5. There are a lot of fans who first concern is me, I explained that in order to see the article, Jane special registration book account (user acquisition costs 100 yuan to calculate, then save the new drawing tens of thousands and even hundreds of thousands to the cost of the book Jane )

Thank you for reading!

(Sometimes really busy, so some people did not respond to a message, please forgive me)

Guess you like

Origin blog.csdn.net/weixin_34062469/article/details/90773838