Jane know almost as good as the book? After climbing 30W Python data, BI visual analysis, to tell you the answer

Domestic UGC platform, know almost be a leader, but also because it is too popular, resulting in a lot of other comprehensive platform are ignored. Then the other platforms are what it?

Take for instance the book Jane, this is a know almost similar and comprehensive platform, but because there is no such Terrier "in the United States, just off the plane", so he slowly been forgotten .....

What users have high-quality books on Jane? That thousands of fans on the number of how many big V, won praise million in the previous number? Read what the maximum number of articles? Gallery is popular and what most users welcome it?

First, access to data

Have to say, the data is Python definitely have to climb, we want to find the data crawling, simple code directly take the world ~

file

file

Due to the degeneracy of the official book of data protection and some limitations, only a single user get 900 followers (fans versa), and before 1900 or so articles. After crawling through Layer 2-3 data, a total of 261,277 users get information, specific data are: user name, home page url, whether it is signed on, the number of fans, number of praise, concerned about the number, article number, total number of words, etc. .

At the same time, but according to this 1916 article won praise DESC seen, ranked first article, was praised as number: 17076; ranked last of 488. Thus, the most popular book on Jane's article may have been acquired to (actually not).

file

file

Two, BI analysis

In general, then take a good number with Python, it is the data visualization.

When it comes to data visualization, it can be described as flourishing, front-end sector there has been a sudden array of third-party libraries: Highcharts, Echarts, Chart.js, D3.js and so on. However, the original aim: the need for a good knowledge of the code, but these products are not really open source.

Is there any way that we do this not for the white code?

This is what I want to say today, the BI, also known as business intelligence. Baidu search BI, roofing felt content from all over the floor, confusing. In fact, BI really doing very little good, but there are some domestic and foreign outstanding products.

Foreign representatives are Tableau, 157 billion dollars to be acquired, it's powerful enough to explain, but for the country, it does not apply:

  • Based on data query tool, real-time data analysis is still lacking
  • Price is very expensive (Tyrant Bypass), so agents are very poor after-sales service
  • Does not have the back-end data warehousing, claimed to be a memory BI, together with the actual hardware demanding for over ten million data analysis, must then be analyzed by means of a front-end data other ETL tools handle
  • We can not support the Chinese-style table-like complex

So I chose the domestic BI products FineBI, an enterprise-class data analysis software, the most important is that it Personal Edition free.

Following advantages:

  • Automatic modeling, modeling simple, model flexibility strong
  • Rich visualization and analysis operations distal end, the data can be visually drill, and multidimensional data analysis operations data slice rotation,
  • Built-in ETL, real-time data analysis, at the same time be able to do rapid processing of large data

Third, the data visualization

Above that, FineBI although enterprise-level data analysis software, but for the individual is free. Meanwhile, FineBI supports multiple data sources, different connection mode, the processing data is completely stress-free.

Then I just great and activate the Python climb out of the data into FineBI, we began a pleasant analysis.

file

1, signed on analysis

file

Now as an independent media platform, which aim is to become the person writing signed on. In this relatively high-quality 26w + users, a total of explicitly hanging "signed on" tag on 126 people's home.

This ratio can be said to be very small, simple book can explain how stringent the requirements for authors from the side.

file

Single more popular and contributed five articles, a total of 69 authors, also shows that writing is not easy.

2, user case fans

file

This is a class analysis diagram of the pyramid, which 26w + user in: the number of fans of greater than 10w + 5 people, are one in a million characters; each other the number of plug-gradient can not repeat them. It is worth mentioning that the number of the number of fans accounted for the largest range 10-100, was 40.38%, rather than the user 1 0 powder or powder, which further illustrate the present times this crawling more high-quality data.

3, analysis of popular articles 24 hours

Up to 11 o'clock in the article, I feel very strange, like a small transparent as publish articles in the evening, I had felt the night is the creation of good times, ah, 11 o'clock already belongs to dinner point, is it the morning of painstaking creation, one day morn, will complete the day's early writing tasks, one easily? Another 24 hours and there are people who publish articles become hot.

file

4, read the number, the number of likes and comments

file

Popularity of an article directly reflected in the number of likes and comments a few years, from the map view is also true.

Original link: soft sail software https://www.toutiao.com/a6782840504510841348/

Published 38 original articles · won praise 1 · views 2175

Guess you like

Origin blog.csdn.net/wulishinian/article/details/105067677