Vernacular word cloud map making

I have been working in the data industry for many years, repeatedly from the front-end UI to the middle service layer, and finally even the database storage. I have been deeply attracted by the database and database application architecture.

When interviewing for many data positions before, the question that was repeatedly questioned by the interviewer was, "Do you like to do ETL, data modeling, or data visualization?"

In fact, this question is not at all level in my opinion. If you are in the data industry, how can you say that you are particularly partial to a certain subject? Isn’t it a pity that there are so many wonderful technologies and fascinating problems that you don’t have to do all of them yourself.

Don't let mediocre work limit your imagination, eh!

I have written a CSDN blog for many years, with nearly 30W+ visits. In the past two days, I suddenly wanted to crawl aside. Which articles and technologies are you interested in? Hence the following story.

As a panacea, Python has penetrated into all fields of computer software, and it has made such a kick in every slot that can be inserted, especially in the field of data analysis, it can be said that "life is too short, use Python quickly". Quickly open a crawler , Crawling a simple website (CSDN is not easy, hey, save some face for Meng Yan), that is, something that can be done in 3-5 hours. Store the results in MySQL.

Simply filter out the posts with more than 1K+ pageviews, do word frequency statistics on them, and put them into the word cloud tool for analysis. Text mining seems to be interesting in terms of word frequency. Only with word frequency can it be extended to make related recommendations.

Tried the word cloud tool with high votes on Zhihu. Time is an effective filter, helping us to extract many excellent products. There are so many tools that can finally be seen by the eyes (based on the effectiveness of visualization and the ease of use to get started), it took an hour, and there are three that can be used:

Tuyue:

Very, very quick and easy to use word cloud tool. Simple enough to make you feel outrageous, as long as you paste your text, it will immediately show your customized word cloud. If there is no special requirement for watermark and exquisiteness, this one is enough. In the process of using it, the unsatisfactory thing is that for articles that are mixed in Chinese and English, the English will be omitted directly, as shown in the following figure:

write picture description here

BlueMc:

Blue cursor. The word cloud tool provided by this company is very powerful (his boss is also very strong and very sentimental!) As individuals, we can apply for a free trial of his product. This tool supports Chinese and English mixing. Good things need grinding. This tool will tell you that you need to wait before generating a word cloud. It's very considerate. The generated renderings, the default color system is very warm, I infer that the product manager is a good MM.

write picture description here
R:

That's right, it's an analytical tool, and the big R that competes with SAS! There should be applause here, and you are also welcome to reward me! After reading the following, you can also generate the same word cloud map, or the R version, and the effect is completely self-controllable. There is a world of difference between modifying a sports car and driving a Mercedes-Benz.

Steps to make a word cloud using R:

1 participle

2 Word frequency statistics

3 Drawing

  1. Participle:

There are various word segmentation algorithms, and there are public packages and private packages that can be used. Packages like JiebaR are very easy to use and have a high degree of support for Chinese. And there is a qseg that does not seem to support Chinese well, and it has not been fixed for a long time.

JieBaR has a branch on Github, and the documentation is very clean: http://qinwenfeng.com/jiebaR/

install.packages(“jiebaR”)

library(jiebaR)

mySeg=worker()

mySeg$bylines = FALSE

texts = readLines(“G:\SideProjects\all_titles.txt”,encoding=”UTF-8”)

newSegResult = segment(texts,mySeg)

merged =sapply(newSegResult,function(x){ paste(x,collapse =” “) } )

Note: First introduce the package jiebaR, and then instantiate the segment to split and merge the text

2 Word frequency statistics:

Use freq to count the results of word segmentation and merge. Only the calculation results with good statistics can be recognized and displayed by the word cloud visualization program.

freq(merged)

Note: freq is the statistics of word frequency of word segmentation

3 Drawings:

wordcloud2 : https://github.com/lchiffon/wordcloud2

First install the devtools library

library(devtools)

devtools::install_github(“lchiffon/wordcloud2”)

wordcloud2(freq(merged),size = 1, shape=’star’)

write picture description here
Compared with R, it is more powerful, more adaptable, compatible with various languages, and can also set filter words (the word "de" can actually be filtered out through configuration)


Welcome to the WeChat public account [about SQL], join the group to discuss technology

write picture description here

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327045826&siteId=291194637