1, Chinese import module configuration
import pandas as pd import numpy as np from matplotlib import pyplot as plt # 支持中文 plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False
2, from the acquired data csv read crawl
Acquisition Code: https://github.com/song-zhixue/lagou
data = pd.read_csv("./lagou_data.csv",sep = ',',encoding = 'gbk') data.head()
3, data cleansing
Here I have only a simple cleaning is to remove null
- deduplication - The empty - 1 Delete - 2 Replace - 3 is filled - to abnormal - 1 Illegal data center should have such a numeric column or mixed with some Kanji symbol - abnormality data unusually large or small numerical value
data.isnull () data.isnull (). the any () # According to statistics null column to see which column a null value
City False Company Name False corporate coding False About False company logo False company size False Published False region True degree False financing situation False type False nature False longitude True latitude True subway True welfare False Job Title False salary False Work Experience False job False dtype: bool
= data.dropna data () # default will delete the line containing the missing values of data
4, according to the city to draw a pie chart TOP10 recruitment
the Data [ " city " ] .value_counts () Beijing 258 Shanghai 149 Shenzhen 136 Guangzhou 54 Chengdu 48 Hangzhou 31 Wuhan 22 Nanjing 12 Chongqing 7 Suzhou 5 Tianjin 4 Shijiazhuang 3 Changsha 3 Xiamen 3 Xi'an 2 Zhengzhou 2 Qingdao 2 Dalian 2 Foshan 2 Changchun 1 Guiyang 1 the Name: City, dtype: int64
Data = RET [ " city " ] .value_counts (). head (10) .plot (kind = ' PIE ' , autopct = ' % 1.2f %% ' , figsize = (10,8)) # take the first 10 results plotted RET plt.show
5, according to a histogram education
Data [ " degree " ] .value_counts () undergraduate 613 college 73 Any 50 Master . 11 the Name: Education, DTYPE: Int64 Data [ " degree " .] .value_counts () Plot (kind = ' bar ' ) plt.xticks (rotation = 0)
6, draw bar chart based on work experience
Data [ " working life " ] .value_counts () . 3 - 5 years 317 . 1 - 3 years 193 5 - 10 years 90 Any 79 graduates 62 is one year or less 5 more than 10 years . 1 the Name: working life, DTYPE: Int64 Data [ " working life " ] .value_counts (). Plot (kind = ' Barh ' , = Color " Orange " )
7, according to company size drawing pie and bar charts
Data [ " company size " ] .value_counts () 150-500 person 190 50-150 181 15-50 130 people more than 2,000 112 500-2000 100 persons less than 15 34 is the Name: company size, DTYPE: Int64 Data [ " company size " ] .value_counts (). Plot (kind = ' PIE ' , autopct = ' % 1.2f %% ' )
data["公司规模"].value_counts().plot(kind='barh',color="red")
8, draw bar according to the financing situation
Data [ " financial situation " ] .value_counts () does not require finance 187 A wheel 1 18 B wheel 114 listed company 96 not finance 88 angel round 55 C wheel 54 is D and above wheel 35 the Name: financing situation, DTYPE: Int64 Data [ " Finance where " ] .value_counts (). Plot (kind = ' bar ' ) plt.xticks (rotation = 45)
9. The benefits drawn word cloud
# Use stutter for word Import jieba # drawing word cloud Import wordcloud # custom word cloud background from the PIL Import Image Data [ " welfare " ] all_str = '' for I in Data [ " Benefits " ]: all_str + = I # use jieba perform word LIS = jieba.lcut (all_str) TXT = " " .join (LIS) # mask = np.array (Image.open ( "./ word cloud .jpg")) # custom background w =wordcloud.WordCloud ( font_path = " msyh.ttc " , width = 400 , height = 400 , BACKGROUND_COLOR = " White " , # the colormap = "Reds", # mask = mask, # contour_width =. 1, # contour_color = "Red" ) w.generate (TXT) w.recolor () # font color random word cloud # w.to_file ( "welfare .png") # the word cloud saved locally w.to_image () # view the generated word cloud
10, map visualization
Map worry-free use of a map visualization: https://www.dituwuyou.com/
Data [[ " longitude " , " latitude " ]] # learn two degrees latitude and column data
the Data [[ " Longitude " , " latitude " ]]. to_csv ( " ./ map latitude and longitude .csv " , encoding = " GBK " ) # Export the map csv worry mapping
Export to csv style
- https://www.dituwuyou.com/orgs/321267/maps - account: xxxxxx - Password: xxxxxx