Data analysis Case - pull hook net jobs

1, Chinese import module configuration

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# 支持中文
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

2, from the acquired data csv read crawl

Acquisition Code: https://github.com/song-zhixue/lagou

data = pd.read_csv("./lagou_data.csv",sep = ',',encoding = 'gbk')
data.head()

3, data cleansing

Here I have only a simple cleaning is to remove null

- deduplication 
- The empty 
   - 1 Delete 
   - 2 Replace 
   - 3 is filled 
- to abnormal 
   - 1 Illegal data center should have such a numeric column or mixed with some Kanji symbol 
   - abnormality data unusually large or small numerical value
data.isnull () 
data.isnull (). the any () # According to statistics null column to see which column a null value
City False 
Company Name False 
corporate coding False 
About False 
company logo False 
company size False 
Published False 
region True 
degree False 
financing situation False 
type False 
nature False 
longitude True 
latitude True 
subway True 
welfare False 
Job Title False 
salary False 
Work Experience False 
job False 
dtype: bool
= data.dropna data ()   # default will delete the line containing the missing values of 
data

4, according to the city to draw a pie chart TOP10 recruitment

the Data [ " city " ] .value_counts () 

Beijing      258 
Shanghai      149 
Shenzhen      136 
Guangzhou       54 
Chengdu       48 
Hangzhou       31 
Wuhan       22 
Nanjing       12 
Chongqing        7 
Suzhou        5 
Tianjin        4 
Shijiazhuang       3 
Changsha        3 
Xiamen        3 
Xi'an        2 
Zhengzhou        2 
Qingdao        2 
Dalian        2 
Foshan        2 
Changchun        1 
Guiyang       1 
the Name: City, dtype: int64
Data = RET [ " city " ] .value_counts (). head (10) .plot (kind = ' PIE ' , autopct = ' % 1.2f %% ' , figsize = (10,8))   # take the first 10 results plotted 
RET 
plt.show

5, according to a histogram education

Data [ " degree " ] .value_counts () 

undergraduate     613 
college      73 
Any      50 
Master      . 11 
the Name: Education, DTYPE: Int64 

Data [ " degree " .] .value_counts () Plot (kind = ' bar ' ) 
plt.xticks (rotation = 0)

6, draw bar chart based on work experience

Data [ " working life " ] .value_counts ()

 . 3 - 5 years 317 
. 1 - 3 years 193 
5 - 10 years 90 
Any         79 
graduates      62 is 
one year or less        5 
more than 10 years       . 1 
the Name: working life, DTYPE: Int64 

Data [ " working life " ] .value_counts (). Plot (kind = ' Barh ' , = Color " Orange " )

 7, according to company size drawing pie and bar charts

Data [ " company size " ] .value_counts ()

 150-500 person 190 
50-150 181 
15-50 130 people 
more than 2,000       112 
500-2000 100 persons 
less than 15          34 is 
the Name: company size, DTYPE: Int64 

Data [ " company size " ] .value_counts (). Plot (kind = ' PIE ' , autopct = ' % 1.2f %% ' )

data["公司规模"].value_counts().plot(kind='barh',color="red")

 8, draw bar according to the financing situation

Data [ " financial situation " ] .value_counts () 

does not require finance     187 
A wheel        1 18 
B wheel        114 
listed company       96 
not finance        88 
angel round        55 
C wheel         54 is 
D and above wheel      35 
the Name: financing situation, DTYPE: Int64 

Data [ " Finance where " ] .value_counts (). Plot (kind = ' bar ' ) 
plt.xticks (rotation = 45)

9. The benefits drawn word cloud

# Use stutter for word 
Import jieba    
 # drawing word cloud 
Import wordcloud
 # custom word cloud background 
from the PIL Import Image 

Data [ " welfare " ] 
all_str = '' 
for I in Data [ " Benefits " ]: 
    all_str + = I
 # use jieba perform word 
LIS = jieba.lcut (all_str) 

TXT = "  " .join (LIS)
 # mask = np.array (Image.open ( "./ word cloud .jpg")) # custom background 
w =wordcloud.WordCloud ( 
    font_path = " msyh.ttc " , 
    width = 400 , 
    height = 400 , 
    BACKGROUND_COLOR = " White " ,
 #      the colormap = "Reds", 
#      mask = mask, 
#      contour_width =. 1, 
#      contour_color = "Red" 
) 
w.generate (TXT) 
w.recolor ()   # font color random word cloud 
# w.to_file ( "welfare .png") # the word cloud saved locally 
w.to_image ()    # view the generated word cloud

10, map visualization

Map worry-free use of a map visualization: https://www.dituwuyou.com/

Data [[ " longitude " , " latitude " ]]    # learn two degrees latitude and column data

the Data [[ " Longitude " , " latitude " ]]. to_csv ( " ./ map latitude and longitude .csv " , encoding = " GBK " )    # Export the map csv worry mapping

Export to csv style

- https://www.dituwuyou.com/orgs/321267/maps 
- account: xxxxxx 
- Password: xxxxxx

Guess you like

Origin www.cnblogs.com/songzhixue/p/11612911.html