Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

The main focus of data analysis job recruitment case, a simple analysis of the data

surroundings

win8, python3.7, pycharm, jupyter notebook

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

'''
遇到python不懂的问题,可以加Python学习交流群:1004391443一起学习交流,群文件还有零基础入门的学习资料
'''

         

text

1. Identify the purpose of analysis

For the latest job recruitment data analysis, including regional distribution, education requirements, experience requirements, salary levels and so on.

2. Data Collection

Here With reptile, crawling recruitment site recruitment information, and then analyze the relevant salary and recruitment requirements.

2.1 Analysis of the target site

Through the analysis of the target site, we need to determine the manner requested destination site, and page structure.

2.2 New scrapy project

1. Perform the following code at any path cmd command line window, for example, in: New zhaopin items under "D pythonTests" directory.

d:
cd D:pythonTests
scrapy startproject zhaopin

2. Upon completion of the project zhaopin created, the next step is in zhaopin project file folder in the new spider crawling the main program

cd zhaopin
scrapy genspider zhaopinSpider zhaopin.com

This completes the creation of the project zhaopin, start writing our program now.

2.3 Definitions items

Definition requires crawling jobs in items.py file.

import scrapy
from scrapy.item import Item, Field
class zhaopinItem(Item):
 # define the fields for your item here like:
 # name = scrapy.Field()
 JobTitle = Field() #职位名称
 CompanyName = Field() #公司名称
 CompanyNature = Field() #公司性质
 CompanySize = Field() #公司规模
 IndustryField = Field() #所属行业
 Salary = Field() #薪水
 Workplace = Field() #工作地点
 Workyear = Field() #要求工作经验
 Education = Field() #要求学历
 RecruitNumbers = Field() #招聘人数
 ReleaseTime = Field() #发布时间
 Language = Field() #要求语言
 Specialty = Field() #要求专业
 PositionAdvantage = Field() #职位福利

2.4 The main program written reptiles

Write reptiles in the main program file zhaopinSpider.py

import scrapy
from scrapy.selector import Selector
from scrapy.spiders import CrawlSpider
from scrapy.http import Request
from zhaopin.items import zhaopinItem
class ZhaoPinSpider(scrapy.Spider):
 name = "ZhaoPinSpider"
 allowed_domains = ['zhaopin.com']
 start_urls = ['https://xxxx.com/list/2,{0}.html?'.format(str(page)) for page in range(1, 217)]
 def parse(self, response):
 '''
 开始第一页
 :param response:
 :return:
 '''
 yield Request(
 url = response.url,
 callback = self.parse_job_url,
 meta={},
 dont_filter= True
 )
 def parse_job_url(self, response):
 '''
 获取每页的职位详情页url
 :param response:
 :return:
 '''
 selector = Selector(response)
 urls = selector.xpath('//div[@class="el"]/p/span')
 for url in urls:
 url = url.xpath('a/@href').extract()[0]
 yield Request(
 url = url,
 callback = self.parse_job_info,
 meta = {},
 dont_filter = True
 )
 def parse_job_info(self, response):
 '''
 解析工作详情页
 :param response:
 :return:
 '''
 item = Job51Item()
 selector = Selector(response)
 JobTitle = selector.xpath('//div[@class="cn"]/h1/text()').extract()[0].strip().replace(' ','').replace(',',';')
 CompanyName = selector.xpath('//div[@class="cn"]/p[1]/a[1]/text()').extract()[0].strip().replace(',',';')
 CompanyNature = selector.xpath('//div[@class="tCompany_sidebar"]/div/div[2]/p[1]/text()').extract()[0].strip().replace(',',';')
 CompanySize = selector.xpath('//div[@class="tCompany_sidebar"]/div/div[2]/p[2]/text()').extract()[0].strip().replace(',',';')
 IndustryField = selector.xpath('//div[@class="tCompany_sidebar"]/div/div[2]/p[3]/text()').extract()[0].strip().replace(',',';')
 Salary = selector.xpath('//div[@class="cn"]/strong/text()').extract()[0].strip().replace(',',';')
 infos = selector.xpath('//div[@class="cn"]/p[2]/text()').extract()
 Workplace = infos[0].strip().replace(' ','').replace(',',';')
 Workyear = infos[1].strip().replace(' ','').replace(',',';')
 if len(infos) == 4:
 Education = ''
 RecruitNumbers = infos[2].strip().replace(' ', '').replace(',',';')
 ReleaseTime = infos[3].strip().replace(' ', '').replace(',',';')
 else:
 Education = infos[2].strip().replace(' ', '').replace(',',';')
 RecruitNumbers = infos[3].strip().replace(' ', '').replace(',',';')
 ReleaseTime = infos[4].strip().replace(' ', '').replace(',',';')
 if len(infos) == 7:
 Language, Specialty = infos[5].strip().replace(' ',''), infos[6].strip().replace(' ','').replace(',',';')
 elif len(infos) == 6:
 if (('英语' in infos[5]) or ('话' in infos[5])):
 Language, Specialty = infos[5].strip().replace(' ','').replace(',',';'), ''
 else:
 Language, Specialty = '', infos[5].strip().replace(' ','').replace(',',';')
 else:
 Language, Specialty = '', ''
 Welfare = selector.xpath('//div[@class="t1"]/span/text()').extract()
 PositionAdvantage = ';'.join(Welfare).replace(',', ';')
 item['JobTitle'] =JobTitle
 item['CompanyName'] =CompanyName
 item['CompanyNature'] =CompanyNature
 item['CompanySize'] = CompanySize
 item['IndustryField'] = IndustryField
 item['Salary'] =Salary
 item['Workplace'] = Workplace
 item['Workyear'] =Workyear
 item['Education'] =Education
 item['RecruitNumbers'] = RecruitNumbers
 item['ReleaseTime'] =ReleaseTime
 item['Language'] = Language
 item['Specialty'] = Specialty
 item['PositionAdvantage'] = PositionAdvantage
 yield item

Csv file to save 2.5

Save to csv file via pipelines pipeline project

class Job51Pipeline(object):
 def process_item(self, item, spider):
 with open(r'D:DataZhaoPin.csv','a', encoding = 'gb18030') as f:
 job_info = [item['JobTitle'], item['CompanyName'], item['CompanyNature'], item['CompanySize'], item['IndustryField'], item['Salary'], item['Workplace'], item['Workyear'], item['Education'], item['RecruitNumbers'], item['ReleaseTime'],item['Language'],item['Specialty'],item['PositionAdvantage'],'
']
 f.write(",".join(job_info))
 return item

2.6 configuration setting

Set the user agent, download delay 0.5s, close cookie tracking, call pipelines

USER_AGENT = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
DOWNLOAD_DELAY = 0.5
COOKIES_ENABLED = False
ITEM_PIPELINES = {
 'job51.pipelines.Job51Pipeline': 300,
}

2.7 to run the program

New main.py file and execute the following code

from scrapy import cmdline
cmdline.execute('scrapy crawl zhaopin'.split())

Thus began crawling data, eventually crawling to more than 9,000 pieces of data, before analyzing the data, take a look at what kind of data is to enter the data overview link.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

3. Data Overview

3.1 to read data

import pandas as pd
df = pd.read_csv(r'D:aPythonDataDataVisualizationshujufenxishiJob51.csv')
#由于原始数据中没有字段, 需要为其添加字段
df.columns = ['JobTitle','CompanyName','CompanyNature','CompanySize','IndustryField','Salary','Workplace','Workyear','Education','RecruitNumbers', 'ReleaseTime','Language','Specialty','PositionAdvantage']
df.info()

Throws an exception: UnicodeDecodeError: 'utf-8' codec can not decode byte 0xbd in position 0: invalid start byte

Solutions; use Notepad ++ coding will be converted to utf-8 bom format

After the conversion, perform again

抛出异常: ValueError: Length mismatch: Expected axis has 15 elements, new values have 14 elements

解决办法: 在列表['JobTitle..... PositionAdvantage ']后面追加' NNN ', 从而补齐15个元素.

追加之后, 再次执行, 执行结果为:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9948 entries, 0 to 9947
Data columns (total 15 columns):
JobTitle 9948 non-null object
CompanyName 9948 non-null object
CompanyNature 9948 non-null object
CompanySize 9948 non-null object
IndustryField 9948 non-null object
Salary 9948 non-null object
Workplace 9948 non-null object
Workyear 9948 non-null object
Education 7533 non-null object
RecruitNumbers 9948 non-null object
ReleaseTime 9948 non-null object
Language 901 non-null object
Specialty 814 non-null object
PositionAdvantage 8288 non-null object
NNN 0 non-null float64
dtypes: float64(1), object(14)
memory usage: 1.1+ MB

可以了解到的信息: 目前的数据维度9948行X15列, Education, Language, Specialty, PositionAdvantage有不同程度的缺失(NNN是最后添加, 仅仅是用来补齐15元素), 14个python对象(1个浮点型)

3.2 描述性统计

由于我们所需信息的数据类型都是python对象, 故使用以下代码

#注意是大写的字母o
df.describe(include=['O'])

从以下信息(公司名称部分我没有截图)中可以得到:

职位名称中'数据分析师'最多, 多为民营公司, 公司规模150-500人最多, 行业领域金融/投资/证券最多, 薪资中6-8千/月最多, 大多对工作经验没有要求, 学历要求多为本科, 多数均招1人等信息.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

职位名称的种类就有4758种, 他们都是我们本次分析的数据分析师岗位吗, 先来确认下:

zhaopin.JobTitle.unique()
array(['零基础免费培训金融外汇数据分析师', '数据分析师(周末双休+上班舒适)', '数据分析师', ...,
 '数据分析实习(J10635)', '数据分析实习(J10691)', '数据分析实习(J10713)'], dtype=object)

这仅仅显示了职位名称中的一部分,而且还都符合要求, 换种思路先看20个

JobTitle = zhaopin.groupby('JobTitle', as_index=False).count()
JobTitle.JobTitle.head(20)
0 (AI)机器学习开发工程师讲师
1 (ID67391)美资公司数据分析
2 (ID67465)美资公司数据分析
3 (ID67674)500强法资汽车制造商数据分析专员(6个月)
4 (ID67897)知名500强法资公司招聘数据分析专员
5 (Senior)DataAnalyst
6 (免费培训)数据分析师+双休+底薪
7 (实习职位)BusinessDataAnalyst/业务数据分析
8 (急)人力销售经理
9 (提供食宿)银行客服+双休
10 (日语)股票数据分析员/EquityDataAnalyst-Japanese/
11 (越南语)股票数据分析员/EquityDataAnalyst-Vietnam
12 (跨境电商)产品专员/数据分析师
13 (韩语)股票数据分析员/EquityDataAnalyst-Korean
14 ***数据分析
15 -数据分析师助理/实习生
16 -数据分析师助理/统计专员+双休五险+住宿
17 -无销售不加班金融数据分析师月入10k
18 -金融数据分析师助理6k-1.5w
19 -金融数据分析师双休岗位分红
Name: JobTitle, dtype: object

可以看到还有机器学习开发讲师, 人力销售经理, 银行客服等其他无效数据.

现在我们对数据有了大致的认识, 下来我们开始数据预处理.

4. 数据预处理

4.1 数据清洗

数据清洗的目的是不让有错误或有问题的数据进入加工过程, 其主要内容包括: 重复值, 缺失值以及空值的处理

4.1.1 删除重复值

如果数据中存在重复记录, 而且重复数量较多时, 势必会对结果造成影响, 因此我们应当首先处理重复值.

#删除数据表中的重复记录, 并将删除后的数据表赋值给zhaopin
zhaopin = df.drop_duplicates(inplace = False)
zhaopin.shape
(8927, 15)

对比之前的数据, 重复记录1021条.

4.1.2 过滤无效数据

我们了解到职位名称中存在无效数据, 我们对其的处理方式是过滤掉.

#筛选名称中包含'数据'或'分析'或'Data'的职位
zhaopin = zhaopin[zhaopin.JobTitle.str.contains('.*?数据.*?|.*?分析.*?|.*?Data.*?')]
zhaopin.shape
(7959, 15)

4.1.3 缺失值处理

在pandas中缺失值为NaN或者NaT, 其处理方式有多种:

1. 利用均值等集中趋势度量填充

2. 利用统计模型计算出的值填充

3. 保留缺失值

4. 删除缺失值

#计算每个特征中缺失值个数
zhaopin.isnull().sum()
JobTitle 0
CompanyName 0
CompanyNature 0
CompanySize 0
IndustryField 0
Salary 0
Workplace 0
Workyear 0
Education 1740
RecruitNumbers 0
ReleaseTime 0
Language 7227
Specialty 7244
PositionAdvantage 1364
NNN 7959
dtype: int64

-- Education: 缺失值占比1740/7959 = 21.86%, 缺失很有可能是"不限学历", 我们就用"不限学历"填充

zhaopin.Education.fillna('不限学历', inplace=True)

-- Language: 缺失值占比7227/7959 = 90.80%, 缺失太多, 删除特征

-- Specialty: 缺失值占比7244/7959 = 91.02%, 同样缺失很多, 删除

zhaopin.drop(['Specialty','Language'], axis=1, inplace = True)

-- PositionAdvantage: 缺失占比1364/7959 = 17.14%, 选用众数中的第一个'五险一金'填充

zhaopin.PositionAdvantage.fillna(zhaopin.PositionAdvantage.mode()[0], inplace = True)

-- NNN: 没有任何意义, 直接删除

zhaopin.drop(["NNN"], axis=1, inplace = True)

最后, 检查缺失值是否处理完毕

zhaopin.isnull().sum()
JobTitle 0
CompanyName 0
CompanyNature 0
CompanySize 0
IndustryField 0
Salary 0
Workplace 0
Workyear 0
Education 0
RecruitNumbers 0
ReleaseTime 0
PositionAdvantage 0
dtype: int64

4.2 数据加工

由于现有的数据不能满足我们的分析需求, 因此需要对现有数据表进行分列, 计算等等操作.

需要处理的特征有: Salary, Workplace

1. Salary

将薪资分为最高薪资和最低薪资, 另外了解到薪资中单位有元/小时, 元/天, 万/月, 万/年, 千/月, 统一将其转化为千/月

import re
#将5种单元进行编号
zhaopin['Standard'] = np.where(zhaopin.Salary.str.contains('元.*?小时'), 0, 
 np.where(zhaopin.Salary.str.contains('元.*?天'), 1,
 np.where(zhaopin.Salary.str.contains('千.*?月'), 2, 
 np.where(zhaopin.Salary.str.contains('万.*?月'), 3, 
 4))))
#用'-'将Salary分割为LowSalary和HighSalary
SalarySplit = zhaopin.Salary.str.split('-', expand = True)
zhaopin['LowSalary'], zhaopin['HighSalary'] = SalarySplit[0], SalarySplit[1]
#Salary中包含'以上', '以下'或者两者都不包含的进行编号
zhaopin['HighOrLow'] = np.where(zhaopin.LowSalary.str.contains('以.*?下'), 0, 
 np.where(zhaopin.LowSalary.str.contains('以.*?上'), 2, 
 1))
#匹配LowSalary中的数字, 并转为浮点型
Lower = zhaopin.LowSalary.apply(lambda x: re.search('(d+.?d*)', x).group(1)).astype(float)
#对LowSalary中HighOrLow为1的部分进行单位换算, 全部转为'千/月'
zhaopin.LowSalary = np.where(((zhaopin.Standard==0)&(zhaopin.HighOrLow==1)), Lower*8*21/1000,
 np.where(((zhaopin.Standard==1)&(zhaopin.HighOrLow==1)), Lower*21/1000,
 np.where(((zhaopin.Standard==2)&(zhaopin.HighOrLow==1)), Lower,
 np.where(((zhaopin.Standard==3)&(zhaopin.HighOrLow==1)), Lower*10,
 np.where(((zhaopin.Standard==4)&(zhaopin.HighOrLow==1)), Lower/12*10,
 Lower)))))
#对HighSalary中的缺失值进行填充, 可以有效避免匹配出错.
zhaopin.HighSalary.fillna('0千/月', inplace =True)
#匹配HighSalary中的数字, 并转为浮点型
Higher = zhaopin.HighSalary.apply(lambda x: re.search('(d+.?d*).*?', str(x)).group(1)).astype(float)
#对HighSalary中HighOrLow为1的部分完成单位换算, 全部转为'千/月'
zhaopin.HighSalary = np.where(((zhaopin.Standard==0)&(zhaopin.HighOrLow==1)),zhaopin.LowSalary/21*26,
 np.where(((zhaopin.Standard==1)&(zhaopin.HighOrLow==1)),zhaopin.LowSalary/21*26,
 np.where(((zhaopin.Standard==2)&(zhaopin.HighOrLow==1)), Higher,
 np.where(((zhaopin.Standard==3)&(zhaopin.HighOrLow==1)), Higher*10,
 np.where(((zhaopin.Standard==4)&(zhaopin.HighOrLow==1)), Higher/12*10,
 np.where(zhaopin.HighOrLow==0, zhaopin.LowSalary, 
 zhaopin.LowSalary))))))
#查看当HighOrLow为0时, Standard都有哪些, 输出为2, 4
zhaopin[zhaopin.HighOrLow==0].Standard.unique() 
#完成HighOrLow为0时的单位换算
zhaopin.loc[(zhaopin.HighOrLow==0)&(zhaopin.Standard==2), 'LowSalary'] = zhaopin[(zhaopin.HighOrLow==0)&(zhaopin.Standard==2)].HighSalary.apply(lambda x: 0.8*x)
zhaopin.loc[(zhaopin.HighOrLow==0)&(zhaopin.Standard==4), 'HighSalary'] = zhaopin[(zhaopin.HighOrLow==0)&(zhaopin.Standard==4)].HighSalary.apply(lambda x: x/12*10)
zhaopin.loc[(zhaopin.HighOrLow==0)&(zhaopin.Standard==4), 'LowSalary'] = zhaopin[(zhaopin.HighOrLow==0)&(zhaopin.Standard==4)].HighSalary.apply(lambda x: 0.8*x)
#查看当HighOrLow为2时, Srandard有哪些, 输出为4
zhaopin[zhaopin.HighOrLow==2].Standard.unique() 
#完成HighOrLow为2时的单位换算
zhaopin.loc[zhaopin.HighOrLow==2, 'LowSalary'] = zhaopin[zhaopin.HighOrLow==2].HighSalary.apply(lambda x: x/12*10)
zhaopin.loc[zhaopin.HighOrLow==2, 'HighSalary'] = zhaopin[zhaopin.HighOrLow==2].LowSalary.apply(lambda x: 1.2*x)
zhaopin.LowSalary , zhaopin.HighSalary = zhaopin.LowSalary.apply(lambda x: '%.1f'%x), zhaopin.HighSalary.apply(lambda x: '%.1f'%x)

2. Workplace

对工作地区进行统一

#查看工作地有哪些
zhaopin.Workplace.unique()
#查看工作地点名字中包括省的有哪些, 结果显示全部为xx省, 且其中不会出现市级地区名
zhaopin[zhaopin.Workplace.str.contains('省')].Workplace.unique()
#将地区统一到市级
zhaopin['Workplace'] = zhaopin.Workplace.str.split('-', expand=True)[0]

3. 删除重复多余信息

zhaopin.drop(['Salary','Standard', 'HighOrLow'], axis = 1, inplace = True)

到目前为止, 我们对数据处理完成了, 接下来就是分析了.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

5. 可视化分析

5.1 企业类型

import matplotlib
import matplotlib.pyplot as plt
CompanyNature_Count = zhaopin.CompanyNature.value_counts()
#设置中文字体
font = {'family': 'SimHei'}
matplotlib.rc('font', **font)
fig = plt.figure(figsize = (8, 8))
#绘制饼图, 参数pctdistance表示饼图内部字体离中心距离, labeldistance则是label的距离, radius指饼图的半径
patches, l_text, p_text = plt.pie(CompanyNature_Count, autopct = '%.2f%%', pctdistance = 0.6, labels = CompanyNature_Count.index, labeldistance=1.1, radius = 1)
m , n= 0.02, 0.028
for t in l_text[7: 11]:
 t.set_y(m)
 m += 0.1
for p in p_text[7: 11]:
 p.set_y(n)
 n += 0.1
plt.title('数据分析岗位中各类型企业所占比例', fontsize=24)

可以看出招聘中主要以民营企业, 合资企业和上市公司为主.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

5.2 企业规模

CompanySize_Count = zhaopin.CompanySize.value_counts()
index, bar_width= np.arange(len(CompanySize_Count)), 0.6
fig = plt.figure(figsize = (8, 6))
plt.barh(index*(-1)+bar_width, CompanySize_Count, tick_label = CompanySize_Count.index, height = bar_width)
#添加数据标签
for x,y in enumerate(CompanySize_Count):
 plt.text(y+0.1, x*(-1)+bar_width, '%s'%y, va = 'center')
plt.title('数据分析岗位各公司规模总数分布条形图', fontsize = 24)

招聘数据分析岗位的公司规模主要以50-500人为主

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

5.3 地区

from pyecharts import Geo
from collections import Counter
#统计各地区出现次数, 并转换为元组的形式
data = Counter(place).most_common()
#生成地理坐标图
geo =Geo("数据分析岗位各地区需求量", title_color="#fff", title_pos="center", width=1200, height=600, background_color='#404a59')
attr, value =geo.cast(data)
#添加数据点
geo.add('', attr, value, visual_range=[0, 100],visual_text_color='#fff', symbol_size=5, is_visualmap=True, is_piecewise=True)
geo.show_config()
geo.render()

可以看出北上广深等经济相对发达的地区, 对于数据分析岗位的需求量大.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

参考自: https://blog.csdn.net/qq_41841569/article/details/82811153?utm_source=blogxgwz1

5.4 学历和工作经验

fig, ax = plt.subplots(1, 2, figsize = (18, 8))
Education_Count = zhaopin.Education.value_counts()
Workyear_Count = zhaopin.Workyear.value_counts()
patches, l_text, p_text = ax[0].pie(Education_Count, autopct = '%.2f%%', labels = Education_Count.index )
m = -0.01
for t in l_text[6:]:
 t.set_y(m)
 m += 0.1
 print(t)
for p in p_text[6:]:
 p.set_y(m)
 m += 0.1
ax[0].set_title('数据分析岗位各学历要求所占比例', fontsize = 24)
index, bar_width = np.arange(len(Workyear_Count)), 0.6
ax[1].barh(index*(-1) + bar_width, Workyear_Count, tick_label = Workyear_Count.index, height = bar_width)
ax[1].set_title('数据分析岗位工作经验要求', fontsize= 24)

Academic requirements mostly undergraduate, college-based work experience required no work experience requirements based, visible mainly for recruitment of fresh graduates.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

5.4 wages

1. The relationship between wages and job requirements

fig = plt.figure(figsize = (9,7))
#转换类型为浮点型
zhaopin.LowSalary, zhaopin.HighSalary = zhaopin.LowSalary.astype(float), zhaopin.HighSalary.astype(float)
#分别求各地区平均最高薪资, 平均最低薪资
Salary = zhaopin.groupby('Workplace', as_index = False)['LowSalary', 'HighSalary'].mean()#分别求各地区的数据分析岗位数量,并降序排列 
Workplace = zhaopin.groupby('Workplace', as_index= False)['JobTitle'].count().sort_values('JobTitle', ascending = False)#合并数据表
Workplace = pd.merge(Workplace, Salary, how = 'left', on = 'Workplace')#用前20名进行绘图
Workplace = Workplace.head(20)
plt.bar(Workplace.Workplace, Workplace.JobTitle, width = 0.8, alpha = 0.8)
plt.plot(Workplace.Workplace, Workplace.HighSalary*1000, '--',color = 'g', alpha = 0.9, label='平均最高薪资')
plt.plot(Workplace.Workplace, Workplace.LowSalary*1000, '-.',color = 'r', alpha = 0.9, label='平均最低薪资')
#添加数据标签
for x, y in enumerate(Workplace.HighSalary*1000):
 plt.text(x, y, '%.0f'%y, ha = 'left', va='bottom')
for x, y in enumerate(Workplace.LowSalary*1000):
 plt.text(x, y, '%.0f'%y, ha = 'right', va='bottom')
for x, y in enumerate(Workplace.JobTitle):
 plt.text(x, y, '%s'%y, ha = 'center', va='bottom')
plt.legend()
plt.title('数据分析岗位需求量排名前20地区的薪资水平状况', fontsize = 20)

As can be seen, with the reduction in demand, wages also decreased.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

2. Compensation and empirical relationship

#求出各工作经验对应的平均最高与平均最低薪资
Salary_Year = zhaopin.groupby('Workyear', as_index = False)['LowSalary', 'HighSalary'].mean()
#求平均薪资
Salary_Year['Salary'] = (Salary_Year.LowSalary.add(Salary_Year.HighSalary)).div(2)
#转换列, 得到想要的顺序
Salary_Year.loc[0], Salary_Year.loc[6] = Salary_Year.loc[6], Salary_Year.loc[0]
#绘制条形图
plt.barh(Salary_Year.Workyear, Salary_Year.Salary, height = 0.6)
for x, y in enumerate(Salary_Year.Salary):
 plt.text(y+0.1,x, '%.2f'%y, va = 'center')
plt.title('各工作经验对应的平均薪资水平(单位:千/月)', fontsize = 20)

The more work experience, the higher the salary.

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

3. Salary and academic relations

#计算平均薪资
Salary_Education = zhaopin.groupby('Education', as_index = False)['LowSalary', 'HighSalary'].mean()
Salary_Education['Salary'] = Salary_Education.LowSalary.add(Salary_Education.HighSalary).div(2)
Salary_Education = Salary_Education.sort_values('Salary', ascending = True)
#绘制柱形图
plt.bar(Salary_Education.Education, Salary_Education.Salary, width = 0.6)
for x,y in enumerate(Salary_Education.Salary):
 plt.text(x, y, '%.2f'%y, ha = 'center', va='bottom')
plt.title('各学历对应的平均工资水平(单位:千/月)', fontsize = 20)

The higher the education, the higher the corresponding salary levels

Python crawling recruitment site for data analysis, benefits crystal clear, high-paying very simple

 

to sum up

1. The type of business data analysis jobs in private enterprises, joint ventures and listed companies based, enterprise-scale multi-50-500 people.

2. Data analysis of job qualifications required to undergraduate, college-based, experience no work experience in the majority, it is visible mainly for graduates.

3. The economy is relatively developed areas north of Guangzhou-Shenzhen, Hangzhou and other large demand for data analysis jobs, and higher wages in other regions; the higher the education, the more rich experience corresponding salary levels will be higher.

Guess you like

Origin blog.csdn.net/qq_40925239/article/details/83660221