Data analysis project combat: HR data analysis - R articles

A: analysis background

When the end of the year, Exxon inventory of work in the past year found that the company's work in terms of manpower and unsatisfactory. Specific performance is: appear in important times during the year and employee separations: the enthusiasm of some of the staff work is not high: Training staff is understaffed, leading to slow business development; human part of the employees to understand the overall situation is not in place and so on.

Second, data import and preview

## 加载包
library(dplyr)
library(ggplot2)
library(Rmisc)         # multiplot()  分割绘图区域

#导入数据集
data<-read.csv('员工数据.csv')

#查看数据的内容摘要
summary(data)

#展示数据的结构摘要
str(data)

Here Insert Picture DescriptionHere Insert Picture Description

#展示部分数据
head(data)

Here Insert Picture Description
Satisfaction of this field exists missing values, the subsequent need for further cleaning of the data;

Third, data cleansing and transformation

Up value for the missing value with the average of:

data[is.na(data$员工满意度),'员工满意度']<-mean(data$员工满意度,na.rm = TRUE)
head(data,15)

Here Insert Picture Description

Converting data types into type factor, further facilitate the subsequent analysis done

# 将数据类型转换成便于后续分析的类型
data$职位变动<-as.factor(data$职位变动)
data$离职<-as.factor(data$离职)
data$过去5年是否有升职<-as.factor(data$过去5年是否有升职)
str(data)

Here Insert Picture Description
The total duration of the project work & Mean Employees: Creating a new feature variable (column)

# 创建新的特征变量()-总工作时长&员工平均项目数
new.data<-data%>%
  mutate('总工作时长'= 平均每月工作小时 * 12 * 在公司工作年限,'年均项目数' = 项目数/在公司工作年限)%>%
  as.data.frame

# 查看数据
str(new.data)

Here Insert Picture Description

Fourth, exploratory data analysis

When employees of the company to explore the relationship satisfaction, performance evaluation and the average monthly turnover of working long and whether

# 绘制对公司满意度与是否离职的箱线图
box_sat <- ggplot(new.data, aes(x = 离职, y = 员工满意度, fill = 离职)) +
  geom_boxplot() + 
  theme_bw() +  # 一种ggplot的主题
  labs(x = '离职', y = '员工满意度') # 设置横纵坐标标签
box_sat

Here Insert Picture Description

# 绘制绩效评估与是否离职的箱线图
box_eva <- ggplot(new.data, aes(x = 离职, y = 最后一次绩效评估, fill = 离职)) + 
  geom_boxplot() +
  theme_bw() + 
  labs(x = '离职', y = '最后一次绩效评估')
box_eva

Here Insert Picture Description

# 绘制平均月工作时长与是否离职的箱线图
box_mon <- ggplot(new.data, aes(x = 离职, y = 平均每月工作小时, fill = 离职)) + 
  geom_boxplot() + 
  theme_bw() + 
  labs(x = '离职', y = '平均每月工作小时')
box_mon

Here Insert Picture Description

# 绘制员工在公司工作年限与是否离职的箱线图
box_time <- ggplot(new.data, aes(x = 离职, y = 在公司工作年限, fill = 离职)) + 
  geom_boxplot() + 
  theme_bw() + 
  labs(x = '离职', y = '在公司工作年限')
box_time

Here Insert Picture Description

# 合并这些图形在一个绘图区域,cols = 2的意思就是排版为一行二列
multiplot(box_sat, box_eva, box_mon, box_time, cols = 2)

Here Insert Picture Description
to sum up:

By exploring a long time employee of the company satisfaction, performance evaluation and the average monthly turnover of work and whether the relationship can be drawn features former employees:

  • Company to lower satisfaction, mostly concentrated around 0.4;
  • High performance evaluation, the more concentrated than 0.8; (
  • Higher average length of the working balance, a mostly above average (200 hours);
  • In about four years work experience

The number of exploration projects involved, there is no relationship between the promotion and pay and leave within five years

# 绘制参与项目个数与是否离职的百分比堆积条形图
bar_pro <- ggplot(new.data, aes(x = as.factor(new.data$项目数), fill = 离职)) +
  geom_bar(position = 'fill') + # position = 'fill'即绘制百分比堆积条形图
  theme_bw() + 
  labs(x = '项目数', y = '离职')
bar_pro

Here Insert Picture Description

# 绘制5年内是否升职与是否离职的百分比堆积条形图
bar_5years <- ggplot(new.data, aes(x = 过去5年是否有升职, fill = 离职)) +
  geom_bar(position = 'fill') + 
  theme_bw() + 
  labs(x = '过去5年是否有升职', y = '离职')
bar_5years

Here Insert Picture Description

# 绘制薪资与是否离职的百分比堆积条形图
bar_salary <- ggplot(new.data, aes(x = 薪资水平, fill = 离职)) +
  geom_bar(position = 'fill') + 
  theme_bw() + 
  labs(x = '薪资水平', y = '离职')
bar_salary

Here Insert Picture Description

# 合并这些图形在一个绘图区域,cols = 3的意思就是排版为一行三列
multiplot(bar_pro, bar_5years, bar_salary, cols = 3)

Here Insert Picture Description
By participating in the number of exploration projects, there is no relationship between the promotion and pay and leave within five years, we can draw the following results:

  • The number of items involved, the more the greater the staff turnover rate (number of items to remove a sample of 2)
  • Within five years there is no promotion of employee turnover rate is relatively large
  • Payroll The higher the turnover rate

Drawing post order and whether to leave a percentage of stacked bar chart

bar_salary <- ggplot(new.data, aes(x = 职务序列, fill = 离职)) +
  geom_bar(position = 'fill') + 
  theme_bw() + 
  labs(x = '职务序列', y = '离职')
bar_salary

Here Insert Picture Description
Overall, the proportion of employee turnover sequence of different positions roughly, but management classes, research these relatively lower turnover ratio of core positions.


Rendering various positions sequences satisfaction boxplot

box_eva <- ggplot(new.data, aes(x = 职务序列, y = 员工满意度, 
                            fill = 职务序列)) + 
  geom_boxplot() +
  theme_bw() + 
  labs(x = '职务序列', y = '员工满意度')
box_eva

Here Insert Picture Description
Overall, the distribution of employee satisfaction sequence of different positions roughly, but HR, Finance class employees scored relatively low.

While exploring the total employees work long relationship between annual leave and the number of items

# 员工总工作时长
box_total_time <- ggplot(new.data, aes(x = 离职, y = 总工作时长, fill = 离职)) + 
  geom_boxplot() + 
  theme_bw() + 
  labs(x = '离职', y = '总工作时长')
box_total_time

Here Insert Picture Description

# 年均项目数数分布

box_avg_pro <- ggplot(new.data, aes(x = 离职, y = 年均项目数, fill = 离职)) + 
  geom_boxplot() + 
  theme_bw() + 
  labs(x = '离职', y = '年均项目数')
box_avg_pro 

Here Insert Picture Description

Fifth, statistical analysis and modeling

# 步骤1:生成表格,体现不同年限员工流失率
new.data$离职<-as.integer(new.data$离职) # 转换数据类型为整型
library(dplyr)
data.Turnover <- new.data %>%
  group_by(在公司工作年限) %>%  # 按工作年限进行分组
  arrange(在公司工作年限) %>%   # 按工作年限升序排列
  dplyr::summarise(该年限离职员工数=sum(离职==2), 该年限员工总数=n()) %>%  # 对分组后的工作年限进行离职人数和所有人数的汇总
  mutate(该年限及以上所有员工数=rev(cumsum(rev(该年限员工总数)))) %>%  # 新增列“该年限及该年限以上所有人数”
  mutate(该年限员工流失率=该年限离职员工数/该年限及以上所有员工数) %>%  # 新增列“该年限员工流失率”
  select(在公司工作年限, 该年限离职员工数, 该年限员工总数, 该年限及以上所有员工数, 该年限员工流失率)%>%   # 只保留表格的特定列
  as.data.frame

# 步骤2:将体现不同年限员工流失率的表格可视化,找出流失率最高对应的年限

#install.packages("ggplot2")
library(ggplot2)
g <- ggplot(data.Turnover, aes(在公司工作年限,该年限员工流失率))  # 绘制散点图-流失率与工作年限的关系
g + geom_point() + 
  geom_line() +
  theme_bw()+
  labs(x = "在公司工作年限") +
  labs(y = "该年限员工流失率") +
  labs(title = "员工流失率与工作年限的关系") 

The highest turnover rate was 5 years work experience
Here Insert Picture Description

# 步骤3:做流失驱动力分析,寻找员工在流失率最高的年份离职的主要因素
data.5year <- new.data %>%
  filter(在公司工作年限 >= '5') %>%  # 筛选工作年限为 5 的数据
  mutate(是否在第五年流失 = ifelse(在公司工作年限=='5' & 离职==2,1,0))  # 新增列“是否在第五年流失”
summary(data.5year)

Here Insert Picture Description

# 步骤4:线性回归,了解各因素对员工流失的影响

lm_fit <- lm(是否在第五年流失 ~ 项目数+平均每月工作小时+工作事故+过去5年是否有升职+职务序列+薪资水平+职位变动+员工满意度+最后一次绩效评估, data=data.5year)
library("car")
vif(lm_fit) # GVIF 均小于10,不存在共线性的干扰

# 挑选模型输入自变量 
library(olsrr) 

model <- lm(是否在第五年流失 ~ 项目数+平均每月工作小时+工作事故+过去5年是否有升职+职务序列+薪资水平+职位变动+员工满意度+最后一次绩效评估,data= data.5year) 
k <- ols_step_all_possible(model) 
print(k)
k[k$adjr==max(k$adjr),]

Here Insert Picture Description

# 将调整R2最大的变量输入到线性回归模型
lm_fit <- lm(是否在第五年流失 ~ 项目数+平均每月工作小时+工作事故+过去5年是否有升职+薪资水平+职位变动+员工满意度+最后一次绩效评估, data.5year) 
summary(lm_fit) 

Here Insert Picture Description
Analysis of the primary factor in the fifth year of employee turnover using linear regression, linear regression results can be obtained as follows:

Whether in the fifth year loss = 1.31+ 0.76 * Last Performance Evaluation + 0.30 + 0.10 * * Employee Satisfaction wages low + 0.07 * + 0.05 * Items salaries medium - whether the past five years * 0.11 - 0.11 * accidents at work there promoted 1-- 0.22 * position change 1

Here Insert Picture Description

VI. Conclusions and recommendations Analysis

in conclusion:

  • Satisfaction and positively correlated wages, salaries higher satisfaction higher, the lower the turnover rate;
  • High performance evaluation, employee satisfaction is low, low salary levels of staff turnover is very easy;
  • Long long while the average monthly job, above average (200 hours) easier to leave;
  • The more the number of employees involved in the project turnover rate is relatively greater;
  • Work 4--6-year employee turnover rate is relatively high.

Suggest:

  • HR departments should focus on the promotion case, satisfaction and psychological evaluation of high performance, low employee satisfaction, low salary levels of this part of the staff.
  • For long working hours, work pressure of staff, timely adjustment of workload, reasonable arrangements for the work program;
  • More than the number of items it may mean higher requirements for the work of the staff, department heads if given enough support in accordance with the appropriate staff capacity, but also need to focus on the factors;
  • Corresponding to 4--6 years, key employees to pay more attention, according to the actual situation, make the appropriate incentives and promotion to increase employee satisfaction, improve retention of these employees.
Published 17 original articles · won praise 10 · views 1666

Guess you like

Origin blog.csdn.net/weixin_44976611/article/details/104936420