# 1. Experimental Purpose

(1) The data were analyzed to find out which variables on employee retention has a direct and significant impact (ie, they are to leave the company or continue)
(2) Draw a bar graph showing the effect of wages on the retention of the
draw bar Graphical graph showing the correlation between department and employee retention
(3) Build a logistic regression model and calculate the accuracy of the model

# 2. Import the necessary modules and read the data

``````import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df.shape    #14999条数据，10个字段
left = df[df.left==1]     #离职
left.shape
retained = df[df.left==0]   #在职
retained.shape
``````

# 3. Visualize data

``````df.groupby('left').mean()     #按是否离职对数据分组
#0表示在职，1表示离职
``````

``````pd.crosstab(df.salary,df.left).plot(kind='bar')    #比较薪水对员工离职的影响
``````

``````pd.crosstab(df.Department,df.left).plot(kind='bar')    #比较不同部门对员工离职的影响
``````

# 4. Data preprocessing

``````subdf = df[['satisfaction_level','average_montly_hours','promotion_last_5years','salary']]  #提取5个影响因素

salary_dummies = pd.get_dummies(subdf.salary,prefix='salary')   #将salary字段数字化 ，转化后的字段加前缀salary

df_with_dummies = pd.concat([subdf,salary_dummies],axis='columns')  #拼接字段
``````

``````df_with_dummies.drop('salary',axis='columns',inplace=True)   #删除原salary字段
``````

``````X = df_with_dummies    #数据
y = df.left      #标签
``````

# 5. Training + prediction

``````from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression    #导入逻辑回归模块

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
model = LogisticRegression()   #实例化模型
model.fit(X_train, y_train)   #训练

model.predict(X_test)   #预测
model.score(X_test,y_test)   #计算得分
model.coef_     #打印系数
model.intercept_   #打印截距
``````

Published 227 original articles · praised 633 · 30,000+ views

### Guess you like

Origin blog.csdn.net/weixin_37763870/article/details/105442542
Recommended
Ranking
Daily