Data analysis practical case: Python analyzes why employees leave (complete code attached)

Hello everyone, today I will introduce to you a practical Python data analysis project,not only includes code, but also provides analysis data sets.

Employee turnover, or the rate at which employees leave the company, is an important concern for companies. Not only does it lead to the loss of valuable talent, it also creates costs and destroys productivity. Understanding why employees quit is critical for organizations to develop effective employee retention strategies.

In this data analysis project, we analyzed a data set to explore the factors behind employee turnover. By analyzing factors such as age, gender, salary, job role and work-life balance, we aim to uncover patterns and insights that can help organizations improve their employee retention efforts.

Through our analysis, we will identify the reasons why employees choose to leave and provide valuable information to organizations looking to enhance their employee retention strategies. By addressing these critical factors, organizations can create a work environment that promotes employee satisfaction and long-term commitment.

Through this data-driven journey, we delve into the complexities of employee turnover and gain insights that enable organizations to reduce turnover and build a strong, motivated workforce that drives business success.

For this project, IBM's HR analytics dataset will be used.

The relevant documents and codes of this article have been uploaded to the public account:Python Learning and Data Mining, and the background reply [< a i=3>Employee turnover] can be obtained.

Recommend

We created "100 Super Powerful Algorithm Models". Features: Easy to learn from 0 to 1. Principles, codes, and cases are all available. All algorithm models are expressed according to this rhythm, so it is a complete set of cases. Library.

Many beginners have such a pain point, which is the case. The completeness of the case directly affects the interest of the students. Therefore, I have compiled 100 of the most common algorithm models to give you a boost on your learning journey!

If you also want to learn, communicate, and obtain information, you can join the communication group to obtain it. The best way to note when adding is: source + direction of interest, so as to find like-minded friends.

Method ①, add WeChat account: dkl88194, remarks: from CSDN + communication group
Method ②, search public account on WeChat: Python learning and data mining, background reply: communication group< /span>

Insert image description here

/ 01 / Data Overview

Begin exploratory data analysis by importing the necessary Python libraries and datasets.

# 导入相关Python库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import hvplot.pandas
import plotly.express as px

import warnings
import plotly.graph_objects as go
import scipy
from scipy.stats import chi2_contingency 
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode
from statistics import stdev
from pprint import pprint
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler, StandardScaler
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
warnings.filterwarnings("ignore")
import plotly.figure_factory as ff

# 读取数据
df = pd.read_csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")

Before we dive into deep visualization, we want to make sure what does our data look like? This will better help us better understand how the data should be processed later in the entire project. Look at the first 5 rows of the data set.

df.head(5)

In this data, we have 35 columns. In the image above, we have just printed the first five rows of the dataset.

Here are some key issues to consider with your dataset:

  • Columns and rows: How many columns and rows are there in the dataset?
  • Missing data: Are there missing values ​​in the dataset?
  • Data types: What are the different types of data used in this dataset?
  • Data distribution: Is the data distribution right-skewed, left-skewed, or symmetric? Understanding distributions is useful for statistical analysis and modeling.
  • Data meaning: What does the data represent? In this dataset, many variables are ordinal, representing an ordered categorical scale. For example, job satisfaction ranges from 1 (low) to 4 (very high).
  • Label: What is the output or label variable in the dataset?

By addressing these questions, a brief overview of our dataset, its characteristics, and the insights it can provide can be obtained.

print(df.info())

The dataset consists of 1470 observations (rows) and 35 features (variables). Fortunately, there are no missing values ​​in the dataset, which simplifies our analysis process. Data types in the dataset include strings and integers, with a clear distinction between the two.

The main focus of our analysis is the Attrition tag as our goal is to understand the reasons behind employees leaving the organization. It is worth noting that the data set is unbalanced, with approximately 84% of cases representing employees who did not leave and 16% representing employees who did leave. This suggests that more employees stay with the organization than leave the organization.

Armed with these details, we can conduct a comprehensive analysis of the data set, explore the factors that lead to employee turnover, and understand the dynamics of employee retention within the organization.

/ 02 / Age Overview

A few questions to consider:

  • Are there differences in turnover rates among younger and older workers?
  • Are there gender-based differences in attrition patterns across age groups?
  • How do attrition rates trend with age? Is there a relationship between them?
age_att=df.groupby(['Age','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
px.line(age_att,x='Age',y='Counts',color='Attrition',title='Age Overview')

The graph above shows that the age group with the highest attrition rates is 28-32 years old, indicating that individuals want to maintain stability in their job roles as they age.

Conversely, employees are also more likely to leave the organization at younger ages, particularly between the ages of 18 and 20, as they explore different opportunities.

As age increases, the churn rate gradually decreases until it reaches an equilibrium point around the age of 21. After the age of 35, the attrition rate gradually decreases.

plt.figure(figsize=(8,5))
sns.kdeplot(x=df['Age'],color='MediumVioletRed',shade=True,label='Age')
plt.axvline(x=df['Age'].mean(),color='k',linestyle ="--",label='Mean Age: 36.923')
plt.legend()
plt.title('Distribution of Age')
plt.show()

fig, axes = plt.subplots(1, 2, sharex=True, figsize=(15,5))
fig.suptitle('Attrition Age Distribution by Gender')
sns.kdeplot(ax=axes[0],x=df[(df['Gender']=='Male')&(df['Attrition']=='Yes')]['Age'], color='r', shade=True, label='Yes')
sns.kdeplot(ax=axes[0],x=df[(df['Gender']=='Male')&(df['Attrition']=='No')]['Age'], color='#01CFFB', shade=True, label='No')
axes[0].set_title('Male')
axes[0].legend(title='Attrition')
sns.kdeplot(ax=axes[1],x=df[(df['Gender']=='Female')&(df['Attrition']=='Yes')]['Age'], color='r', shade=True, label='Yes')
sns.kdeplot(ax=axes[1],x=df[(df['Gender']=='Female')&(df['Attrition']=='No')]['Age'], color='#01CFFB', shade=True, label='No')
axes[1].set_title('Female')
axes[1].legend(title='Attrition')
plt.show()

  • Turnover and Age: Analysis shows significant differences in turnover rates for younger and older workers. The highest turnover rates are found in the 28-32 age range, indicating individuals' desire for job stability as they age. Conversely, the likelihood of younger employees (18-20 years old) leaving the organization increases as employees explore different opportunities. As age increases, employee turnover gradually decreases until reaching an equilibrium point around the age of 21. After the age of 35, employee turnover rates decline further, indicating higher job retention rates for older workers.
  • Gender-based differences: The analysis also revealed gender-based differences in attrition patterns. It is evident from the distribution plot that male employees have a higher turnover rate compared to female employees. Further exploration of attrition patterns across age groups by gender can provide insight into the factors that influence attrition within specific demographics.

/ 03 / Gender Overview

Here are a few points to consider.

  • How many male and female employees are there?
  • Do women and men have different attrition rates?
  • What is the median salary for the male and female groups?
  • Is there a relationship between total years with a company and gender?

In this section we will try to see if there are differences between men and women in organizations. Additionally, we'll look at other basic information such as age, job satisfaction level, and average salary by gender.

att1=df.groupby(['Gender'],as_index=False)['Age'].count()
att1.rename(columns={
    
    'Age':'Count'},inplace=True)
fig = make_subplots(rows=1, cols=2, specs=[[{
    
    "type": "pie"},{
    
    "type": "pie"}]],subplot_titles=('',''))
fig.add_trace(go.Pie(values=att1['Count'],labels=['Female','Male'],hole=0.7,marker_colors=['Red','Blue']),row=1,col=1)
fig.add_layout_image(
    dict(
        source="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT8aOWxpkvZGU2EzQ0_USzl6PhuWLi_36xptjeWVXvSqQ2a13MNAjCWyBnhMlkr_ZbFACk&usqp=CAU",
        xref="paper",
        yref="paper",
        x=0.94, y=0.272,
        sizex=0.35, sizey=1,
        xanchor="right", yanchor="bottom", sizing= "contain",
    )
)
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,legend_title_text="<b>Gender",title_text='<b style="color:black; font-size:120%;">Gender Overview',font_family="Times New Roman",title_font_family="Times New Roman")
fig.update_traces(marker=dict(line=dict(color='#000000', width=1.2)))
fig.update_layout(title_x=0.5,legend=dict(orientation='v',yanchor='bottom',y=1.02,xanchor='right',x=1))
fig.add_annotation(x=0.715,
                   y=0.18,
                   text='<b style="font-size:1.2vw" >Male</b><br><br><b style="color:DeepSkyBlue; font-size:2vw">882</b>',
                   showarrow=False,
                   xref="paper",
                   yref="paper",
                  )
fig.add_annotation(x=0.89,
                   y=0.18,
                   text='<b style="font-size:1.2vw" >Female</b><br><br><b style="color:MediumVioletRed; font-size:2vw">588</b>',
                   showarrow=False,
                   xref="paper",
                   yref="paper",
                  )
fig.show()

att1=df.groupby('Attrition',as_index=False)['Age'].count()
att1['Count']=att1['Age']
att1.drop('Age',axis=1,inplace=True)
att2=df.groupby(['Gender','Attrition'],as_index=False)['Age'].count()
att2['Count']=att2['Age']
att2.drop('Age',axis=1,inplace=True)
fig=go.Figure()
fig=make_subplots(rows=1,cols=3)
fig = make_subplots(rows=1, cols=3, specs=[[{
    
    "type": "pie"}, {
    
    "type": "pie"}, {
    
    "type": "pie"}]],subplot_titles=('<b>Employee Attrition', '<b>Female Attrition','<b>Male Attrition'))



fig.add_trace(go.Pie(values=att1['Count'],labels=att1['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Employee Attrition',showlegend=False),row=1,col=1)
fig.add_trace(go.Pie(values=att2[(att2['Gender']=='Female')]['Count'],labels=att2[(att2['Gender']=='Female')]['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Female Attrition',showlegend=False),row=1,col=2)
fig.add_trace(go.Pie(values=att2[(att2['Gender']=='Male')]['Count'],labels=att2[(att2['Gender']=='Male')]['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Male Attrition',showlegend=True),row=1,col=3)
fig.update_layout(title_x=0,template='simple_white',showlegend=True,legend_title_text="<b style=\"font-size:90%;\">Attrition",title_text='<b style="color:black; font-size:120%;"></b>',font_family="Times New Roman",title_font_family="Times New Roman")
fig.update_traces(marker=dict(line=dict(color='#000000', width=1)))

In this company, the employee turnover rate is 16%.

Male employees have the highest turnover rate compared to female employees.

The attrition rate was 17% for male employees and 14.8% for female employees.

fig=px.box(df,x='Gender',y='MonthlyIncome',color='Attrition',template='simple_white',color_discrete_sequence=['LightCoral','DeepSkyBlue'])
fig=fig.update_xaxes(visible=True)
fig=fig.update_yaxes(visible=True)
fig=fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,title_text='<b style="color:black; font-size:105%;">Employee Attrition based on Monthly Income</b>',font_family="Times New Roman",title_font_family="Times New Roman")
fig.show()

Employees with lower wages have higher turnover rates.

fig=px.box(df,x='Gender',y='TotalWorkingYears',color='Attrition',template='simple_white',color_discrete_sequence=['LightCoral','DeepSkyBlue'])
fig=fig.update_xaxes(visible=True)
fig=fig.update_yaxes(visible=True)
fig=fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,title_text='<b style="color:black; font-size:105%;">Employee Attrition based on Total working Years</b>',font_family="Times New Roman",title_font_family="Times New Roman")
fig.show()

From the boxplot above we can see that women tend to stay at companies longer. A male and female employee with 19 years of service has resigned.

fig, axes = plt.subplots(1, 2, sharex=True, figsize=(15,5))
fig.suptitle('Attrition Salary Distribution by Gender')
sns.kdeplot(ax=axes[0],x=df[(df['Gender']=='Male')&(df['Attrition']=='Yes')]['MonthlyIncome'], color='r', shade=True, label='Yes')
sns.kdeplot(ax=axes[0],x=df[(df['Gender']=='Male')&(df['Attrition']=='No')]['MonthlyIncome'], color='#00BFFF', shade=True, label='No')
axes[0].set_title('Male')
axes[0].legend(title='Attrition')
sns.kdeplot(ax=axes[1],x=df[(df['Gender']=='Female')&(df['Attrition']=='Yes')]['MonthlyIncome'], color='r', shade=True, label='Yes')
sns.kdeplot(ax=axes[1],x=df[(df['Gender']=='Female')&(df['Attrition']=='No')]['MonthlyIncome'], color='#00BFFF', shade=True, label='No')
axes[1].set_title('Female')
axes[1].legend(title='Attrition')
plt.show()

  • Number of male and female employees: Determining the number of male and female employees in a company can help understand the gender composition of the workforce. We have 882 men and 588 women
  • Turnover rates by gender: Analysis shows differences in turnover rates for male and female employees. Attrition rates are higher among male employees, at 17%, while female employees have a slightly lower attrition rate of 14.8%. This suggests that gender may play a role in employee turnover.
  • Median salary by gender: Examining the median salary of male and female employees can provide insight into any potential salary-related factors that influence employee turnover. It was observed that female employees with a median salary of 2,886 have left the company, while male employees with a median salary of 3,400 have also lost ground. This suggests that employees with lower wages are more likely to leave the organization.
  • Relationship between total years of experience and gender: Boxplot analysis shows a potential relationship between total years of experience in a company and gender. Women tend to stay in companies longer than men. Notably, both male and female employees with 19 years of service have left the company, indicating a potential threshold or tipping point for employee retention.

/ 04 / Other factors

1. Is marital status a factor?

att1=df.groupby('Attrition',as_index=False)['Age'].count()
att1['Count']=att1['Age']
att1.drop('Age',axis=1,inplace=True)
att2=df.groupby(['MaritalStatus','Attrition'],as_index=False)['Age'].count()
att2['Count']=att2['Age']
att2.drop('Age',axis=1,inplace=True)
fig=go.Figure()
fig=make_subplots(rows=1,cols=4)
fig = make_subplots(rows=1, cols=4, specs=[[{
    
    "type": "pie"}, {
    
    "type": "pie"}, {
    
    "type": "pie"},{
    
    "type": "pie"} ]],subplot_titles=('<b>Employee Attrition', '<b>Married Attrition','<b>Single Attrition','<b>Divorced Attrition'))



fig.add_trace(go.Pie(values=att1['Count'],labels=att1['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Employee Attrition',showlegend=False),row=1,col=1)
fig.add_trace(go.Pie(values=att2[(att2['MaritalStatus']=='Married')]['Count'],labels=att2[(att2['MaritalStatus']=='Married')]['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Married Attrition',showlegend=False),row=1,col=2)
fig.add_trace(go.Pie(values=att2[(att2['MaritalStatus']=='Single')]['Count'],labels=att2[(att2['MaritalStatus']=='Single')]['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Single Attrition',showlegend=True),row=1,col=3)
fig.add_trace(go.Pie(values=att2[(att2['MaritalStatus']=='Divorced')]['Count'],labels=att2[(att2['MaritalStatus']=='Divorced')]['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Divorced Attrition',showlegend=True),row=1,col=4)
fig.update_layout(title_x=0,template='simple_white',showlegend=True,legend_title_text="<b style=\"font-size:90%;\">Attrition",title_text='<b style="color:black; font-size:120%;"></b>',font_family="Times New Roman",title_font_family="Times New Roman")
fig.update_traces(marker=dict(line=dict(color='#000000', width=1)))

Single people have higher attrition rates than married and divorced people. Divorces have lower attrition rates.

2. Is revenue the main factor in employee turnover?

rate_att=df.groupby(['MonthlyIncome','Attrition']).apply(lambda x:x['MonthlyIncome'].count()).reset_index(name='Counts')
rate_att['MonthlyIncome']=round(rate_att['MonthlyIncome'],-3)
rate_att=rate_att.groupby(['MonthlyIncome','Attrition']).apply(lambda x:x['MonthlyIncome'].count()).reset_index(name='Counts')
fig=px.line(rate_att,x='MonthlyIncome',y='Counts',color='Attrition',title='Monthly Income basis counts of People in an Organization')
fig.show()

As shown above, at very low revenue levels (less than 5,000 people per month), attrition rates are clearly high.

This number drops further, but there is a small peak around 10k, indicating middle class living standards.

They tend to pursue a better standard of living and therefore move to different jobs.

When monthly income is reasonably good, the likelihood of an employee leaving the organization is low - as shown by the flat line.

plt.figure(figsize=(8,5))
sns.kdeplot(x=df['MonthlyIncome'],color='MediumVioletRed',shade=True,label='Monthly Income')
plt.axvline(x=df['MonthlyIncome'].mean(),color='k',linestyle ="--",label='Average: 6502.93')
plt.xlabel('Monthly Income')
plt.legend()
plt.title('Distribution of Monthly Income')
plt.show()

plot_df = data.groupby(['Department', 'Attrition', 'Gender'])['MonthlyIncome'].median()
plot_df = plot_df.mul(12).rename('Salary').reset_index().sort_values('Salary', ascending=False).sort_values('Gender')
fig = px.bar(plot_df, x='Department', y='Salary', color='Gender', text='Salary',  
             barmode='group', opacity=0.75, color_discrete_map={
    
    'Female': '#ACBCE3','Male': '#ACBCA3'},
             facet_col='Attrition', category_orders={
    
    'Attrition': ['Yes', 'No']})
fig.update_traces(texttemplate='$%{text:,.0f}', textposition='outside',
                  marker_line=dict(width=1, color='#28221F'))
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='#28221F')
fig.update_layout(title_text='Median Salaries by Department', font_color='#28221F',
                  yaxis=dict(title='Salary',tickprefix='$',range=(0,79900)),width=950,height=500,
                  paper_bgcolor='#F4F2F0', plot_bgcolor='#F4F2F1')
fig.show()

plot_df = data.copy()
plot_df['JobLevel'] = pd.Categorical(
    plot_df['JobLevel']).rename_categories( 
    ['Entry level', 'Mid level', 'Senior', 'Lead', 'Executive'])
col=['#73AF8E', '#4F909B', '#707BAD', '#A89DB7','#C99193']
fig = px.scatter(plot_df, x='TotalWorkingYears', y='MonthlyIncome', 
                 color='JobLevel', size='MonthlyIncome',
                 color_discrete_sequence=col, 
                 category_orders={
    
    'JobLevel': ['Entry level', 'Mid level', 'Senior', 'Lead', 'Executive']})
fig =fig.update_layout(legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
                       title='Correlation between Monthly income and total number of years worked and job level <br>',
                       xaxis_title='Total Working Years', yaxis=dict(title='Income',tickprefix='$'), 
                       legend_title='', font_color='#28221D',
                       margin=dict(l=40, r=30, b=80, t=120),paper_bgcolor='#F4F2F0', plot_bgcolor='#F4F2F0')
fig.show()

According to the scatter plot above, monthly income is positively correlated with total years of service, and there is a strong correlation between an employee's income and their job level.

3. Does job department affect attrition?

dept_att=df.groupby(['Department','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
fig=px.bar(dept_att,x='Department',y='Counts',color='Attrition',title='Department wise Counts of People in an Organization')
fig.show()

k=df.groupby(['Department','Attrition'],as_index=False)['Age'].count()
k.rename(columns={
    
    'Age':'Count'},inplace=True)
fig=go.Figure()
fig=make_subplots(rows=1,cols=3)
fig = make_subplots(rows=1, cols=3, specs=[[{
    
    "type": "pie"}, {
    
    "type": "pie"}, {
    
    "type": "pie"}]],subplot_titles=('Human Resources', 'Research & Development','Sales'))

fig =fig.add_trace(go.Pie(values=k[k['Department']=='Human Resources']['Count'],labels=k[k['Department']=='Human Resources']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Human Resources',showlegend=False),row=1,col=1)
fig =fig.add_trace(go.Pie(values=k[k['Department']=='Research & Development']['Count'],labels=k[k['Department']=='Research & Development']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Research & Development',showlegend=False),row=1,col=2)
fig =fig.add_trace(go.Pie(values=k[k['Department']=='Sales']['Count'],labels=k[k['Department']=='Sales']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Sales',showlegend=True),row=1,col=3)

fig =fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,legend_title_text="Attrition",title_text='<b style="color:black; font-size:100%;">Department wise Employee Attrition',font_family="Times New Roman",title_font_family="Times New Roman")
fig =fig.update_traces(marker=dict(line=dict(color='#000000', width=1)))
fig.show()

The data only includes 3 major departments, with the sales department having the highest attrition rate (25.84%), followed by human resources (19.05%).

The research and development department has the lowest attrition rate, which shows the stability and content of this department, as shown in the chart above (13.83%).

bus=df.groupby(['EducationField','Attrition'],as_index=False)['Age'].count()
bus.rename(columns={
    
    'Age':'Count'},inplace=True)
fig=go.Figure()
fig = make_subplots(rows=2, cols=3, specs=[[{
    
    "type": "pie"}, {
    
    "type": "pie"}, {
    
    "type": "pie"}],[{
    
    "type": "pie"}, {
    
    "type": "pie"}, {
    
    "type": "pie"}]],subplot_titles=('Life Sciences', 'Medical','Marketing','Technical Degree','Human Resources','Other'))

fig.add_trace(go.Pie(values=bus[bus['EducationField']=='Life Sciences']['Count'],labels=bus[bus['EducationField']=='Life Sciences']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Life Sciences',showlegend=False),row=1,col=1)
fig.add_trace(go.Pie(values=bus[bus['EducationField']=='Medical']['Count'],labels=bus[bus['EducationField']=='Medical']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Medical',showlegend=False),row=1,col=2)
fig.add_trace(go.Pie(values=bus[bus['EducationField']=='Marketing']['Count'],labels=bus[bus['EducationField']=='Marketing']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Marketing',showlegend=True),row=1,col=3)
fig.add_trace(go.Pie(values=bus[bus['EducationField']=='Technical Degree']['Count'],labels=bus[bus['EducationField']=='Technical Degree']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Technical Degree',showlegend=False),row=2,col=1)
fig.add_trace(go.Pie(values=bus[bus['EducationField']=='Human Resources']['Count'],labels=bus[bus['EducationField']=='Human Resources']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Human Resources',showlegend=False),row=2,col=2)
fig.add_trace(go.Pie(values=bus[bus['EducationField']=='Other']['Count'],labels=bus[bus['EducationField']=='Other']['Attrition'],hole=0.7,marker_colors=['DeepSkyBlue','LightCoral'],name='Other',showlegend=False),row=2,col=3)

fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,legend_title_text="Attrition",title_text='<b style="color:black; font-size:100%;">Employee Attrition based on Education Feild',font_family="Times New Roman",title_font_family="Times New Roman")
fig.update_traces(marker=dict(line=dict(color='#000000', width=1)))

Employee turnover rates are highest in human resources, marketing and technical degree education areas.

Education in the medical and life science fields has lower employee turnover rates.

k=df.groupby(['JobRole','Attrition'],as_index=False)['Age'].count()
a=k[k['Attrition']=='Yes']
b=k[k['Attrition']=='No']
a['Age']=a['Age'].apply(lambda x: -x)
k=pd.concat([a,b],ignore_index=True)
k['Count']=k['Age']
k.rename(columns={
    
    'JobRole':'Job Role'},inplace=True)
fig=px.bar(k,x='Job Role',y='Count',color='Attrition',template='simple_white',text='Count',color_discrete_sequence=['LightCoral','DeepSkyBlue'])
fig=fig.update_yaxes(range=[-200,300])
fig=fig.update_traces(marker=dict(line=dict(color='#000000', width=1)),textposition = "outside")
fig=fig.update_xaxes(visible=True)
fig=fig.update_yaxes(visible=True)
fig=fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,title_text='<b style="color:black; font-size:105%;">Employee Attrition based on Job Roles</b>',font_family="Times New Roman",title_font_family="Times New Roman")
fig.show()

Most employees hold positions as sales executives, research scientists, and laboratory technicians.

The employee positions with the highest turnover rates are sales executives, sales representatives, laboratory technicians, and research scientists.

The positions with the least staff turnover were research directors, managers and health care representatives.

4. How does environmental satisfaction affect employee turnover?

sats_att=df.groupby(['JobSatisfaction','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
fig = px.area(sats_att,x='JobSatisfaction',y='Counts',color='Attrition',title='Job Satisfaction level Counts of People in an Organization')
fig.show()

The chart above shows that higher job satisfaction is associated with lower employee turnover.

Additionally, attrition decreases in the environmental satisfaction range of 1-2 but increases from 2-3, indicating that individuals may leave for better opportunities.

5. Will the stocks provided by the company to employees affect employee turnover?

stock_att=df.groupby(['StockOptionLevel','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
fig = px.bar(stock_att,x='StockOptionLevel',y='Counts',color='Attrition',title='Stock facilities level wise People in an Organization')
fig.show()

Fewer stock options significantly increase the likelihood that an employee will leave the organization.

The availability of stock is a huge financial incentive for employees to stay with the company for a few years.

But individuals with few or no stock options will be more likely to leave an organization because they don't have the same financial incentives that tie them to the company.

6. How does work experience affect employee turnover?

ncwrd_att=df.groupby(['NumCompaniesWorked','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
fig = px.area(ncwrd_att,x='NumCompaniesWorked',y='Counts',color='Attrition',title='Work Experience Distribution: Analyzing the Level of Work Experience in an Organization')
fig.show()

The chart above clearly shows that employees who start their career with a company or join early in their career are more likely to leave for another organization.

Conversely, individuals who gain extensive work experience at multiple companies tend to exhibit greater loyalty and are more likely to stay with the companies they join.

7. Will salary increase percentage affect turnover?

hike_att=df.groupby(['PercentSalaryHike','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
px.line(hike_att,x='PercentSalaryHike',y='Counts',color='Attrition',title='Distribution of Hike Percentage')

Higher salary increases motivate people to work better and stay in the organization.

As a result, we see employees leaving organizations with low pay raises at much higher rates than companies with good pay raises.

fig=px.box(df,x='JobRole',y='PercentSalaryHike',color='Attrition',color_discrete_sequence=['LightCoral','DeepSkyBlue'],template='simple_white')
fig.update_xaxes(visible=True)
fig.update_yaxes(visible=True)
fig.update_layout(title_x=0.5,template='simple_white',showlegend=True,title_text='<b style="color:black; font-size:105%;">Education wise Employee Attrition based on % Salary Hike </b>',font_family="Times New Roman",title_font_family="Times New Roman")
fig.show()

8. Are managers the reason people quit?

man_att=df.groupby(['YearsWithCurrManager','Attrition']).apply(lambda x:x['DailyRate'].count()).reset_index(name='Counts')
px.line(man_att,x='YearsWithCurrManager',y='Counts',color='Attrition',title='Count of people spending years with a Manager in an Organization')

When we analyzed employees’ relationships with their managers, we noticed 3 major spikes in attrition rates.

Initially, people tend to leave after spending relatively little time with their manager, given their relationship with the previous manager.

On average two years, employees also tend to seek change when they feel they need improvement.

When one stays with a manager for a little longer (around 7 years), people tend to find their career growth plateauing and tend to seek change.

But when the relative amount of time spent with a manager is substantial, people become satisfied with their jobs. Therefore, the likelihood of employees quitting is very low.

/ 05 / Summary

What follows are some suggestions for this data analysis project.

  • Addressing the Gender Disparity in Employee Turnover: Investigate and address factors that contribute to higher rates of male employee turnover.
  • Focus on salary: In-depth analysis of salary structure to ensure competitiveness.
  • Enhance work-life balance: Prioritize work-life balance with flexible schedules and employee support programs.
  • Strengthen manager-employee relationships: Invest in building strong relationships and provide management training.
  • Provide opportunities for growth and development: Provide training, mentoring and a clear career path.
  • Assess and improve job satisfaction: Regularly assess satisfaction and resolve issues.
  • Review and optimize compensation and benefits: Ensure competitive compensation and attractive incentives are provided.
  • Focus on retaining sales and HR employees: Implement retention strategies specific to these departments.

Guess you like

Origin blog.csdn.net/qq_34160248/article/details/134655714