Portraits of lost telecom users

The three major operators China Telecom, China Unicom and China Mobile all want to expand their customer base. According to research, the cost of acquiring new customers is much higher than the cost of retaining existing customers. Therefore, in order to gain an advantage in fierce competition, it has become a major challenge to predict in advance whether users will be lost and take retention measures. This article will explore with you the portrait of lost telecom customers. Subsequent articles will predict the loss of telecom users.

  

1. Data reading and analysis

  

1 Introduction to data sets

  
First, let’s introduce the data set, which contains a total of 7043 users’ information. Each row stores a sample of a user. Each sample contains 21 attributes, consisting of basic user information, business activation information, signed contract information, and target variables. The details are as follows:

Insert image description here

  

2 Read data

  
Then read the data into Python for preprocessing. The code for reading the data is as follows:

import os
import numpy as np
import pandas as pd 

os.chdir(r'F:\公众号\电信客户流失')
data = pd.read_csv('Customer_Churn.csv')
data.head(2)

Parameter explanation:
  
import: import library.
  
os.chdir: Set the location where data is read.
  
pd.read_csv: Read data in csv format.
  
data.head(2): Print the first 2 lines of data data.
  
got the answer:
  
Insert image description here

  
  

3 Probability of customer churn

  
Then look at the probability of customer churn. The code is as follows:

data.Churn.value_counts()/len(data.Churn)

got the answer:

No     0.73463
Yes    0.26537
Name: Churn, dtype: float64

It can be found that the proportion of lost customers is 0.265.

  
  

4 Customer churn rate corresponding to each category under different indicators

  

Finally, let’s look at the probability of customer churn of each category under different indicators. The code is as follows:

data['y'] = 0
data['y'][data['Churn'] =='Yes'] = 1
#流失客户y标签值为1否则为0
for i in data.columns[1:-2]:
    pivot_result = pd.pivot_table(data, values='y', index=[i], aggfunc=['count', np.sum, np.mean], margins=True )
    #对不同指标求缺失率
    pivot_result.columns = ['客户数', '流失客户数', '客户流失率']
    #重命名pivot_table列名
    pivot_result['客户流失率'] = pivot_result['客户流失率'].apply(lambda x:round(x, 3))
    #客户流失率值保留3位小数
    display(pivot_result)
    #展示结果
    print('================================')

The results are obtained (because the loop analyzes all variables, only one indicator is displayed here, and will be explained later): It can be found that among the gender indicators, the
  
Insert image description here
  
customer churn rates corresponding to boys and girls are 0.269 and 0.262, respectively. The overall customer churn rate is 0.265, which is not much different. It shows that gender indicators have little impact on customer churn.

  
  

2. Profile analysis of lost customers - details

  

1 Are you an elderly person?

  
The customer churn rates corresponding to different values ​​of the elderly indicator are as follows:
  
Insert image description here
  
the number of elderly and non-elderly customers are 1142 and 5901 respectively. The proportion of elderly customers in the number of customers is much smaller than that of non-elderly people. The attrition rate of the elderly is 0.417, which is much higher than the attrition rate of non-elderly people, which is 0.236. Explain that if you want to increase user retention, you can consider offering some discounts to the elderly or taking some incentives to reduce the loss of old users.

  
  

2 Do you have a partner?

  

The customer churn rates corresponding to different values ​​​​of whether there is a partner indicator are as follows:
  
Insert image description here
  
the number of customers with partners and without partners are 3402 and 3641 respectively. The distribution of the number of customers is relatively even. However, the churn rate for customers without a partner is 0.33, which is much higher than the churn rate for customers with a partner, 0.197. Customer retention methods can take corresponding measures based on different humanistic habits in different regions.

  
  

3. Do you have any family members?

  
The customer churn rates corresponding to different values ​​of the indicator whether there are family members are as follows:
  
Insert image description here
  
the number of customers with family members and those without family members are 2110 and 4933 respectively. The number of customers with family members is less than that without family members. At the same time, the churn rate of customers with family members is 0.155, which is lower than the churn rate of customers without family members, 0.313.

  
  

4. Whether to open telephone service business

  
The customer churn rates corresponding to different values ​​of the indicator of whether to activate the telephone service business are as follows: the
  
Insert image description here
  
number of customers who have activated the telephone service business and those who have not activated the telephone service business are 6361 and 682 respectively. The number of customers who have activated the telephone service business far exceeds those who have not activated the telephone service business. The customer churn rates for those with and without phone service are respectively 0.267 and 0.249. The customer churn rate for those without phone service is slightly higher than that for those with phone service. It shows that whether the telephone service business indicator is opened has little impact on customer churn.

  
  

5. Whether to activate multi-line services

  
The customer churn rates corresponding to different values ​​of the multi-line service indicator are as follows: the
  
Insert image description here
  
number of customers who have not activated, have no telephone service and have activated are 3390, 682 and 2971 respectively. The number of customers who have not activated the multi-line service is the largest, and the number of customers who have no telephone service is the largest. least. The churn rates of customers without activation, without telephone service and activation are 0.25, 0.249 and 0.286 respectively. The churn rates of customers without activation and without telephone service are very close, slightly lower than those of customers with multi-line services. It shows that whether or not multi-line business indicators are activated has little impact on customer churn.

  
  

6 Whether to activate Internet services

  
The customer churn rates corresponding to different values ​​of the Internet service indicator are as follows: the number of
  
Insert image description here
  
customers who have opened digital subscriber lines, optical fiber and those who have not opened Internet services are 2421, 3096 and 1526 respectively. The number of customers who have opened optical fiber is the largest, and the number of customers who have not opened Internet services is the largest. least. The highest customer churn rate is for customers who have opened optical fiber services, with a value of 0.419, followed by customers who have opened digital subscriber lines, with a value of 0.19, and the lowest is for customers who have not opened Internet services, with a value of 0.074.

  
  

7. Whether to activate network security services

  
The customer churn rates corresponding to different values ​​of the indicator of whether to activate network security services are as follows:
  
Insert image description here
  
the number of customers who have not activated network security services, do not have Internet services, and have activated network security services are 3498, 1526, and 2019 respectively. The highest customer churn rate is for customers who have not activated network security services, with a value of 0.418, followed by customers who have activated network security services, with a value of 0.146, and the lowest is for customers without network services, with a value of 0.074.

  
  

8 Whether to enable online backup

  
The customer churn rates corresponding to different values ​​of the online backup indicator are as follows:
  
Insert image description here
  
the number of customers without online backup, without Internet service, and with online backup enabled are 3088, 1526, and 2429 respectively. The highest customer churn rate is for customers who have not activated online backup, with a value of 0.399, followed by customers who have activated online backup, with a value of 0.215, and the lowest is for customers without network services, with a value of 0.074.

  
  

9 Whether to enable device protection

  
The customer churn rates corresponding to different values ​​of the device protection indicator are as follows:
  
Insert image description here
  
the number of customers without device protection, without Internet service and with device protection enabled are 3095, 1526 and 2422 respectively. The highest customer churn rate is for customers who have not activated device protection, with a value of 0.391, followed by customers who have activated device protection, with a value of 0.225, and the lowest is for customers without network services, with a value of 0.074.

  
  

10 Whether to order technical support services

  
The customer churn rates corresponding to different values ​​of the indicator whether to subscribe to technical support services are as follows:
  
Insert image description here
  
the number of customers who do not subscribe to technical support services, do not have Internet services, and subscribe to technical support services are 3473, 1526, and 2044 respectively. The highest customer churn rate is for customers who have not subscribed to technical support services, with a value of 0.416, followed by customers who have subscribed to technical support services, with a value of 0.152, and the lowest is for customers without network services, with a value of 0.074.

  
  

11 Whether to subscribe to Internet TV

  
The customer churn rates corresponding to different values ​​​​of whether to subscribe to Internet TV indicators are as follows:
  
Insert image description here
  
the number of customers who do not subscribe to Internet TV, do not have Internet services, and subscribe to Internet TV are 2810, 1526, and 2707 respectively. The highest customer churn rate is for customers who have not subscribed to Internet TV, with a value of 0.335, followed by customers who have subscribed to Internet TV, with a value of 0.301, and the lowest is for customers without Internet services, with a value of 0.074.
  
  

12 Whether to order online movies

  
The customer churn rates corresponding to different values ​​​​of whether to order online movies are as follows:
  
Insert image description here
  
the number of customers who have not ordered online movies, have no Internet services, and subscribed to online movies are 2785, 1526, and 2732 respectively. The highest customer churn rate is for customers who have not ordered online movies, with a value of 0.337, followed by customers who have ordered online movies, with a value of 0.299, and the lowest is for customers without Internet services, with a value of 0.074.
  
  

13 Methods of signing a contract

  
The customer churn rates corresponding to different values ​​of the contract signing method indicator are as follows:
  
Insert image description here
  
The number of customers in Month-to-month, One year and Two year are 3875, 1473 and 1695 respectively. The highest customer churn rate is Month-to-month customers, with a value of 0.427, followed by One year customers, with a value of 0.113, and the lowest is Two year customers, with a value of 0.028.
  
  

14 Whether to activate electronic bills

  
The customer churn rates corresponding to different values ​​of the electronic bill indicator are as follows:
  
Insert image description here
  
the number of customers who have not activated electronic bills and those who have activated electronic bills are 2872 and 4171 respectively. The churn rate of customers who have activated electronic bills is higher, with a value of 0.336, and the churn rate of customers who have not activated electronic bills is 0.163.

  
  

15 Client payment methods

  
The customer churn rates corresponding to different values ​​of the client payment method indicator are as follows:
  
Insert image description here
  
The number of customers for Bank transfer, Credit card, Electronic check and Mailed check are 1544, 1522, 2365 and 1612 respectively. The highest customer churn rate is for Electronic check customers, with a value of 0.453, followed by Mailed check customers, with a value of 0.191. Bank transfer and Credit card are the lowest, with values ​​respectively. are 0.167 and 0.152.
  
Since the values ​​of the three indicators of product usage time, monthly cost, and total cost are relatively scattered, no conclusion can be drawn by applying the above pivot_table analysis, so the IV analysis method is used.
  
  

16 Length of using the product

  

First define the function to calculate IV, the code is as follows:

#切割变量
def bin_cut(data,x,y,n=10): #x为待分箱的变量,y为target变量.n为分箱数量
    total = y.count()         #计算总样本数
    bad = y.sum()             #计算坏样本数
    good = total-bad          #计算好样本数
    if x.value_counts().shape[0]==2:
        d1 = pd.DataFrame({
    
    'x':x,'y':y,'bucket':pd.cut(x,2)})
    elif x.value_counts().shape[0]<=50:
        cutOffPoints = ChiMerge_MaxInterval_Original(data, i, 'flag')
        cutOffPoints.append(max(data[i]))
        cutOffPoints.insert(0, min(data[i])-0.1)
        d1 = pd.DataFrame({
    
    'x':data_1[i],'y':data_1['flag'],'bucket':pd.cut(data_1[i],cutOffPoints)})
    else:
        d1 = pd.DataFrame({
    
    'x':x,'y':y,'bucket':pd.qcut(x,n,duplicates='drop')}) #用pd.cut实现等频分箱
    d2 = d1.groupby('bucket',as_index=True)     #按照分箱结果进行分组聚合
    d3 = pd.DataFrame(d2.x.min(),columns=['min_bin'])
    d3['min_bin'] = d2.x.min()  #箱体的左边界
    d3['max_bin'] = d2.x.max()  #箱体的右边界
    d3['bad'] = d2.y.sum()      #每个箱体中坏样本的数量
    d3['total'] = d2.y.count()  #每个箱体的总样本数
    d3['bad_rate'] = d3['bad']/d3['total']  #每个箱体中坏样本所占总样本数的比例
    d3['badattr'] = d3['bad']/bad           #每个箱体中坏样本所占坏样本总数的比例
    d3['goodattr'] = (d3['total'] - d3['bad'])/good    #每个箱体中好样本所占好样本总数的比例
    d3['woe'] = np.log(d3['badattr']/d3['goodattr'])   #计算每个箱体的woe值
    iv = ((d3['badattr']-d3['goodattr'])*d3['woe']).sum()      #计算变量的iv值
    d4 = (d3.sort_values(by='min_bin')).reset_index(drop=True) #对箱体从大到小进行排序
    cut = []
    cut.append(float('-inf'))
    for i in d4.min_bin:
        cut.append(i)
    cut.append(float('inf'))
    woe = list(d4['woe'].round(3))
    return iv,cut,woe,d4

Then calculate the IV value of the product usage time, the code is as follows:

i = 'tenure'
iv,cut,woe,d4 = bin_cut(data,data[i],data['y'],n=10)
print('===========', i, '============')
print('iv=', iv)
d4

Detailed explanation of input parameters:
  
data: data set
  
data[i]: variable to be calculated IV
  
data['y']: dependent variable y.
  
Conclusion:
  
Insert image description here
  
the IV value of the product duration variable is 0.823. Generally, the IV value of the variable is higher than 0.3, which belongs to A strong variable has a strong correlation with customer churn. It can be seen from the bad_rate column that the shorter the online time, the higher the customer churn rate.

  
  

17 months cost

  
Calculate the IV value of the monthly fee, the code is as follows:

i = 'MonthlyCharges'
iv,cut,woe,d4 = bin_cut(data,data[i],data['y'],n=10)
print('===========', i, '============')
print('IV=', iv)
d4

It is concluded that
  
Insert image description here  
the IV value of the monthly cost variable is 0.364, and the IV value of the general variable is higher than 0.3, which is a strong variable, that is, it has a strong correlation with customer churn. It can be seen from the bad_rate column that the lower the monthly fee, the lower the customer churn rate, but when the monthly fee is higher than 100 yuan, the churn rate decreases.
  
  

18 total cost

  
Calculate the IV value of the total cost, the code is as follows:

i = 'TotalCharges'
data[i] = data[i].fillna(0)
data[i] = data[i].replace(' ', 0).astype(float)
iv,cut,woe,d4 = bin_cut(data,data[i],data['y'],n=10)
print('===========', i, '============')
print('IV=', iv)
d4

It is concluded that
  
Insert image description here
  
the IV value of the total cost variable is 0.332, and the IV value of the general variable is higher than 0.3, which is a strong variable, that is, it has a strong correlation with customer churn. It can be seen from the bad_rate column that the lower the total cost, the higher the customer churn rate, which may be related to the fact that the customer is a new user.
  
  

3. Analysis of Lost Customer Portraits - Summary

  
The summarized portrait of lost customers is as follows:
  
Insert image description here
  
At this point, the portrait of telecom’s lost customers has been explained. A follow-up article will predict telecom customer churn, so stay tuned for the pictures.

[Free group membership for a limited time] Discuss learning Python, playing with Python, risk control modeling, artificial intelligence learning, data analysis, etc. in the group for free, and you can also exchange related problems encountered at work. Friends who need it can add WeChat ID 19967879837, and add a note about the group you want to join, such as risk control modeling.

  
You may be interested in:
Using Python to draw Pikachu
, Python to draw word clouds,
Python to draw 520 Eternal Heartbeat,
Python face recognition - I only have you in my eyes,
Python to draw a beautiful starry sky chart (beautiful background)
[Python] Valentine's Day Confession Fireworks ( With sound and text)
Use the py2neo library in Python to operate neo4j to build a correlation graph
Python romantic confession source code collection (love, roses, photo wall, confession under the starry sky)

Guess you like

Origin blog.csdn.net/qq_32532663/article/details/132134997