Financial Risk Control Project Actual Combat - Bank Credit Card Loss Prediction Model_Based on ANN Neural Network_Financial Training_Thesis Research_Graduation Project

business background

According to the data released by the central bank, the growth rate of national bank credit card and loan combined card issuance has declined year by year from the high point of 26.35% year-on-year growth in 2017, and the year-on-year growth rate has dropped to 4.26% by 2020. Against the background that the growth rate of bank credit card issuance has slowed down significantly, the problem of preventing the loss of old customers has become more and more important.

Suppose a business manager at a consumer credit card bank is facing customer churn. Managers want to analyze data, find out what's behind it, and use that data to predict customers who are likely to churn. Managers also try to find the main characteristics of churn customers and make recommendations to reduce churn.

descriptive statistics

In this project, we will build an artificial neural network model for predicting credit card churn.

The first task in this business problem is to identify customers who are churning.

Even if we predict non-churn customers as churn, it doesn't hurt our business.

However, predicting churners as non-churners does the trick.

So the recall rate (TP/TP+FN) needs to be higher.

The dataset consists of 10,000 customers (entries),

They mentioned their age, salary, marital status, credit card limit, credit card type, etc.

So these 19 attributes (features) will be our input to the neural network.

The figure below shows the variable correlation analysis

The following figure is a histogram visualization of variables

Here is a KDE plot of the univariate one-year contract totals.

Since the dataset is characterized by multiple formats; mostly strings and integers, it needs to be prepared.

Only 16.07% of our customers abandon credit card services.

Therefore, we have an unbalanced dataset.

To deal with this imbalance, we will assign weights to the two classes of target variables to make them balanced.

We prepare the dataset for the ANN by replacing the string variables in the feature column with integers,

We also removed the "CLIENTNUM" column as it is not a feature that affects the target variable.

#%% Importing Libraries
import matplotlib.pyplot as pltimport pandas as pdimport numpy as npimport seaborn as snsfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScaler, RobustScaler,MinMaxScalerfrom sklearn.metrics import accuracy_score,confusion_matrix,f1_score,matthews_corrcoef,precision_score,recall_scorefrom tensorflow.keras.layers import Densefrom tensorflow.keras.models import Sequentialfrom sklearn.utils import class_weight
#%% Loading the Dataset
df = pd.read_csv('C:/Users/Sahil Bagwe/Desktop/Python/dataset/Bank/BankChurners.csv')df = df.drop(df.columns[21:23],axis=1)df=df.drop('CLIENTNUM',axis=1)  
#%% Preparing the Dataset
df['Gender'].replace('M',1,inplace = True)df['Gender'].replace('F',0,inplace = True) df['Education_Level'].replace('Unknown',0,inplace = True)df['Education_Level'].replace('Uneducated',1,inplace = True)df['Education_Level'].replace('High School',2,inplace = True)df['Education_Level'].replace('College',3,inplace = True)df['Education_Level'].replace('Graduate',4,inplace = True)df['Education_Level'].replace('Post-Graduate',5,inplace = True)df['Education_Level'].replace('Doctorate',6,inplace = True)
df['Marital_Status'].replace('Unknown',0,inplace = True)df['Marital_Status'].replace('Single',1,inplace = True)df['Marital_Status'].replace('Married',2,inplace = True)df['Marital_Status'].replace('Divorced',3,inplace = True)
df['Card_Category'].replace('Blue',0,inplace = True)df['Card_Category'].replace('Gold',1,inplace = True)df['Card_Category'].replace('Silver',2,inplace = True)df['Card_Category'].replace('Platinum',3,inplace = True)

df['Income_Category'].replace('Unknown',0,inplace = True)df['Income_Category'].replace('Less than $40K',1,inplace = True)df['Income_Category'].replace('$40K - $60K',2,inplace = True)df['Income_Category'].replace('$60K - $80K',3,inplace = True)df['Income_Category'].replace('$80K - $120K',4,inplace = True)df['Income_Category'].replace('$120K +',5,inplace = True)
df['Attrition_Flag'].replace('Existing Customer',0,inplace = True)df['Attrition_Flag'].replace('Attrited Customer',1,inplace = True)

Preprocess the dataset

We start this phase by splitting the dataset into a feature matrix (x) and a target variable (y). Since the values ​​of the data vary widely, it is necessary to scale the values ​​to normalize the range of these values. Robust Scaler removes the median and scales the data according to the quantile range (IQR: Interquartile Range by default). IQR is the range between the first quartile (25th quantile) and the third quartile (75th quantile).

Build an artificial neural network

Since the dataset is unbalanced, we need to assign category weights to it.

This is done by calculating the ratio of churned customers to the total number of customers.

Next, we build a 3-layer neural network.

The input layer contains as many neurons as there are columns in the feature matrix.

The output layer consists of a layer that predicts the output, i.e. 1 for churned customers and 0 for existing customers.

The number of neurons in the hidden layer is usually a value between the number of neurons in the input and output layers.

It is considered safe to take the number of neurons in the hidden layer as the average of the neurons in the input and output layers.

#%% Assigning weights to classescw = class_weight.compute_class_weight('balanced', np.unique(Y_train), Y_train)a = y.value_counts()ratio = a[1]/(a[1]+a[0])weights = [ratio, 1-ratio]
#%% Building the Model
model = Sequential()model.add(Dense(19,activation="sigmoid"))model.add(Dense(10,activation="sigmoid"))model.add(Dense(1))model.compile(optimizer='rmsprop',loss = "binary_crossentropy",metrics=["BinaryAccuracy"],loss_weights=weights)

Predict Churn

Since the dataset is unbalanced, we need to assign category weights to it.

This is done by calculating the ratio of churned customers to the total number of customers.

Next, we build a 3-layer neural network.

The input layer contains as many neurons as there are columns in the feature matrix.

The output layer consists of a layer that predicts the output, i.e. 1 for churned customers and 0 for existing customers.

The number of neurons in the hidden layer is usually a value between the number of neurons in the input and output layers.

It is considered safe to take the number of neurons in the hidden layer as the average of the neurons in the input and output layers.

#%% Predicting history = model.fit(x=X_train,y=Y_train,epochs=100, class_weight = {0:cw[0], 1:cw[1]})predictions = model.predict_classes(X_test)

Through model verification, the accuracy rate reaches 0.89, and the recall rate reaches 0.9, which is a very good model performance.

The model for predicting bank credit card loss will be introduced here. More practical cases of "Python Financial Risk Control Scorecard Model and Data Analysis (Enhanced Edition)" will be updated regularly for bank training. Please scan the QR code below. Remember to bookmark the course.

Copyright statement: The article comes from the official account (python risk control model), without permission, no plagiarism. Follow the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprinting

Guess you like

Origin blog.csdn.net/fulk6667g78o8/article/details/131233398
Recommended