【Detailed Explanation of Practical Cases of MATLAB Data Processing (15)】—Using BP Neural Network to Realize Personal Credit Evaluation

1. Problem description

The evaluation methods of personal credit are mainly divided into two types: qualitative evaluation and quantitative evaluation. The former is mainly based on the subjective judgment of credit officers, and the latter is based on the information of individual customers, using tools such as score cards and credit scoring models for analysis.
In this paper, BP neural network is used to learn an abstract model with known user information and credit status as training samples, and then evaluate new samples, and the correct rate is stable at more than 70%.
Do binary classification for all customers, only distinguish between good and bad. The data uses the German credit database . The German credit database is organized by Professor Hans Hofmann, which contains 1,000 customer profiles, each customer contains 20 attributes, and is marked with good or bad credit.
The download address of the dataset is as follows:
http://ftp.ics.uci.edu/pub/machine-learning-databases/statlog/german .

The original data is stored in the german.data file, including:
current account status, account duration, loan history, loan purpose, loan amount, etc.

2. BP neural network realizes the principle of personal credit credit evaluation

Each customer in the data set contains 20 attributes, among which the numerical attributes can be used directly, and the category attributes can be used after integer encoding. In addition, the database also provides another processed file german.data-numeric, which encodes the category attributes of the original file with integers to form 24 numerical attributes, which can be used directly. This article mainly uses the original book data for evaluation, and finally gives the results.

Use MATLAB to implement a three-layer BP neural network. Since each individual user has 24 attributes, the input layer contains 24 neuron nodes. The problem is a binary classification problem for good/bad credit, so the output layer contains only one neuron. The number of neurons in the hidden layer is related to the performance of the network and needs to be determined through experiments.
The process is as follows:
insert image description here

3. Algorithm steps

3.1 Read in data

close all;
clear,clc

%% 读入数据
% 打开文件
fid = fopen('german.data', 'r');

% 按格式读取每一行
% 每行包括21项,包括字符串和数字
C = textscan(fid, '%s %d %s %s %d %s %s %d %s %s %d %s %d %s %s %d %s %d %s %s %d\n');

% 关闭文件
fclose(fid);

3.2 Divide training samples and test samples

Among all 1000 samples, there are 700 positive examples (good reputation) and 300 negative examples (bad reputation). When dividing, take the first 350 positive examples and the first 150 negative examples as training samples, and the last 350 positive examples and last 150 negative examples as test samples.

% 输入向量
x = C1(1:N, :);
% 目标输出
y = C1(N+1, :);

% 正例
posx = x(:,y==1);
% 负例
negx = x(:,y==2);

% 训练样本
trainx = [ posx(:,1:350), negx(:,1:150)];
trainy = [ones(1,350), ones(1,150)*2];

% 测试样本
testx = [ posx(:,351:700), negx(:,151:300)];
testy = trainy;

3.3 Sample normalization

Normalize the input samples using the mapminmax function:

% 训练样本归一化
[trainx, s1] = mapminmax(trainx);

% 测试样本归一化
testx = mapminmax('apply', testx, s1);

3.4 Create a BP neural network and complete the training

code show as below:

% 创建BP网络
net = newff(trainx, trainy);
% 设置最大训练次数
net.trainParam.epochs = 1500;
% 目标误差
net.trainParam.goal = 1e-13;
% 显示级别
net.trainParam.show = 1;

% 训练
net = train(net,trainx, trainy);

3.5 Testing

The output value of the BP network is not limited to 1 or 2, but a real number, so the output needs to be converted to an integer. Take 1.5 as the threshold, the output smaller than the threshold is judged as 1 (good credit), otherwise it is judged as 2 (bad credit)

y0 = net(testx);

% y0为浮点数输出。将y0量化为1或2。
y00 = y0;
% 以1.5为临界点,小于1.5为1,大于1.5为2
y00(y00<1.5)=1;
y00(y00>1.5)=2;

% 显示正确率
fprintf('正确率: \n');
disp(sum(y00==testy)/length(y00));

4. Running results

The training process is as follows:
insert image description here
Performance curve:
insert image description here


If you need the source code, you can refer to the resource: https://download.csdn.net/download/didi_ya/87734937 .
It is not easy to make, if it is helpful to you, remember to like it~

Guess you like

Origin blog.csdn.net/didi_ya/article/details/130401851