Use Python to generate a word cloud map of people's loan reasons

Table of contents

1. Write in front

2. About the code

3. Some word cloud maps

3.1 Filtering conditions: None

3.2 Filtering conditions: gender-male

3.3 Filtering conditions: gender-female

3.4 Screening criteria: Tutorial level - postgraduate and above

3.5 Screening criteria: education level - undergraduate 

3.6 Filter criteria: Hometown - Fujian

3.7 Screening criteria: Hometown - Guangdong

3.8 Screening criteria: reason for borrowing - including the word "Apple"

 4. Code

4.1 Import library

4.2 Import data

4.3 Setting stop words

4.4 Generate word cloud map code

5. Write at the end


1. Write in front

Historical blog post about Renrendai: Renrendai Loose Standard Crawler Example , Use python to process 280,000 pieces of personal loan data, and tell you the most detailed structure distribution of borrowers - Xiaozhan Keji-CSDN Blog

Three points were mentioned in the previous blog post about Renrendai. First, we can continue to mine data, such as analyzing the distribution of educational backgrounds of various age groups. Second, we can use the data of Renrendai to train neural network models for credit evaluation. It is possible to use the column data of the reason for borrowing to generate a word cloud map.

Since I have been busy with related research on blockchain and supply chain finance recently, I will pick the soft persimmons first this time. It is very fast to generate a word cloud map.

Finally, if you need Renrendai loan data, please private message me!

2. About the code

I won’t go into details about the method of generating word graphs. There are a lot of tutorials on the Internet, such as Python making cool word cloud graphs (including stop words and word frequency statistics)! ! ! _gjgfjgy's blog-CSDN blog_Stop word analysis, drawing word cloud graph
EDG won the championship, using Python to analyze a wave: fans are all fried_The beauty of data analysis and statistics-CSDN blog
Here is a little about

A more common usage of pandas: filter rows/columns that contain a certain keyword!

First of all, the data is as shown in the picture above, which contains a total of 284,316 reasons for borrowing. What should I do if I want to find out the data that contains the word "apple" in the reason for borrowing?

conciseData[conciseData["借款理由"].str.contains("苹果",na=False)]["借款理由"]

As can be seen from the figure above, there are only 646 records of borrowing to buy Apple mobile phones, accounting for 0.23%. It seems that there are not many people who borrow to buy Apple mobile phones.

3. Some word cloud maps

3.1 Filtering conditions: None

3.2 Filtering conditions: gender-male

3.3 Filtering conditions: gender-female

3.4 Screening criteria: Tutorial level - postgraduate and above

3.5 Screening criteria: education level - undergraduate 

3.6 Filter criteria: Hometown - Fujian

3.7 Screening criteria: Hometown - Guangdong

3.8 Screening criteria: reason for borrowing - including the word "Apple"

 4. Code

4.1 Import library

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
            
import matplotlib.ticker as ticker
import mpl_toolkits.axisartist as AA
from mpl_toolkits.axisartist.axislines import SubplotZero
import pylab

import jieba   
from wordcloud import WordCloud

pylab.mpl.rcParams['font.sans-serif'] = ['SimHei'] #显示中文
plt.rcParams['axes.unicode_minus']=False  #用于解决不能显示负号的问题

4.2 Import data

data = pd.read_csv("all.csv",encoding="gbk",header=None,parse_dates=True)
data.columns = ["id","借款时间(月)","剩余还款时间(月)","借款金额","notPayInterest","productRepayType",
               "贷款类型","利率","性别","籍贯","出生日期","教育程度","工作单位","行业","公司规模","职位","收入",
               "车贷","汽车数量","婚姻状况","房贷","房子数量","信用等级","none","none","none","借款理由"]

conciseData = data[["id","借款时间(月)","剩余还款时间(月)","借款金额","贷款类型","利率","性别","籍贯","出生日期","教育程度","工作单位","行业","公司规模","职位","收入",
               "车贷","汽车数量","婚姻状况","房贷","房子数量","信用等级","借款理由"]]
conciseData = conciseData.set_index("id")
conciseData = conciseData.dropna(how="all")

4.3 Setting stop words

stopWords = ["人人","真实有效","同时","符合","借款人","提供","上述","考察","实地",
    "已经","希望","大家","认证","审核","此次","公司","众信","借款","谢谢","比较","第一次","压力",
        "贷","的","标准","方友","业","还款","收入","用于","信息","以上","问题","好","一下","通过",
            "稳定","全国","企业","位于","该","为","自己","现居","工作","单位","但","高","一些","还清",
                "行业","主要","从事","有","无","良好","贷款","累计","自","放心","家里","吱吱","为了","放款",
                    "多","在","年","所","抵押","无担保","服务","本人","多多","小额贷款","想","与","借","给","建立"
                        "支持","至今","安信","良好","最","多","探索","大","小","证大速贷","成立","于","信用","成立",
                            "每月","流水","一家","因为","我","和","是","做","所以","迅速","以来","需"
                                "快速","简便","可以","专门","资料","经","了","也","现在","由于",
                                    "测试","需要","元","也","还","个","月","人","申请","等",
                                        "能","了","及","没有","现在","就","进行","都","各位","急急",
                                            "每个","准备","有限公司","目前","保证","按时","因","可","持续","一个",
                                                "上","到","万","要","现","来","想","个人","左右","不","年底","能力",]

4.4 Generate word cloud map code

Since there are too many 28W pieces of data, the step size here is 3 pairs of data slices!

txt = ""

for each in conciseData[conciseData["性别"]=="男"]["借款理由"][::3]:
    if isinstance(each,str):
            txt += each + "  "
            
words = jieba.cut(txt) #分词

result = ""
for each in words:
    if each not in stopWords:
        result += each + " "
        

wordshow = WordCloud(background_color='black',
                     width=800,
                     height=800,
                     max_words=800,
                     max_font_size=100,
                     font_path="msyh.ttc",    
                     ).generate(result)

wordshow.to_file('男.png')

5. Write at the end

All living beings are suffering, not just you, letting go is freedom.

Guess you like

Origin blog.csdn.net/zsllsz2022/article/details/121355465