Big data analysis and modeling thinking skills and characteristics of algorithms

 In order to provide customers with better services, retail banks need to classify and manage customers by analyzing the customer data and information retained in the database of the banking system itself.

  In recent years, big data has become a hot spot in the technology and business circles, and more and more enterprises and researchers are paying attention to the application of big data. The analysis and mining technology of big data is in full swing in the scientific community, and various new algorithms for big data have been developed.

  At the same time, the application of big data analysis in business has been sought after by people, and various cases of the successful application of big data in business emerge in an endless stream, such as the accurate push of advertisements by Target, a large American retailer. This article will discuss the big data analysis technology and the role of big data analysis technology in the retail banking industry.

 

what is big data

  In 2011, McKinsey first proposed the concept of big data in a research report titled "Big Data, the Next New Frontier for Innovation, Competition, and Increased Generation Rates". The report believes that data has penetrated into every industry and business function area, and there is tremendous value in data, which will lead to data becoming an important production factor. In a 2012 New York Times op-ed, the era of "big data" has arrived, and in business, economics and beyond, final decisions will increasingly be made based on data and analysis, rather than experience and intuition. In March 2012, the Obama administration of the United States announced to invest 200 million US dollars to launch the "Big Data Research and Development Program", which is another major scientific and technological development deployment after the United States announced the "Information Superhighway" program in 1993. The U.S. government believes that big data is the "new oil of the future", and raising "big data research" to the will of the country will have a profound impact on future technological and economic development.

  Entering the 21st century, the rise of the Internet has contributed to a massive increase in the amount of data. In the Internet era, almost all people are producing data, and at the same time, the formation of data is also extremely rich. On the one hand, there are data actively generated by applications such as social networking, multimedia, collaborative creation, and virtual services; on the other hand, there are data recorded and collected during search engines and web browsing. The characteristics of data at this stage are user originality, initiative and interaction.

  According to a research report by the International Data Corporation (IDC), the total amount of data created and replicated worldwide in 2011 was 1.8ZB (data storage unit, zettabyte, equal to 1024 exabytes or 270 bytes), and the trend is increasing Following the new Moore's Law, it is expected that by 2020, the global data volume will double approximately every two years, and the world will have a data volume of 35ZB. It is because of the development of information technology that big data can be generated and developed. Big data technology is the ability to quickly obtain valuable information from massive and diverse data.

  Big data refers to the huge amount of data involved that cannot be intercepted, managed, processed, and organized into information that can be interpreted by humans within a reasonable time. In "The Age of Big Data" written by Victor Meyer-Schönberger and Kenneth Cookyer, big data analysis refers to the method of analyzing all data without random analysis of sample surveys.

  Based on the current understanding of big data, it is generally believed that big data has 4V characteristics, namely Volume (a large number), Variety (variety), Velocity (high speed), and Value (value). These four characteristics describe the big data analysis technology from four aspects: First, the data volume is huge. From TB level to PB level, and even jumping to EB and even ZB level; Second, the data types are diverse. There are various structured and unstructured data including web text, logs, videos, pictures, geographic location information, etc. All information is data. Third, the processing speed is fast. Using various big data analysis tools, such as hadoop and SPSS, can quickly obtain high-value information from various types of data, which is fundamentally different from traditional data analysis techniques. Fourth, as long as the data is used reasonably and analyzed correctly and accurately, mining the hidden correlations within the data will bring high value returns.

  Different from traditional logical reasoning research, big data research is the analysis and induction of statistical search, comparison, clustering and classification of a huge amount of data. Big data analysis pays more attention to the correlation or correlation of data. The so-called "correlation" refers to the existence of a certain law between the values ​​of two or more variables. The purpose of "correlation analysis" is to find hidden networks of interrelationships (associations) in a data set. Therefore, big data focuses on finding correlations rather than causality. Perhaps it is precisely because big data analysis focuses on finding correlations that big data analysis technology is widely used in the business field. The application of business is profit, so as long as a certain factor is found to have a strong correlation with increasing profitability from data mining, and then the relevant factor is fully developed.

Basic ideas and skills of big data analysis and modeling

 

  After having a large amount of data, the next step is to analyze the data, hoping to establish a model through appropriate data analysis and mining technology to find the objective laws hidden under the data. After so many years of development of big data analysis technology, some basic ideas for analysis and modeling have been formed. CRISP-DM (short for "Cross-Industry Data Mining Standard Process") is an industry-recognized method for guiding big data analysis and mining efforts.

  CRISP-DM believes that there is a big data analysis mining life cycle model in big data analysis. In this life cycle model, there are six stages of business understanding, data understanding, data preparation, model building, model evaluation and result deployment. Figure 1 shows the relationship of these six stages, where the number of arrows indicates the frequency and importance of the dependencies between the stages, and the order between each stage does not necessarily follow strictly. In practice, most projects move back and forth between these various phases as needed.

  Business understanding usually refers to understanding the actual type of business, the actual problem in the business and trying to understand as much as possible about the business goals of data mining. Data understanding means that the data understanding stage contains a deep understanding of the data that can be used for mining. This process includes the collection of initial data, the description of the initial data, and the verification of data quality. Data preparation is one of the most important stages of data mining and usually takes a lot of time. It is estimated that the actual data preparation usually accounts for 50-70% of the project time and effort.

  Data preparation typically involves the following tasks: merging datasets and records, selecting a subset of data samples, summarizing records, exporting new attributes, sorting data for modeling, removing or replacing blank or missing values, dividing into training and test data set etc. After data preparation, the next stage is to build the model. Modeling typically involves performing multiple iterations, choosing an appropriate model algorithm, running multiple possible models, fine-tuning these parameters to optimize the model, and finally selecting the best model. During the model evaluation phase, it is necessary to evaluate whether the project results meet the business success criteria. The prerequisite for this step is to have a clear understanding of the stated business goals, so business understanding in the early stage is increasingly important. After the model evaluation is completed, it enters the result deployment stage, in which the best model selected in the early stage is applied to the actual business, and the final report is obtained.

  Big data analytics make knowledge-based decisions by predicting future trends and behaviors. The main target functions of big data analysis and mining are as follows:

  First, automatically predict trends and behavior. Data mining automatically finds predictive information in large databases, and problems that used to require extensive manual analysis can now be quickly concluded directly from the data itself. For example, predict when and where a flu outbreak will occur in the GOOGLE Flu Analysis case.

  Second, correlation analysis. Data association is an important type of knowledge that can be found in the database. If there is a certain regularity between the values ​​of two or more variables, it is called an association. Association analysis aims to find several attributes with strong correlations. A typical case is the association analysis of beer and diapers, which is often used in product recommendation in e-commerce.

  Third, clustering. Some similar records in the database can be grouped together, that is, clusters. Clustering often helps people to re-understand things. Clustering techniques are often used in social network analysis.

  After several years of development in big data analysis technology, some relatively mature and stable model algorithms have been formed. Common model algorithms include association rule analysis (Apriori), decision tree, neural network, K-MEANS clustering, support vector machine, multiple linear regression, generalized linear regression, Bayesian network, Cox and K nearest neighbors. Some of these algorithm models are suitable for predicting trends and behaviors, some are suitable for association analysis, and some are suitable for cluster analysis; each model algorithm has its own advantages and disadvantages, we can choose suitable algorithm models for different scenarios for big data Analytical mining. The advantages, disadvantages and applicable occasions of some commonly used model algorithms are shown in Table 1:

Table 1: Feature Analysis of Common Model Algorithms in Big Data

  Model Algorithms Advantages Disadvantages Application Scenarios

   The association rule analysis (Apriori) algorithm is easy to understand, and can use simple if-then rules to describe the complete relationship between data; the resulting rules are readable; it can handle continuous and discrete data. There may be no strong rules between data; Combination explosion problems may arise due to finding all possible rules in the entire database; data form specification, easy grouping; retail and time series analysis, product promotion in e-commerce

  The decision tree is the easiest to understand. It has better performance when solving specific target values ​​based on multiple complex attributes, and can generate independent rules to predict continuous attribute values. The performance is poor; it cannot analyze time-related attribute variables for classification occasions; when the model is required to have strong interpretability

  The neural network has strong versatility and good analysis effect on nonlinear and noisy complex data; it can process large-scale databases, predict continuous data, classify or cluster discrete data; can process data with noise or missing attribute values. The rules obtained cannot be intuitively explained, and the results are difficult to interpret; the algorithm converges too early, and local optimal solutions or overfitting are prone to occur for classification prediction, and the linear relationship between variables is difficult to explain.

  Clustering (K-MEANS) is simple to apply, does not require prior knowledge, and can handle classified data. The number of numerical data and character data clusters needs to be determined in advance, and it is difficult to select appropriate distance functions and attribute weights for the data. Categorize by attributes to find outliers and data that do not fit the predictive model

  Support vector machines have strong adaptability to data and strong robustness. Classical algorithms can only be divided into two categories, and it is more troublesome to divide into multiple categories for classification and prediction.

  Types of Big Data in Retail Banking

  In modern economic life, personal and family life is closely related to banking retail business, such as investment and wealth management, e-commerce, mobile payment, home life and outbound travel are all closely related to banking retail business. Just because the customers of retail banks are huge, widely distributed, and the business volume is large and complex, retail banks have different requirements for business management, risk control, and customer marketing. And with the development of Internet finance, bank retail business is increasingly challenged by other non-bank institutions, retail banks are facing new pressures and put forward new requirements for the stability and development of their business. In order to meet this challenge, continuously expand the business, and create new profit margins, it is necessary to conduct thorough research on market demand, and discover value points based on the research and research, which is exactly where big data analysis comes in.

  After so many years of development, especially under the premise of the rapid development of the Internet and mobile Internet in recent years, retail banks have accumulated a large amount of data, which covers almost all aspects of the market and customers. These data of retail banks mainly include the following aspects:

  First, attribute data of existing customers. The customer attribute data includes the customer's gender, age, income, and the customer's occupation. These data are attribute data left by customers when opening an account or purchasing products. These attributes can basically describe the general situation of customers, such as income level, asset status, etc.

  Second, the customer's account information. The customer's account information includes the customer's account balance, account type and account status. The customer's account information records the current state of the customer's assets, and plays an important role in the retail bank's analysis of customers and mining customers.

  Third, customer transaction information. The customer's transaction information includes the date and time of the customer's transaction, the amount of the transaction and the type of transaction. Through these, we can know the frequency and total amount of customer transactions, from which we can infer the customer's trading preferences and asset capabilities.

  Fourth, the channel information of customers. Channel information refers to whether the customer prefers to go to the bank counter to handle the business, or to handle the business through the Internet client or mobile Internet client. Customer channel information is crucial to customer management and expansion.

  Fifth, customer behavior information. In the Internet era, all retail banks have online banking logs and mobile banking logs, which record the behavior information of customers when handling business. Compared with the data information in the previous aspects, the log information of online banking and mobile banking is a kind of unstructured data information.

  对比以上数据来源,可以发现零售银行的数据信息主要包括以下几类:客户的属性、交易习惯、渠道偏好以及行为信息。这些数据信息储存于零售银行的网银系统、客户管理系统、电子支付平台、ECIF系统、核心银行系统或者其它系统里面。这些系统对数据的保存及分析提供了极大的便利性和准确性。

大数据分析对零售银行的商业价值

  近几年来,大数据分析在各个相关领域飞速发展,零售银行也不例外。鉴于零售银行的业务类型以及零售银行的数据类型,大数据分析在零售银行的商业价值主要存在于以下几个方面。

  第一,客户的精细分类和档案管理。零售银行为了给客户提供更加优质的服务,需要通过分析银行系统本身数据库所保留的客户资料信息,对客户进行分类管理。

  相关统计表明,只有大约20%的客户能给银行带来最大收益,因此找到这20%的优质客户就成为零售银行的一大主要目标。而根据客户的数据信息资料找出客户背后的社会、经济、消费特征,进而可以推断出客户的消费能力、消费水平和消费习惯,并可以计算出各个客户对银行的贡献率,最终根据这些特征对客户进行精细化的分类及管理。通过这些分类和管理能给零售银行带来最大的收益,而这些操作只能通过大数据分析才能实现。

  第二,客户流失的预防和精准营销。从行业经验来看,发展一个新客户的成本远远大于维持一个原有客户的成本,尤其是优质客户。如今,银行零售业务的竞争非常激烈,市场区域饱和,因此维持原有客户防止客户流失显得愈发重要。如何保留原有的客户并且不断为这些客户提供优质的增值服务是零售银行业面临的一项重要挑战和机遇。目前大数据分析可以帮助零售银行精细的定位和划分客户,从而找出具有潜在流失可能性的现有客户。通过对数据进行分析,挖掘和整理出客户流失的具体原因,客户不满意哪些产品和服务,客户消费行为的定位等等。通过大数据分析可以对不同的客户提出具有强烈吸引力的个性化营销方案,进而帮助零售银行预防客户流失进行精准营销。

  第三,产品的分析和管理。零售银行有众多的产品,这些产品适合不同的客户群体,如何对产品进行分析、管理和优化也是零售银行面临的一个难题。以往的产品分析和管理只是单纯的利用统计分析来对产品的当前状态进行描述,缺少的是深入的挖掘。而在如今的大数据时代,通过大数据分析不仅可以对产品的覆盖人群、产品的盈利能力、用户的反应、用户的留存率、产品的营销推广、产品的优化升级进行全方位的挖掘,还可以在此基础上找到新的价值增长点。通过大数据分析,零售银行对产品的把控能力必将得到更大的提高。

  第四,风险控制和管理。信用卡的使用就是零售银行面临的风险之一,客户恶意透支信用卡,逾期不还款这些都是银行面临的潜在风险。因此,如何提前识别有风险的客户,如何预防客户的恶意透支以及如何进行风险管控,这些都是零售银行面临的难题。在大数据分析大规模应用之前,银行只是简单的通过用户的背景资料来进行预防,这种方法既被动又无效。而如今,在大数据的帮助下,银行可以从客户的历史数据中分析出客户的消费行为习惯,一旦客户出现非常规的消费行为,即可认为风险指数超标从而中止交易,进而有效地防止风险的出现。

  另外,通过大数据分析也可对用户的信用等级进行评估,对信用评估得分低的客户可以重点进行风险管理和控制;对信用评估得分高的客户可以进一步挖掘出这部分客户的消费潜力进而提高零售银行的业绩。

  第五,银行经营状况分析。大数据分析不仅可以对零售银行的客户进行精准定位、营销和风险管理,也可以对零售银行的总体经营状况进行深度分析。通过数据挖掘及时了解营业状况、资金情况、利润情况等重要信息。同时,还可以结合历史同一时间的经营状况数据,挖掘出现阶段经营状况的问题以及改进的策略,进而提出在该条件下最大收益的经营方式。

  以上五点只是大数据分析对零售银行商业价值存在的主要方面,也是大数据分析对零售银行影响最大的几个层面。随着大数据分析在零售银行业的应用与发展,大数据分析对零售银行其它业务的商业价值必将得到更大的显现。

  总而言之,大数据是创新、竞争和提高生产率的新领域,蕴含着许多市场机会与利润空间;大数据所蕴藏的巨大价值必将引起包括零售银行在内的诸多行业的经营创新和企业管理的重大变革。今后,大数据分析对零售银行的影响会越来越大,零售银行业在大数据的推动下必将迎来一个新的增长机遇。

  说起大数据,可能很多人都知道这是未来互联网时代发展的一个大发向。但是大数据的兴起却不是因为互联网,也不是因为移动互联网,而是因为万物互联。

  互联网可以说是信息1.0时代,而移动互联网则是信息1.5时代,物联网呢,则是信息2.0时代。在这个万物互联的时代,它将是一个信息爆炸的时代,大数据将会在这个时代掀起一个突飞猛进。

  目前,各种智能硬件、联网设备、传感器如雨后春笋般地冒出来了。智能家居、智能可穿戴、智能汽车、智能小区、智能城市等很快就将在全球范围流行起来。而在这个万物联网的背后,数据的分析、处理、识别、预测等就变得尤为重要。

  眼下阿里云在金融云、政务云、企业云服务方面已经跑在前面,而百度云则在个人云服务、物联网数据方面领先,腾讯呢,自然在这方面要略显落后了。不过进入到今年以来,腾讯云正在加速追赶百度和阿里。

  不过在物联网时代,谁能真正玩转大数据,目前不管是亚马逊、还是谷歌等科技巨头,目前都没有谁敢真正说这个领域将会是自己说了算。而第一个尝到大数据甜食的又会是谁,我们拭目以待。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327071212&siteId=291194637