How is the financial big data platform built?

The value of big data to the banking industry is self-evident.

In terms of business , how to tap the internal needs of customers and provide customers with more valuable services is the key to the strategic transformation and business innovation of financial institutions. Big data technology is an important tool for financial institutions to dig deep into data assets, achieve differentiated competition, and promote business innovation.

In terms of operation , through the application and analysis of big data, financial institutions can locate internal management shortcomings, formulate effective improvement measures, and optimize management ideas, thereby reducing management and operation costs.

In terms of risk , big data technology can help reduce the degree of information asymmetry and enhance risk control capabilities. At present, big data risk control management has been widely promoted in the financial industry, and there have been successful application cases such as microfinance.

Compared with the general business system, the big data system is a distributed system with a large scale and many components. Financial enterprises require big data systems in terms of scalability, reliability, controllability, security, convenient operation and maintenance, and dynamic resource allocation. , multi-tenant support and many other requirements have been put forward.

So how does the financial industry build a big data platform ?

Xiaoyi wants to share something about this topic today.

1. Problems faced by the financial industry

Compared with small and medium-sized financial enterprises, banks have a huge number of customers, and their digital transformation is more difficult. The challenges mainly include that the types of data are becoming more and more diverse, and the amount of data is increasing day by day. Whether it is data storage or data query , bottlenecks have been encountered in terms of software and hardware .

The user's application and analysis results show a trend of integration, and the requirements for real-time and command response time are getting higher and higher; at the same time, the data processing model is becoming more and more complex, and the complexity of the algorithm is correspondingly increased. Improvements and optimizations in data management, data processing (including data transmission), etc. For example, some banks have the following problems in data processing and application :

1. The data storage space of traditional tools has become a bottleneck

With the increasing development of business, financial enterprises have a large amount of cold data, low-value data and historical data. These data have exceeded the management upper limit of traditional data storage tool software, and at the same time consumed the effective storage space of expensive servers and databases. With the acceleration of data growth, this problem has increasingly become a huge obstacle to the transformation and expansion of financial services.

2. The data processing efficiency of traditional tools is increasingly low

In the fields of credit risk management, customer relationship management , financial analysis, compliance management, operation monitoring, data warehouse and other fields, TB, 10 TB or even 100 TB data processing is required every day, and the processing cycle of traditional storage processes is getting longer and longer. It is no longer possible to meet the requirements of the application.

3. The customer experience of the application system is getting worse

The sharp increase in data volume and the decline in processing efficiency lead to worse and worse customer experience in financial application systems. This kind of situation exists in many financial applications, and some financial applications have to transform the real-time query method into an offline query method, which leads to a further decline in customer experience.

Since there are some problems in the current financial industry data analysis and processing, the construction of a financial big data platform is even more necessary.

2. Ideas for building a financial big data platform

1. Construction goals

The financial big data application platform collects massive structured and unstructured data. Through real-time analysis, it can provide customers with all-round information for financial regulatory agencies, financial institutions, securities institutions, Internet finance, etc., and analyze and mine customers' consumption habits . And accurately predict customer behavior, so that financial regulators and financial service platforms are targeted in marketing and risk control. Using big data to analyze financial risks, precision marketing, and building a sound credit system are also the main goals of the current comprehensive platform construction.

2. Financial big data platform architecture

The architecture of the big data platform is shown in the figure below:

Among them, the top layer is big data application . The ultimate goal of the big data platform is to solve practical business problems, and it can be applied to the fields of macro-prudential assessment, macro-economic analysis, social credit system construction, anti-money laundering, and targeted poverty alleviation in the performance of the central bank's duties.

The second layer is the application interface layer . It includes multiple components such as data collection, interactive query , algorithm library, and data display, covering the entire data life cycle of data collection, processing, analysis, display, and deletion.

The third layer is the resource management layer . It is mainly used for unified management and allocation of storage resources and computing resources. It uses containers to allocate resources for computing frameworks and storage frameworks, and supports resource scheduling and elastic scaling. The fourth layer is the infrastructure layer. The infrastructure layer provides basic computing, network, and storage resources, and is the basis for upper-layer data storage, computing, and transmission.

Finally, the big data platform also needs to deploy a unified platform security monitoring , which is used to realize the security management, operation and maintenance monitoring and other functions of the big data platform.

3. Analysis of key technologies of big data

(1) Data collection and preprocessing

In the life cycle of big data, data collection is in the first link. Data acquisition is the integration of structured and unstructured data scattered in different networks and systems, and then comprehensive analysis of these data. Data collection methods include file log collection, database log collection, relational database access and application program access, etc. In addition, there are heterogeneity problems between different data sets, and the collected data needs to be analyzed. Data preprocessing, especially extracting and sorting heterogeneous data into a new data collection with a unified structure and pattern, forming a series of data views that are convenient for addition, deletion, modification, analysis and processing.

(2) Big data computing mode

The analysis and mining of big data is data-intensive computing, which not only requires huge computing power and data throughput, but also has high requirements for the scalability and cost performance of computing systems. The so-called big data computing model refers to various high-level abstractions or models that are refined and established according to different data characteristics and computing requirements of big data. With the emergence and development of big data, people are more aware of the hidden value behind the data. At the same time, the unique characteristics of big data, such as large data volume, diverse types, fast update speed and low value density, pose more stringent requirements for data processing. The challenges of big data, the application scenarios of big data, the diversity of user needs and data characteristics require a higher level of big data computing mode.

A variety of typical and important big data computing modes have emerged for different computing needs, such as the MapReduce parallel computing abstraction, the "distributed memory abstraction RDD" in the Spark system, and the "graph parallel abstraction" in GraphLab, etc. At the same time, many big data computing systems and tools adapted to these computing models have emerged.

(3) Data visualization

Data visualization aims to convey and communicate information clearly and effectively by means of graphics, and it is an intuitive way to realize the interaction between users and data collections. Visualization and visual analysis software can refine data characteristics according to user needs, and display different types of data collections from different dimensions in front of users in the form of relationship diagrams, sequence diagrams, or tables, helping users to obtain effective information more quickly. Get accurate analysis results.

(4) Big data storage management technology

The first thing that big data storage technology needs to solve is the demand for data massification and rapid growth. The storage hardware architecture and file system are much more cost-effective than traditional technologies, the storage capacity must have good throughput and scalability, and require robust fault tolerance and high-performance concurrent read and write capabilities. At present, Google's file system GFS and Hadoop's distributed file system HDFS have laid the foundation for big data storage technology. The second problem of big data storage technology is to deal with data in various formats, which requires the big data storage management system to be able to handle various unstructured data. Its representative products mainly include non-relational databases such as Google's Big Table and Hadoop Hbase.

4. Security protection of financial big data platform

With the wide application of big data technology in the financial industry, while big data technology promotes financial innovation, it also brings security risks that cannot be ignored. It is reasonable to look at risks objectively, make risk identification and emergency plans in advance, and deal with big data security issues from the aspects of data management, infrastructure protection, laws and regulations, etc.

First, in terms of platform security management, it is necessary to strengthen data authority control, data desensitization, privacy protection, and data trustworthiness management.

The second is to strengthen the security construction of big data application systems, incorporate all links including data collection, storage, analysis and processing, data mining and data display into the category of information security, and configure corresponding security products to form a unified and controllable system. security system.

The third is to improve the security management system. Under the framework of big data security laws and regulations, improve the information security management system and information security supervision system, and cultivate big data security talents.

3. Cases of financial big data platform

With so many theories mentioned above, Xiaoyi wants to share a financial big data platform actually built by Yixin Huachen so that everyone can better understand it.

Agricultural Development Bank of China : Data Analysis Application Architecture Design

Application products: Data collection and summary platform  Yixin ABI  metadata management platform

1. Project background

According to the "Twelfth Five-Year Plan" of the Agricultural Development Bank of China's informatization construction, during the "Twelfth Five-Year Plan" period, it is necessary to sort out the business operation data, and implement data quality engineering for the purpose of ensuring the consistency and accuracy of the data ; Based on the exchange platform, build a data warehouse; by the end of 2015, realize the sharing of operation and management data in the whole bank, based on the data warehouse, promote the establishment of a theme-oriented, market-oriented, decision-oriented, and meet internal management and external policy requirements Intelligent application, striving to form a complete and unified decision-making support platform for the whole bank, each with its own focus, to provide basic information and decision-making basis for operation management and customer service.

In recent years, with the continuous improvement of internal management and external regulatory requirements, the demand for data analysis-oriented management applications is increasing. Several report systems currently in use by the Agricultural Bank of China have respectively realized the management analysis of the corresponding business fields, but the overall construction of the report application of the Agricultural Bank of China has the following main problems :

  • The data sources of each reporting system are different, and there is a lack of unified data standards and specifications, making it difficult to conduct comprehensive business analysis across systems;
  • Statistical indicators of the same caliber need to be obtained repeatedly in different systems, but the data results may be inconsistent;
  • With the continuous improvement of internal management and external regulatory requirements, the demand for data analysis-oriented management applications and data collection and supplementary recording applications is increasing;
  • The data acquisition process and submission process of various report application systems are basically the same, but different report systems need to be developed separately;
  • In the application of collection reports, there are problems such as heavy workload for business personnel to supplement data, data quality cannot be guaranteed, the submission process is not easy to control, and data collection cannot be effectively analyzed. Because the regulatory department has formulated some new regulatory reports or changed the statistical caliber of regulatory reports etc., the report format also needs to be adjusted frequently;

In order to integrate different business reports into a unified system framework, and to provide quick customization and implementation means for simple business reports in the future, during 2009-2011, the Agricultural Bank of China completed the construction of a general comprehensive report platform and the promotion of provincial banks . Initially realize the integration and sharing of operation and management data.

By the beginning of 2012, the comprehensive report platform had been in use for nearly three years. After the system completed the initial construction goals, with the rapid growth of report applications, users, and data volume, the system also exposed some problems, mainly including :

  • ETL performance is not ideal, data extraction conversion loading time is too long;
  • It is difficult for various business departments to use the platform to make reports, mainly in the difficulty of understanding the data model;
  • The amount of data is growing too fast, and before the first optimization, the data has shown an explosive growth trend;
  • There is still the problem of out-of-sync data between the integrated business system and the credit management system;
  • The timeliness of data cannot be guaranteed, resulting in some business departments with high timeliness requirements not being able to use the reporting platform with confidence.

2. Project construction overview

In order to solve the above problems, ensure the scientificity, advancement, efficiency, and ease of use of the system, and comprehensively consolidate the technical framework of the data analysis application side of the Agricultural Development Bank, from mid-2012 to early 2014, a larger scale was launched in the industry Comprehensive reporting platform upgrade project.

After five years and a total of two phases of project construction, the data application of the Agricultural Bank of China has formed a relatively complete technical system, including data warehouse, analysis application, data management, data governance, data peripheral services and other fields

Project Construction Overview:

(1) According to the data modeling model of the financial industry, a business analysis data covering the four main business systems of accounting core system (CBS), credit system (CM2006), foreign exchange system (EE) and bond system (BOND) has been built, and A data warehouse that fully records its historical change information, including ODS layer, integration layer, summary layer, and application market layer;

(2) The comprehensive report platform takes the data warehouse as the main data source, realizes the same source and structure of data for each report application, and unifies the data caliber; gradually unifies the main index data in the industry into one platform, realizes index sharing, and solves various problems The data sources of the report system are different, lack of unified data standards and specifications, and it is difficult to realize comprehensive business analysis across systems;

(3) Established a comprehensive report platform for analysis and application including two modules: "data collection module and display analysis module". The "data collection module" is realized by i@Report, and the "display analysis module" is realized by BI@Report . The functions of the two products can be quickly customized to meet the needs of various business reports, which can reduce the cost and difficulty of report development, shorten the report development cycle, standardize the operation process of report use, reduce the complexity of management and maintenance, and flexibly realize the increasing variety of reports. Class report requirements;

(4) Optimize ETL performance, introduce a scheduling platform and optimize each ETL job, and effectively manage the lifecycle of rapidly growing business data in the data warehouse, greatly optimize ETL performance, and solve the problem of long loading time for data extraction and conversion.

(5) Build various business application-oriented data marts. The data in the data mart are mainly general-level business-oriented data, which is a topic model that is easy for business personnel to understand and use, and is convenient for business personnel to use the comprehensive report platform to customize and make Reports and display and analysis of various reports;

(6) The "Accounting Index Library" is specially designed for accounting statements in the comprehensive report platform, which satisfies the design of accounting-related reports by defining the subjects included in the accounting indicators, the attributes and conditions of the subjects, and the activation and deactivation time of accounting indicators;

(7) Yixin BI can be seamlessly connected with i@Report, not only can take data from Yixin BI through i@Report, realize the initialization of collected data, but also display and analyze the data collected by i@Report in Yixin BI , the former can greatly reduce the workload of supplementary recording for business personnel, and the latter can conduct rich and diverse statistical analysis through Yixin BI without any ETL processing for the data reported through i@Report;

(8) i@Report provides a complete set of solutions from report design, report release, data filling, data review, summary reporting and approval process. A series of workflows from report definition to data application are completed through this platform, without The manual step-by-step transfer reduces many intermediate links, thereby helping the Agricultural Development Bank improve the efficiency of data collection and shorten the data collection cycle.

3. Project results

As of 2015, the Agricultural Development Bank's comprehensive report platform has completed the following tasks:

Built a data warehouse that includes all business analysis data of the main business system and fully records its historical change information, including ODS layer, integration layer, summary layer, and application market layer;

Built a comprehensive platform for analysis and application including two modules of "data collection module and display analysis module", and on this basis, completed the construction of a total of 14 sets of report applications for multiple business departments to meet the requirements of internal management and external supervision ;

Completed the construction and consolidation of the basic technical system for data extraction, conversion, and loading, realized the order-of-magnitude optimization of data processing efficiency, and realized the construction of intelligent scheduling, load balancing, and disaster recovery;

The construction of the data governance system has been improved, and the construction of metadata, data life cycle, data standards, and data quality system has been completed;

Completed the construction of 30 regional data/application centers, realized the external data service construction of data centers, and completed the construction of auxiliary systems such as data dynamic transmission and application version synchronization.

At the report application level, on the basis of the overall architecture, 14 sets of report applications for 9 business departments have been realized, and the number of applications will continue to grow rapidly and the forms will be more abundant.


By building a big data platform, financial enterprises can comprehensively sort out the data assets of the whole bank, improve the data structure of the whole bank, and form a global data view. The development and utilization of rich data resources through big data technologies such as batch processing, real-time data flow analysis and various instant queries is the mainstream choice of financial innovation trends. But it is also necessary to develop a financial big data platform suitable for itself in order to exert the greatest effect.

Guess you like

Origin blog.csdn.net/esensoft123/article/details/131241766