Practice of data governance in securities institutions to achieve "management, governance, and use" of data

In 2016, the China Securities Regulatory Commission's "Comprehensive Risk Management Regulations for Securities Companies" proposed to establish and improve data governance and quality control mechanisms.

In 2018, the China Securities Regulatory Commission issued the "Guidelines for the Operation of Securities Data Governance (Draft for Comment)" and the "Guidelines for the Classification and Grading of Data in the Securities and Futures Industry". In the same year, the Ministry of Industry and Information Technology of the People's Republic of China released the DCMM data management capability maturity assessment model , refining the eight process areas of organizational data management, and dividing the data capability maturity into five development levels to help with evaluation.

In 2022, the "14th Five-Year Plan for the Development of Science and Technology in the Securities and Futures Industry" and "The 14th Five-Year Plan for the Development of Financial Standardization" will be successively released to consolidate the general basic standard system of the securities and futures industry and promote the digital transformation and standardization of the securities and futures industry. Deep integration will provide more assistance for the high-quality development of the securities and futures industry. Standardization plays an increasingly important role in leading and promoting digital transformation.

However, in the process of promoting data governance, many securities institutions still have many problems such as insufficient driving force for data governance, lack of data governance system planning, imperfect data accountability system, and difficulty in improving data quality. Data governance urgently needs to be improved rapidly.

In order to give full play to the asset value of data, by sorting out the needs and particularities of securities and futures industry regulatory big data governance, a big data governance system for the securities and futures industry is established , including building a data model for the securities and futures industry , building a public data platform, and building data services system and building an organizational security system.

Securities Institution Data Governance Solution

New Paradigm of Data Governance Based on DataOps System

DataOps starts with environment management, and each environment can support task orchestration, monitoring, and automated testing . Every time a cross-environment release is performed, the version of the code released each time is recorded for later troubleshooting. After release and launch to production, Kangaroo Cloud's data middle-end products can connect the above-mentioned links of brokerage users. From the development stage, brokerage users can release to the test environment with one click. After the test environment is verified, observe task instances and data output It can be released to the production environment after the operation is correct.

As shown in the figure below, data on topics such as information, transactions, and risk control flow in from the source system on the left. The middle links are various data processing tools, such as data warehouses or data marts , AI analysis, etc., and the data is cleaned and processed. , summary statistics, data governance and other processes, and finally serve various demand parties such as investment research, marketing, and business analysis through BI, customized reports, API and other tools.file

Efficient data warehouse construction and management based on SDOM model

By sorting out the main trading behaviors of securities, funds, futures, bonds, and repurchases in the market, an industry transaction model is formed; by reverse sorting out information disclosure projects that are about to be launched and online, an industry information disclosure model is formed. According to the relevant laws and regulations, business rules, systems and processes of the securities industry, etc., extract the commonality of the whole business process and data in the market, and form a universal, stable and scalable data model around customers, companies, supervision, products, transactions, etc.file

OLAP-based information data verification

As one of the main data sources of securities companies, information data is widely used, involving various fields such as investment transactions, asset management, brokerage and wealth management, and asset custody. Information data are often applied to the integration of asset management, investment research, investment transactions, and asset custody . , margin financing and securities lending and other systems. At the same time, due to business needs, it is often necessary to purchase a variety of heterogeneous data source interfaces from the market, such as Wind, Juyuan, Cailian, Tonglian, Hong Kong and Macau Information, etc. Some securities companies even purchase as many as 20 or 30 types of information data.

This information faces the following problems: First, the data quality is uneven, and data problems are difficult to find in time, and frequent complaints from business parties are received; second, the cost of data quality management is high, and rule development is difficult; third, the troubleshooting link is long and difficult Quickly locate data problems, lacking a global statistical perspective: Fourth, there is no experience in quality problems, and similar problems occur frequently.

For all kinds of information data, cross-source comparison is carried out, and the work of subsiding and treating symptoms is carried out at the source. Through the process of pre-rule configuration , in-process rule verification, and post-event analysis and reporting, multi-dimensional evaluation of data integrity, accuracy, standardization, uniqueness, consistency, etc. is carried out to ensure the data quality of securities companies.file

Label data governance based on data model

The rapid development of financial technology has made the carriers of the securities industry more and more closely integrated with social media and e-commerce. Brokers use data strategies to break through data boundaries to build a more comprehensive corporate marketing panorama. Facing the long development cycle of customer activities, inaccurate operations, lack of timely tracking of marketing effects, and untimely operational feedback, through the customer data insight platform of Kangaroo Cloud , business personnel can use labels transparently, turning the black box of data into the white of business language box to assist business decision-making and drive business growth. The usage scenarios include the marketing operation of the Internet Finance Department, abnormal transaction monitoring, and user life cycle management .file

Create a data service market based on the concept of OneService

The data platform provides investment research, information, and investment advisory data to data users in a service-oriented and interface-based manner, shields the underlying data storage and calculation details, and simplifies and strengthens the use of data. Visually generate and register data service management, quickly build data sharing services , standardize management and control services through various means, and complete lifecycle management and control from data interface creation, release, application/approval, and invocation, forming a data market and data service management platform , Improve the efficiency of data development and sharing.file

Data Governance Deliverables for Securities Institutions

Data platform construction

The data platform includes a real-time data development platform and a data application platform to realize the "management, governance, and use" of data.

The first is to build a big data real-time development platform , covering the whole link process from real-time data collection to real-time data development, and provide operation and maintenance monitoring curve and log functions. The specific functions are as follows:

1) Real-time collection, with log-based real-time data collection and interval polling-based real-time data collection methods, with the method of collecting changes in the database and restoring them, and converting the static data insertion, update, and deletion actions of the database into messages Dynamic data changes in middleware, or directly written to Hive for data storage.

2) Task management, the platform should support the unified management of real-time synchronization tasks, support FlinkSQL and Flink API job development support, support environment parameter configuration, historical version management, etc.

3) Resource management, the platform supports unified management of resources used in the real-time development process, supports users to upload local jar resources, registers custom functions and other operations, and can realize multi-version management of resources.

4) Function management, commonly used functions in the platform integration development process, and the platform supports the creation of UDF, UDAF, and UDTF functions through local resources. After creation, users can use the corresponding function through the function name, or modify the corresponding resource file. After completion, resubmit the task to realize the function update without modifying the code.

The second is to build a data application platform, which is mainly a data service platform , which is the top-level component of the data platform and an interface layer for externally providing data capabilities. The data service platform empowers front-end applications and is the export of data capabilities. Through the construction of the data service platform, the data capabilities are abstracted and encapsulated, and the system achieves the following goals:

1) Capable of encapsulating data and providing a RESTful interface externally . The application can obtain the data content by calling the RESTful service, and the application does not need to know the details of the table structure, sub-database and sub-table.

2) The horizontal expansion capability supports high concurrency and data growth. Support the growth of data volume by adding storage nodes, and support high concurrency by adding service processing nodes.

3) Configure and create data services. Developers can create and publish a new data service by configuring SQL data query statements, parameters, database connections, permissions, etc. on the management interface based on the underlying databasefile

The construction of the data service platform will bring the following benefits:

1) Reduce data duplication and reduce costs. The application system does not need to copy data content, and obtains data through service calls, thereby reducing data storage costs, especially for application scenarios that include historical data queries. At the same time, management costs such as backup and security are reduced.

2) Improve application development efficiency . The application does not need to consider the sub-database and sub-table design of big data, and does not need to understand the underlying data storage details, and can obtain the returned results through service calls. Greatly avoid data inconsistency conflicts. Since the same data does not need to manage multiple copies, when the data changes, there is no need to copy and update, reducing the conflict of data inconsistency.

data application implementation

Complete the sorting out of data kinship for big data platforms . Reconstruct the big data platform contracts, account funds, positions, and special securities scene data models, and carry out bidding implementation. Real-time statistical push of assets and liabilities of financing and financing customers, real-time statistical push of financial voucher flow, and transformation of native Flink operations into SQL data warehouses were completed.

Data Governance Construction Achievements of Banking Institutions

Convergence of massive business data to build a financial-level data platform

Big data engine + stream-batch integrated data collection satisfies the data aggregation of the securities company's business system, collects and aggregates online and offline business data, provides centralized big data technology and storage capabilities, and ensures real-time diversified data collection, data storage, Data computing; provide a powerful big data platform foundation to meet current and future data collection, storage and technical needs.

Unified data development, lowering the threshold for big data development

The real-time development platform satisfies the requirement that the science and technology departments of securities companies can conduct centralized development and data processing of various internal and external tenants on one platform, and provide a unified and integrated data development platform to meet the development and processing requirements of big data, SQL, and graphical data, reducing the The complexity of development tools, the cost of data development, and the rapid construction of data warehouses.

Satisfy regulatory submissions and realize integrated data services

The data service platform provides internal unified data services and data exchange to meet the regulatory reporting requirements of financial third-party institutions. Data Usage Requirements.

Data Analysis Scenario Service

According to the application requirements of a securities company, Kangaroo Cloud has designed the following three data analysis scenario services for it :

One is the real-time calculation of credit account assets and liabilities for the two financing businesses handled by customers:

1) Obtain counter market data in real time, and perform market aggregation on a minute-by-minute basis.

2) Obtain the customer's stock position data in real time, manage and dynamically update the customer position data according to the status, and at the same time calculate the customer's total position assets in real time by correlating the customer's position stock code and minute-level market conditions.

3) Obtain real-time contract flow data of financing and financing customers, and calculate financing liabilities and securities lending liabilities according to contract types. According to the stock code of the client's securities lending contract and the minute-level stock quotations, it dynamically calculates and updates the client's securities lending liabilities in real time.

4) Obtain customer fund transfer data in real time, and update the customer's total fund assets and available funds.

5) Real-time access to the client's securities lending and selling funds and the use of funds, and dynamically update the client's balance sheet. It also calculates the concentration of positions, the market value of group positions, the market value of high-risk securities positions, and the market value of securities whose prices have fallen below the lower limit.

6) Integrate the data of positions, contracts, capital transfers, and securities lending transactions through multi-stream correlation, and finally output the customer's balance sheet.

7) Real-time acquisition of customer credit application, customer credit application, customer credit approval and other data, real-time monitoring of customer credit application limit, credit status, credit approval status, approval amount and other changes, and push to the downstream system.

The second is to obtain the flow of financial documents in real time: associate auxiliary accounting dimension tables (dimension tables may also change during the day), calculate the daily change data of financial indicators according to the department and subject dimensions according to the calculation rules of financial indicators, and push them to the downstream.

The third is that the native FLink service lacks monitoring. In order to enhance the operation and maintenance monitoring work , the following data is planned to be migrated to the real-time platform:

1) The stock fund trading volume and wealth management trading volume of the day

2) Number of accounts opened on the day and account opening flow

3) Reminder of early redemption of convertible bonds

4) China Securities Easy Sign private placement return visit: generate return visit task reminders, return visit task success reminders

"Dutstack Product White Paper": https://www.dtstack.com/resources/1004?src=szsm

"Data Governance Industry Practice White Paper" download address: https://www.dtstack.com/resources/1001?src=szsm If you want to know or consult more about Kangaroo Cloud big data products, industry solutions, and customer cases, visit Kangaroo Cloud official website: https://www.dtstack.com/?src=szkyzg

At the same time, students who are interested in big data open source projects are welcome to join "Kangaroo Cloud Open Source Framework DingTalk Technology qun" to exchange the latest open source technology information, qun number: 30537511, project address: https://github.com/DTStack

Ministry of Industry and Information Technology: Do not provide network access services for unregistered apps Go 1.21 officially released Ruan Yifeng released " TypeScript Tutorial" Bram Moolenaar, the father of Vim, passed away due to illness The self-developed kernel Linus personally reviewed the code, hoping to calm down the "infighting" driven by the Bcachefs file system. ByteDance launched a public DNS service . Excellent, committed to the Linux kernel mainline this month
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3869098/blog/10092261
Recommended