Data migration problems caused by the replacement of old and new systems

Data migration problems caused by the replacement of old and new systems

Author : Xia Kai, Xi'an project team Writing time: 2004.11.08

In the process of informatization construction, with the development of technology, the original information system is constantly being replaced by new systems with more powerful functions. From two-tier structure to three-tier structure, from Client/Server to Browser/Server . In the process of switching between the old and new systems, there is bound to be a problem of data migration.

The concept of data migration

The original old system will inevitably accumulate a large amount of precious historical data during its use from the time it is put into use until it is replaced by the new system, many of which are necessary for the smooth start of the new system. In addition, these historical data are also an important basis for decision-making analysis. Data migration is the process of cleaning, transforming, and loading these historical data into the new system. Data migration is mainly applicable to the situation where the historical data in the old system needs to be converted to the new system when one set of old systems is switched to another set of new systems, or when multiple sets of old systems are switched to the same set of new systems. Data migration is generally required when systems are switched in the fields of taxation, telecommunications, industry and commerce, banking, insurance, and sales. For the many-to-one situation, for example, due to the sequence of informatization construction, there are several different systems running at the same time, but cannot effectively share information with each other, so a new system needs to accommodate the problems of several old systems.

Data migration is of great significance to system switching and even the operation of the new system. The quality of data migration is not only an important prerequisite for the successful launch of the new system, but also a strong guarantee for the stable operation of the new system in the future. If the data migration fails, the new system will not be able to operate normally; if the quality of data migration is poor and all garbage data cannot be blocked, it will cause great hidden dangers to the new system. Once the new system accesses the garbage data, it may be These junk data generate new wrong data, and even lead to system abnormality in severe cases.

On the contrary, a successful data migration can effectively guarantee the smooth operation of the new system and inherit precious historical data. Because no matter for a company or a department, historical data is undoubtedly a very precious resource. For example, the tax information of the tax department, the company's customer information, the bank's deposit records, etc.

Features of data migration

The data migration during system switching is different from the data extraction from the production system OLTP ( On-line Transaction Processing ) to the data warehouse DW ( Data Warehouse ). The latter mainly synchronizes the data changes that occurred in the production system after the last extraction to the data warehouse. This synchronization is performed in each extraction cycle, usually in units of days. Data migration is to convert the required historical data to the new production system once or several times. Its main feature is that it needs to complete the extraction, cleaning and loading of large quantities of data in a short time.

The content of data migration is the basis of the entire data migration, which needs to be considered uniformly from the perspective of information system planning. When dividing the content, it can be considered from the perspectives of horizontal time and vertical modules.

Horizontal division

Based on the time when the data is generated, it is necessary to consider how to migrate the relatively long-term historical data. Due to the development of information technology and our increasing dependence on computers, new systems often need to store more information than the old systems every day. At the same time, in order to solve the performance bottleneck caused by the high growth of data volume, the new system generally only retains data for a certain period of time. Data, such as 1 year, and the data beyond the storage period, that is , the data before 1 year, are transferred to the data warehouse for decision analysis (a decision support system needs to be established) DSS system . For the data migration of this new system, the data within one year is mainly migrated , and the historical data before one year needs to be considered separately .

vertical division

Based on the functional modules that process data, it is necessary to consider the functional modules that are not included in the new system and the data processing issues involved. This kind of data generally does not need to be migrated to the new system because the mapping relationship cannot be established. However, for the old system with relatively close coupling between modules, it is necessary to pay attention to the integrity of the data when dividing vertically. For this data migration, each functional module of the old system can establish a certain mapping relationship with the new system, so this migration is basically done for modules.

Methods of data migration

Data migration can be carried out in different ways. There are four main methods, namely, migration through ETL tools before system switching, writing background database programs, manual entry before system switching, and generation through new system after system switching.

Migration through tools before system switching

Before system switching, use ETL ( Extract Transform Load ) tool to extract, convert and load historical data in the old system into the new system. Among them, ETL tools can be purchased mature products (such as Infomatic , Data Integrator Designer of Business Objects ), or self-developed programs. This method is the most important and fastest method for data migration. It is implemented on the premise that historical data is available and can be mapped into the new system.

Write a background database program

Before the system switching, build the corresponding database table structure in the new system for the migration data involved in the old system, import the old data into the corresponding table, and then write a background program in the new system to migrate the historical data to the new system. in the system. This time, our migration is to establish a data migration database that is exactly the same as the official system, and the migration data of the old system is impregnated into this migration database. All coding and debugging are also carried out in this environment, so that the data can be imported to the new Problems that may arise in the data of the system, such as the transformation of dictionary table data, foreign key constraints, etc., are all dealt with in the migration database, so that the obtained data can be smoothly imported into the new system.

Manual input before system switching

Before the system is switched, the relevant personnel of the organization manually enter the required data into the new system. This method consumes a lot of manpower and material resources, and the error rate is also relatively high. Mainly for some data that cannot be converted into the new system, and the data that is necessary when the new system is activated but cannot be provided by the old system, this method can be used as a useful supplement to the first method.

Generated by the new system after the system switch

After the system is switched over, the required data is generated by the relevant functions of the new system or by a specially developed supporting program. The required information is usually generated from data that has been migrated into the new system. The premise of its implementation is that these data can be generated from other data.

Data Migration Strategy

The data migration strategy refers to how to migrate the data. Combined with different migration methods, there are mainly one-time migration, staged migration, first-recording and then-removing, and first-moving and then supplementary methods to choose from.

one migration

One-time migration is to migrate all required historical data to the new system at one time through data migration tools or migration programs. The advantage of one-time migration is that the process of migration implementation is short, and compared with staged migration, there are fewer problems involved in the migration and relatively low risks. The disadvantage is that the work intensity is relatively high. Since the personnel who implement the migration need to monitor the migration process all the time, if the migration takes a long time, the staff will be very tired. The premise of a migration is that there is little difference between the old and new system databases, and the migration of all data volumes can be completed within the allowable downtime.

Staged migration

The staged migration is to migrate the required historical data to the new system in several stages through a data migration tool or migration program. Staged migration can separate tasks, effectively solving the contradiction between large data volume and short downtime. However, multiple switching leads to multiple data merging, which increases the probability of errors. At the same time, in order to maintain the consistency of the overall data, it is necessary to synchronize the data switched first during the multiple migration, which increases the complexity of the migration. Staged migration generally migrates static data and data that changes infrequently, such as code and user information, before system switching, and then migrates dynamic data, such as transaction information, during system switching. For data changes that occur after static data migration, you can It is synchronized to the new system every day, or it can be synchronized to the new system in an incremental manner when the system is switched. This time, the strategy of staged migration is adopted, that is, staged migration according to the sub-bureau; for some information, the means of regular update is also adopted to ensure the correctness of the data.

Record first and then move

Before the system is switched, some data are manually entered into the new system, and other historical data are migrated when the system is switched. Recording before migration is mainly aimed at the situation where there are specific differences in the data structure of the old and new systems, that is, the beginning data necessary for the start of the new system cannot be obtained from the existing historical data. For this part of the initial data, it can be manually entered before the system is switched. For example, this time the verification information, the collection and management appraisal information, the approval results of tax reduction and exemption, the application / approval of invoice purchase and purchase, etc., are the method of first recording and then moving.

Relocate first and then make up

Before the system is switched, the original data is migrated to the new system through a data migration tool or migration program, and then through the relevant functions of the new system, or a supporting program specially written for this purpose, according to the data that has been migrated to the new system The raw data in , generate the required result data. The amount of data to be migrated can be reduced by migrating first and then patching.

Technical preparation for data migration

Data conversion and migration usually includes multiple tasks: data dictionary sorting of old system, data quality analysis of old system, data dictionary sorting of new system, data difference analysis of old and new system, establishment of mapping relationship between data of old and new system, data conversion and migration of development and deployment Procedures, formulate contingency plans in the process of data conversion and migration, implement the conversion and migration of old system data to the new system, and check the integrity and correctness of data after conversion and migration.

The data conversion and migration program, that is, the ETL process can be roughly divided into three steps: extraction, conversion, and loading. Data extraction and conversion are carried out according to the mapping relationship between the old and new system databases, and data difference analysis is the premise of establishing the mapping relationship, which also includes the difference analysis of code data. The conversion step generally also includes the process of data cleaning. Data cleaning is mainly aimed at performing corresponding cleaning operations on the data with ambiguity, duplication, incompleteness, violation of business or logic rules, etc. in the source database. Conduct data quality analysis to identify problematic data, otherwise data cleaning would be impossible. Data loading is to load the extracted and transformed result data into the target database through loading tools or self-written SQL programs.

Implementation of data migration

The realization of data migration can be divided into three stages: preparation before data migration, implementation of data migration and verification after data migration.

Due to the characteristics of data migration, a lot of work needs to be completed in the preparation stage, and sufficient and thoughtful preparation is the main basis for completing data migration. Specifically, it is necessary to give a detailed description of the data sources to be migrated, including the storage method of data, the amount of data, the time span of the data, and the establishment of the data dictionary of the old and new system databases, which is what we often call the reference table; the historical data of the old system Carry out quality analysis, difference analysis of data structure of old and new systems; difference analysis of code data of old and new systems; establishment of mapping relationship between database tables of old and new systems, processing methods for unmapped fields, development and deployment of ETL tools, and writing test plans for data conversion and verification procedures, and formulate contingency measures for data conversion. For some data dictionary data that cannot be corresponded due to too long time, we use the method of building a special code and disabling the new system to ensure the integrity of the migration. This part of the data can be negotiated with customers in the future to adopt a better method. Correction.

Among them, the implementation of data migration is the most important link in the three stages of data migration. It requires the development of detailed implementation steps for data transformation; preparation of the data migration environment; business preparation, ending unfinished business matters or putting them to an end; testing the technologies involved in data migration; and finally implementing data migration.

The verification after data migration is a check on the migration work, and the result of the data verification is an important basis for judging whether the new system can be officially launched. Data verification can be carried out through quality inspection tools or by writing inspection programs, and the accuracy of data can be checked by trial-running the functional modules of the new system, especially the query and report functions.

inspection of data

Data Length Check : Check the valid length of the data. Special attention should be paid to the conversion of fields of type char to type varchar .

Interval range check : Checks whether the data is contained within the defined interval of maximum and minimum values; for example, age is 300 , or entry date is 4000-1-1 .

Null value and default value check : Check whether the null value and default value defined by the old and new systems are the same. The definition of null value in different database systems may be different, so special attention is required.

Integrity Check : Check the associated integrity of the data. For example, whether the referenced code value exists or not, it should be noted that some systems remove foreign key constraints in order to improve efficiency after a period of use.

Consistency check : Checks whether there is logically data that violates consistency, especially if there is a system that commits operations separately.

Validation after data migration

After the data migration is completed, the migrated data needs to be verified. The verification after data migration is to check the quality of the migration, and the result of the data verification is also an important basis for judging whether the new system can be officially launched. The migrated data can be verified in two ways.

Quality analysis of the migrated data can be carried out through data quality inspection tools or by writing targeted inspection programs. The verification of the post-migration data is different from the quality analysis of the historical data before the migration, mainly because of the difference in the inspection indicators. The indicators of data verification after migration mainly include five aspects: integrity check, whether the referenced foreign key exists; consistency check, whether the values ​​of data with the same meaning in different locations are consistent; total score balance check, such as the sum of tax debt indicators Compare with the total of different granularities by department and household; check the number of records, check whether the number of records corresponding to the old and new databases is consistent; check the special sample data, check whether the same sample is consistent in the old and new databases.

Compare and check the query data of the old and new systems, query the data of the same indicators through the respective query tools of the old and new systems, and compare the final query results; All the business that happened on the old system was added to the new system, checked for any abnormality, and compared the final result with the old system.

The data check of this migration is to check whether the record data of the old and new systems are consistent, and whether the data obtained by industry, by taxpayer, by level, by conversion conditions, etc. are consistent with the same way of querying in the old system.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326403098&siteId=291194637