[Test] life tailored excel table diff algorithms to the game plan and QA

Domestic game development team planning many students are accustomed to using excel as a table with tools. So for students planning students proofread and QA acceptance, the need for appropriate tools to detect changes diff excel file, which can identify problems with the table as soon as possible. To this end, the author started the game efficiency toolset gameff-toolset small projects, the first small script will do excel diff.

diff algorithms have very much, but how to reflect the characteristics of the planning table and planning work, this is the most need of attention. Many projects are in the planning table by SVN is stored, so from SVN commit information in the planning table can know what happened to change / increase, therefore, we only care about how each excel file diff operation on it.

Planning excel table with the following characteristics:

  • Each sheet has a header header, general header row header
  • header set will not change after the general
  • Different sheet, table structure substantially different
  • row does not necessarily have a primary key, even with the sheet duplicate id
  • Each time changes, less the number of rows number of rows changed relatively unchanged
  • There may be some case where the line is moved to the other position
  • Planning students excel in some empty area may add comments

Therefore, from the point of view of the sheet size, diff module can be designed:

  • Defined header row index, the row index data start, a data starting column index
  • Statistics and increase the removal of the header. If the simple header name changed, substantially no change following data, the entire column can be considered to have changed. Thus, the length of the column becomes different.
  • For the common header, the following line statistical data, remove illegal lines. If the row starting column has no data, even if not legally.
  • By seeking hash line to get change before and after mapping change excel excel rows, so they know what line does not change, which changes the line.
  • No change for the line, after seeking change excel in the index of these lines of LIS . Specific method of calculating reference may stackoverflow a post on, the main idea is to use two arrays are maintained long as the minimum index of the last number of the lis of x and the number of an index as the index of the previous number of the last time lis a number, then after the array by a back, thereby obtaining a string lis. By lis string, we can know which lines just simply moved position, and the contents of which have not been altered.
  • For the line "change", but also three cases: Increase the line, cut the line and modify the line. Because each row relates generally no change, it is possible to use o (n side) of the process is complicated by each row grid. By comparison by frame, you can find the similarity between the travel, so we can define a similarity threshold to determine whether two similar rows. If the similarity with the original line of a row is greater than the threshold value, it shows two lines corresponds to a "modified line" behavior, so we only need to record changes in the cell in which; if looking from a row before the change a row after less than similar change is a "cut line" behavior; if the row after row corresponds to some change before the change is not, is an "up the line" behavior.

This module detailed code has been written in a excel_differ.py , although not overly meticulous finishing, there may be a lot of room for optimization, but the module has been split clear enough, and the performance is also OK enough. If you take the two folders excel file in contrast, you can export a similar such a json report. So to put into operation, whether it is simple copypaste script, or to the web server, are more than sufficient

Published 42 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/u013842501/article/details/104076218
QA