gzip compressed file damage restoration principles and methods of data recovery

Then repair damaged gzip compression principle chapter documents cited again GZIP structure:
gzip compressed file damage restoration principles and methods of data recovery
key known to repair a damaged gzip file is to find the starting point of the next normal compression package. The information structure can be seen in FIG., Each packet starts with a compressed structure of the Huffman tree has reached the tail patch type, use, and the number of elements 3 and other tree Huffman tree. If a gzip file the middle of a bad sector, the starting point of a normal post to find bad sectors, just a bit to the right, has been shifted to a position normal decompression, it is possible to find the correct archive starting. According to compress the job window gzip file size of 32KB projections, the traverse will not exceed 64KB to find. Rapid cycling can be found quickly in memory, but the need for clear error of judgment method.
First of all is clear is that the end of the sign, should be zero (we check back from the damaged point). The Huffman tree type is also generally should be dynamic Huffman (0x02), the number of elements should have the value of cl1 between 257-286 (inclusive), the number of elements should be less than or equal to 30 cl2 elements of ccl The value may be the number of 1 to 15 (inclusive).
In fact, you can also refer to something there, whether Huffman tree unlock the exception, or by the principle of regularity find the last value is the value of 256, but these algorithms should be more troublesome, above algorithm that continuously check several compressed block is sufficient.
The specific method is gzip make changes to the source code, to traverse. Due to the time, did not make general engineering, only to quickly modify a part of the code. Substantially revised points:

First, find the point of damage:

In the unzip.c,
error ( "invalid Compressed Data - Violated the format");
before this line, to get the current byte position decoding.
 

Second, traversing find the point of damage:

1, inflate.c file, change

if (nl > 286 || nd > 30)
#endif
return 1;

for:

if (nl > 286 || nd > 30||nl <257 || nd <1)
#endif
return 1;

2, inflate.c file, the int inflate_block (e) function
before the following code

bb = b; 
bk = k;

Join Code:

if ((t != 2) || (*e != 0)) 
return 2;

. 3, inflate.c file, the int inflate_block (e) tail function
where the if (t == 0) and if (t == 1) is directly return an error value 2.
 
. 4, inflate.c file, function int inflate (), the modified

if ((r = inflate_block(&e)) != 0) 
return r; 
end

for:

unsigned t;           /* block type */ 
register ulg b;       /* bit buffer */ 
register unsigned k;  /* number of bits in bit buffer */ 
while (inptr <= insize) 
{ 
    unsigned int tptr = inptr; 
    unsigned int tbk = bk; 
    unsigned long tbb = bb; 
    unsigned int twp = wp; 
    long long tstart = *(long long*)(inbuf + tptr); 
    if ((r = inflate_block(&e)) != 0) 
    { 
        inptr = tptr; 
        bb = tbb; 
        bk = tbk; 
        wp = twp; 
        b = bb; 
        k = bk; 
        NEEDBITS(1) 
        DUMPBITS(1) 
    } 
    else 
    { 
        printf("get by www.datahf.net!"); //也可输出tstart,bb,bk 值,转载时请保留版权信息:www.datahf.net张宇 
    } 
} 

After completion of this step 4, try to debug the errors .gz file, of course, can also be interpreted in the code After adding a seek head structure, to seek immediate damage location.
Normally, when output printf ( "get by www.datahf.net!" ) This line of code, has found the correct starting position.
After finding the start position, or may be constructed of a normal copy of gzip file header, then spliced bitstream to find good, can unpacked. (If the bit stream is not byte-aligned, to do all the possible displacements). After stitching a lot of compressed files can be opened even unpacked, however, there may be error, mainly tail checksum wrong size, in fact, can be ignored.
If the stitching is good under linux, you can not directly use "gzip -d" decompression, because crc errors can cause decompression to 99% after the error, and then delete the file, replace the pipe command:

Guess you like

Origin blog.51cto.com/sun510/2430980