A nasty coding problems (continued)

I said yesterday that the coding problem, find the right packing method. But there is a problem, many with modified code before playing a zip bag, how to fix?

The most straightforward approach is to first extract, then repackage the code is roughly:

                        string folderName = "";
                        using (ZipFile zip = ZipFile.Read("foo.zip"))
                        {
                              foreach (ZipEntry entry in zip)
                             {
                                 if (entry.IsDirectory)
                                 {
                                      folderName = entry.FileName.Replace("/", "");
                                 }
                                 entry.Extract(rootFolder); 
                             }
                        }
                        byte[] bytes = Encoding.GetEncoding("gb2312").GetBytes(folderName.ToCharArray());
                        var newName = Encoding.GetEncoding("IBM437").GetString(bytes);                        
                         using (ZipFile zip = new ZipFile(Encoding.UTF8))
                        {
                            zip.CompressionMethod = Ionic.Zip.CompressionMethod.Deflate;
                            zip.AlternateEncoding = Encoding.GetEncoding("IBM437");
                            var entry = zip.AddDirectory(Path.Combine(rootFolder, folderName), newName);
                            zip.Save(Path.Combine(rootFolder, folderName + ".uvz"));
                        }                                       

So really can do it, but because of decompression, the weight of a large number of IO operations, relatively poor performance. If found open question zip package with 7-zip, and then modify the inside of the folder name, you can use UnicornViewer open. It is not possible without re-compression, and renamed by the method of avoiding IO operations (renamed nature is probably the first extract into memory, and then re-compressed disk is not operating. This is just a guess, too lazy to look at code verification.), And to achieve the same effect?

DotNetZip zip bag allows you to modify an existing file name, he began to write the following code:

                        the using (ZIP = ZipFile.Read the ZipFile ( "foo.uvz")) 
                        { 
                              var list = zip.ToList (); // To modify the properties because the zipentry, with foreach be wrong, so here again the first turn to traverse the list 
                              for (int I = 0; I <list.Count; I ++) 
                             { 
                                 String List in oldName = [I] .FileName; 
                                 byte [] bytes = Encoding.GetEncoding ( "GB2312") the GetBytes (oldName.ToCharArray ());. 
                                 var newName Encoding.GetEncoding = ( "IBM437") the GetString (bytes);. 
                                 List [I] = .FileName newName; 
                                 List [I] = .AlternateEncoding Encoding.GetEncoding ( "IBM437");
                              }
                              zip.Save();
                        }

After the modification, it really can UnicornViewer open, but there is a problem, if you open the zip package this modified using 7-zip, found inside the folder name is garbled.

I tried a lot of methods are useless, but finally the old way, the 7-zip packaged zip and zip code is packaged separately read out, and then compare zipentry properties, finally found the problem. 7-zip packaged zip, the BitField ZipEntry property is 0, and the code packaged zip, this property is 2048. But when trying to modify BitField in code, suggesting it is a read-only attribute can not be modified.

Dotnetzip checked the documents, saying:

The bitfield for the entry as defined in the zip spec. You probably never need to look at this.

.........

You probably do not need to concern yourself with the contents of this property, but in case you do:

The following explains the meaning of each bit, and the problem here which is related to the first 11:

11 Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file must be encoded using UTF-8. This library currently does not support UTF-8.

This estimate is at play. Read-only attribute how to do? Fortunately Dotnetzip is open source, find the source code from Github, after downloading, find the code, modify the read-only writable (original only get, plus a set, engaged .net should be well understood), and then recompile . Compile Shihai slight problem. The first is not able to find the key, which is easy to handle, modify the project properties, remove the "signature" one on it. Then say a vbs execution error. Checked, the original prebuild in the definition of a vbs (VBScript code files), find this vbs, performed manually, suggesting no directory or something, the successful implementation of the revised path. Since vbs has been executed, put prebuild in the command line to remove and re-compile successfully. Then re-introduction of this new Ionic.Zip.dll in their project, and the above code added, modified to:

                        the using (ZIP = ZipFile.Read the ZipFile ( "foo.uvz")) 
                        { 
                              var list = zip.ToList (); // To modify the properties because the zipentry, with foreach be wrong, so here again the first turn to traverse the list 
                              for (int I = 0; I <list.Count; I ++) 
                             { 
                                 String List in oldName = [I] .FileName; 
                                 byte [] bytes = Encoding.GetEncoding ( "GB2312") the GetBytes (oldName.ToCharArray ());. 
                                 var newName Encoding.GetEncoding = ( "IBM437") the GetString (bytes);. 
                                 List [I] = .FileName newName; 
                                 List [I] = .AlternateEncoding Encoding.GetEncoding ( "IBM437");
                                 list[i].BitField = 0;
                              }
                              zip.Save();
                        }

Retry by, indeed greatly improved performance.

The main harvest write this program is garbled a little too much to understand. The so-called hash, is used to encode and decode different Encoding way. So to find the correct original Encoding and decoding Encoding, and then using the above conversion on it. For example, for garbled "╩└╜τ╡┌╥╗╬╗┴ ∙ ╣┌═⌡╒╘╣ · ╚┘╩╡╒╜╫¿╝¡", you can use the following code to get the right Chinese characters:

string strWrong = "╩└╜τ╡┌╥╗╬╗┴∙╣┌═⌡╒╘╣·╚┘╩╡╒╜╫¿╝¡";
byte[] bytes = Encoding.GetEncoding("IBM437").GetBytes(strWrong);
string strRight = Encoding.GetEncoding("gb2312").GetString(bytes);

The problem is that it is difficult to know what the original encoding, decoding and what should go with. Like the stackoverflow says:

Notice: as already pointed out "determine encoding" makes sense only for byte streams. If you have a string it is already encoded from someone along the way who already knew or guessed the encoding to get the string in the first place. (https://stackoverflow.com/questions/1025332/determine-a-strings-encoding-in-c-sharp)

That is only to check the encoded byte stream. If you have good coding, such as the above code snippet, that only one Encoding to try. Of course, intelligent people than the machine does not need to traverse all the code combinations, and test several possible encoding on the line. Distortion such as Chinese, generally decoding gb2312 / gbk, big5, hz, utf-8 what encoding mostly IBM437, IBM852 something.

Guess you like

Origin www.cnblogs.com/badnumber/p/12131184.html