Solve the problem that the decompressed zip file in python3 is a garbled file name

In the zip standard, the encoding of the file name is not unicode, but may be used by various software according to the default character set of the system (this is a guess), so when the zipfile is detected according to the file flag, only cp437 and utf are supported. -8.

Specifically, find the source code of zipfile.py and find the following code:

1: if flags & 0x800:
2: # UTF-8 file names extension
3: filename = filename.decode('utf-8')
4: else:
5: # Historical ZIP filename encoding
6: filename = filename.decode('cp437')

It can be seen that except when the encoding is correctly identified as utf8, it will be identified and decoded as cp437 encoding, but if it is actually gbk and other encodings, it will become garbled. So the solution is to manually convert to the correct encoding after being decoded to cp437.

The specific code is as follows:

#Modify the code
if flags & 0x800:
 # UTF-8 file names extension
 filename = filename.decode('utf-8')
else:
 # Historical ZIP filename encoding
 filename = filename.decode('cp437')
 #Revise
 filename = filename.encode("cp437").decode('gbk')

The latter is also modified in the same way

if zinfo.flag_bits & 0x800:
 # UTF-8 filename
 fname_str = fname.decode("utf-8")
else:
 fname_str = fname.decode("cp437")
 #Revise
 fname_str = fname_str.encode("cp437").decode('gbk')

Pro test is effective!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325686698&siteId=291194637