In the zip standard, the encoding of the file name is not unicode, but may be used by various software according to the default character set of the system (this is a guess), so when the zipfile is detected according to the file flag, only cp437 and utf are supported. -8.
Specifically, find the source code of zipfile.py and find the following code:
1: if flags & 0x800:
2: # UTF-8 file names extension
3: filename = filename.decode('utf-8')
4: else:
5: # Historical ZIP filename encoding
6: filename = filename.decode('cp437')
It can be seen that except when the encoding is correctly identified as utf8, it will be identified and decoded as cp437 encoding, but if it is actually gbk and other encodings, it will become garbled. So the solution is to manually convert to the correct encoding after being decoded to cp437.
The specific code is as follows:
#Modify the code if flags & 0x800: # UTF-8 file names extension filename = filename.decode('utf-8') else: # Historical ZIP filename encoding filename = filename.decode('cp437') #Revise filename = filename.encode("cp437").decode('gbk')
The latter is also modified in the same way
if zinfo.flag_bits & 0x800: # UTF-8 filename fname_str = fname.decode("utf-8") else: fname_str = fname.decode("cp437") #Revise fname_str = fname_str.encode("cp437").decode('gbk')
Pro test is effective!