PyPDF2中文配置

PyPDF2 中文设置

PyPDF2 默认是 Latin-1 编码的,当处理中文文档的时候就会报错。

本文内容 Linux 与 Windows 通用 已测试

快速方法:(覆盖文件)

配置文件下载
将下载的 generic.pyutils.py复制到 目录C:\Users\currentuser\AppData\Local\Programs\Python\Python39\Lib\site-packages\PyPDF2下即可
Linux下find一下

自定义:(自己修改配置文件)

utils.py 内大概240行左右的内容:

 r = s.encode('latin-1')
 if len(s) < 2:
   		bc[s] = r
 return r

修改为

r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

generic.py 大概480行左右的内容

try:
   return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
   # Name objects should represent irregular characters
   # with a '#' followed by the symbol's hex number
   if not pdf.strict:
      warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
      return NameObject(name)
   else:
      raise utils.PdfReadError("Illegal character in Name Object")

修改为

try:
	return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
	try:
		return NameObject(name.decode('gbk'))
	except (UnicodeEncodeError, UnicodeDecodeError) as e:
		# Name objects should represent irregular characters
		# with a '#' followed by the symbol's hex number
		if not pdf.strict:
			warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
			return NameObject(name)
		else:
			raise utils.PdfReadError("Illegal character in Name Object")

文章内容结束,以上内容在2021年01月09日 Windows 与 Linux 平台下 均测试通过

猜你喜欢

转载自blog.csdn.net/qq_41238308/article/details/108572771