L'écriture de fichier pdf PyPDF2 provoque l'erreur suivante: PyPDF2.utils.PdfReadError: caractère illégal dans l'objet de nom

 

Aujourd'hui, l'apprentissage du fichier pdf PyPDF2 pour écrire d'autres fichiers pdf désignés affiche le message d'erreur suivant:

Traceback (most recent call last):
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "D:\python35\Lib\apps\backstage\views\busi_contract_manage_view.py", line 703, in post
    merge_pdf_result = merge_pdf(final_files, pdf_path)
  File "D:\python35\Lib\apps\utils\doc_convert_util.py", line 86, in merge_pdf
    pdf_writer.write(new_file)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "D:\python35\Lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "D:\python35\Lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object
  • En analysant le rapport d'erreur ci-dessus, on peut voir que l'erreur provient de E: \ python_workspace \ TornadoDemo \ venv \ Lib \ site-packages \ PyPDF2 \ generic.py ", ligne 484. Ligne 484 du fichier generic.py, le le contenu original est:
try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    # Name objects should represent irregular characters
    # with a '#' followed by the symbol's hex number
    if not pdf.strict:
        warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
        return NameObject(name)
    else:
        raise utils.PdfReadError("Illegal character in Name Object")
  • Le contenu original ci-dessus doit être modifié comme suit:
try:
            return NameObject(name.decode('utf-8'))
        except (UnicodeEncodeError, UnicodeDecodeError) as e:
            # Name objects should represent irregular characters
            # with a '#' followed by the symbol's hex number
            try:
                return NameObject(name.decode('gbk'))
            except (UnicodeEncodeError, UnicodeDecodeError) as e:
                if not pdf.strict:
                    warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
                    return NameObject(name)
                else:
                    raise utils.PdfReadError("Illegal character in Name Object")
  • Ensuite, modifiez la ligne 238 dans le fichier utils.py. Le contenu original de la ligne 238 dans le fichier utils.py est le suivant:
r = s.encode('latin-1')
if len(s) < 2:
    bc[s] = r
return r
  • Le contenu original ci-dessus doit être modifié comme suit:
try:
    r = s.encode('latin-1')
except Exception as e:
    r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

 

pypdf2 Spécifiez le fichier pdf à écrire dans d'autres fichiers pdf

# encoding:utf-8
from PyPDF2 import PdfFileReader, PdfFileWriter

readFile = 'D:\\1.pdf'
outFile = 'D:\\2.pdf'
pdfFileWriter = PdfFileWriter()

# 获取 PdfFileReader 对象
pdfFileReader = PdfFileReader(readFile)  # 或者这个方式:pdfFileReader = PdfFileReader(open(readFile, 'rb'))
# 文档总页数
numPages = pdfFileReader.getNumPages()


for index in range(0, numPages):
    pageObj = pdfFileReader.getPage(index)
    pdfFileWriter.addPage(pageObj)
    # 添加完每页,再一起保存至文件中
    pdfFileWriter.write(open(outFile, 'wb'))
pdfFileWriter.addBlankPage()
pdfFileWriter.addBlankPage()
pdfFileWriter.write(open(outFile, "wb"))

 

Je suppose que tu aimes

Origine blog.csdn.net/zhouzhiwengang/article/details/112727647
conseillé
Classement