Python-I heard it’s better than the paid version (PDF to Word file)

renderings

PDF file

Converted word file

Tools & Techniques

GUI graphics technology: PySimpleGUI For detailed description, please see the previous article  Python-Write a gif image generator (a small artifact for fighting pictures)

pdf2docx library

This is a third-party library that exists on github . It can help us convert pdf into word files very easily. It is generally normal. All can be converted directly, but complex ones may have some formatting issues, and it may not be easy to use if they are scanned copies. In general, it is enough to solve the scenarios used in ordinary daily life

Install

pip install pdf2docx

code

Draw a graphical interface that meets your expectations

def create_layout():
    # 设置主题
    sg.change_look_and_feel("GreenMono")

    # 设置内容
    layout = [
        [sg.InputText(key="in_file"), sg.FileBrowse('选择PDF文件', button_color=sg.GREENS[0])],
        [sg.InputText(key="out_file"), sg.FolderBrowse('选择输入目录')],
        [sg.Button("开始生成", button_color=(sg.YELLOWS[0], sg.BLUES[0])), sg.Button("关闭")],
        [sg.Output(size=(80, 20))]
    ]
    return layout

Simple verification of the input and output directories entered on the interface

# 校验参数
def check_file(in_file, out_file):
    if in_file.endswith(".pdf")and out_file:
        return True

    if not out_file:
        print("请选择输出目录!!!")
        return False

    print("文件不符合格式,请重新选择!!!")
    return False

PDF to Word method

def pdf2doc(pdf_name, out_file):
    # 转化pdf文件
    cv = Converter(pdf_name)
    try:
        #  file_name 要转换成word的文件名   start: 开始页    end 结束页    默认是0开始到最后一页
        cv.convert(str(out_file)+"/result.docx", start=0, end=None)
    except Exception as e:
        print("转化出错:", e)
        return False
    cv.close()
    return True

The overall code is as follows

import PySimpleGUI as sg
from pdf2docx import Converter


def create_layout():
    # 设置主题
    sg.change_look_and_feel("GreenMono")

    # 设置内容
    layout = [
        [sg.InputText(key="in_file"), sg.FileBrowse('选择PDF文件', button_color=sg.GREENS[0])],
        [sg.InputText(key="out_file"), sg.FolderBrowse('选择输入目录')],
        [sg.Button("开始生成", button_color=(sg.YELLOWS[0], sg.BLUES[0])), sg.Button("关闭")],
        [sg.Output(size=(80, 20))]
    ]
    return layout


# 校验参数
def check_file(in_file, out_file):
    if in_file.endswith(".pdf")and out_file:
        return True

    if not out_file:
        print("请选择输出目录!!!")
        return False

    print("文件不符合格式,请重新选择!!!")
    return False


def pdf2doc(pdf_name, out_file):
    # 转化pdf文件
    cv = Converter(pdf_name)
    try:
        #  file_name 要转换成word的文件名   start: 开始页    end 结束页    默认是0开始到最后一页
        cv.convert(str(out_file)+"/result.docx", start=0, end=None)
    except Exception as e:
        print("转化出错:", e)
        return False
    cv.close()
    return True


if __name__ == '__main__':
    layout = create_layout()
    window = sg.Window("欢迎使用pdf转word神器1.0.0版本!!!", layout)
    while True:
        event, values = window.read()
        if event in [None, "关闭", "exit"]:
            break

        if event == "开始生成":
            in_file = values["in_file"]
            out_file = values["out_file"]
            is_success = check_file(in_file, out_file)
            if is_success:
                pdf2doc(in_file, out_file)
                print("word文件生成成功!!!!")
                print("生成目录为:",str(out_file)+"/result.docx")
    window.close()

The program code can be copied and run directly, and the above renderings will appear after running! ! !

The Record of Programmers and Investment Life has been renamed Programmer Zhiqiu, which is the same as the WX official account. Welcome to pay attention!

Guess you like

Origin blog.csdn.net/qq_25702235/article/details/130122943