使用python-docx读取doc,docx文档

API:    http://python-docx.readthedocs.io/en/latest/#api-documentation

将doc转为docx:

        from win32com import client as wc

        word = wc.Dispatch("Word.Application")

        doc = word.Documents.Open(路径+名称.doc)

        doc.SaveAs(路径+名称.docx, 12)   12为docx

        doc.Close()

        word.Quit()

读取段落:

        import docx

        docStr = Document(docName)   打开文档

扫描二维码关注公众号,回复: 1260504 查看本文章

        for paragraph in docStr.paragraphs:

                parStr = paragraph.text

                --》paragraph.style.name == 'Heading 1'  一级标题   

                --》paragraph.paragraph_format.alignment == 1  居中显示

                --》paragraph.style.font.color



读取表格:

        numTables = docStr.tables

        for table in numTables:

                #行列个数

                row_count = len(table.rows)

                col_count = len(table.columns)

                for i in range(row_count):

                        row = table.rows[i].cells

                        i行j列内容:row[j].text

           或者:

                    row_count = len(table.rows)
                    col_count = len(table.columns)
                    for i in range(row_count):
                            for j in range(col_count):
                                    print(table.cell(i,j).text)



猜你喜欢

转载自blog.csdn.net/qq_22521211/article/details/80278371