Java poi operation word and some source code parsing (notes)

Java poi operation word

  Two days ago to do the project wrote a little Java POI-related content on the operation word today to do a little summary, poi powerful.

Java doc read the document

  doc version of the document is the office 97-03 word document, after 2007 docx format is the old version poi support binary file formats such as doc, xls, ppt and other file types, since poi3.5 play, POI support the new version of OOXML file formats such as docx, xlsx, pptx and other documents. So I use an older version poi, 3.17 version.
  Let's look at the code to read and print doc document content, is very simple.

public void readAndPrintDoc() throws IOException  {
		WordExtractor wordExtractor = new WordExtractor(new FileInputStream(new File("doc文件地址")));
        String text = wordExtractor.getText();
        System.out.println(text);
}

  It is very simple, WordExtractor class where I used to operate the document, there is another way, talk a while, let us understand what WordExtractor class.
WordExtractor principal method
  The figure is the main method of WordExtractor, I do not bother typing it in the idea in the screenshot, you will see.
  The first four construction method is a method, there can be seen different parameters, substantially all to the lowermost HWPFDocument assignment doc type parameters, another operation method of a document previously mentioned doc is operated by class HWPFDocument and that is something, now property doc has been assigned by the constructor of the value of finished, I picked a few ways to see it posted.
  main () method ignores, in fact, to find a word document by the array pass in, and then all the contents of the document output. Code does not impress, very simple, call as follows.

String[] strings = {文档地址};//这是一个数组,源码里通过这个数组的第一位定义到文件地址,所以只填一个就行
WordExtractor.main(strings);

  getParagraphText () method, all of the acquired text paragraphs, returns an array of strings.

    public String[] getParagraphText() {
        String[] ret;

        // Extract using the model code
        try {
            Range r = doc.getRange();

            ret = getParagraphText( r );
        } catch ( Exception e ) {
            // Something's up with turning the text pieces into paragraphs
            // Fall back to ripping out the text pieces
            ret = new String[1];
            ret[0] = getTextFromPieces();
        }

        return ret;
    }

    protected static String[] getParagraphText( Range r ) {
        String[] ret;
        ret = new String[r.numParagraphs()];
        for ( int i = 0; i < ret.length; i++ ) {
            Paragraph p = r.getParagraph( i );
            ret[i] = p.text();

            // Fix the line ending
            if ( ret[i].endsWith( "\r" )) {
                ret[i] = ret[i] + "\n";
            }
        }
        return ret;
    }

  Another method previously mentioned here and some relations, the principle is the same, are to be acquired by HWPFDocument Range class object class, which is the core of HWPFDocument to get to all the paragraph text string through it, then became we operate.
  WordExtractor operation acquisition classes paragraphs

		WordExtractor wordExtractor = new WordExtractor(new FileInputStream(new File("doc文件地址")));
		String[] paragraphText = wordExtractor.getParagraphText();
        System.out.println("文章标题:" + paragraphText[0]);
        System.out.println("文章段落数:" + paragraphText.length);
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < paragraphText.length; i++) {
            sb.append(paragraphText[i]);
        }
        System.out.println(sb);
		wordExtractor.close();

  HWPFDocument operation

		HWPFDocument document = new HWPFDocument(new FileInputStream(new File("src\\main\\resources\\templates\\sldkfj.doc")));
        Range range = document.getRange();
        System.out.println("文章段落数:" + range.numParagraphs());
        System.out.println("文章标题:" + range.getParagraph(0).text());
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < range.numParagraphs(); i++) {
            sb.append(range.getParagraph(i));
        }
		System.out.println(sb);

  getText method is relatively ink, remember it is a method to obtain the full text on the line. HWPFDocument method is more complicated, but also more powerful function, it is in fact WordExtractor had a simple package, allowing users to get the document content is more simple and convenient.
  Use HWPFDocument perform basic tasks.

		HWPFDocument document = new HWPFDocument(new FileInputStream(new File(文件地址)));
        Range range = document.getRange();
        //书签操作
        Bookmarks bookmarks = document.getBookmarks();
        System.out.println("书签数量:" + bookmarks.getBookmarksCount());
        for (int i = 0; i < bookmarks.getBookmarksCount(); i++) {
            Bookmark bookmark = bookmarks.getBookmark(i);
            System.out.println("书签" + i + "名:" + bookmark.getName());
            System.out.println("开始位置:" + bookmark.getStart());
            System.out.println("结束位置:" + bookmark.getEnd());
        }
        //表格操作
        TableIterator tableIterator = new TableIterator(range);
        Table table;
        TableRow tableRow;
        TableCell tableCell;
        while (tableIterator.hasNext()) {
            table = tableIterator.next();
            int rowNum = table.numRows();
            for (int j = 0; j < rowNum; j++) {
                tableRow = table.getRow(j);
                int cellNum = tableRow.numCells();
                for (int k = 0; k < cellNum; k++) {
                    tableCell = tableRow.getCell(k);
                    //输出单元格的文本
                    System.out.println(tableCell.text().trim());
                }
            }
        }

  Make a small example of it, to write a small table
Generate a template
Replace results

		File file = new File(文件路径);
        HWPFDocument doc = new HWPFDocument(new FileInputStream(file));
        Range range = doc.getRange();
        range.replaceText("生成", "个人信息");
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
        range.replaceText("shijian",simpleDateFormat.format(new Date()));
        range.replaceText("Name", "nb爽");
        range.replaceText("Age", "123");
        range.replaceText("Gender","爷们");
        doc.write(file);
        doc.close();

  I would like to find more doc writing method, and unfortunately, find POI to write still relatively junior, can not write more complex word document, but also a way to generate complex word document, the word is saved to xml, then translated into FreeMarker template, generated by FreeMarker Doc document, you can also go to achieve through Jacob, but I did not study, and other studies before starting to make a blog.

Java operations Docx

  Simple Docx file is as follows

		//打印内容及一些信息
        XWPFDocument docx = new XWPFDocument(new FileInputStream("src\\main\\resources\\templates\\sldkfj.docx"));
        XWPFWordExtractor extractor = new XWPFWordExtractor(docx);
        System.out.println(extractor.getText());
        POIXMLProperties.CoreProperties coreProperties = extractor.getCoreProperties();
        System.out.println("分类:" + coreProperties.getCategory());
        System.out.println("创建者:" + coreProperties.getCreator());
        System.out.println("创建时间:" + coreProperties.getCreated());
        System.out.println("标题:" + coreProperties.getTitle());
		docx.close();

  The above operation is performed by XWPFWordExtractor, by the following simple operation XWPFDocument. Print paragraph, table of contents, headers and footers.

		XWPFDocument docx = new XWPFDocument(new FileInputStream(文件地址));
        //段落打印
        List<XWPFParagraph> paragraphs = docx.getParagraphs();
        for (XWPFParagraph paragraph : paragraphs) {
            System.out.println(paragraph.getText());
        }
        //获取表格
        List<XWPFTable> tables = docx.getTables();
        List<XWPFTableRow> rows;
        List<XWPFTableCell> cells;
        for (XWPFTable table : tables) {
            rows = table.getRows();
            for (XWPFTableRow row : rows) {
                cells = row.getTableCells();
                for (XWPFTableCell cell : cells) {
                    System.out.println(cell.getText());
                }
            }
        }
        //页脚
        List<XWPFFooter> footerList = docx.getFooterList();
        for (XWPFFooter xwpfFooter : footerList) {
            System.out.println(xwpfFooter.getText());
        }

        //页眉
        List<XWPFHeader> headerList = docx.getHeaderList();
        for (XWPFHeader xwpfHeader : headerList) {
            System.out.println(xwpfHeader.getText());
        }
        docx.close();

  Write operation can be better than the doc file number, you do not need to completely re-create the document doc replacement value, a new direct XWPFDocument on the line.

        XWPFDocument document = new XWPFDocument();
        XWPFParagraph paragraph = document.createParagraph();
        // 具有相同属性的一个区域
        XWPFRun run = paragraph.createRun();
        //设置粗体
        run.setBold(true);
        //设置内容
        run.setText("hahahahaha");
        run = paragraph.createRun();
        //设置红色
        run.setColor("FF0000");
        run.setText("red color");
        //写一个表格 3*3
        XWPFTable table = document.createTable(3, 3);
        //新增一行
        table.createRow();
        List<XWPFTableRow> rows = table.getRows();
        //表格属性
        CTTblPr ctTblPr = table.getCTTbl().addNewTblPr();
        //表格宽度
        CTTblWidth ctTblWidth = ctTblPr.addNewTblW();
        ctTblWidth.setW(BigInteger.valueOf(10000));
        List<XWPFTableCell> cells;
        XWPFTableCell cell;
        int i = 0;
        for (XWPFTableRow row : rows) {
            //添加一单元格
            row.addNewTableCell();
            //行高
            row.setHeight(500);
            cells = row.getTableCells();
            for (XWPFTableCell tableCell : cells) {
                tableCell.setColor("FF0000");
                //单元格属性
                CTTcPr ctTcPr = tableCell.getCTTc().addNewTcPr();
                ctTcPr.addNewVAlign().setVal(STVerticalJc.CENTER);
                CTTblWidth ctTblWidth1 = ctTcPr.addNewTcW();
                ctTblWidth1.setW(BigInteger.valueOf(1000));
                tableCell.setText("第" + ++i);
            }
        }
        FileOutputStream fileOutputStream = new FileOutputStream("D:\\writeDocx.docx");
        document.write(fileOutputStream);
        document.close();
        fileOutputStream.close();

  Small examples and doc similar
template
result
  amount, two shots in the alphabet to ignore it, in fact better to use the wording is ${}doing a placeholder, more clarity, but I only plan a convenient useless to use.

to sum up

  Eloquent also wrote a lot of words, there are better written, but I did not use, or need to have a lot of improvement until the subsequent update of it, so be it.

Published 26 original articles · won praise 2 · Views 2343

Guess you like

Origin blog.csdn.net/qq_42909545/article/details/102565102