Android uses ApachePOI components to read and write Worddoc and docx files

Recently, in the project to generate doc and docx files of Word, after some Baidu google, I found that the mainstream implemented by java language is the POI component of Apache. In addition to POI, there is another implementation here , but I didn't study it, and interested students can study it.

About POI , you can visit the official website of Apache POI for detailed information.

enter the theme!

Since only the components of doc and docx are used in the project, the following only introduces the use of these two components

1. How to use POI components in Android Studio

From the POI official website, it seems that IntelliJ IDE is not supported for the time being, as shown in the figure below, so here we use the method of directly downloading the jar package and importing the project.

Official websiteHOWTOBUILD

Through the official website- >Overview->Components, you can see that the d and docx files correspond to the components HWPF and XWPF respectively , while HWPF and XWPF correspond to poi-scratchpad and poi-ooxml

file type component name MavenId
doc HWPF then-scratchpad
docx XWPF poi-ooxml

write picture description here


download

Go to the Apache download page and select the latest version to download, as follows. Select The latest beta release is Apache POI 3.16-beta2 will jump to poi-bin-3.16-beta2-20170202.tar.gz, then click poi-bin-3.16-beta2-20170202.tar.gz, select the mirror Downloaded successfully.

Note:
Linux system selection.tar.gz
Windows system selection.zip

write picture description here


decompress

Unzip the downloaded compressed package and you will get the following files.

file (folder) name effect
docs Documentation (including API documentation and how to use and version information)
lib doc function implementation depends on the package
ooxml-lib Docx function implementation depends on the package
LICENSE
NOTICE
poi-3.16-beta2.jar The prerequisite poi-scratchpad-3.16-beta2.jar
poi-examples-3.16-beta2.jar unclear
poi-excelant-3.16-beta2.jar Excel function implementation
poi-ooxml-3.16-beta2.jar docx function implementation
poi-ooxml-schemas-3.16-beta2.jar The prerequisite of poi-ooxml-3.16-beta2.jar
poi-scratchpad-3.16-beta2.jar doc function implementation

write picture description here


import

Students who are not familiar with how to import can take a look at the Android Studio import jar package tutorial
1. doc
For doc files, you need to put the jar package, poi-3.16-beta2.jar, poi-scratchpad-3.16-beta2.jar in the lib folder into Under the libs directory of the android project (junit-4.12.jar and log4j-1.2.17.jar in the lib folder are not placed in my project, and there is no exception, it can be less).

write picture description here

2. docx
For docx, you need to import the jar packages under the lib folder, poi-3.16-beta2.jar, poi-ooxml-3.16-beta2.jar, poi-ooxml-schemas-3.16-beta2.jar and ooxml-lib Package , because I have been getting the error Warning: Ingoring InnerClasses attribute for an anonymous inner class , and because the doc basically meets my needs and the size of the apk increases due to importing so many jars, it is not implemented.
Interested students can do research.


Second, realize the reading and writing of doc files

The HWPF module in Apache POI is specially used to read and generate files in doc format. In HWPF, we use HWPFDocument to represent a word doc document. Before looking at the code, it is necessary to understand several concepts in HWPFDocument:

name meaning
Range Represents a range, which can be the entire document, a section, a paragraph, or a piece of text with functional attributes (CharacterRun)
Section A section of a word document, a word document can be composed of multiple sections.
Paragraph A paragraph in a word document, a section can be composed of multiple paragraphs.
CharacterRun A paragraph of text with the same properties, a paragraph can be composed of multiple CharacterRuns.
Table a form.
TableRow corresponding row of the table
TableCell corresponding cell of the table

Note: Section, Paragraph, CharacterRun and Table all inherit from Range.


Note before reading and writing : The HWPFDocument class provided by Apache POI can only read and write standard .doc files, that is to say, if you use the method of modifying the suffix name to generate the doc file or create it directly by naming, an error "Your file" will appear. appears not to be a valid OLE2 document"

Invalid header signature; read 0x7267617266202E31, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document 

doc read

There are two ways to read the doc file
(a) read the file through WordExtractor
(b) read the file through HWPFDocument

In daily applications, it is very rare for us to read information from word files, and more often we write content into word files. There are two main ways to read data from word doc files using POI: reading through WordExtractor and reading through HWPFDocument. When reading information inside WordExtractor, it is still obtained through HWPFDocument.

Read with WordExtractor

When using WordExtractor to read a file, we can only read the text content of the file and some attributes based on the document. As for the attributes of the document content, we cannot read it. If you want to read the properties of the document content, you need to use HWPFDocument to read it. Here is an example of reading a file using WordExtractor:

//通过WordExtractor读文件
public class WordExtractorTest {

   private final String PATH = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "test.doc");
   private static final String TAG = "WordExtractorTest";

   private void log(Object o) {
       Log.d(TAG, String.valueOf(o));
   }

   public void testReadByExtractor() throws Exception {
      InputStream is = new FileInputStream(PATH);
      WordExtractor extractor = new WordExtractor(is);
      //输出word文档所有的文本
      log(extractor.getText());
      log(extractor.getTextFromPieces());
      //输出页眉的内容
      log("页眉:" + extractor.getHeaderText());
      //输出页脚的内容
      log("页脚:" + extractor.getFooterText());
      //输出当前word文档的元数据信息,包括作者、文档的修改时间等。
      log(extractor.getMetadataTextExtractor().getText());
      //获取各个段落的文本
      String paraTexts[] = extractor.getParagraphText();
      for (int i=0; i<paraTexts.length; i++) {
         log("Paragraph " + (i+1) + " : " + paraTexts[i]);
      }
      //输出当前word的一些信息
      printInfo(extractor.getSummaryInformation());
      //输出当前word的一些信息
      this.printInfo(extractor.getDocSummaryInformation());
      this.closeStream(is);
   }

   /**
    * 输出SummaryInfomation
    * @param info
    */
   private void printInfo(SummaryInformation info) {
      //作者
      log(info.getAuthor());
      //字符统计
      log(info.getCharCount());
      //页数
      log(info.getPageCount());
      //标题
      log(info.getTitle());
      //主题
      log(info.getSubject());
   }

   /**
    * 输出DocumentSummaryInfomation
    * @param info
    */
   private void printInfo(DocumentSummaryInformation info) {
      //分类
      log(info.getCategory());
      //公司
      log(info.getCompany());
   }

   /**
    * 关闭输入流
    * @param is
    */
   private void closeStream(InputStream is) {
      if (is != null) {
         try {
            is.close();
         } catch (IOException e) {
            e.printStackTrace();
         }
      }
   }
}
Read using HWPFDocument

HWPFDocument is the representative of the current Word document, and its function is stronger than WordExtractor. Through it, we can read tables, lists, etc. in the document, and can also add, modify and delete the content of the document. Only after these additions, modifications and deletions are completed, the relevant information is saved in the HWPFDocument, that is to say, what we change is the HWPFDocument, not the file on the disk. If we want these modifications to take effect, we can call the write method of HWPFDocument to output the modified HWPFDocument to the specified output stream. This can be the output stream of the original file, the output stream of the new file (equivalent to Save As), or another output stream. Here is an example of reading a file via HWPFDocument:

//使用HWPFDocument读文件
public class HWPFDocumentTest {

   private final String PATH = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "test.doc");
   private static final String TAG = "HWPFDocumentTest";

   private void log(Object o) {
       Log.d(TAG, String.valueOf(o));
   }

   public void testReadByDoc() throws Exception {
      InputStream is = new FileInputStream(PATH);
      HWPFDocument doc = new HWPFDocument(is);
      //输出书签信息
      this.printInfo(doc.getBookmarks());
      //输出文本
      log(doc.getDocumentText());
      Range range = doc.getRange();
      //读整体
      this.printInfo(range);
      //读表格
      this.readTable(range);
      //读列表
      this.readList(range);
      this.closeStream(is);
   }

   /**
    * 关闭输入流
    * @param is
    */
   private void closeStream(InputStream is) {
      if (is != null) {
         try {
            is.close();
         } catch (IOException e) {
            e.printStackTrace();
         }
      }
   }

   /**
    * 输出书签信息
    * @param bookmarks
    */
   private void printInfo(Bookmarks bookmarks) {
      int count = bookmarks.getBookmarksCount();
      log("书签数量:" + count);
      Bookmark bookmark;
      for (int i=0; i<count; i++) {
         bookmark = bookmarks.getBookmark(i);
         log("书签" + (i+1) + "的名称是:" + bookmark.getName());
         log("开始位置:" + bookmark.getStart());
         log("结束位置:" + bookmark.getEnd());
      }
   }

   /**
    * 读表格
    * 每一个回车符代表一个段落,所以对于表格而言,每一个单元格至少包含一个段落,每行结束都是一个段落。
    * @param range
    */
   private void readTable(Range range) {
      //遍历range范围内的table。
      TableIterator tableIter = new TableIterator(range);
      Table table;
      TableRow row;
      TableCell cell;
      while (tableIter.hasNext()) {
         table = tableIter.next();
         int rowNum = table.numRows();
         for (int j=0; j<rowNum; j++) {
            row = table.getRow(j);
            int cellNum = row.numCells();
            for (int k=0; k<cellNum; k++) {
                cell = row.getCell(k);
                //输出单元格的文本
                log(cell.text().trim());
            }
         }
      }
   }

   /**
    * 读列表
    * @param range
    */
   private void readList(Range range) {
      int num = range.numParagraphs();
      Paragraph para;
      for (int i=0; i<num; i++) {
         para = range.getParagraph(i);
         if (para.isInList()) {
            log("list: " + para.text());
         }
      }
   }

   /**
    * 输出Range
    * @param range
    */
   private void printInfo(Range range) {
      //获取段落数
      int paraNum = range.numParagraphs();
      log(paraNum);
      for (int i=0; i<paraNum; i++) {
         log("段落" + (i+1) + ":" + range.getParagraph(i).text());
         if (i == (paraNum-1)) {
            this.insertInfo(range.getParagraph(i));
         }
      }
      int secNum = range.numSections();
      log(secNum);
      Section section;
      for (int i=0; i<secNum; i++) {
         section = range.getSection(i);
         log(section.getMarginLeft());
         log(section.getMarginRight());
         log(section.getMarginTop());
         log(section.getMarginBottom());
         log(section.getPageHeight());
         log(section.text());
      }
   }

   /**
    * 插入内容到Range,这里只会写到内存中
    * @param range
    */
   private void insertInfo(Range range) {
      range.insertAfter("Hello");
   }
}

DOC write

Writing files using HWPFDocument

When using POI to write word doc files, we must first have a doc file, because we write doc files through HWPFDocument, and HWPFDocument is attached to a doc file. So the usual practice is that we first prepare a blank doc file on the hard disk, and then create an HWPFDocument based on the blank file. After that, we can add new content to HWPFDocument, and then write it into another doc file, which is equivalent to using POI to generate a word doc file.

//写字符串进word
    InputStream is = new FileInputStream(PATH);
    HWPFDocument doc = new HWPFDocument(is);

    //获取Range
    Range range = doc.getRange();
    for(int i = 0; i < 100; i++) {
        if( i % 2 == 0 ) {
            range.insertAfter("Hello " + i + "\n");//在文件末尾插入String
        } else {
            range.insertBefore("      Bye " + i + "\n");//在文件头插入String
        }
    }
    //写到原文件中
    OutputStream os = new FileOutputStream(PATH);
    //写到另一个文件中
    //OutputStream os = new FileOutputStream(其他路径);
    doc.write(os);
    this.closeStream(is);
    this.closeStream(os);

However, in practical applications, when we generate a word file, we always generate a certain type of file. The format of this type of file is fixed, but some fields are different. So in practical applications, we don't need to generate the content of the entire word file through HWPFDocument. Instead, first create a new word document on the disk, the content of which is the content of the word file we need to generate, and then replace some of the content belonging to variables in a method similar to "${paramName}". In this way, when we generate a word file based on some information, we only need to obtain the HWPFDocument based on the word file, and then call the replaceText() method of Range to replace the corresponding variable with the corresponding value, and then write the current HWPFDocument. into a new output stream. This method is used more in practical applications, because it can not only reduce our workload, but also make the text format more clear. Let's make an example based on this method.

Suppose we have a template like this:
doc template
then we use the file as a template, use the relevant data to replace the variables in it, and then output the replaced document to another doc file. The specific methods are as follows:

public class HWPFTemplateTest {
    /**
    * 用一个doc文档作为模板,然后替换其中的内容,再写入目标文档中。
    * @throws Exception
    */

     @Test
   public void testTemplateWrite() throws Exception {
      String templatePath = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "template.doc");

      String targetPath = Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + "target.doc";
      InputStream is = new FileInputStream(templatePath);
      HWPFDocument doc = new HWPFDocument(is);
      Range range = doc.getRange();
      //把range范围内的${reportDate}替换为当前的日期
      range.replaceText("${reportDate}", new SimpleDateFormat("yyyy-MM-dd").format(new Date()));
      range.replaceText("${appleAmt}", "100.00");
      range.replaceText("${bananaAmt}", "200.00");
      range.replaceText("${totalAmt}", "300.00");
      OutputStream os = new FileOutputStream(targetPath);
      //把doc输出到输出流中
      doc.write(os);
      this.closeStream(os);
      this.closeStream(is);
   }

   /**
    * 关闭输入流
    * @param is
    */
   private void closeStream(InputStream is) {
      if (is != null) {
         try {
            is.close();
         } catch (IOException e) {
            e.printStackTrace();
         }
      }
   }

   /**
    * 关闭输出流
    * @param os
    */
   private void closeStream(OutputStream os) {
      if (os != null) {
         try {
            os.close();
         } catch (IOException e) {
            e.printStackTrace();
         }
      }
   }
}

Third, realize the reading and writing of docx files

POI reads and writes word docx files through the xwpf module, the core of which is XWPFDocument. An XWPFDocument represents a docx document, which can be used to read and write docx documents. XWPFDocument mainly contains the following objects:

object meaning
XWPFParagraph represents a paragraph
XWPFRun Represents a piece of text with the same properties
XWPFTable represents a form
XWPFTableRow a row of the table
XWPFTableCell A cell corresponding to the table

At the same time, XWPFDocument can directly create a new docx file without requiring a template to exist like HWPFDocument.

For details, please refer to the POI read and write docx file written by this classmate .


4. Summary

Everyone is welcome to make suggestions and correct possible errors in this article, thank you for your support.

WeChat public account: Yuan Shi Xoong

Welcome to pay attention, continue to produce high-quality technical articles.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325856371&siteId=291194637