table of Contents
Problem Description:
java backend Apache POI export of Word, Word template when it comes to the two merge, the merged file open questions in the following figure appears!
Problems found:
Find a circle in the last blog java development doxc download prompts The file is corrupt and can not be opened to find a solution.
Troubleshooting:
The blog is mentioned in
/** 获取指定标签中的内容
* @param xml
* @param label
* @return
*/
public static String regex(String xml, String label) {
String context = "";
// 正则表达式
String rgex = "<" + label + "[^>]*>((?:(?!<\\/" + label + ">)[\\s\\S])*)<\\/" + label + ">";
Pattern pattern = Pattern.compile(rgex);// 匹配的模式
Matcher m = pattern.matcher(xml);
// 匹配的有多个
List<String> list = new ArrayList<String>();
while (m.find()) {
int i = 1;
list.add(m.group(i));
i++;
}
if (list.size() > 0) {
// 输出内容自己定义
context = String.valueOf(list.size());
}
return context;
}
Then check when POI merge POI will look into the Word xml w:sectPr
tag case
/** 两个对象进行追加
* 2019-06-26 houzw添加
* @param src 目标文档
* @param append 子文档
* @return
* @throws Exception
*/
public static XWPFDocument mergeWord(XWPFDocument src, XWPFDocument append) throws Exception {
// XWPFParagraph paragraph = src.createParagraph();
// //设置分页符
// paragraph.setPageBreak(true);
CTBody src1Body = src.getDocument().getBody();
CTBody src2Body = append.getDocument().getBody();
List<XWPFPictureData> allPictures = append.getAllPictures();
// 记录图片合并前及合并后的ID
Map<String, String> map = new HashMap();
for (XWPFPictureData picture : allPictures) {
String before = append.getRelationId(picture);
// 将原文档中的图片加入到目标文档中
String after = src.addPictureData(picture.getData(), Document.PICTURE_TYPE_PNG);
map.put(before, after);
}
appendBody(src1Body, src2Body, map);
return src;
}
private static void appendBody(CTBody src, CTBody append, Map<String, String> map) throws Exception {
XmlOptions optionsOuter = new XmlOptions();
optionsOuter.setSaveOuter();
String appendString = append.xmlText(optionsOuter);
String srcString = src.xmlText();
String regex = regex(srcString, "w:sectPr");
System.out.println(regex);
String prefix = srcString.substring(0, srcString.indexOf(">") + 1);
String mainPart = srcString.substring(srcString.indexOf(">") + 1, srcString.lastIndexOf("<"));
String sufix = srcString.substring(srcString.lastIndexOf("<"));
String addPart = appendString.substring(appendString.indexOf(">") + 1, appendString.lastIndexOf("<"));
if (map != null && !map.isEmpty()) {
// 对xml字符串中图片ID进行替换
for (Map.Entry<String, String> set : map.entrySet()) {
addPart = addPart.replace(set.getKey(), set.getValue());
}
}
// 将两个文档的xml内容进行拼接
CTBody makeBody = CTBody.Factory.parse(prefix + mainPart + addPart + sufix);
src.set(makeBody);
}
Three word documents merge, the resulting output 2 means that after the first two documents merge w:sectPr
tag has two. Because I was before the output of the target template merge w:sectPr
label situation.
Address specific method
Remove the additional word content w:sectPr
label to ensure that only a synthesis of word w:sectPr
tags
String rgex = "<[\\s]*?w:sectPr[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?w:sectPr[\\s]*?>";
appendString = appendString.replaceAll(rgex, "");
So far the problem is solved! Question is always one-sided, and sincerely hope you Tell me treatise, there are new ideas message me, we can talk about.