Printing Chinese characters in pdfbox

Mirko :

I'm using the following set-up:

  • Java 11.0.1

  • pdfbox 2.0.15

Objective: Rendering a pdf that contains Chinese characters

Problem: java.lang.IllegalArgumentException: U+674E is not available in this font's encoding: WinAnsiEncoding

I already tried:

  • Using different fonts for Chinese character support. The latest one is NotoSansCJKtc-Regular.ttf

  • Set font to unicode as described here: Java: Write national characters to PDF using PDFBox, however the used loadTTF method is deprecated.

  • Using Arial-Unicode-MS_4302.ttf

My code looks like this (shortened a bit):

try (InputStream pdfIn = inputStream; PDDocument pdfDocument =
             PDDocument.load(pdfIn)) {

      PDFont formFont;
      //Check if Chinese characters are present
      if (!Util.containsHanScript(queryString)) {
        formFont = PDType0Font.load(pdfDocument,
            PdfReportGenerator.class.getResourceAsStream("LiberationSans-Regular.ttf"),
            false);
      } else {
        formFont = PDType0Font.load(pdfDocument,
            PdfReportGenerator.class.getResourceAsStream("NotoSansCJKtc-Regular.ttf"),
            false);
      }

        List<PDField> fields = acroForm.getFields();

        //Load fields into Map
        Map<String, PDField> pdfFields = new HashMap<>();
        for (PDField field : fields) {
          String key = field.getPartialName();
          pdfFields.put(key, field);
        }

        PDField currentField = pdfFields.get("someFieldID");
        PDVariableText pdfield = (PDVariableText) currentField;

        PDResources res = acroForm.getDefaultResources();
        String fontName = res.add(formFont).getName();
        String defaultAppearanceString = "/" + fontName + " 10 Tf 0 g";

        pdfield.setDefaultAppearance(defaultAppearanceString);
        pdfield.setValue("李柱");

      acroForm.flatten(fields, true);

      ByteArrayOutputStream pdfOut = new ByteArrayOutputStream();
      pdfDocument.save(pdfOut);
}

Expected result: Chinese characters on pdf.

Actual result: java.lang.IllegalArgumentException: U+674E is not available in this font's encoding: WinAnsiEncoding

So my question is about how to best support rendering of Chinese characters with pdfbox. Any help is appreciated.

Tilman Hausherr :

The following code works for me, it uses the file of PDFBOX-4629:

PDDocument doc = PDDocument.load(new URL("https://issues.apache.org/jira/secure/attachment/12977270/Report_Template_DE.pdf").openStream());
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDVariableText field = (PDVariableText) acroForm.getField("search_query");
List<PDField> fields = acroForm.getFields();
PDFont font = PDType0Font.load(doc, new FileInputStream("c:/windows/fonts/arialuni.ttf"), false);

PDResources res = acroForm.getDefaultResources();
String fontName = res.add(font).getName();
String defaultAppearanceString = "/" + fontName + " 10 Tf 0 g";

field.setDefaultAppearance(defaultAppearanceString);
field.setValue("李柱");

acroForm.flatten(fields, true);
doc.save("saved.pdf");
doc.close();

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=126468&siteId=1