题记

正常的HTML转PDF，往上一找一大把，我这里遇到的是一堆问题与条件合在一起。

访问一个接口获取HTML并且转PDF
不能用插件，服务器不能装
HTML不符合XML标准，走XML解析（render）的方式根本不行
HTML中有图片，转PDF后无法显示
更过分的是HTML中实际不是图片，是有登录拦截的JSP
字体格式转出来比较有问题（目前未解决）
HTML中有JS对标签做操作，影响展示

整个文章是我处理一个个问题的思路与细节，串起来的代码可以看最后的gitee地址

1.先将HTML和内部的JS下载到本地

先访问主页面获取cookie

RestTemplate restTemplate = new RestTemplate();
String url = "http://ip:port/easp/easPrint?boeHeaderId=35687735&type=azBoe";
ResponseEntity<String> entity  = restTemplate.getForEntity(url, String.class);
List<String> cookies = entity.getHeaders().get(SET_COOKIE);
String cookie = cookies.get(0);

解析HTML，拿到其中所有img标签的属性值

用到jsoup，这个很强大，还可以修改html的值

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

通过下面的方法可以拿到每个img标签的各种属性值

Document document =  Jsoup.connect(url).get();
Elements elementsByClass = document.getElementsByTag("img");

for (Element byClass : elementsByClass) {
    String image = byClass.attr("src");
    String id = byClass.attr("id");
    System.out.println("图片路径::"+image);
    System.out.println("id::" +id);
    String allImage = header + image;

    String realPath = path+ "/" + i +".jsp";
    download(allImage, realPath,cookie);
    replaceTxtByStr(filePath,image,file2Base64(realPath));
    i ++;
}

将远程文件下载到本地

这个没啥好特殊说的，注意带上前面拿到的cookie就可以

// 将文件下载到本地
private void download(String httpUrl, String fileName,String cookie) throws Exception {
    // 解决url中可能有中文情况
    URL url = new URL(httpUrl);
    HttpURLConnection http = (HttpURLConnection)url.openConnection();
    http.setConnectTimeout(3000);
    // 设置 User-Agent 避免被拦截
    http.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" );
    http.setRequestProperty("Accept-Encoding","gzip, deflate" );
    http.setRequestProperty("Accept-Language","zh-CN,zh;q=0.9" );
    http.setRequestProperty("Cache-Control","max-age=0" );
    http.setRequestProperty("Connection","keep-alive" );
    http.setRequestProperty("Host","10.250.34.61:8000" );
    http.setRequestProperty("Upgrade-Insecure-Requests","1" );
    http.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36" );
    http.setRequestProperty("Cookie",cookie);

    InputStream inputStream = http.getInputStream();
    http.connect();
    http.getResponseCode();
    byte[] buff = new byte[1024*10];
    File file = new File(fileName);
    System.out.println(file);
    if(!file.exists()){
        OutputStream out = new FileOutputStream(file);
        int len ;
        int count = 0; // 计数
        while((len = inputStream.read(buff)) != -1) {
            String line = new String(buff);
            out.write(buff, 0, len);
            out.flush();
            ++count ;
        }
        // 关闭资源
        out.close();
        inputStream.close();
        http.disconnect();
    }
}

2.对HTML文件做内部处理

写好一个通用的替换文件内容的方法

虽然jsoup能修改html内容，但是改完就会覆盖掉JS代码，所以只能另辟蹊径

/**
 * 替换文件中的字符串
 *
 * @param filePath
 * @param oldStr
 * @param replaceStr
 */
public void replaceTxtByStr(String filePath, String oldStr, String replaceStr) {
    int len = oldStr.length();
    StringBuffer tempBuf = new StringBuffer();
    try {
        File file = new File(filePath);
        FileInputStream fis = new FileInputStream(file);
        InputStreamReader isr = new InputStreamReader(fis);
        BufferedReader br = new BufferedReader(isr);
        StringBuffer buf = new StringBuffer();

        // 替换所有匹配的字符串
        for (String temp = null; (temp = br.readLine()) != null; temp = null) {
            if (temp.indexOf(oldStr) != -1) {
                temp = temp.replace(oldStr, replaceStr);
            }
            buf.append(temp);
            buf.append(System.getProperty("line.separator"));
        }

        br.close();
        FileOutputStream fos = new FileOutputStream(file);
        PrintWriter pw = new PrintWriter(fos);
        pw.write(buf.toString().toCharArray());
        pw.flush();
        pw.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

图片转base64

只有把base64放到html中才能跟着转成PDF，并且注意细节，后台和HTML对于base64的标准有区别，抬头不一样

private String file2Base64(String filePath){

    byte[] data ;
    try {
        FileInputStream inputStream = new FileInputStream(filePath);
        data = new byte[inputStream.available()];
        inputStream.read(data);
        inputStream.close();
    return "data:image/png;base64," + (Base64.encodeBase64String(data));
    } catch (IOException e) {
        e.printStackTrace();
        throw new RuntimeException(e);
    }

}

将html中src的路径改为base64

这里面的细节，要先下载HTML，才能处理HTML。先下载JSP文件，才能转base64

private String analysis(String path,String url,String cookie) throws Exception {
    // HTML的目录
    String filePath = path + "/orderPage.html";
    String key = "/easp/";
    Document document =  Jsoup.connect(url).get();
    Elements elementsByClass = document.getElementsByTag("img");
    download(url,filePath,cookie);
    //todo 这里这句话是为了去掉JS中对HTML操作的那句话
    replaceTxtByStr(filePath,"document.getElementById("barCodeId").src=strs[0]+"barcode.jsp?billCode="+boeNum;","");
    String[] arr = url.split(key);
    String header = arr[0] + key;
    //遍历以上列表
    int i = 0;
    for (Element byClass : elementsByClass) {
        String image = byClass.attr("src");
        String id = byClass.attr("id");
        System.out.println("图片路径::"+image);
        System.out.println("id::" +id);
        String allImage = header + image;

        String realPath = path+ "/" + i +".jsp";
        download(allImage, realPath,cookie);
        replaceTxtByStr(filePath,image,file2Base64(realPath));
        i ++;
    }

    return filePath;

}

3.转PDF

直接调用转的方法就好了

一个细节，字体库目前是windows，但是服务在linux上运行，这里需要处理

public class HtmlToPdf {

    public void htmlToPdf() throws Exception {
        String path ="D:\workspace\demo\src\main\resources\1656666705064\orderPage.html";
        String destPath = "D:\workspace\demo\src\main\resources\1656666705064\template.pdf";
        ConverterProperties converterProperties = new ConverterProperties();
        FontProvider dfp = new DefaultFontProvider();
//        //添加字体库
        dfp.addDirectory("C:/Windows/Fonts");
        converterProperties.setFontProvider(dfp);

        try (InputStream in = new FileInputStream((path)); OutputStream out = new FileOutputStream((destPath))){
            HtmlConverter.convertToPdf(in, out, converterProperties);

        }catch (Exception e){
            e.printStackTrace();
        }

    }

    public static void main(String[] args) throws Exception {
        new HtmlToPdf().htmlToPdf();
    }
}

4.整体代码的地址：

gitee.com/yi_linran/h…

论HTML转PDF的各种坑和心得

题记

1.先将HTML和内部的JS下载到本地

先访问主页面获取cookie

解析HTML，拿到其中所有img标签的属性值

将远程文件下载到本地

2.对HTML文件做内部处理

写好一个通用的替换文件内容的方法

图片转base64

将html中src的路径改为base64

3.转PDF

直接调用转的方法就好了

4.整体代码的地址：

猜你喜欢