GeoServer series-geojson saves mongodb garbled characters

foreword

Based on the previous article GeoServer series - publishing geojson data through mongodb , common geographic files can be unified into geojson and saved to mongodb in business, which is convenient for unified maintenance and release of geoserver. This article will solve the problem of Chinese garbled characters in mongodb.

1, Solutions

  • Because a lot of spatial data is uploaded in a compressed package, it may be very large to convert to a geojson file after decompression (my 400+MB gdb compressed package converted to geojson becomes 2.2GB), so the method of reading line by line is used to reduce memory consumption
  • The io operation is involved many times during the file upload process: save slices -> merge slices -> format conversion -> coordinate conversion -> save as geojson, there is no need to set the encoding method in this process, because the last step is in geojson format Storage, so only need to solve geojson garbled characters
  • The core of the file garbled problem is that the reading method is inconsistent with the encoding method of the file itself. For example, the user file is encoded in GBK, gdal uses ISO8859-1 to read the file, and the geojson attribute is read using utf-8, which is bound to be garbled, so The core problem is to find the original encoding method of the file, how it is encoded and what encoding we use to read it

2. Read geojson line by line and save mongodb

    /**
     * 保存geojson数据到mongodb
     *
     * @param filePath geojson全路径
     * @param collect  集合名 = 文件md5标识
     * @return 属性列表
     */
    public List<String> geojsonToMongodb(String filePath, String collect) throws IOException {
        List<String> titleList = new ArrayList<>();
        // 1. 读取 GeoJSON 文件,判断文件的编码方式确定读取编码
        String chart = ReadUtil.detectCharset(new File(filePath)).name();
        if("ISO-8859-1".equals(chart)){chart="GB2312";}
        FileInputStream inputStream = new FileInputStream(filePath);
        InputStreamReader inputStreamReader = null;
        inputStreamReader = new InputStreamReader(inputStream, Charset.forName(chart));
        //2. 连接 MongoDB ,如果集合已存在先清空
        mongoTemplate2.dropCollection(collect);
        mongoTemplate2.createCollection(collect);
        //3 创建2dsphere索引
        GeospatialIndex index = new GeospatialIndex("geometry");
        index.typed(GeoSpatialIndexType.GEO_2DSPHERE);
        mongoTemplate2.indexOps(collect).ensureIndex(index);
        //4逐行读取feature,并写入Mongodb
        BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
        String line;
        int successNum = 0;
        int failNum = 0;
        while ((line = bufferedReader.readLine()) != null) {
            // 判断当前行是否为一个 feature
            if (line.trim().startsWith("{") && (line.trim().endsWith("},") || line.trim().endsWith("}"))) {
                // 将当前行转换为 JSONObject
                JSONObject feature = new JSONObject(line);
                //自定义属性追加上feature唯一ID(用于地图联动)和集合类型
                int pid = (int) feature.get("id");
                String type = feature.getJSONObject("geometry").get("type") + "";
                feature.getJSONObject("properties").put("id", pid).set("type", type);
                Document document = Document.parse(feature.toString());
                try {
                    mongoTemplate2.insert(document, collect);
                    successNum++;
                } catch (Exception e) {
                    //忽略某些异常的Feature
                    //log.error("mongodb插入geojson数据失败:{}", e.getCause().getMessage());
                    failNum++;
                }
                //保存属性数据(仅第一列即可)
                if (successNum == 1) {
                    JSONObject jsonObject = feature.getJSONObject("properties");
                    Set<String> sIterator = jsonObject.keySet();
                    for (String key:sIterator) {
                        titleList.add(key);
                    }
                }
            }
        }
        log.warn("插入geojson数据{},成功{}条,失败{}条", collect, successNum, failNum);
        bufferedReader.close();
        inputStreamReader.close();
        return titleList;
    }

Some character encodings have an inclusion relationship, such as
GB2312 is a subset of ISO-8859-1, all characters in GB2312 exist in ISO-8859-1, but there are some characters in ISO-8859-1 that do not exist in GB2312 , this case requires a special

3. Obtain the encoding method of the file

import com.ibm.icu.text.CharsetDetector;
import com.ibm.icu.text.CharsetMatch;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.charset.Charset;

public static Charset detectCharset(File file) throws IOException {
    CharsetDetector detector = new CharsetDetector();
    byte[] buffer = new byte[4096];
    try (FileInputStream input = new FileInputStream(file)) {
        int nread;
        while ((nread = input.read(buffer)) != -1) {
            detector.setText(buffer);
            CharsetMatch match = detector.detect();
            if (match != null) {
                return Charset.forName(match.getName());
            }
        }
    }
    return Charset.defaultCharset();
}
    <dependency>
        <groupId>com.ibm.icu</groupId>
        <artifactId>icu4j</artifactId>
        <version>72.1</version>
    </dependency>

4. View all encoding methods

    @Test
    public void test(){
        Map<String, Charset> charsets = Charset.availableCharsets();
        for (Map.Entry<String, Charset> entry : charsets.entrySet()) {
            String name = entry.getKey();
            Charset charset = entry.getValue();
            System.out.println(name + ": " + charset.displayName() + ", " + charset.aliases());
        }
    }

Guess you like

Origin blog.csdn.net/u012796085/article/details/129954298