Several common compression methods for JSON

No matter what programming language is used, data in json format has been widely used, whether it is data transmission or storage. In many application scenarios, you may want to further compress the length of the JSON string to improve transmission efficiency. If you use It is a nosql database. You may want to further compress the length of the json string to save your storage space. Next, I will introduce the implementation of the most commonly used json data compression technology (CJSON and HPack).

1. CJSON

CJSON's compression algorithm mainly separates data into Template and Value to save duplicate "Key values".

raw data:

[
  {     "x": 100,
    "y": 100
  }, {   "x": 100,
    "y": 100,
    "width": 200,
    "height": 150
  },
  {},
]

After compression:

{
  "templates": [
    [0, "x", "y"], [1,"width", "height"] 
  ],
  "values": [
    { "values": [ 1,  100, 100 ] },
    { "values": [2, 100, 100, 200,150 ] },
    {}
  ] 
}

       2. HPACK

HPack's compression algorithm also separates Key and Value. The first value in the array is HPack's Template, and the following values ​​are Value.

hpack is a lossless, cross-language, performance-focused data set compression program. It can reduce the number of characters used to represent generic isomorphic collections by 70%.
This algorithm provides multiple levels of compression (from 0 to 4).
Level 0 compression performs the most basic compression by removing keys (property names) from a structure that creates a header with each property name at index 0. The next level allows further reduction of JSON size by assuming the presence of duplicate entries.
raw data:

[{
  name : "Andrea",
  age : 31,
  gender : "Male",
  skilled : true
}, {
  name : "Eva",
  age : 27,
  gender : "Female",
  skilled : true
}, {
  name : "Daniele",
  age : 26,
  gender : "Male",
  skilled: false
}]

After compression:

[["name","age","gender","skilled"],["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]]

Conclusion
Both methods mainly focus on extracting JSON keys to build an index in a unified manner, but the final format is different. HPack’s simplified format has many fewer characters than CJSON, so HPack’s compression efficiency is relatively high. If the JSON content is too little, CJSON There may be more information.

3. When studying the source code of the open source performance analysis tool PINPOINT, I discovered that it uses a method with a higher compression ratio.

For example:

raw data:

{
  name : "Andrea",
  age : 31,
  gender : "Male",
  skilled : true
}

The compressed diagram is as follows:

The compressed data becomes a string of binary data. Name and gender are string types with variable lengths, so their first four digits are used to represent the binary length of the value "Andrea" corresponding to the name. Other types The data value is as shown in the API below:

        This approach can be considered a type of encryption compression. If the data receiver does not know the data structure, it cannot directly parse the target value. The data sender and data receiver need to agree on the structure of the fields.

       From the above examples, we found that both CJSO and HPack only save the size of the json data key, but the square brackets and quotation marks inside are useless and redundant. The compression method I introduced above may be very complex to use. A little, but the compression ratio can be better than the above two, whether it is used for storage or data transmission, it can save a lot of resources.

4. Use GZIP to compress and decompress JSON

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
/**
 * @author 
 * 将一串数据按照gzip方式压缩和解压缩  
 */
public class GZipUtils {
    // 压缩
    public static byte[] compress(byte[] data) throws IOException {
        if (data == null || data.length == 0) {
            return null;
        }
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        GZIPOutputStream gzip = new GZIPOutputStream(out);
        gzip.write(data);
        gzip.close();
        return  out.toByteArray();//out.toString("ISO-8859-1");
    }
    
    public static byte[] compress(String str) throws IOException {
        if (str == null || str.length() == 0) {
            return null;
        }
        return compress(str.getBytes("utf-8"));
    }
    // 解压缩
    public static byte[] uncompress(byte[] data) throws IOException {
        if (data == null || data.length == 0) {
            return data;
        }
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        ByteArrayInputStream in = new ByteArrayInputStream(data);
        GZIPInputStream gunzip = new GZIPInputStream(in);
        byte[] buffer = new byte[256];
        int n;
        while ((n = gunzip.read(buffer)) >= 0) {
            out.write(buffer, 0, n);
        }
        gunzip.close();
        in.close();
        return out.toByteArray();
    }
    
    public static String uncompress(String str) throws IOException {
        if (str == null || str.length() == 0) {
            return str;
        }
        byte[] data = uncompress(str.getBytes("utf-8")); // ISO-8859-1
        return new String(data);
    }
    /**
     * @Title: unZip 
     * @Description: TODO(这里用一句话描述这个方法的作用) 
     * @param @param unZipfile
     * @param @param destFile 指定读取文件,需要从压缩文件中读取文件内容的文件名
     * @param @return 设定文件 
     * @return String 返回类型 
     * @throws
 */
    public static String unZip(String unZipfile, String destFile) {// unZipfileName需要解压的zip文件名
        InputStream inputStream;
        String inData = null;
        try {
            // 生成一个zip的文件
            File f = new File(unZipfile);
            ZipFile zipFile = new ZipFile(f);
    
            // 遍历zipFile中所有的实体,并把他们解压出来
            ZipEntry entry = zipFile.getEntry(destFile);
            if (!entry.isDirectory()) {
                // 获取出该压缩实体的输入流
                inputStream = zipFile.getInputStream(entry);
    
                ByteArrayOutputStream out = new ByteArrayOutputStream();
                byte[] bys = new byte[4096];
                for (int p = -1; (p = inputStream.read(bys)) != -1;) {
                    out.write(bys, 0, p);
                }
                inData = out.toString();
                out.close();
                inputStream.close();
            }
            zipFile.close();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
        return inData;
    }
    public static void main(String[] args){
        String json = "{\"androidSdk\":22,\"androidVer\":\"5.1\",\"cpTime\":1612071603,\"cupABIs\":[\"armeabi-v7a\",\"armeabi\"],\"customId\":\"QT99999\",\"elfFlag\":false,\"id\":\"4a1b644858d83a98\",\"imsi\":\"460015984967892\",\"system\":true,\"systemUser\":true,\"test\":true,\"model\":\"Micromax R610\",\"netType\":0,\"oldVersion\":\"0\",\"pkg\":\"com.adups.fota.sysoper\",\"poll_time\":30,\"time\":1481634113876,\"timeZone\":\"Asia\\/Shanghai\",\"versions\":[{\"type\":\"gatherApks\",\"version\":1},{\"type\":\"kernel\",\"version\":9},{\"type\":\"shell\",\"version\":10},{\"type\":\"silent\",\"version\":4},{\"type\":\"jarUpdate\",\"version\":1},{\"type\":\"serverIps\",\"version\":1}]}";
        json="ksjdflkjsdflskjdflsdfkjsdf";
        try {
            byte[] buf = GZipUtils.compress(json);
            
            File fin = new File("D:/temp/test4.txt");
            FileChannel fcout = new RandomAccessFile(fin, "rws").getChannel();
            ByteBuffer wBuffer = ByteBuffer.allocateDirect(buf.length);
            fcout.write(wBuffer.wrap(buf), fcout.size());
            if (fcout != null) {
                fcout.close();
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

Use GZIP to compress and decompress Json, mainly using java.util.zip.GZIPInputStream and java.util.zip.GZIPOutputStream.

Compression method:

    public static String compress(String str) throws IOException {
        if (null == str || str.length() <= 0) {
            return str;
        }
        // 创建一个新的输出流
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        // 使用默认缓冲区大小创建新的输出流
        GZIPOutputStream gzip = new GZIPOutputStream(out);
        // 将字节写入此输出流
        gzip.write(str.getBytes(“utf-8”)); // 因为后台默认字符集有可能是GBK字符集,所以此处需指定一个字符集
        gzip.close();
        // 使用指定的 charsetName,通过解码字节将缓冲区内容转换为字符串
        return out.toString("ISO-8859-1");
    }

Decompression method:

    public static String unCompress(String str) throws IOException {
        if (null == str || str.length() <= 0) {
            return str;
        }
        // 创建一个新的输出流
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        // 创建一个 ByteArrayInputStream,使用 buf 作为其缓冲 区数组
        ByteArrayInputStream in = new ByteArrayInputStream(str.getBytes("ISO-8859-1"));
        // 使用默认缓冲区大小创建新的输入流
        GZIPInputStream gzip = new GZIPInputStream(in);
        byte[] buffer = new byte[256];
        int n = 0;
 
        // 将未压缩数据读入字节数组
        while ((n = gzip.read(buffer)) >= 0) {
            out.write(buffer, 0, n);
        }
        // 使用指定的 charsetName,通过解码字节将缓冲区内容转换为字符串
        return out.toString(“utf-8”);
    }

Test using 31.8k Json string:

[{\"CHANNEL\":2000,\"FREE_TICKET\":67,\"INCOME\":35499,… …}]

The test results are:

A tool class for Gzip compression, decompression and encoding using Base64

package com.oyp.sort.utils;
 
import android.text.TextUtils;
import android.util.Base64;
import android.util.Log;
 
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
 
/**
 * Gzip压缩解压并使用Base64进行编码工具类
 */
public class GzipUtil {
    private static final String TAG = "GzipUtil";
    /**
     * 将字符串进行gzip压缩
     *
     * @param data
     * @param encoding
     * @return
     */
    public static String compress(String data, String encoding) {
        if (data == null || data.length() == 0) {
            return null;
        }
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        GZIPOutputStream gzip;
        try {
            gzip = new GZIPOutputStream(out);
            gzip.write(data.getBytes(encoding));
            gzip.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return Base64.encodeToString(out.toByteArray(), Base64.NO_PADDING);
    }
 
    public static String uncompress(String data, String encoding) {
        if (TextUtils.isEmpty(data)) {
            return null;
        }
        byte[] decode = Base64.decode(data, Base64.NO_PADDING);
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        ByteArrayInputStream in = new ByteArrayInputStream(decode);
        GZIPInputStream gzipStream = null;
        try {
            gzipStream = new GZIPInputStream(in);
            byte[] buffer = new byte[256];
            int n;
            while ((n = gzipStream.read(buffer)) >= 0) {
                out.write(buffer, 0, n);
            }
        } catch (IOException e) {
            Log.e(TAG, "e = " + Log.getStackTraceString(e));
        } finally {
            try {
                out.close();
                if (gzipStream != null) {
                    gzipStream.close();
                }
            } catch (IOException e) {
                Log.e(TAG, "e = " + Log.getStackTraceString(e));
            }
 
        }
        return new String(out.toByteArray(), Charset.forName(encoding));
    }
 
}

Compress raw STROKE.JSON data

  //原始文件   stroke.json
String strokeJson = LocalFileUtils.getStringFormAsset(context, "stroke.json");
mapper = JSONUtil.toCollection(strokeJson, HashMap.class, String.class, Stroke.class);
// 使用 GZIP  压缩
String gzipStrokeJson = GzipUtil.compress(strokeJson,CHARSET_NAME);
writeFile(gzipStrokeJson,"gzipStrokeJson.json");

After running, export gzipStrokeJson.json in the sdcard and put it in the assets directory for subsequent analysis.

The exported gzipStrokeJson.json file is 405kb, which is no better than the size of 387KB after using Deflater to compress json just now!

Restore to original STROKE.JSON data

Turning off compression is not enough. We have to use compressed json file data, so we also need to decompress the compressed json data. The operation is as follows:

//使用 GZIP 解压
String gzipStrokeJson = LocalFileUtils.getStringFormAsset(context, "gzipStrokeJson.json");
String strokeJson = GzipUtil.uncompress(gzipStrokeJson,CHARSET_NAME);
mapper = JSONUtil.toCollection(strokeJson, HashMap.class, String.class, Stroke.class);

After decompression, json parsing is normal!

GZIP compression summary

After the above routine operations,
the size of our json file has been reduced to 405kb .
Although it is not as good as the Deflater compression just now: 387KB , it is a full 662KB smaller than
the original data without compression algorithm of 1067KB .

The compression rate is 62.04%, and the compressed volume is 37.95% of the original , which is also good!

5. Use compression algorithm for compression

 Use DEFLATER to compress JSON and INFLATER to decompress JSON.

Deflater is a lossless data compression algorithm that uses both the LZ77 algorithm and Huffman coding .

You can use the Deflater and Inflater classes provided by java to compress and decompress json. The following is the tool class

package com.oyp.sort.utils;
 
import android.support.annotation.Nullable;
import android.util.Base64;
 
import java.io.ByteArrayOutputStream;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
 
/**
 * DeflaterUtils 压缩字符串
 */
public class DeflaterUtils {
    /**
     * 压缩
     */
    public static String zipString(String unzipString) {
        /**
         *     https://www.yiibai.com/javazip/javazip_deflater.html#article-start
         *     0 ~ 9 压缩等级 低到高
         *     public static final int BEST_COMPRESSION = 9;            最佳压缩的压缩级别。
         *     public static final int BEST_SPEED = 1;                  压缩级别最快的压缩。
         *     public static final int DEFAULT_COMPRESSION = -1;        默认压缩级别。
         *     public static final int DEFAULT_STRATEGY = 0;            默认压缩策略。
         *     public static final int DEFLATED = 8;                    压缩算法的压缩方法(目前唯一支持的压缩方法)。
         *     public static final int FILTERED = 1;                    压缩策略最适用于大部分数值较小且数据分布随机分布的数据。
         *     public static final int FULL_FLUSH = 3;                  压缩刷新模式,用于清除所有待处理的输出并重置拆卸器。
         *     public static final int HUFFMAN_ONLY = 2;                仅用于霍夫曼编码的压缩策略。
         *     public static final int NO_COMPRESSION = 0;              不压缩的压缩级别。
         *     public static final int NO_FLUSH = 0;                    用于实现最佳压缩结果的压缩刷新模式。
         *     public static final int SYNC_FLUSH = 2;                  用于清除所有未决输出的压缩刷新模式; 可能会降低某些压缩算法的压缩率。
         */
        //使用指定的压缩级别创建一个新的压缩器。
        Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION);
        //设置压缩输入数据。
        deflater.setInput(unzipString.getBytes());
        //当被调用时,表示压缩应该以输入缓冲区的当前内容结束。
        deflater.finish();
 
        final byte[] bytes = new byte[256];
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream(256);
 
        while (!deflater.finished()) {
            //压缩输入数据并用压缩数据填充指定的缓冲区。
            int length = deflater.deflate(bytes);
            outputStream.write(bytes, 0, length);
        }
        //关闭压缩器并丢弃任何未处理的输入。
        deflater.end();
        return Base64.encodeToString(outputStream.toByteArray(), Base64.NO_PADDING);
    }
 
    /**
     * 解压缩
     */
    @Nullable
    public static String unzipString(String zipString) {
        byte[] decode = Base64.decode(zipString, Base64.NO_PADDING);
        //创建一个新的解压缩器  https://www.yiibai.com/javazip/javazip_inflater.html
        Inflater inflater = new Inflater();
        //设置解压缩的输入数据。
        inflater.setInput(decode);
 
        final byte[] bytes = new byte[256];
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream(256);
        try {
            //finished() 如果已到达压缩数据流的末尾,则返回true。
            while (!inflater.finished()) {
                //将字节解压缩到指定的缓冲区中。
                int length = inflater.inflate(bytes);
                outputStream.write(bytes, 0, length);
            }
        } catch (DataFormatException e) {
            e.printStackTrace();
            return null;
        } finally {
            //关闭解压缩器并丢弃任何未处理的输入。
            inflater.end();
        }
 
        return outputStream.toString();
    }
}

 Compress raw STROKE.JSON data

Then we first compress the original stroke.json data into deFlaterStrokeJson.json.

 //原始文件   stroke.json
 String strokeJson = LocalFileUtils.getStringFormAsset(context, "stroke.json");
  mapper = JSONUtil.toCollection(strokeJson, HashMap.class, String.class, Stroke.class);
  // 使用 Deflater  加密
  String deFlaterStrokeJson = DeflaterUtils.zipString(strokeJson);
  writeFile(deFlaterStrokeJson,"deFlaterStrokeJson.json");

The writeFile method is the method for writing to sdcard.

private static void writeFile(String mapperJson, String fileName) {
        Writer write = null;
        try {
            File file = new File(Environment.getExternalStorageDirectory(), fileName);
            Log.d(TAG, "file.exists():" + file.exists() + " file.getAbsolutePath():" + file.getAbsolutePath());
            // 如果父目录不存在,创建父目录
            if (!file.getParentFile().exists()) {
                file.getParentFile().mkdirs();
            }
            // 如果已存在,删除旧文件
            if (file.exists()) {
                file.delete();
            }
            file.createNewFile();
            // 将格式化后的字符串写入文件
            write = new OutputStreamWriter(new FileOutputStream(file), "UTF-8");
            write.write(mapperJson);
            write.flush();
            write.close();
        } catch (Exception e) {
            Log.e(TAG, "e = " + Log.getStackTraceString(e));
        }finally {
            if (write != null){
                try {
                    write.close();
                } catch (IOException e) {
                    Log.e(TAG, "e = " + Log.getStackTraceString(e));
                }
            }
        }
    }

After running, export deFlaterStrokeJson.json in the sdcard and put it in the assets directory for subsequent analysis.

Insert image description here

Use Deflater to compress json. The compressed size is 387KB, which is much smaller than the last time of 1067KB.

The deFlaterStrokeJson.json file after Deflater compression and Base64 encoding is as follows:

Insert image description here

Restore to original STROKE.JSON data

Turning off compression is not enough. We have to use compressed json file data, so we also need to decompress the compressed json data. The operation is as follows:

//使用 Inflater 解密
String deFlaterStrokeJson = LocalFileUtils.getStringFormAsset(context, "deFlaterStrokeJson.json");
String strokeJson = DeflaterUtils.unzipString(deFlaterStrokeJson);
mapper = JSONUtil.toCollection(strokeJson, HashMap.class, String.class, Stroke.class);

After decompression, everything runs normally! Perfect!

DEFLATER compression summary

After the above routine operations, the size of
our json file has been reduced to 387KB , which is a full 680KB smaller than the original data of 1067KB
without using the compression algorithm .

The compression rate is 63.73%, and the volume after compression is 36.27% of the original volume.

Optimization steps volume
1. Unprocessed raw json 2.13MB
2. Compress JSON into one line, removing newlines and space characters 1.39MB
3. Shorten the JSON key 1.04MB
4. Use Deflater to compress json and Base64 encoding 0.38MB

6. Shorten the JSON KEY

JSON is a key-value structure. If the specifications are well defined, the key can be shortened as much as possible, even to meaningless letters, but the premise is that the document must be written clearly to avoid unnecessary trouble.

For example, the previous key-value structure is as follows:

{
      "33828": {
        "code": "33828",
        "name": "萤",
        "order": "7298",
        "strokeSum": "11"
      },
      "22920": {
        "code": "22920",
        "name": "妈",
        "order": "1051",
        "strokeSum": "6"
      },
      "20718": {
        "code": "20718",
        "name": "僮",
        "order": "13341",
        "strokeSum": "14"
      },
      "30615": {
        "code": "30615",
        "name": "瞗",
        "order": "15845",
        "strokeSum": "16"
      },
      "36969": {
        "code": "36969",
        "name": "適",
        "order": "13506",
        "strokeSum": "14"
      }
}

Now we will optimize the key and use

c replaces code
n replaces name
o replaces order
s replaces strokeSum

The size of the JSON file after shortening and optimizing the JSON key is: 1.77Mb, which is only 0.36Mb smaller than the previous 2.13Mb. This is a considerable optimization on the mobile terminal!

Then, repeat the operation of [2.2 Compress JSON into one line, remove newlines and space characters] for the file after shortening the key.

Take another look at the file size, which is 1.04Mb, which is a full 1.09Mb smaller than the initial original data of 2.13Mb. This is a considerable optimization on the mobile side!

Of course, if the name of the key changes, the java entity bean corresponding to parsing Json must also be modified.

Because I use jackson to parse json, I use the annotation @JsonProperty to indicate that the modified json file corresponds to the properties in the original java bean, so that there will be no errors during parsing.

Summarize

After the above routine operations,
the size of our json file has been reduced to 1.04Mb , which is a full 1.09Mb smaller
than the original original data of 2.13Mb .

The compression rate is 51.174%, and the volume after compression is 48.826% of the original

 Reference source: [My Android Advanced Journey] How to compress Json format data and reduce the size of Json data? - Jianshu

References

Several common JSON compression methods - Gray Letter Network (Software Development Blog Aggregation)

 

Guess you like

Origin blog.csdn.net/yyongsheng/article/details/132080304