Cache space optimization practice

guide

Caching Redis is our most commonly used service. It is applicable to a wide range of scenarios and is widely used in various business scenarios. Because of this, cache has become an important source of hardware cost, and it is necessary for us to do some space optimization to reduce cost and improve performance.

Let's take our case to illustrate how to reduce the cache space by 70%.

scene setting

1. We need to store the POJO in the cache. The definition of this class is as follows

public class TestPOJO implements Serializable {
    private String testStatus;
    private String userPin;
    private String investor;
    private Date testQueryTime;
    private Date createTime;
    private String bizInfo;
    private Date otherTime;
    private BigDecimal userAmount;
    private BigDecimal userRate;
    private BigDecimal applyAmount;
    private String type;
    private String checkTime;
    private String preTestStatus;
    
    public Object[] toValueArray(){
        Object[] array = {testStatus, userPin, investor, testQueryTime,
                createTime, bizInfo, otherTime, userAmount,
                userRate, applyAmount, type, checkTime, preTestStatus};
        return array;
    }
    
    public CreditRecord fromValueArray(Object[] valueArray){         
        //具体的数据类型会丢失,需要做处理
    }
}

2. Use the following examples as test data

TestPOJO pojo = new TestPOJO();
pojo.setApplyAmount(new BigDecimal("200.11"));
pojo.setBizInfo("XX");
pojo.setUserAmount(new BigDecimal("1000.00"));
pojo.setTestStatus("SUCCESS");
pojo.setCheckTime("2023-02-02");
pojo.setInvestor("ABCD");
pojo.setUserRate(new BigDecimal("0.002"));
pojo.setTestQueryTime(new Date());
pojo.setOtherTime(new Date());
pojo.setPreTestStatus("PROCESSING");
pojo.setUserPin("ABCDEFGHIJ");
pojo.setType("Y");

conventional practice

System.out.println(JSON.toJSONString(pojo).length());

Use JSON to directly serialize and print  length=284 . This method is the simplest and most commonly used. The specific data is as follows:

{"applyAmount":200.11,"bizInfo":"XX","checkTime":"2023-02-02","investor":"ABCD","otherTime":"2023-04-10 17:45:17.717","preCheckStatus":"PROCESSING","testQueryTime":"2023-04-10 17:45:17.717","testStatus":"SUCCESS","type":"Y","userAmount":1000.00,"userPin":"ABCDEFGHIJ","userRate":0.002}

We found that the above contains a lot of useless data, where the attribute name is not necessary to store.

Improvement 1 - remove the attribute name

System.out.println(JSON.toJSONString(pojo.toValueArray()).length());

By choosing an array structure instead of an object structure, removing the attribute name, printing  length=144 , and reducing the data size by 50%, the specific data is as follows:

["SUCCESS","ABCDEFGHIJ","ABCD","2023-04-10 17:45:17.717",null,"XX","2023-04-10 17:45:17.717",1000.00,0.002,200.11,"Y","2023-02-02","PROCESSING"]

We found that there is no need to store null, and the time format is serialized as a string. Unreasonable serialization results lead to data expansion, so we should choose a better serialization tool.

Improvement 2 - use better serialization tools

//我们仍然选取JSON格式,但使用了第三方序列化工具
System.out.println(new ObjectMapper(new MessagePackFactory()).writeValueAsBytes(pojo.toValueArray()).length);

Choose a better serialization tool, realize field compression and reasonable data format, print  length=92, and the space is reduced by 40% compared with the previous step.

This is a piece of binary data. Redis needs to be operated in binary. After the binary is converted into a string, the printout is as follows:

��SUCCESS�ABCDEFGHIJ�ABCD��j�6���XX��j�6����?`bM����@i��Q�Y�2023-02-02�PROCESSING

Following this idea and digging deeper, we found that we can manually select the data type to achieve a more extreme optimization effect, and choose to use a smaller data type to achieve further improvement.

Improvement 3 - Optimizing data types

In the above use case, the three fields testStatus, preCheckStatus, and investor are actually enumeration string types. If you can use simpler data types (such as byte or int) instead of string, you can further save space. Among them, checkTime can be replaced by Long type, which will be output by the serialization tool with fewer bytes.

public Object[] toValueArray(){
    Object[] array = {toInt(testStatus), userPin, toInt(investor), testQueryTime,
    createTime, bizInfo, otherTime, userAmount,
    userRate, applyAmount, type, toLong(checkTime), toInt(preTestStatus)};
    return array;
}

After manual adjustment, use a smaller data type instead of the String type, print  length=69

Improvement 4 - Consider ZIP compression

In addition to the above points, you can also consider using ZIP compression to obtain a smaller volume. When the content is large or repetitive, the effect of ZIP compression is obvious. If the stored content is an array of TestPOJO, it may Suitable for ZIP compression.

But ZIP compression does not necessarily reduce the size, and may increase the size if it is less than 30 bytes. In the case of low repetitive content, no significant improvement can be obtained. And there is CPU overhead.

After the above optimization, ZIP compression is no longer a must, and it needs to be tested according to actual data to distinguish the compression effect of ZIP.

Finally landed

The above several improvement steps reflect the idea of ​​optimization, but the process of deserialization will lead to the loss of types, which is cumbersome to deal with, so we also need to consider the problem of deserialization.

When the cache object is predefined, we can manually process each field, so in actual combat, it is recommended to use manual serialization to achieve the above goals, achieve fine-grained control, achieve the best compression effect and the smallest performance overhead .

You can refer to the implementation code of msgpack below. The following is the test code. Please pack better tools such as Packer and UnPacker by yourself:

<dependency>    
    <groupId>org.msgpack</groupId>    
    <artifactId>msgpack-core</artifactId>    
    <version>0.9.3</version>
</dependency>
    public byte[] toByteArray() throws Exception {
        MessageBufferPacker packer = MessagePack.newDefaultBufferPacker();
        toByteArray(packer);
        packer.close();
        return packer.toByteArray();
    }

    public void toByteArray(MessageBufferPacker packer) throws Exception {
        if (testStatus == null) {
            packer.packNil();
        }else{
            packer.packString(testStatus);
        }

        if (userPin == null) {
            packer.packNil();
        }else{
            packer.packString(userPin);
        }

        if (investor == null) {
            packer.packNil();
        }else{
            packer.packString(investor);
        }

        if (testQueryTime == null) {
            packer.packNil();
        }else{
            packer.packLong(testQueryTime.getTime());
        }

        if (createTime == null) {
            packer.packNil();
        }else{
            packer.packLong(createTime.getTime());
        }

        if (bizInfo == null) {
            packer.packNil();
        }else{
            packer.packString(bizInfo);
        }

        if (otherTime == null) {
            packer.packNil();
        }else{
            packer.packLong(otherTime.getTime());
        }

        if (userAmount == null) {
            packer.packNil();
        }else{
            packer.packString(userAmount.toString());
        }

        if (userRate == null) {
            packer.packNil();
        }else{
            packer.packString(userRate.toString());
        }

        if (applyAmount == null) {
            packer.packNil();
        }else{
            packer.packString(applyAmount.toString());
        }

        if (type == null) {
            packer.packNil();
        }else{
            packer.packString(type);
        }

        if (checkTime == null) {
            packer.packNil();
        }else{
            packer.packString(checkTime);
        }

        if (preTestStatus == null) {
            packer.packNil();
        }else{
            packer.packString(preTestStatus);
        }
    }


    public void fromByteArray(byte[] byteArray) throws Exception {
        MessageUnpacker unpacker = MessagePack.newDefaultUnpacker(byteArray);
        fromByteArray(unpacker);
        unpacker.close();
    }

    public void fromByteArray(MessageUnpacker unpacker) throws Exception {
        if (!unpacker.tryUnpackNil()){
            this.setTestStatus(unpacker.unpackString());
        }
        if (!unpacker.tryUnpackNil()){
            this.setUserPin(unpacker.unpackString());
        }
        if (!unpacker.tryUnpackNil()){
            this.setInvestor(unpacker.unpackString());
        }
        if (!unpacker.tryUnpackNil()){
            this.setTestQueryTime(new Date(unpacker.unpackLong()));
        }
        if (!unpacker.tryUnpackNil()){
            this.setCreateTime(new Date(unpacker.unpackLong()));
        }
        if (!unpacker.tryUnpackNil()){
            this.setBizInfo(unpacker.unpackString());
        }
        if (!unpacker.tryUnpackNil()){
            this.setOtherTime(new Date(unpacker.unpackLong()));
        }
        if (!unpacker.tryUnpackNil()){
            this.setUserAmount(new BigDecimal(unpacker.unpackString()));
        }
        if (!unpacker.tryUnpackNil()){
            this.setUserRate(new BigDecimal(unpacker.unpackString()));
        }
        if (!unpacker.tryUnpackNil()){
            this.setApplyAmount(new BigDecimal(unpacker.unpackString()));
        }
        if (!unpacker.tryUnpackNil()){
            this.setType(unpacker.unpackString());
        }
        if (!unpacker.tryUnpackNil()){
            this.setCheckTime(unpacker.unpackString());
        }
        if (!unpacker.tryUnpackNil()){
            this.setPreTestStatus(unpacker.unpackString());
        }
    }

scene extension

Suppose we store data for 200 million users, each user contains 40 fields, the length of the field key is 6 bytes, and the fields are managed separately.

Under normal circumstances, we think of the hash structure, and the hash structure stores key information, which will occupy additional resources, and the field key is unnecessary data. According to the above ideas, you can use the list instead of the hash structure.

Tested by the official Redis tool, using the list structure requires 144G of space, while using the hash structure requires 245G of space (when more than 50% of the attributes are empty, you need to test whether it is still applicable)

In the above case, we have taken several very simple measures. Only a few lines of simple code can reduce the space by more than 70%. It is highly recommended in scenarios with large data volume and high performance requirements. :

• Use an array instead of an object (if a large number of fields are empty, you need to cooperate with the serialization tool to compress the null)

• Use better serialization tools

• use smaller data types

• Consider using ZIP compression

• Use list instead of hash structure (if a large number of fields are empty, test comparison is required)

 

Guess you like

Origin blog.csdn.net/APItesterCris/article/details/131164219