How to design a codec that is 10 times faster than JSON? YoMo Codec - Performance Benchmark Report for Y3

yomo-y3-stress-testing

YoMo

YoMo is a set of open source real-time edge computing gateway, development framework and microservice platform. The communication layer is based on the QUIC protocol ( updated to Draft-31 version on 2020-09-25 ), which better releases the next-generation low-latency networks such as 5G the value of. The codec yomo-codec designed for Streaming Computing can greatly improve the throughput of computing services; based on the plug-in development model, your IoT real-time edge computing processing system can be launched in 5 minutes. YoMo has been deployed in the industrial Internet field.

Official website: https://yomo.run

Introduction to YoMo Codec

yomo-codec-golang is a SPEC description of YoMo Codec implemented in golang language ; it provides the ability to encode and decode basic data types, and provides YoMo with encoding and decoding tools that support its message processing. You can extend it to handle more data types, and even extend and apply it to other frameworks that require codecs.TLV结构

Project introduction: README.md

Why do you need YoMo-Codec?

As we all know, in HTTP communication, we often use JSON as the message codec, because it is simple in format, easy to read and write, and supports multiple languages, so it is very popular in Internet applications, so why do we need to develop YoMo by ourselves Codec to support YoMo applications?

  • YoMo streams messages and extracts monitored key-value pairs for business logic processing. If JSON is used for encoding and decoding, it is required to wait for the complete data packet to be received before deserializing the data packet into an object, and then extracting the corresponding key-value value; but for YoMo Codec, by describing the object data as a group TLV结构, When decoding a data packet, it is possible to know Twhether the current key is the monitored key earlier in the decoding process , so as to determine whether to skip directly to the next group TLV结构, without the need for redundant data packets that are not monitored. decoding operation, thereby improving the decoding efficiency.
  • The decoding of JSON usually uses a lot of reflection, which will affect its performance. However, because YoMo Codec only decodes the key-value that is actually monitored, the use of actual reflection will be greatly reduced.
  • In the industrial Internet or network applications with strict computing resource requirements, less CPU resources are required for the same encoding and decoding operations, so that limited computing resources can be more fully utilized.

This performance test is to verify that YoMo Codec has higher data decoding performance and less resource consumption than JSON, thereby providing YoMo with more real-time, efficient, and low-loss message processing capabilities.

test introduction

1. Test method

  • Benchmarking through Benchmark provides both serial and parallel modes, the latter in order to see the performance under the full utilization of CPU resources.

  • The data package to be tested is generated by the program, and it is guaranteed that the value of the key-value pair contained in the data used by the Codec and JSON tests is exactly the same.

  • The data of the key-value pairs contained in the tested data are divided into 3 pairs , 16 pairs , 32 pairs , and 63 pairs. The value is the middle value of its quantity, for example: K08 means to monitor the value of the 8th key. In this way, the following dimensions are obtained, which are then expressed in the graph of the test results.

    Symbolic representation Number of Key-values The key position to be monitored
    C63-K32 A total of 63 pairs of key-value Listen to extract the value of the 32nd key of the key
    C32-K16 A total of 32 pairs of key-value Listen to extract the value of the 16th digit key
    C16-K08 A total of 16 pairs of key-value Listen to extract the value of the 08th key of the key
    C03-K02 A total of 03 pairs of key-value Listen to extract the value of the key in the 02nd digit
  • The results of the test include:

    • A performance comparison of the operation of decoding and extracting the value corresponding to the monitored key from the data packet.
    • Compare its CPU time in the same decoded extracted scene.

2. Data structure

  • Y3 test data

    0x80
        0x01 value
        ....
        0x3f value 
    
  • Structure of JSON test data

    {
        "k1": value,
        ...
        "k63" value
    }
    

3. Data processing logic

TakeValueFromXXX

4. Test project

  • The code for this test report is available from the yomo-y3-stress-testing project.

  • Main code structure description (only list the file descriptions directly related to this test):

catalog

5. Test environment

  • Hardware environment:
    • CPU:2.6 GHz 6P intel Core i7,GOMAXPROCS=12
    • RAM: 32GB
    • Hard Disk: SSD
  • Software Environment:

Benchmark test

1. Serial test process

  • Tested code: ./internal/decoder/report_serial/report_benchmark_test.go, such as:

    // 针对YoMo Codec Y3进行基准测试
    func Benchmark_Codec_C63_K32(b *testing.B) {
    	var key byte = 0x20
    	data := generator.NewCodecTestData().GenDataBy(63)
    	b.ResetTimer()
    	for i := 0; i < b.N; i++ {
    		if decoder.TakeValueFromCodec(key, data) == nil {
    			panic(errors.New("take is failure"))
    		}
    	}
    }
    
    // 针对JSON进行基准测试
    func Benchmark_Json_C63_K32(b *testing.B) {
    	key := "k32"
    	data := generator.NewJsonTestData().GenDataBy(63)
    	data = append(data, decoder.TokenEnd)
    	b.ResetTimer()
    	for i := 0; i < b.N; i++ {
    		if decoder.TakeValueFromJson(key, data) == nil {
    			panic(errors.New("take is failure"))
    		}
    	}
    }
    
    • Benchmark_Codec_C63_K32: Indicates that the data value of the 32nd key is extracted from the data set whose key-value is 63 groups, and a serial benchmark test is performed on this.
    • Default: GOMAXPROCS=12
  • Start the test script:./internal/decoder/report_serial/report_benchmark_test.sh

    temp_file="../../../docs/temp.out"
    report_file="../../../docs/report.out"
    go test -bench=. -benchtime=3s -benchmem -run=none | grep Benchmark > ${temp_file} \
      && echo 'finished bench' \
      && cat ${temp_file} \
      && cat ${temp_file} | awk '{print $1,$3}' | awk -F "_" '{print $2,$3"-"substr($4,1,3),substr($4,7)}' | awk -v OFS=, '{print $1,$2,$3}' > ${report_file} \
      && echo 'finished analyse' \
      && cat ${report_file}
    

    The test result set is generated and saved to a file by running the benchmark on the report_benchmark_test.go test file ./docs/report.out.

  • Generate the resulting graph:./docs/report_graphics.ipynb

    python --version # Python version > 3.2.x
    pip install runipy
    bar_ylim=70000 barh_xlim=20 runipy ./report_graphics.ipynb
    

2. Parallel testing process

In order to maximize the utilization of the CPU, observe the performance of the decoder in the multi-core scenario, and add the Parallel test item

  • Tested code: ./internal/decoder/report_parallel/report_benchmark_test.go, such as:

    func Benchmark_Codec_C63_K32(b *testing.B) {
    	var key byte = 0x20
    	data := generator.NewCodecTestData().GenDataBy(63)
    	b.ResetTimer()
    	b.RunParallel(func(pb *testing.PB) {
    		for pb.Next(){
    			if decoder.TakeValueFromCodec(key, data) == nil {
    				panic(errors.New("take is failure"))
    			}
    		}
    	})
    }
    
    • The code is the same as the main body of the serial, the difference is the use of RunParallel for parallel testing
    • Default: GOMAXPROCS=12
  • Start test script: ./internal/decoder/report_parallel/report_benchmark_test.shGenerate test result set and save to ./docs/report.outfile.

  • Generate the resulting graph:

    bar_ylim=18000 barh_xlim=25 runipy ./report_graphics.ipynb
    

3. Test results

  • Serial Benchmark test results:

    • Time-consuming comparison of single decoding extraction: Figure 3.1

    report1_serial

    • Ratio of Y3 to JSON time-consuming growth: Figure 3.2

      report2_serial

    • Chart Description:
      • The coordinates in Figure 3.1: C63-K32, indicating that the data packet contains 63 pairs of key-values, and the same 32nd key is monitored to extract its value.
      • The Y coordinate of Figure 3.1: Indicates the number of nanoseconds that a single operation takes.
      • The X coordinate of Figure 3.2: indicates the increase in (JSON decoding time/Y3 decoding time). Such as: 43010/2077=20.07
  • Parallel Benchmark test results:

    • Time-consuming comparison of single decoding extraction: Figure 3.3

      report1_parallel

    • Ratio of Y3 to JSON time-consuming growth: Figure 3.4

      report2_parallel

4. Test Analysis

The above test results can be seen:

  • The decoding performance of Y3 is greatly improved than that of JSON. The more key-value pairs contained in the data packet, the more obvious the performance improvement, with an average increase of 10 times. (20.7+15.8+6.2+3.3)/4=11.5

  • Using multi-core for parallel decoding, the performance of ns/op is also greatly improved. There is a 3x improvement in parallel and serial comparison:

    C63-K32 C32-K16 C16-K08 C03-K02
    Serial test 2077 1361 1667 610
    Parallel testing 706 505 515 175
    increase 290% 260% 320% 350%

CPU resource analysis

1. Test process

  • Tested code:./cpu/cpu_pprof.go

    func main() {
    	dataCodec := generator.NewCodecTestData().GenDataBy(63)
    	dataJson := generator.NewJsonTestData().GenDataBy(63)
    	dataJson = append(dataJson, decoder.TokenEnd)
    
    	// pprof
    	fmt.Printf("start pprof\n")
    	go pprof.Run()
    	time.Sleep(5 * time.Second)
    
    	fmt.Printf("start testing...\n")
    	for {
    		if decoder.TakeValueFromCodec(0x20, dataCodec) == nil {
    			panic(errors.New("take is failure"))
    		}
    		if decoder.TakeValueFromJson("k32", dataJson) == nil {
    			panic(errors.New("take is failure"))
    		}
    	}
    }
    
    • pprof.Run(): used to start pprof
  • The program continuously decodes Y3 and JSON in a loop, and observes the resource ratio of its CPU by observing the sampling map of the cpu profile

  • Run the test:

    # 运行被观察代码,pprof默认启动6060端口
    go run ./cpu_pprof.go
    # 进行取样,通过8081端口观察分析图
    go tool pprof -http=":8081" http://localhost:6060/debug/cpu/profile
    

2. Test results

cpu

3. Test analysis

As can be seen from the above figure, YoMo Codec Y3 has to decode much less CPU resources than JSON, and the difference is more than 10 times ( 0.73/0.07=10.4 ). This observation can correspond to Benchmark, which has low CPU resource usage and at the same time The decoding speed is also improved qualitatively.

Test conclusion

Compared with JSON, Y3 has an order of magnitude improvement in decoding performance. The more keys in the data package, the more obvious the performance improvement. At the same time, Y3's CPU resource usage is also reduced by an order of magnitude; this performance test can verify that YoMo Codec Y3's performance The decoding capability can provide real-time, efficient, and low-loss message processing capabilities for YoMo or other scenarios that require high-performance decoding.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324134770&siteId=291194637