Heterogeneous computing helps customers to encode webp pictures during the Spring Festival

Abstract: GigaOM, a blog about background and challenges, has reported that when YouTube video thumbnails are in WebP format, the loading speed of web pages is increased by 10%; after Google's Chrome Web Store adopts WebP format pictures, it can save several terabytes of bandwidth every day, Average page load time was reduced by about 1/3; Google+ mobile app saved 50TB of data storage per day with WebP image format.

Background and Challenges

Technology blog GigaOM has reported that when YouTube video thumbnails are in WebP format, web page loading speed is increased by 10%; after Google's Chrome Web Store uses WebP format pictures, it can save several terabytes of bandwidth per day, and the average page Load time reduced by about 1/3; Google+ mobile app saves 50TB of data storage per day with WebP image format. However, the biggest disadvantage of Webp is that the computational complexity of the compression algorithm is more than 10 times that of JPEG. We urgently need a high-performance acceleration solution to reduce business costs. Project

During the Spring Festival this year, a major customer put forward webp transcoding requirements to Alibaba Cloud in order to support its red envelope grabbing business.

According to past experience, dozens of physical machines with 32 cores and 64 threads need to be prepared in total. In order to improve user experience and reduce its own costs, Alibaba Cloud uses FaaS (FPGA as a Service) F1 instances to accelerate webp coding. The FaaS team provides FPGA platform support, and the OSS team provides algorithm support. Thanks to the high-performance FPGA platform, we used 5 single-card FPGA cloud servers to carry 40% of the daily webp encoding traffic.

Effect

The sample used in this performance test is a 512x512 image, and all tests are performed on the Alibaba Cloud FaaS F1 instance. According to the requirements of the business side, we have made some obfuscation of some of the data values.

1) Delay

After using FPGA to accelerate webp encoding, the delay is reduced to 1/10 of the original.


2) Throughput

Each single-card F1 instance (8vcpu, 1 * ARRIA 10) can obtain about 2 times the throughput of a 32-core 64-thread physical machine, compared with a professional acceleration webp coding company in the industry (using the same F1 instance) ). We found that a company's FPGA-accelerated webp coding is very dependent on CPU, but the utilization rate is only 50-60%, which is very confusing.


3) Picture quality

The below is the psnr curve of the comparison software (blue line), OSS (red line), and a company (green line) under different quality. PSNR is calculated using ImageMagick's convert tool, and the larger the value, the better. The hardware acceleration algorithm provided by OSS almost completely overlaps with the software in terms of image quality. There is a big gap between the webp encoding accelerator provided by a company (the gap is between 0.1 and 0.5db).


4) Compression ratio The test frame of the image space is

also used, and the quality setting is the same. The value is the compression ratio of the original JPEG image, and the smaller the value, the better. After testing, we found that the compression ratios of software, OSS, and a company almost completely overlap, but the original echelon is still maintained, software > OSS > a company.


According to the above test results, Alibaba Cloud's OSS acceleration solution currently surpasses a certain company in all indicators in the webp compression scenario. Except for the slight lead in the compression rate, the other two indicators have very obvious advantages.

In the future

1) It is expected that E2E can improve the performance by 50% after the performance optimization is completed. In terms of compression rate, m6-level encoding will be used in the future, and its compression rate will be higher than the current compression rate.
2) The cost of a single FPGA board is much less than that of a server, so the key to reducing business costs is to increase the density of FPGAs. In the future, the webp accelerator will use F3 instances, the FPGA performance of a single chip has been increased by more than 2 times, and the FPGA chip density of a single server has also been doubled.

Original link: https://yq.aliyun.com/articles/561260?spm=a2c41.11181499.0.0

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326194064&siteId=291194637