The right posture for developing function computation - compressing large files with brotli

large file problem

Function Compute limits the uploaded zip code package size to 50M. In some scenarios, the code package will exceed this limit, such as uncropped serverless-chrome , similar to libreoffice, and common machine learning training model files. There are currently three ways to solve the problem of large files

  1. Use an algorithm with a higher compression ratio, such as the brotli algorithm introduced in this article
  2. Download with OSS runtime
  3. Using NAS File Sharing

Simply compare the pros and cons of these three methods

method advantage shortcoming
high density compression Simple to publish and quickest to start Uploading the code package is slow; you need to write the decompression code; the size is limited to no more than 50 M
US After downloading and decompressing, the file should not exceed 512 M It needs to be uploaded to OSS in advance; to write the download and decompression code, the download speed is about 50M/s
IN THE No file size limit, no compression required It needs to be uploaded to the NAS in advance; the VPC environment has a cold start delay (~5s)

Under normal circumstances, if the code package can be controlled below 50M, the startup will be faster. Moreover, the engineering is relatively simple, the data and code are put together, and there is no need to write additional scripts to update the OSS or NAS synchronously.

compression algorithm

Brotli is an open source compression algorithm developed by Google engineers. It is currently supported by newer mainstream browsers as a compression algorithm for HTTP transmission. Below is a benchmark of Brotli and other common compression algorithms found online.

From the above three figures, we can see that compared to gzip, xz and bz2, brotli has the highest compression ratio, the decompression speed close to gzip, and the slowest compression speed.

However, in our scenario, it is not sensitive to the disadvantage of slow compression, and the compression task only needs to be executed once in the stage of developing and preparing the material.

make compressed file

Let me first introduce how to create a compressed file. The code and use cases below are from the project packed-selenium-java-example .

install brotli command

Mac users

brew install brotli

Windows users can go to this interface to download, https://github.com/google/brotli/releases

pack and compress

The size of the first two files are 7.5M and 97M respectively

╭─ ~/D/test1[◷ 18:15:21]
╰─  ll
total 213840
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

Packed and compressed using GZip, the size is 44 M.

╭─ ~/D/test1[◷ 18:15:33]
╰─  tar -czvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:16:41]
╰─  ll
total 306216
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rw-r--r--  1 vangie  staff    44M  3  6 18:16 chromedriver.tar
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

tar remove the z option and pack it again, the size is 104M

╭─ ~/D/test1[◷ 18:16:42]
╰─  tar -cvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:17:06]
╰─  ll
total 443232
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rw-r--r--  1 vangie  staff   104M  3  6 18:17 chromedriver.tar
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

The compressed size is 33M, which is much smaller than Gzip's 44M. The time-consuming is also very touching 6 minutes and 18 seconds, Gzip only 5 seconds.

╭─ ~/D/test1[◷ 18:17:08]
╰─  time brotli -q 11 -j -f chromedriver.tar
brotli -q 11 -j -f chromedriver.tar  375.39s user 1.66s system 99% cpu 6:18.21 total
╭─ ~/D/test1[◷ 18:24:23]
╰─  ll
total 281552
-rwxr-xr-x  1 vangie  staff   7.5M  3  5 11:13 chromedriver
-rw-r--r--  1 vangie  staff    33M  3  6 18:17 chromedriver.tar.br
-rwxr-xr-x  1 vangie  staff    97M  1 25  2018 headless-chromium

Unzip at runtime

The following is an example of a java maven project

Add unzip dependencies

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.18</version>
</dependency>

<dependency>
    <groupId>org.brotli</groupId>
    <artifactId>dec</artifactId>
    <version>0.1.2</version>
</dependency>

commons-compressIt is a decompression toolkit provided by apache. It provides a consistent abstract interface for various compression algorithms. Among them, only decompression is supported for the brotli algorithm, which is enough here. org.brotli:decThe package is the underlying implementation of the brotli decompression algorithm provided by Google.

Implement the initialize method

public class ChromeDemo implements  FunctionInitializer {

    public void initialize(Context context) throws IOException {

        Instant start = Instant.now();

        try (TarArchiveInputStream in =
                     new TarArchiveInputStream(
                             new BrotliCompressorInputStream(
                                     new BufferedInputStream(
                                             new FileInputStream("chromedriver.tar.br"))))) {

            TarArchiveEntry entry;
            while ((entry = in.getNextTarEntry()) != null) {
                if (entry.isDirectory()) {
                    continue;
                }
                File file = new File("/tmp/bin", entry.getName());
                File parent = file.getParentFile();
                if (!parent.exists()) {
                    parent.mkdirs();
                }

                System.out.println("extract file to " + file.getAbsolutePath());

                try (FileOutputStream out = new FileOutputStream(file)) {
                    IOUtils.copy(in, out);
                }

                Files.setPosixFilePermissions(file.getCanonicalFile().toPath(),
                        getPosixFilePermission(entry.getMode()));
            }
        }

        Instant finish = Instant.now();
        long timeElapsed = Duration.between(start, finish).toMillis();

        System.out.println("Extract binary elapsed: " + timeElapsed + "ms");


    }
}

A method that implements the FunctionInitializerinterface initialize. At the beginning of the decompression process, there are four layers of nested streams, and the functions are as follows:

  1. FileInputStreamread file
  2. BufferedInputStreamProvide cache, introduce context switching brought by system calls, and prompt reading speed
  3. BrotliCompressorInputStreamDecode the byte stream
  4. TarArchiveInputStreamExtract the files in the tar package one by one

Then Files.setPosixFilePermissionsthe role is to restore the permissions of the files in the tar package. The code is too long and omitted here, see packed-selenium-java-example

Instant start = Instant.now();
...

Instant finish = Instant.now();
long timeElapsed = Duration.between(start, finish).toMillis();

System.out.println("Extract binary elapsed: " + timeElapsed + "ms");

The above code segment will print out the decompression time, and the actual execution time is about 3.7 s.

Finally don't forget template.ymlto configure InitializerandInitializationTimeout

reference reading

  1. https://www.opencpu.org/posts/brotli-benchmarks/
  2. https://github.com/vangie/packed-selenium-java-example

join us

team introduction

Alibaba Cloud Function Service is a brand-new computing service that supports the event-driven programming model. It helps users focus on their own business logic, build applications in a serverless manner, and quickly implement low-cost, scalable, and high-availability systems without considering the management of underlying infrastructure such as servers. Users can quickly create prototypes, and the same architecture can scale smoothly with business scale. Make computing more efficient, economical, resilient, and reliable. Both small startups and large corporations benefit from it. Our team is rapidly expanding and we are looking for talent. We're looking for teammates who have solid fundamentals. Not only can you read papers to track industry trends, but you can also quickly code to solve practical problems. Rigorous, systematic thinking skills. It can not only consider business opportunities, system architecture, operation and maintenance costs and many other factors as a whole, but also control the complete process of design/development/testing/release, predict and control risks. Driven by curiosity and a sense of purpose. Happy to explore the unknown, not only a dreamer, but also a practitioner. Tenacity, optimism and confidence. Seeing opportunities amid pressure and difficulty makes work fun! If you are passionate about cloud computing and want to build an influential computing platform and ecosystem, please join us and realize your dreams with us!

description of job

Build a next-generation serverless computing platform, including:

  1. Design and implement a complete and scalable front-end system, including authentication/authority management, metadata management, traffic control, metering and billing, log monitoring, and more
  2. Design and implement flexible and reliable back-end systems, including resource scheduling, load balancing, fault-tolerant processing, etc.
  3. Rich and easy-to-use SDK/Tools/CLI/console
  4. Driven by user needs, track industry trends, and use technology to drive business growth

Job Requirements

  1. Solid basic knowledge of algorithms/data structures/operating systems, excellent logical thinking ability.
  2. Master at least one programming language. For example Java/Go/C/C#/C++.
  3. Experience in developing large-scale, high-availability distributed systems is preferred.
  4. Experience in Web/Mobile Backends/Microservice development is preferred.
  5. Good communication skills and teamwork spirit, with certain organizational and coordination skills.
  6. Bachelor degree or above
  7. More than 3 years of work experience, students who have passed the "Alibaba Coding Standards" certification are preferred, and the certification address: https://edu.aliyun.com/certification/cldt02

CV submission

yixian.dw AT alibaba-inc.com

" Alibaba's cloud-native technology circle focuses on microservices, serverless, containers, service mesh and other technical fields, focuses on cloud-native popular technology trends, and cloud-native large-scale implementation practices, and is a technology circle that understands cloud-native developers best."

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324079636&siteId=291194637