large file problem
Function Compute limits the uploaded zip code package size to 50M. In some scenarios, the code package will exceed this limit, such as uncropped serverless-chrome , similar to libreoffice, and common machine learning training model files. There are currently three ways to solve the problem of large files
- Use an algorithm with a higher compression ratio, such as the brotli algorithm introduced in this article
- Download with OSS runtime
- Using NAS File Sharing
Simply compare the pros and cons of these three methods
method | advantage | shortcoming |
---|---|---|
high density compression | Simple to publish and quickest to start | Uploading the code package is slow; you need to write the decompression code; the size is limited to no more than 50 M |
US | After downloading and decompressing, the file should not exceed 512 M | It needs to be uploaded to OSS in advance; to write the download and decompression code, the download speed is about 50M/s |
IN THE | No file size limit, no compression required | It needs to be uploaded to the NAS in advance; the VPC environment has a cold start delay (~5s) |
Under normal circumstances, if the code package can be controlled below 50M, the startup will be faster. Moreover, the engineering is relatively simple, the data and code are put together, and there is no need to write additional scripts to update the OSS or NAS synchronously.
compression algorithm
Brotli is an open source compression algorithm developed by Google engineers. It is currently supported by newer mainstream browsers as a compression algorithm for HTTP transmission. Below is a benchmark of Brotli and other common compression algorithms found online.
From the above three figures, we can see that compared to gzip, xz and bz2, brotli has the highest compression ratio, the decompression speed close to gzip, and the slowest compression speed.
However, in our scenario, it is not sensitive to the disadvantage of slow compression, and the compression task only needs to be executed once in the stage of developing and preparing the material.
make compressed file
Let me first introduce how to create a compressed file. The code and use cases below are from the project packed-selenium-java-example .
install brotli command
Mac users
brew install brotli
Windows users can go to this interface to download, https://github.com/google/brotli/releases
pack and compress
The size of the first two files are 7.5M and 97M respectively
╭─ ~/D/test1[◷ 18:15:21]
╰─ ll
total 213840
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
Packed and compressed using GZip, the size is 44 M.
╭─ ~/D/test1[◷ 18:15:33]
╰─ tar -czvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:16:41]
╰─ ll
total 306216
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rw-r--r-- 1 vangie staff 44M 3 6 18:16 chromedriver.tar
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
tar remove the z option and pack it again, the size is 104M
╭─ ~/D/test1[◷ 18:16:42]
╰─ tar -cvf chromedriver.tar chromedriver headless-chromium
a chromedriver
a headless-chromium
╭─ ~/D/test1[◷ 18:17:06]
╰─ ll
total 443232
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rw-r--r-- 1 vangie staff 104M 3 6 18:17 chromedriver.tar
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
The compressed size is 33M, which is much smaller than Gzip's 44M. The time-consuming is also very touching 6 minutes and 18 seconds, Gzip only 5 seconds.
╭─ ~/D/test1[◷ 18:17:08]
╰─ time brotli -q 11 -j -f chromedriver.tar
brotli -q 11 -j -f chromedriver.tar 375.39s user 1.66s system 99% cpu 6:18.21 total
╭─ ~/D/test1[◷ 18:24:23]
╰─ ll
total 281552
-rwxr-xr-x 1 vangie staff 7.5M 3 5 11:13 chromedriver
-rw-r--r-- 1 vangie staff 33M 3 6 18:17 chromedriver.tar.br
-rwxr-xr-x 1 vangie staff 97M 1 25 2018 headless-chromium
Unzip at runtime
The following is an example of a java maven project
Add unzip dependencies
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>
<dependency>
<groupId>org.brotli</groupId>
<artifactId>dec</artifactId>
<version>0.1.2</version>
</dependency>
commons-compress
It is a decompression toolkit provided by apache. It provides a consistent abstract interface for various compression algorithms. Among them, only decompression is supported for the brotli algorithm, which is enough here. org.brotli:dec
The package is the underlying implementation of the brotli decompression algorithm provided by Google.
Implement the initialize method
public class ChromeDemo implements FunctionInitializer {
public void initialize(Context context) throws IOException {
Instant start = Instant.now();
try (TarArchiveInputStream in =
new TarArchiveInputStream(
new BrotliCompressorInputStream(
new BufferedInputStream(
new FileInputStream("chromedriver.tar.br"))))) {
TarArchiveEntry entry;
while ((entry = in.getNextTarEntry()) != null) {
if (entry.isDirectory()) {
continue;
}
File file = new File("/tmp/bin", entry.getName());
File parent = file.getParentFile();
if (!parent.exists()) {
parent.mkdirs();
}
System.out.println("extract file to " + file.getAbsolutePath());
try (FileOutputStream out = new FileOutputStream(file)) {
IOUtils.copy(in, out);
}
Files.setPosixFilePermissions(file.getCanonicalFile().toPath(),
getPosixFilePermission(entry.getMode()));
}
}
Instant finish = Instant.now();
long timeElapsed = Duration.between(start, finish).toMillis();
System.out.println("Extract binary elapsed: " + timeElapsed + "ms");
}
}
A method that implements the FunctionInitializer
interface initialize
. At the beginning of the decompression process, there are four layers of nested streams, and the functions are as follows:
FileInputStream
read fileBufferedInputStream
Provide cache, introduce context switching brought by system calls, and prompt reading speedBrotliCompressorInputStream
Decode the byte streamTarArchiveInputStream
Extract the files in the tar package one by one
Then Files.setPosixFilePermissions
the role is to restore the permissions of the files in the tar package. The code is too long and omitted here, see packed-selenium-java-example
Instant start = Instant.now();
...
Instant finish = Instant.now();
long timeElapsed = Duration.between(start, finish).toMillis();
System.out.println("Extract binary elapsed: " + timeElapsed + "ms");
The above code segment will print out the decompression time, and the actual execution time is about 3.7 s.
Finally don't forget template.yml
to configure Initializer
andInitializationTimeout
reference reading
- https://www.opencpu.org/posts/brotli-benchmarks/
- https://github.com/vangie/packed-selenium-java-example
join us
team introduction
Alibaba Cloud Function Service is a brand-new computing service that supports the event-driven programming model. It helps users focus on their own business logic, build applications in a serverless manner, and quickly implement low-cost, scalable, and high-availability systems without considering the management of underlying infrastructure such as servers. Users can quickly create prototypes, and the same architecture can scale smoothly with business scale. Make computing more efficient, economical, resilient, and reliable. Both small startups and large corporations benefit from it. Our team is rapidly expanding and we are looking for talent. We're looking for teammates who have solid fundamentals. Not only can you read papers to track industry trends, but you can also quickly code to solve practical problems. Rigorous, systematic thinking skills. It can not only consider business opportunities, system architecture, operation and maintenance costs and many other factors as a whole, but also control the complete process of design/development/testing/release, predict and control risks. Driven by curiosity and a sense of purpose. Happy to explore the unknown, not only a dreamer, but also a practitioner. Tenacity, optimism and confidence. Seeing opportunities amid pressure and difficulty makes work fun! If you are passionate about cloud computing and want to build an influential computing platform and ecosystem, please join us and realize your dreams with us!
description of job
Build a next-generation serverless computing platform, including:
- Design and implement a complete and scalable front-end system, including authentication/authority management, metadata management, traffic control, metering and billing, log monitoring, and more
- Design and implement flexible and reliable back-end systems, including resource scheduling, load balancing, fault-tolerant processing, etc.
- Rich and easy-to-use SDK/Tools/CLI/console
- Driven by user needs, track industry trends, and use technology to drive business growth
Job Requirements
- Solid basic knowledge of algorithms/data structures/operating systems, excellent logical thinking ability.
- Master at least one programming language. For example Java/Go/C/C#/C++.
- Experience in developing large-scale, high-availability distributed systems is preferred.
- Experience in Web/Mobile Backends/Microservice development is preferred.
- Good communication skills and teamwork spirit, with certain organizational and coordination skills.
- Bachelor degree or above
- More than 3 years of work experience, students who have passed the "Alibaba Coding Standards" certification are preferred, and the certification address: https://edu.aliyun.com/certification/cldt02
CV submission
yixian.dw AT alibaba-inc.com
" Alibaba's cloud-native technology circle focuses on microservices, serverless, containers, service mesh and other technical fields, focuses on cloud-native popular technology trends, and cloud-native large-scale implementation practices, and is a technology circle that understands cloud-native developers best."