Meet WebAssembly

origin

WebAssembly originated as a side project of Mozilla employees. In 2010, Alon Zakai, who was working on Android Firefox at Mozilla, developed a compiler called Emscripten in his spare time in order to port his previously developed game engine to run on the browser , which can compile C++ code through LLVM IR. into JavaScript code.

By the end of 2011, Emscripten was even able to successfully compile large-scale C++ projects such as Python and Doom. Mozilla felt that this project was very promising at this time, so they formed a team and invited Alon to develop the project full-time. In 2013, Alon and other members proposed the asm.js specification, asm.js is a strict subset of the JavaScript language, trying to help browsers improve the JavaScript optimization space by "reducing dynamic features" and "adding type hints". Compared with the complete JavaScript language, the trimmed asm.js is closer to the bottom layer and is more suitable as a compiler target language.

asm.js only provides two data types: 32-bit signed integer, 64-bit signed floating-point number, other data types such as strings, booleans or objects, asm.js does not provide them at all, they all exist in the form of numbers , stored in memory, called via TypedArray . Type declarations also have a fixed way of writing: 变量 | 0for integers, +变量for floating-point numbers. For example the following code:

function MyAsmModule() {
    "use asm";  // 告诉浏览器这是个 asm.js 模块
    function add(x, y) {
        x = x | 0;  // 变量 | 0 表示整数
        y = y | 0;
        return (x + y) | 0;
    }
    return { add: add };
}

Engines that support asm.js identify types in advance, and can perform aggressive JIT (just-in-time compilation) optimizations, or even AOT (ahead-of-time compilation) compilations, greatly improving performance. Does not support asm.js Execution by ordinary JavaScript code will not affect the running result.

However, the disadvantage of asm.js is also obvious, that is, the "bottom layer" is not thorough enough, for example, the code is still in text format; the code writing is still limited by JavaScript syntax; the browser still needs to complete the parsing script, interpreting and executing, collecting performance indicators, JIT A series of steps such as compilation. If a binary format like Java class file is used, it can not only reduce the file size, reduce the network transmission time and parsing time, but also select the bytecode that is closer to the machine, so that the AOT/JIT compiler will be easier to implement and the effect will be better. better.

Meanwhile, Google's Chrome team is also trying to solve JavaScript performance issues, but in a different direction. The solutions given by Chrome are NaCl (Google Native Client) and PNaCl (Portable NaCl). With NaCl/PNaC1, the Chrome browser can directly execute native code in a sandbox environment.

asm.js and NaCl/PNaC1 technologies have their own advantages and disadvantages, and the two can learn from each other. Mozilla and Google have also seen this, so since 2013, the two teams have been communicating and collaborating frequently. They later decided to combine the strengths of both projects and collaborate on a bytecode-based technology. In 2015, "WebAssembly" was officially named and made public, and the W3C established the WASM Community Group (members include Chrome, Edge, Firefox, and WebKit) to promote the development of WASM technology.

In 2016, Rust 1.14 was released and began to support WASM. In 2017, Google decided to abandon PNaCl technology; the updated versions of the four major browsers Chrome, Edge, Safari, and Firefox began to support WASM. In 2018, Go 1.11 was released and started to support WASM. In 2019, Emscripten was updated to use LLVM to compile to WASM code by default, discontinuing support for asm.js; WebAssembly became a recommendation of the World Wide Web Consortium (W3C), becoming the fourth language of the Web along with HTML, CSS, and JavaScript.

Introduction

The official definition: WebAssembly / WASM is a binary instruction set based on a stack virtual machine, which can be used as a compilation target for programming languages ​​and can be deployed in web client and server applications.

WebAssembly has the following features:

  • is a low-level assembly-like language that runs at near-native speed on all modern desktop browsers and many mobile browsers.
  • Files are designed to be compact so they can be transferred and downloaded quickly. These files are also designed in such a way that they can be parsed and initialized quickly.
  • Designed as a compile target, code written in C++, Rust, and other languages ​​can now run on the web.

That is to say, WebAssembly can make code written in various languages ​​run in the browser at near-native speed.

WebAssembly is also designed to coexist and work with JavaScript, solving several problems relative to JavaScript (including asm.js):

  • Performance improvements. Since WebAssembly is a low-level assembly language, the code is statically typed, and the browser can directly compile it into machine code to greatly improve performance; and because WebAssembly is in the form of bytecode, the file size is also small, which is convenient for fast networking For transmission, browser manufacturers even introduced "stream compilation" technology, so that files can be compiled while downloading, and they can be initialized after downloading.
  • Fusion of different languages. Before, if I wanted to execute other languages ​​on the Web, I could only convert other languages ​​into JavaScript languages, but this process is not easy, and it will bring about a significant reduction in execution performance; and WebAssembly has been positioned as a compilation target language from the beginning of its design. Making it easy to convert other languages ​​into WebAssembly language code not only does not have to worry about performance (although there will still be some losses), but also makes code reuse easier.
  • Enhance code security. Securing JavaScript code can usually only use obfuscation to drastically reduce code readability, but with the help of some tools it can still be readable with the help of a few tools. However, the translated WASM code is completely unreadable. Even if it is decompiled by tools such as wasm2c, it is still much more difficult than analyzing JS code (of course it will not achieve complete code security, but increasing the difficulty of reverse engineering will make its risk is greatly reduced).

However, WebAssembly is not a pure browser platform technology, just like JavaScript and Node.js. Now it also has its own Runtime, which has many applications in cloud native, blockchain, security and other system applications outside the browser.

compile

C/C++ compiled via Emscripten :

emcc hello.c -o hello.wasm

Rust compiles with Cargo :

cargo build --target wasm32-example --release

It is also possible to further compress the volume:

wasm-gc target/wasm32-example/release/hello.wasm

Golang built-in compilation:

GOARCH=wasm GOOS=js go build -o hello.wasm main.go

run

run in JavaScript

In order to run WebAssembly in JavaScript, before compiling/instantiating, you first need to put the module into memory, e.g. via XMLHttpRequest or Fetch, the module will be initialized as a typed array.

Example using Fetch:

fetch('module.wasm').then(response =>
  response.arrayBuffer()
).then(bytes =>
  WebAssembly.instantiate(bytes, importObject)
).then(results => {
  result.instance.exports
});

The above approach is to first create an ArrayBuffer containing your WebAssembly module binary, then WebAssembly.instantiate()compile .

You can also use WebAssembly.instantiateStreaming()this method to get, compile and instantiate modules directly from raw bytecode without converting to ArrayBuffer:

WebAssembly.instantiateStreaming(fetch('simple.wasm'), importObject)
.then(result => {
  result.instance.exports
});

WebAssembly plans to support direct loading and running in the form of <script type="module">and ES6 import statements in the future.

run outside the browser

The Wasm community provides a lot of Runtime containers, so that WASM can be executed on systems other than browsers, and the runtime environment is sandboxed.

Currently the more popular Runtimes:

  • wasmtime : either as a CLI or embedded in other application systems, such as IoT or cloud native
  • WebAssembly Micro Runtime : A virtual machine that is more inclined to chip scenarios. As its name suggests, it is very small in size, with a starting speed of only 100 microseconds and a minimum memory consumption of 100KB.
  • wasmer : features support for running WASM instances in more programming languages, and has its own package management platform Wapm
  • WasmEdge : Formerly known as SSVM, targeted optimization for cloud-native, edge and decentralized applications

underlying concept

module

The main unit of a WebAssembly program is called a Module, a term used to refer to both the binary version of the code and the compiled version in the browser.

A large WebAssembly application is often composed of multiple sub-modules, each of which has its own independent data resources, so sub-modules cannot tamper with the data of other modules; in addition, the permissions that each module can use is specified by the top-level caller, so Third-party submodules cannot be invoked without the knowledge of the upper-level module. This kind of permission management is similar to the need to declare all dependent permissions in advance in Android development.

When other high-level languages ​​are compiled into WebAssembly, it becomes a module binary file, the file name ends with a .wasmsuffix , and the file content begins with an 8-byte module header for description:

0000000: 0061 736d              ; WASM_BINARY_MAGIC
0000004: 0d00 0000              ; WASM_BINARY_VERSION

The first 4 bytes are called "Magic Number", which corresponds to a \0asmstring to identify a Wasm module; the last 4 bytes are the WASM standard version number used by the current module.

part

After the module header is the main content of the module. These contents are classified into different sections (Section). Wasm puts specific functions or associated code into a specific section. Some sections are required for any module. , some segments are optional.

A segment may contain multiple items, and the Wasm specification defines a total of 12 segments and assigns IDs to each segment. Except for custom segments, all other segments can appear at most once and must appear in increasing segment ID order.

The following is a description of each segment, with the required segments in bold:

ID part illustrate
0 Custom segment (Custom) Mainly used to store data such as debugging information
1 Type segment (Type) Stores the function parameter list of imported functions and module internal functions
2 Import section (Import) Function name, function parameter index used to store imported functions
3 Function section (Function) Used to store function index values
4 Table segment (Table) It is used to store object references. The function (instruction) of function pointers can be implemented through table segments call_indirect, which can be imported from external hosts and exported to external host environments.
5 Memory segment (Memory) It is used to store the runtime dynamic data of the program, which can be imported from the external host and can also be exported to the external host environment
6 Global segment (Global) Used to store all variable values
7 Export segment (Export) Function name, function parameter index used to store exported functions
8 Start section (Start) Used to specify the function index value when the module is initialized
9 Element segment (Elem) The table segment is not explicitly initialized, the element segment is used to store the index value of the function
10 Code segment (Code) Instruction code for the stored function
11 Data segment (Data) Static data used to store initialized memory

type of data

The data types of WASM in binary encoding are as follows:

  • Unsigned integer. Support three non-negative integer types: uint8, uint16, uint32, the following number indicates how many bits are occupied
  • Variable-length unsigned integer. Supports three variable-length non-negative integer types: varuint1, varuint7, varuint32. The so-called variable length means that how many bits are used according to the specific data size, and the following numbers indicate the maximum number of bits that can be occupied.
  • Variable-length signed integer. As above, negative numbers are allowed here, and three types of varint7, varint32, and varint64 are supported.
  • floating point number. Same as JavaScript, using IEEE-754 scheme, single precision is 32 bits

For the language itself, the following numeric types are provided:

  • i32: 32-bit integer
  • i64: 64-bit integer
  • f32: 32-bit floating point type
  • f64: 64-bit floating point

Each parameter and local variable must be one of the above four value types, and the function signature consists of a sequence of types of zero or more parameters and a sequence of types of zero or more return values. (In the minimum viable version, a function can have at most one return type). Note that the value types i32 and i64 are not inherently signed or unsigned. The interpretation of these types depends on a specific operator.

Boolean values ​​are represented by unsigned 32-bit integers, where 0 is false and non-zero is true. All other value types (such as strings) need to be represented in the module's linear memory space.

WHAT

WASM binary files are unreadable, WAT (WebAssembly Text Format) is another output format, which is a text format using "S-expressions", which can be approximately understood as binary equivalent assembly language.

C, WAT, WASM code conversion

The developer tools of some browsers support converting WASM to WAT viewing, which is convenient for online debugging. The community wasm2watprovides wat2wasmmature tools such as and to convert the two, which can be found in the WABT (WebAssembly Binary Toolkit) tool set, so it is also possible to directly write WAT and then convert it to WASM.

WASI

Although WebAssembly was born for the web, it does not mean that it can only and is not intended to run only on browsers. Developers want to push it beyond the browser, which requires a set of interfaces to interact with the operating system.

Because WebAssembly is an assembly language based on conceptual machines, not physical machines, WebAssembly provides a fast, scalable, and safe way to run the same code on all computers. At the same time in order to run on all the different operating systems, WebAssembly needs a system interface to a conceptual machine, not any single operating system. So the developers defined a unified standard for communicating with different operating systems, called WASI (WebAssembly System Interface), which is a set of engine-indepent and non-Web system-oriented specially designed for WASM. oriented) API standard.

The design of WASI follows two principles:

  • portability. The ability to compile portable binaries that can be compiled once and run on different computers makes it easier for users to distribute code. For example, if Node's native modules are written in WebAssembly, users don't need to run applications with native modules when they install node-gypthem , and developers don't need to configure and distribute dozens of binaries.
  • safety. When a line of code requests the operating system to perform some input or output, the operating system needs to determine whether the operation requested by the code is safe. WebAssembly uses a sandbox mechanism, the code cannot directly interact with the operating system, the host (may be a browser, or a WASM runtime) needs to put the relevant functions into the sandbox that the code can use, and the host can limit each code one by one. what a program can do. While having a sandbox doesn't make the system itself secure (the host can still put all capabilities into the sandbox), it at least gives the host the option to create a more secure system.

Based on the above two key principles, WASI is designed as a set of modular standard interfaces. The most basic core module is wasi-corethat other sub-sets such as sensors, crypto, , processes, multimediaetc. are organized in the form of separate sub-modules.

WASI module

wasi-coreContains the basic interface required by all programs, and it covers nearly the same areas as POSIX, including WASI abstract function interfaces for related system calls such as files, network connections, clocks, and random numbers.

WASI adds a "system call abstraction layer" between the WASM bytecode and the virtual machine. For example, for the fopenfunction , when we wasi-libccompile this part of the source code with the C standard library specially implemented for WASI, fopenthe will indirectly call __wasi_path_openthe function named . to realise. This __wasi_path_openfunction is an abstraction of the actual system call.

The main work of WASI is to define the Import interface standard and provide the specific implementation of the general Import interface on different systems (similar to the implementation of libc mode on different operating systems). Based on the design idea of ​​WASI, we can also provide a higher-level WADSI (WebAssembly Domain Specific Interface) for different fields, and provide the common interface in the field as the Import interface, so that developers can use it directly.

safety

One of the sources of WebAssembly's security is that it was the first language to share a JavaScript VM, which is sandboxed at runtime and has undergone years of validation and security testing, which ensures its security. WebAssembly modules will be accessible no more than JavaScript can, while obeying the same security rules, including enhanced rules like the same-origin policy.

Unlike desktop applications, WebAssembly modules have no direct access to device memory, instead the runtime environment passes an ArrayBuffer to the module during initialization. The module uses this ArrayBuffer as linear memory, and the WebAssembly framework performs checks to ensure that the code does not out-of-bounds operations on this array.

Items like function pointers that are stored in the Table segment are also not directly accessible by WebAssembly modules. The code will use the index value to make a request to the WebAssembly framework to access an item. The framework then accesses the memory and executes the item on behalf of the code.

In C++, the execution stack resides in memory along with linear memory, and although C++ code should not modify the execution stack, it can do so using pointers. The execution stack of WebAssembly is separated from the linear memory, and the code cannot access it.

Applications

Google Earth Google Earth was released in version 9.0 in 2017 and was developed using NaCl technology, so it could only run on Chrome at that time. In 2020, Google rewrote the project in C++ with WebAssembly, and since then it can run on Firefox and Edge.

Google Earth

AutoCAD AutoCAD is a well-known desktop design software with a history of nearly 40 years, which is widely used in many fields such as civil construction, decoration, industrial drawing and so on. The Web version of AutoCAD was released in 2014. It was developed with the help of Google Web Toolkit (a tool set developed by Google that can use Java language to develop Web applications). The Java code on the Android side is translated into JS code, but due to the generated JS code Very large, resulting in very low efficiency on the browser. In 2015, the main functions in the original C++ code were directly compiled and transplanted to the Web platform through asm.js, and the performance was greatly improved. In March 2018, AutoCAD Web based on WASM was also successfully born.

AutoCAD

Figma Figma is a browser-based collaborative UI design tool. The core interactive interface is hosted in a Canvas, and the interaction of this Canvas is controlled by WASM. Browser-based makes it easy to run cross-platform, and WebAssembly brings high performance, making it faster than similar applications developed on native OS even on the web platform.

Figma

Epilogue

It can be seen that WebAssembly is not used to completely replace JavaScript, but as a supplement to Web technology to make up for the limitations of JavaScript in terms of performance and code reuse. As the official slogan of WASM: "Everything that can be implemented with WebAssembly will be implemented with WebAssembly", the ultimate goal of WebAssembly is to compile in any language and run efficiently on any platform. The most important thing is that it is backed by the support of mainstream development institutions such as Google, Mozilla, and Edge. I believe that there will be more development in the future.

References

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324075825&siteId=291194637