"The Definitive Guide to WebAssembly" (3) WebAssembly module

This article is the third article in the series "The Definitive Guide to WebAssembly". List of articles in the series:

The operating system runs programs that are usually included in compiled form. Each operating system has its own format that defines where to start running, what data is required, and what the instructions are for the different function bits. WebAssembly is no exception. In this chapter, we'll see how this behavior is packaged and how the host knows how to handle it.

A software engineer can spend his entire career ignoring how programs are loaded and executed through this process. Their world begins with int main (int argc, char **argv) or and ends with mere arrival. These are well-known entry points for C, Java, and Python programs, so this is where the programmer assumes control flow responsibility. However, the operating system or program runtime needs to build and tear down the executable structure before the program starts and after it exits. The loader needs to know where the instructions start, how the data elements are initialized, what other modules or libraries need to be loaded, etc.static void main (String [] args)if __name == "__main__":

These details are usually defined by the nature of the executable file. On Linux, this is defined by the Executable and Linkable Format (ELF) ^[1] ; on Windows, it is defined by the Portable Executable Format (PE) ^[2] ; on macOS, it is defined by Mach-O Format ^[3] definition. Apparently these are platform-specific formats for native executable files. More portable systems like Java and .NET use intermediate bytecode representation, but still have a well-defined structure, and they all work similarly.

One of the first design considerations for WebAssembly MVP is to define the module structure so that the WebAssembly host knows what to look for and verify, and where to start when executing the deployment unit.

In Chapter 2, you saw a more complex module structure than when you started this chapter. We'll walk through these parts step by step, and then show you some tools for exploring the textual and visual structure of WebAssembly modules. In the previous chapter, we briefly discussed binary structures. It is compact and quick to transfer and load. You probably don't often spend a lot of time looking at binary details because you're focused on the software side. It's useful to be familiar with the layout of modules, so let's take a look.

Module structure

The empty module is the most basic module of WebAssembly. An empty module does not need any content to be a valid module, as shown in Example 3-1.

Example 3-1. Empty module, but valid WebAssembly module.

(module)

Obviously, this is nothing to see, but it can be converted to binary form. You'll notice in the output below that it doesn't take up much space and it does nothing.

brian@tweezer ~/g/w/s/ch03> wat2wasm empty.wat
brian@tweezer ~/g/w/s/ch03> ls -alF
total 16
drwxr-xr-x 4 brian staff 128 Dec 21 14:45 ./ 
drwxr-xr-x 4 brian staff 128 Dec 14 12:37 ../ 
-rw-r--r-- 1 brian staff 8   Dec 21 14:45 empty.wasm
-rw-r--r-- 1 brian staff 8   Dec 14 12:37 empty.wat

If you're more visually oriented, you might like to use WebAssembly Code Explorer, available from the wasdk GitHub repository ^[4] . You can use it online in a browser ^[5] or clone it to run an HTTP server. I'll use the Python 3 web server as before.

brian@tweezer ~/g/wasmcodeexplorer> python3 -m http.server 10003
Serving HTTP on :: port 10003 (http://[::]:10003/) ...

Again, it doesn't look like much for an empty module, but it will be a useful summary once we start adding some elements to it. The operating system usually identifies the file format from the first few bytes of the file. They are often called magic numbers . For WebAssembly, these bytes are encoded as 0x00 0x61 0x73 0x6Dhexadecimal values representing the characters a, s, and m respectively, followed by the version number 1 ( 0x01 0x00 0x00 0x00 expressed in bytes).

In Figure 3-1, you can see the magic byte, which is version 1 of the WebAssembly file format, with a series of numbers on the left and an empty module structure on the right.

Figure 3-1. Visual representation of an empty module in the WebAssembly Code Explorer.

wasm-objdump For command line inspection of modules you have several options, the executables in the Wabt toolkit are very useful. See the appendix for help installing the various tools discussed in this book.

If you run the command without the switch, it will prompt an error message. As you'll see, these make a bigger difference when you have more details to explore.

brian@tweezer ~/g/w/s/ch03> wasm-objdump empty.wasm
At least one of the following switches must be given:
     -d/--disassemble
     -h/--headers
     -x/--details
     -s/--full-contents

Now we just need to verify that our module, although useless, is valid by using the detail switch. This also indicates that we are dealing with version 1 of the format.

brian@tweezer ~/g/w/s/ch03> wasm-objdump -x empty.wasm 
empty.wasm: file format wasm 0x1
Section Details:

Explore the various parts of the module

Regarding the concepts we introduced, there is a problem of circular dependencies. The module format must support all the various elements included in WebAssembly, some of which we will cover in later chapters. We'll focus primarily on what we've seen so far, with the promise of revisiting elements from other sections soon.

The overall structure of the module is based on a series of optional numbered sections, each covering a specific feature of WebAssembly. In Table 3-1 we can see a list and description of these parts.

Table 3-1. List of WebAssembly modules

ID	name	describe
0	Custom	Debug or metadata information for use by third parties
1	Type	Type definitions used in modules
2	Import	Import elements used by a module
3	Function	Type signatures associated with functions in modules
4	Table	A table defining indirect, immutable references used by modules
5	Memory	The linear memory structure used by a module
6	Global	global variables
7	Export	Export elements provided by a module
8	Start	An optional startup function used to start a module
9	Element	elements defined by a module
10	Code	The body of a function defined by a module
11	Data	A data element defined by a module
12	Data Count	The number of data elements defined by the module

Consider the following example from Chapter 2.

Example 3-2. A simple WebAssembly text file

(module
    (func $how_old (param $year_now i32) (param $year_born i32) (result i32) ①
        local.get $year_now 
        local.get $year_born
        i32.sub)

    (export "how_old" (func $how_old)) ②
)

1. Internal functions $how_old
2. Exported functions how_old

We use the wat2wasm tool to convert it to binary form. If we try to interrogate the structure produced by this conversion, we see the following:

> wasm-objdump -x hello.wasm

hello.wasm: file format wasm 0x1

Section Details:

Type [1]:
 - type [0] (i32, i32) -> i32
Function [1]:
 - func [0] sig=0 <how_old>
Export [1]:
 - func [0] <how_old> -> "how_old"
Code [1]:
 - func [0] size=7 <how_old>

Note that there are many more parts than our empty module. First, we have a type section, which defines a signature. It proposes a type that accepts two i32s and returns one i32. This is the appropriate signature for our how_old method. The type is not given a name, but it can still be used to set expectations and validate in terms of functional configuration.

Next we have a Function section which links our type (type[0] in the Type section) to the named function. Because we export our functions to make them available to our host environment or other modules, we see the internal functions exported <how_old> by name how_old . Finally, we have a Code section that contains the actual description of our only function.

Figure 3-2 shows what our module looks like in WebAssembly Code Explorer.

Figure 3-2. Our Hello, World! module visualized in the WebAssembly Code Explorer.

Red indicates section boundaries, but you can also get more detail by moving sections in the browser. For example, the purple bytes in the exports section, if you hover over one of those bytes, it should show the name of the exported function how_old. You can see the actual instructions via the green and blue bytes in the final code section.

If you look closely at Example 3-2, you'll notice that our variable names are not imported by default. wasm-objdump This fact was also emphasized. For debugging purposes, you need to specify in the wat2wasm command:

> wat2wasm hello.wat -o hellodebug.wasm --debug-names
> wasm-objdump -x hellodebug.wasm

hellodebug.wasm: file format wasm 0x1

Section Details:

Type [1]:
 - type [0] (i32, i32) -> i32
Function [1]:
 - func [0] sig=0 <how_old>
Export [1]:
 - func [0] <how_old> -> "how_old"
Code [1]:
 - func [0] size=7 <how_old>
Custom:
 - name: "name"
 - func [0] <how_old>
 - func [0] local [0] <year_now>
 - func [0] local [1] <year_born>

Note that wat2wasm uses custom sections to preserve function and local variable details. Other tools may use this section for their own purposes, but this is generally how debugging information is captured. In Figure 3-3, you can see that there are more bytes in the module because of this custom part.

Figure 3-3. Our Hello, World! module visually retains debugging details in the WebAssembly code browser.

Use modules

Once you understand the process of inspecting the static binary structure of a WebAssembly module, you'll want to move on to working with it in a more dynamic way. We've seen the basics of instantiating modules through the JavaScript API in a few examples, such as in Example 2-4, but there are other things we can do.

The code in Example 3-2 generates an export section, but as we saw in Table 3-1, there is also a potential import section that receives elements from the host environment. This can eventually include Memory and Table instances, as we'll see in subsequent chapters, but now we can import a function into the module that allows us to communicate more directly with WebAssembly's console window. Keep in mind that we're still sorting out the low-level details, and your day-to-day experience with these technologies will likely be at a higher level.

Take a look at Example 3-3, a new version of our example that exports a second function. More importantly, it also imports a function.

(module
    (func $log (import "imports" "log_func") (param i32)) ①

    (func $how_old (param $year_now i32) (param $year_born i32) (result i32) ②
        local.get $year_now
        local.get $year_born
        i32.sub)

    (func $log_how_old (param $year_now i32) (param $year_born i32) ③
           local.get $year_now
    local.get $year_born
    call $how_old
    call $log
    )

    (export "how_old" (func ow_old)) ④
    (export "log_how_old" (func $log_how_old)) ⑤
)

1. Import a function from the host that expects an i32 parameter
2. Same as previous $how_old function
3. A new function requires two parameters, and then calls the function we imported
4. Export our old function as before how_old
5. Export our new log_how_old function

As you can see, we have a new function that can be called in the module, but we can't call it yet. Our previous functionality is still available with no changes. Our new function calls the old function to do the math, but requires a log_func function named to call its result. To clarify some differences, let's generate .wasm the output and then dump the module structure.

brian@tweezer ~/g/w/s/ch03> wat2wasm hellolog.wat brian@tweezer ~/g/w/s/ch03> wasm-objdump -x hellolog.wasm
    hellolog.wasm:  file format wasm 0x1
    Section Details:
    Type [3]:
     - type [0] (i32) -> nil
     - type [1] (i32, i32) -> i32
     - type [2] (i32, i32) -> nil
    Import [1]:
     - func [0] sig=0 <imports.log_func> <- imports.log_func
    Function [2]:
     - func [1] sig=1 <how_old>
     - func [2] sig=2 <log_how_old>
    Export [2]:
     - func [1] <how_old> -> "how_old"
     - func [2] <log_how_old> -> "log_how_old"
    Code [2]:
     - func [1] size=7 <how_old>
     - func [2] size=10 <log_how_old>

This is the first time we have an entry in the import section. It's defined as having types we haven't seen yet. If you look at the types section, you'll see that we now specify three types: one that takes an i32 but returns nothing, one that takes two i32 parameters and an i32 return value, and one that takes two A new types , i32, returns nothing.

The first of these types is defined in our import. We want the host environment to give us a function that we can call to receive i32. The purpose of this function is to print out the arguments in some way, not to return anything, so it doesn't need a return type. We want to find this function from importObject which we ignored earlier on the JavaScript side. The second one is the same as before. The third one calls our $how_old function with arguments, but then logs it so it doesn't need a return value either. The Imports and Functions sections show the links between functions and signatures.

To pass importObject the provided element, we need some HTML code, as shown in Example 3-4.

Example 3-4. An HTML file to instantiate our module and call the imported object through methods.

<!doctype html>

<html>
  <head>
      <meta charset="utf-8">
      <title>WASM Import test</title>
      <script src="utils.js"></script>
  </head>

  <body>
    <script>
      var importObject = {
        imports: {log_func: function (arg) {console.log ("You are this old:" + arg + "years.");
          },

          log_func_2: function (arg) {alert ("You are this old:" + arg + "years.");
          }
        }
      };

      fetchAndInstantiate ('hellolog.wasm', importObject).then (function (instance) {console.log (instance.exports.log_how_old (2021, 2000));
      });

    </script>
  </body>
</html>

Compare the import statement in Example 3-3 with the structure of the object. Note that there is an import namespace, which contains a log_func function named. This is the structure specified by our import statement. $log_how_old The function pushes its two arguments onto the stack, and then $how_old the call instruction calls our previous function. Remember, this function subtracts one argument from another and returns the result to the top of the stack. At this point, we don't need to push the value back onto the stack; we can simply call $log the imported function we named. The result of the previous function will be the parameter of this new call. Take the time to make sure you understand the relationship between parameters, return values, and functions.

If you copy the file from the previous chapter utils.js (which provides fetchAnd Instantiate() the functions) and serve it over HTTP as we did before, you can load the new HTML file. Initially you won't see anything as ours log_func just dumps its arguments to console.log(). However, if you look at the console in your browser's developer tools, you should see something like Figure 3-4.

Figure 3-4. The result of calling our new function using an imported JavaScript function.

If importObject you change it to something like Example 3-5 and then reload the HTML file in the browser, you will no longer see the console message; you should see a pop-up alert message. Obviously, nothing has changed in our WebAssembly code - we're just passing in a different function from the JavaScript side, so we're seeing different results. We'll see more complex interactions as we delve deeper into this topic, but hopefully you're starting to understand how WebAssembly and JavaScript code interact through imports and exports.

Example 3-5. The same WebAssembly module can be instantiated and called in different ways

var importObject = { 
  imports: {log_func: function (arg) {alert ("You are this old:" + arg + "years.");
    }
  }
};

Instantiating modules and calling their functions will be your main interaction with them through the JavaScript API, but there are some additional behaviors you can use. If you want to know which methods a module imports or exports, you can use the JavaScript API to ask the loaded module. If you do not call the methods utils.js in fetchAndInstantiate() , but instead change the HTML to have the code shown in Example 3-6, you will see the results shown in Figure 3-5.

Example 3-6. We can do more with the JavaScript API, including streaming compilation.

WebAssembly.compileStreaming (fetch ('hellolog.wasm'))
  .then (function (mod) {var imports = WebAssembly.Module.imports (mod);
  console.log (imports [0]);
  var exports = WebAssembly.Module.exports (mod);
  console.log (exports);
  }
);

Figure 3-5. Querying module structure through JavaScript API

Once we understand more concepts and start using higher-level languages to express our actions, the full power of WebAssembly begins to emerge.

So far, we've been using code blocks in a file called utils.js, which looks like Example 3-7. For simple modules this is fine, but as your modules get larger it can remove some of the built-in latency. Performance refers not only to runtime performance, but also to loadtime performance.

Example 3-7. We have been instantiating modules in a simple way

function fetchAndInstantiate (url, importObject) {return fetch (url).then (response =>
    response.arrayBuffer ()).then (bytes =>
    WebAssembly.instantiate (bytes, importObject)
  ).then (results =>
    results.instance
  );
}

The problem here is that although we use Promises to avoid blocking the main thread, we read the module into an ArrayBuffer before instantiating it. We are actually waiting for all network transfers to complete before compiling the module. One of the first features of post-MVP is the ability to support compilation while bytes are still being transferred over the network. The module's format structure lends itself to this kind of optimization, so it would be a shame not to use it.

Although there is no "right" way to instantiate your module (for example, in some cases you may want to instantiate multiple instances of a module), in most cases the code in Example 3-8 is a A slightly more efficient method.

Example 3-8. The recommended way to instantiate a module in most cases.

(async () => {const fetchPromise = fetch (url);
  const {instance} = await WebAssembly.instantiateStreaming (fetchPromise); // Use the module
  const result = instance.exports.method (param1, param2); 
  console.log (result);
})();

Note that we're not creating ArrayBuffer; we're passing a Promise from fetch() a method to a WebAssembly object's instantiateStreaming() method. This allows the baseline compiler to start compiling functions as they appear on the network. In most cases, the code compiles faster than it can be transferred over the network, so by the time you finish downloading the code, it should be verified and ready to use. When JavaScript finishes downloading, that's typically when the verification process begins, so we see improvements in startup times.

There is currently no official way to cache WebAssembly modules, but it is an unobtrusive way to improve startup time. Cache control and other network artifact handling will avoid unnecessary redownloading of modules (for example, if they have been updated).

Future integration with ES6 modules

As we can see, while being able to work through the JavaScript API is obviously useful, doing so is low-level and repetitive, which is why we put it in a reusable utility script file. In the future, we hope that it will be easier to use WebAssembly modules from HTML, since they will be available as ES6 modules.

This is a bit tricky because of the asynchronous processing required at the top level and how the module's graph is loaded in three phases of build, instantiation and evaluation. There are subtle differences in the verification process for binary WebAssembly and JavaScript-based modules, when compilation occurs, and how module environment records are traversed and linked.

There are proposals to add support for the platform to eliminate these differences. At the time of writing, we are in the second phase of the proposal process. Link Clark gives a good introduction to its intricacies on YouTube ^{[6] .}

Our goal is to introduce a form of declaration, as shown in Example 3-9.

Example 3-9. Recommended declaration form for loading WebAssembly modules

import {something} from "./myModule.wasm";
something ();

This not only helps simplify the instantiation of WebAssembly modules, but also helps them participate in the JavaScript module's dependency graph. Without distinguishing how they are managed as dependencies, developers will more easily mix behaviors expressed in multiple languages into a complete solution.

The proposal is cleanly designed and well supported, but involves careful orchestration of the HTML specification, ES6 module specification, implementation, JavaScript bundler, and the larger Node.js community. My guess is that it won’t be long before we see progress on this proposal.

Now that we understand the structural elements of a WebAssembly binary, you should be able to easily inspect your own and third-party modules manually and programmatically. The next step is to look at the more dynamic elements of the WebAssembly module. We'll focus first on the Memory instance to simulate the functionality of contiguous memory blocks in a more traditional programming runtime.

Reference link

[1] Executable and Linkable Format (ELF): https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
[2] Portable Executable Format (PE): https://en.wikipedia.org/wiki/Portable_Executable
[3] Mach-O Format: https://en.wikipedia.org/wiki/Portable_Executable
[4] wasdk GitHub repository: https://github.com/wasdk/wasmcodeexplorer
[5] Online use: https://wasdk.github.io/wasmcodeexplorer/
[6] On YouTube: https: //www.youtube.com/watch?v=qR_b5gajwug&ab_channel=MozillaHacks

To get more information about the cloud native community, join the WeChat group. Please join the cloud native community and click to read the original article to learn more.