Tinkered with WAMR

Preface#

Recently, I plan to write a programming language (a flag set at the end of the year), and I have selected a backend for compilation. I initially planned to compile to a C backend and then use clang to compile it into binary code. The advantage is that the C toolchain is already very mature, and it can be easily compiled to various targets based on LLVM, ensuring execution speed (aside: why don't you compile directly to LLVM IR = =). Later, I thought about it again (~~rebellious bear~~) and looked at the cranelift backend implemented in Rust. It seems to be a good choice because cranelift focuses on JIT compilation (it can also directly generate binary files through the object module), so it is much faster in code generation speed compared to LLVM (of course, there are not as many optimization passes), while still maintaining very good performance. Because cranelift does not provide bindings for other languages, I learned Rust for a while and then gave up (failed to learn Rust for the nth time, I'm too bad 😭). Finally, I had a flash of inspiration and thought of using WebAssembly as the backend for compilation. Therefore, I started researching wasm runtimes and finally chose WAMR.

Why Choose WAMR#

Extremely fast execution speed (very close to native performance)
Comes with an interpreter mode (perfect for quickly starting in dev mode)
Supports wasi libc
Very small binary size

Let's Get Started#

At first, I planned to try WAMR first, so I didn't consider embedding it. I directly used the two CLI tools provided by the official website for testing. However, the official website only provides prebuilt binaries for x86_64 on macOS, and I happen to use an ARM chip, so I need to compile it manually. First, download the WAMR source code to your local machine. We need two CLI tools: iwasm and wamrc. Let's start by compiling iwasm.

Compiling iwasm#

First, let's find the path of iwasm in the product-mini/platforms/darwin folder. We can see the CMakeLists.txt file there. Interested readers can open the file to see the available options for compilation. Based on my use case, I created a make.sh file in the product-mini/platforms/darwin folder for compilation. Now let's take a look at the content.

#!/bin/sh

mkdir build && cd build
# Pass compilation options
cmake .. -DWAMR_BUILD_TARGET=AARCH64 -DWAMR_BUILD_JIT=0
# Compile
make
cd ..

In this shell script, the key part is the cmake section. We pass two compilation options. Let's interpret the meaning of these compilation options.
WAMR_BUILD_TARGET=AARCH64 compiles to the ARM 64 instruction set.
WAMR_BUILD_JIT=0 does not compile JIT functionality (actually, I initially hoped to use JIT functionality in dev mode so that the speed in dev mode would not be too far behind the final build mode. Currently, WAMR has two JIT modes: Fast JIT and LLVM JIT. The LLVM JIT is too large in size, so I didn't plan to compile this feature from the beginning. After all, it is only used in dev mode and is not necessary. The Fast JIT, on the other hand, is relatively lightweight and only adds a very small binary size. According to the official statement, its performance can reach 50% of LLVM. This is sufficient for dev mode. Unfortunately, it didn't compile successfully on my computer. I will try again later.)
After execution, you can see the iwasm executable file in the build folder. In the AARCH64 architecture, the size of the pure interpreter-executed binary file is only 426 KB, which is very lightweight. Next, let's generate a WebAssembly file and try it out. Here, I choose to compile it to the wasm32-wasi target using Rust. First, let's add the wasm32-wasi target using rustup.

rustup target add wasm32-wasi

Then, let's create a new Rust project using cargo.

cargo new --bin hello_wasm

Next, let's write a program to calculate the Fibonacci sequence.

use std::io;

fn fib_recursive(n: usize) -> usize {
    match n {
        0 | 1 => 1,
        _ => fib_recursive(n - 2) + fib_recursive(n - 1),
    }
}

fn main() {
    println!("Please enter a number to calculate the Fibonacci sequence:");

    let mut input = String::new();
    io::stdin().read_line(&mut input).expect("Failed to read input");

    let n: usize = input.trim().parse().expect("Please enter a valid number");

    // Calculate the Fibonacci sequence and measure the time
    let start_time = std::time::Instant::now();
    let result = fib_recursive(n);
    let elapsed_time = start_time.elapsed();

    println!("The value of the {}th item in the Fibonacci sequence is: {}", n, result);
    println!("Calculation time: {:?}", elapsed_time);
}

Compile to wasi
cargo build --target wasm32-wasi --release
After compilation, you can find the compiled hello_wasm.wasm file in the target/wasm32-wasi/release folder. Let's use the previously compiled iwasm to execute this wasm file.
iwasm --interp hello_wasm.wasm
You can see that the program executes successfully. On my Mac mini (M1 chip), executing fib(40) takes about 3.7 seconds, while the native Rust program takes 337 ms. It can be seen that the efficiency of interpreting WebAssembly is about 1/10 of that of native programs (which is already a good result. This is because WAMR has a fast interpreter implementation internally, which converts the stack-based virtual machine instructions of WebAssembly into internal IR and then executes them).

Compiling wamrc#

Next, let's focus on optimizing performance. Let's compile wamrc, which means converting wasm files into aot files and then using the previously compiled iwasm to execute them for faster execution. Since wamrc relies on LLVM for compilation optimization, we need to compile LLVM first. On macOS, let's install the dependencies required for compiling LLVM (if you have already installed cmake and ninja, you can ignore this step).

brew install cmake && brew install ninja

Execute build.sh in the wamr-compiler directory.

./build_llvm.sh

Then, an error occurs = =. Let's follow the prompt to fix it. It seems that the version of LLVM I downloaded does not support the LLVM_CCACHE_BUILD option. We need to modify the compilation options in the build-scripts/build_llvm.py file to disable the ccache option.

LLVM_COMPILE_OPTIONS.append("-DLLVM_CCACHE_BUILD:BOOL=OFF")

After modifying it, let's build LLVM again. Then, let's compile wamrc. There is not much difference from compiling iwasm. Just follow the compilation steps in the official readme.

mkdir build && cd build
cmake .. -DWAMR_BUILD_PLATFORM=darwin
make

After execution, we will get the wamrc executable file in the build path. Let's use wamrc to compile it.

./wamrc --size-level=3 -o hello_wasm.aot hello_wasm.wasm

Here, because I am using an ARM64 architecture chip, I need to add the --size-level=3 option, otherwise it cannot be compiled (it is related to the file size).

Executing AOT WebAssembly#

Let's use the previously compiled aot artifact with iwasm to execute it.

./iwasm hello_wasm.aot

Let's try calling fib(40) again. This time, on my machine, it only takes 337 ms, which is the same as the native Rust program. Although this simple example cannot fully represent the performance difference between aot and native Rust programs, it also shows that after optimization by LLVM, WebAssembly can achieve execution speeds close to native performance.

Anecdote#

Node.js uses V8, which is also a highly optimized JIT compiler. However, unlike aot, V8 first generates bytecode for JavaScript and interprets it. Only hot functions are JIT compiled. Moreover, because JavaScript is a dynamically typed language, it is more difficult to optimize compared to statically typed languages like WebAssembly. So, as the pinnacle of dynamic languages, how long does it take to execute the above fib function? On my machine, executing fib(40) takes about 959 ms, which is 30% of the native Rust program. This also shows that V8 is indeed powerful.