Strings in Webassembly
Problem
I was working on compiling a small subset of javascript to webassembly. Representing numbers , if else conditions and for loops was intuitive and straightforward. However, because i didn't have any prior experience working with low level instructions, I was stuck with using strings in webassembly.
Compiling a hello world program in webassembly was not easy as I was thinking
Memory Layout
After researching for a bit, I found that webassembly has a data section, where we can initialize data. So, i could push "hello world" to the data section and print by its offset from the memory.
(data (i32.const 0) "Hello World")
Unlike high-level languages where strings are first-class citizens, WebAssembly treats them as raw bytes in memory. It follows a linear memory model, so like array, we can store data in contiguous memory. And this is how hello world is stored in memory.
You can understand more about linear memory from this blog by Lin Clark
Storing the Strings
For printing and storing the string, we need two offsets
- Start of the string
- Length of the string
We need to track this offsets in our compiler. So, if we want to track two different strings "Hello" and "world" in different line of the program
1| var a = "hello"
....
18| var b = "world"
We can track the offsets by simply incrementing our last offset by the length of the string.
new_offset = last_offset + length of the string
For example
new_offset = 0 + "hello".length = 0+5 = 5
In webassembly, this is translated to
(data (i32.const 0) "hello")
(data (i32.const 5) "world")
Okay, but how do we print this now ? We only know the offsets of the string, how about the length ?
Length of the string
Then i found this blog, where the author was storing the length of the string in the first byte. And this technique is called Length Prefixed Encoding, which is common in network protocols and even in programming language like Pascal.
String: "\05hello"
Memory: [0x05]['h']['e']['l']['l']['o']
↑______↑
length data
So if my string is hello
, then the string with length will be stored as \05hello
, the first byte is the length of the string.
\05
will be stored as 0x05
I defined this function to get the length of the string
(func $len (param $addr i32) (result i32)
local.get $addr
call $nullthrow
i32.load8_u
)
$addr
is the byte offset of the string; To get the length of "hello", we have to send 0.$nullthrow
is optional, this function will check if the value exists in the offset or noti32.load8_u
will load the first byte of the offset, and return it
End Notes
But we're not done yet. In my next post, we'll tackle the real challenge: dynamic strings that can grow, shrink, and be modified at runtime. We'll build our own memory allocator.