WebAssembly on Roku
For the source code, visit my github: https://github.com/TrevorSundberg/wasm2brs
I’ve been of huge fan of the TCL + Roku series of TVs. I’m a simple person with simple pleasures, and the button on the remote that instantly turns on the TV and goes to Netflix won me over. As with most technology, it didn’t take long for me to start tinkering.
Roku’s app development platform requires that all apps are written in a proprietary language known as BrightScript. It’s a dynamic language with limited optional types. After playing in BrightScript for a bit, I found myself searching “How to compile C++ for Roku?” only to realize that C++ support was discontinued or locked behind partnerships and NDAs.
Path to C/C++ in BrightScript
Up front I knew I was going to need to do some translating. The immediate options available were:
- C++ directly to BrightScript
- C++ to LLVM bitcode to BrightScript
- C++ to WebAssembly to BrightScript
Translation of the C++ AST provided by clang was a no-go due to the complexity of having to convert classes, operators, templates, etc. LLVM looked attractive because it boils down to basic instructions, pointers, and structs, and is a supported output by many languages. Ultimately I chose WebAssembly over LLVM bitcode because the number of instructions was smaller, there is no concept of structs (only bytes), pointers are just integers with no special type, and the linear memory model for WASM was simple.
When I started, I looked for a parser for .wasm files which led me to wabt, The WebAssembly Binary Toolkit. My first approach was going to be parse the .wasm, walk over instructions, generate variables, and emit BrightScript. This proved to be a little more complicated than I expected as I didn’t fully understand the type and label stack, or even how the br instruction jumps after the block, instead of to it like in LLVM. However in the wabt repository I came across wasm2c, a simple command line interface for translating WebAssembly into C. This gave me a clearer picture of the workings of WebAssembly and a template to work from, now called wasm2brs for WebAssembly To BrightScript. All I needed to do was modify the output of wasm2c to match BrightScript syntax.
To start I picked a simple example of adding integers with no imports, memory, globals, etc. You can use the wabt tool wat2wasm to convert this into a .wasm file:
(func (export "add") (param $x i32) (param $y i32) (result i32) (i32.add (local.get $x) (local.get $y) ) )
In fact it was so useful to be able to run on both .wat and .wasm files that I ended up integrating the source of wat2wasm into wasm2brs.
After some slight modification of the original wasm2c project, I was able to output the following BrightScript:
Function w2b_add(w2b_x As Integer, w2b_y As Integer) As Integer w2b_i0 = w2b_x w2b_i1 = w2b_y w2b_i0 += w2b_i1 Return w2b_i0 End Function
Ta-da! It actually worked and produced valid code that ran on the Roku. I was ecstatic at this point, but I was also piggybacking on the fact that wasm2c was able to use all of C’s built in operators such as
+= which happens to exist in BrightScript.
n in this case actually jumps by 4 bytes (
index = n * 4). This means it does not support unaligned loads, which is required by WebAssembly.
This presented a problem: we’re going to have to do everything in bytes. Roku employees, if you’re reading this, please add all the methods from DataView :)
To implement i32.load which loads a 32 bit integer from memory, I modified wasm2brs to emit a function call to the following:
Function I32Load(mem as Object, index as Integer) as Integer Return mem[index] + (mem[index + 1] << 8) + (mem[index + 2] << 16) + (mem[index + 3] << 24) End Function
Luckily BrightScript has logical bit-shift built in and I was able to read 4 bytes and shift them all into place. At the time, I didn’t realize that BrightScript also had the bitwise-or operator (literally
Or) as I tried the typical C
| operator and it failed, so I opted to use
What became immediately clear was that I needed a way to know whether my WASM instruction implementations were correct. For example, unsigned i32 division is different than signed i32 division, and BrightScript only supported signed integral types. How would I know my implementation was correct without coming up with all the pathological cases to test?
Again, wabt to the rescue! It was by luck that I noticed they had a sub-repository WebAssembly/testsuite which contained a massive set of WebAssembly .wast files that tested the behavior of each instruction. For those familiar with WebAssembly, a .wast is just a .wat file with extra syntax for testing. For example, you could test the unsigned i32 division by using the test file i32.wast:
(assert_return (invoke "div_u" (i32.const 1) (i32.const 1)) (i32.const 1)) (assert_return (invoke "div_u" (i32.const 0) (i32.const 1)) (i32.const 0)) (assert_return (invoke "div_u" (i32.const -1) (i32.const -1)) (i32.const 1)) (assert_return (invoke "div_u" (i32.const 0x80000000) (i32.const -1)) (i32.const 0)) (assert_return (invoke "div_u" (i32.const 0x80000000) (i32.const 2)) (i32.const 0x40000000)) (assert_return (invoke "div_u" (i32.const 0x8ff00ff0) (i32.const 0x10001)) (i32.const 0x8fef)) (assert_return (invoke "div_u" (i32.const 0x80000001) (i32.const 1000)) (i32.const 0x20c49b)) (assert_return (invoke "div_u" (i32.const 5) (i32.const 2)) (i32.const 2)) (assert_return (invoke "div_u" (i32.const -5) (i32.const 2)) (i32.const 0x7ffffffd)) (assert_return (invoke "div_u" (i32.const 5) (i32.const -2)) (i32.const 0)) (assert_return (invoke "div_u" (i32.const -5) (i32.const -2)) (i32.const 0)) (assert_return (invoke "div_u" (i32.const 7) (i32.const 3)) (i32.const 2)) (assert_return (invoke "div_u" (i32.const 11) (i32.const 5)) (i32.const 2)) (assert_return (invoke "div_u" (i32.const 17) (i32.const 7)) (i32.const 2))
assert_return in BrightScript and creating a framework that read .wast files using the wast2json tool, I had a way to progress forward:
result = test0_div_u(1%,1%) ' i32.wast(87) current.json(50) AssertEquals(result, 1%) result = test0_div_u(0%,1%) ' i32.wast(88) current.json(51) AssertEquals(result, 0%) result = test0_div_u(4294967295%,4294967295%) ' i32.wast(89) current.json(52) AssertEquals(result, 1%) result = test0_div_u(2147483648%,4294967295%) ' i32.wast(90) current.json(53) AssertEquals(result, 0%) result = test0_div_u(2147483648%,2%) ' i32.wast(91) current.json(54) AssertEquals(result, 1073741824%) result = test0_div_u(2414874608%,65537%) ' i32.wast(92) current.json(55) AssertEquals(result, 36847%) result = test0_div_u(2147483649%,1000%) ' i32.wast(93) current.json(56) AssertEquals(result, 2147483%) result = test0_div_u(5%,2%) ' i32.wast(94) current.json(57) AssertEquals(result, 2%) result = test0_div_u(4294967291%,2%) ' i32.wast(95) current.json(58) AssertEquals(result, 2147483645%) result = test0_div_u(5%,4294967294%) ' i32.wast(96) current.json(59) AssertEquals(result, 0%) result = test0_div_u(4294967291%,4294967294%) ' i32.wast(97) current.json(60) AssertEquals(result, 0%) result = test0_div_u(7%,3%) ' i32.wast(98) current.json(61) AssertEquals(result, 2%) result = test0_div_u(11%,5%) ' i32.wast(99) current.json(62) AssertEquals(result, 2%) result = test0_div_u(17%,7%) ' i32.wast(100) current.json(63)
My mission became clear: work through each .wast file and implement all the instructions until 100% of the tests pass.
Unsigned i32 Operations
One thing that I learned is that there is no difference between unsigned and signed addition (I believe due to the Two’s Complement, but don’t quote me on that). For example if you were to take the expression:
-123 + 20 = -103
Reinterpret the first two operands as unsigned you would get:
FFFFFF85 + 00000014
When added they result in:
Which if you reinterpret that back as signed you get:
However, the same cannot be said for unsigned division which is exactly why WebAssembly has the i32.div_u and i32.div_s instructions. In the land of unsigned, it’s very clear to see that:
00000001 / FFFFFFFF = 00000000
The number 00000001 (or just 1) divided by a very large number of 0xFFFFFFFF is clearly 0 due to integer division. However if you reinterpret those same operands as i32 signed values, you get:
1 / -1 = -1
This presents us with a problem: BrightScript does not have unsigned primitives and therefore does not have a way to run an unsigned i32 division operation. It’s time to get clever… Trevor (I make no apologies for this).
We can take advantage of the fact that the entire range of an unsigned i32 fits within a signed i64. The tricky part is in BrightScript, all conversions from Integer to LongInteger are done via sign extension. That means that the value 0xFFFFFFFF would turn into 0xFFFFFFFFFFFFFFFF, because 0xFFFFFFFF interpreted as signed is -1, and -1 in 64 bit is 0xFFFFFFFFFFFFFFFF.
There is a simple fix: we use bitwise and to discard the high 4 bytes and keep only the low 4 bytes after performing the sign extension. This can be done all in one BrightScript operation via:
value And &HFFFFFFFF&
BrightScript uses the
&H syntax for hex (similar to Basic) rather than the C style
0x. By adding the postfix
& we’re indicating to BrightScript that the hex is a LongInteger literal rather than just an Integer. Moreover, since value in this case is an Integer, BrightScript will implicitly promote it to LongInteger to do the bitwise operation (which sign extends) and then the
And will lop off the top bits, and viola! We have what looks like an unsigned i32 value but stored in an i64.
Now we can simply do our divion as such:
Function I32DivU(numerator as Integer, denominator as Integer) as Integer Return (value And &HFFFFFFFF&) / (value And &HFFFFFFFF&) End Function
And with only two added bitwise operations, we still get a decent performance out of unsigned division.
Unsigned i64 Operations
Unlike i32 where we can just promote to i64 and do the operation, we don’t have a 128 bit integer in BrightScript to help us do unsigned i64 operations. Unfortunately that means that our division algorithm needs to be implemented differently. I’m no math wizard, but ideally I’d find a fast solution that decomposes the i64 into two unsigned i32s (represented again as i64s) and then performs the division piece by piece. For example:
115 / 3 = 38.333... 100 / 3 + 10 / 3 + 5 / 3 = 38.333...
Decomposition of the numerator works in this way for floating point numbers, however it breaks down if you perform each division as an integer division:
100 / 3 = 33 10 / 3 = 3 5 / 3 = 1 33 + 3 + 1 = 37
This is where you would need to account for the remainder. Another problem is that the denominator cannot be decomposed the same way as the numerator:
100 / 11 = 9.0909... 100 / 10 + 100 / 1 = 110
I’m sure there’s a way to do this and that someone reading this article will point out how naive I’m being and how I should have paid attention in elementary school.
I ended up finding an algorithm to perform the division bit by bit in a loop. This is of course an incredibly slow algorithm, however we optimize it by first checking if both signed numbers are positive (meaning they would have the same value if converted to unsigned):
If numerator >= 0& And denominator >= 0& Return numerator / denominator
In practice this check gets hit very often as most programs I’ve tested that use unsigned i64 division aren’t using 64 bit numbers above 263 – 1 or
I will happily accept PRs that make this code path faster :)
In BrightScript you use the keyword’s
Not for operations with their built in Boolean type, and if you apply them to Integers you get bitwise operations instead. Think of them as being overloaded operators for Integer / LongInteger.
Unfortunately the have all the operators except XOR, which we need to implement WebAssembly’s i32.xor.
This is where remembering logic tables comes in handy. XOR is short for eXclusive-OR, and in layman’s term’s means “A or B, but not both”. The following is a truth table that describes XOR:
If we look at the top 3 entries, we see it looks exactly like the bitwise or. Now all we have to do is account for the bottom case where both bits are 1. It turns out this follows the English statement very well, we have the “A or B” and now to implement the “but not both”, or rather we could say “and not both”. By “both”, we mean if both are true, which is a bitwise and operation! So now we have “(A or B) and not (A and B)“, and sure enough that matches the table. There’s a more proper explanation out there probably using De Morgan’s laws but again I digress: I am a simple person.
And thus, here is the simple implementation:
Function I32Xor(a as Integer, b as Integer) as Integer Return (a Or b) And Not (a And b) End Function
And we’re done!
Floating Point Load / Store
As I mentioned before, we could only load and store individual bytes into the roByteArray. Unfortunately, BrightScript’s standard library has no way of extracting the bytes out of a float or composing a float from bytes. This means that we have to decompose the float using floating point and integer math alone.
A few of the incredibly helpful links that I used to verify my implementation came from the Wikipedia entry on single precision floating point, as well as these two online demos that let you play with the bits:
From the StackOverflow post I was able to glean the following approximate BrightScript implementation:
Function I32ReinterpretF32(value as Float) as Integer If value = FloatInf() Return &H7F800000 If value = -FloatInf() Return &HFF800000 If value = 3.4028234663852886e+38! Return &H7F7FFFFF If value = -3.4028234663852886e+38! Return &HFF7FFFFF If value = 1.17549435e-38! Return &H00800000 If value = -1.17549435e-38! Return &H80800000 If value = 1.17549421e-38! Return &H007FFFFF If value = -1.17549421e-38! Return &H807FFFFF If value = 1.401298464324817e-45! Return 1% If value = -1.401298464324817e-45! Return 2147483649% If value = 0 Then If IsNegativeZero(value) Return &H80000000 Return &H00000000 End If If IsNan(value) Return &HFFFFFFFF bytes = 0% If value <= -0.0! Then bytes = &H80000000 value = -value End If exponent = F32Floor(Log(value) / Log(2)) significand = ((value / (2 ^ exponent)) * (2 ^ 23)) exponent += 127 If exponent >= &HFF Then exponent = &HFF significand = 0 Else If exponent < 0 exponent = 0 End If exponentWhole% = exponent significandWhole% = significand bytes = bytes Or (exponentWhole% << 23) Return bytes Or (significandWhole% And Not (-1 << 23)) End Function
If you’re wondering why there are so many
If checks in the beginning with specific values, it’s because the method of conversion was incredibly close to the real result but not perfect. Even if the least significant bit was off, the tests would fail. This lead to a modification in the tests where float values were allowed to be off by an epsilon of
Moreover, we also convert the float value into a string and compare strings, and if Roku’s internal Float to String code rounded off insignificant digits then we would assume they were equal. This string comparison worked for both really tiny numbers and really large numbers.
You’ll also notice that when we check
IsNan we return
FFFFFFFF. There are in fact multiple bit patterns of NaN that are all well tested in the WebAssembly/testsuite however I could not come up with a way to check for each kind using what was available in BrightScript.
What this means in practice is that our floating point model is approximately correct, and only loses precision when you load or store the floats.
For conversion of bytes back into a float, I similarly found another StackOverflow answer that effectively details the reverse process:
Function F32ReinterpretI32(value as Integer) as Float b3 = value And &HFF b2 = (value >> 8) And &HFF b1 = (value >> 16) And &HFF b0 = (value >> 24) And &HFF signBit = (b0 And 1 << 7) >> 7 sign = (-1) ^ signBit exponent = (((b0 And 127) << 1) Or (b1 And (1 << 7)) >> 7) If exponent = 0 Return 0! mul = 2# ^ (exponent - 127 - 23) mantissa = b3 + b2 * (2 ^ (8 * 1)) + (b1 And 127) * (2 ^ (8 * 2)) + (2 ^ 23) If exponent = &HFF Then If mantissa = 0 Then Return sign * FloatInf() Else Return FloatNan() End If End If Return sign * mantissa * mul End Function
You may notice that both the instructions are reinterpret instructions from i32 rather than load / store. This is because f32.reinterpret_i32 and i32.reinterpret_f32 instructions exist in WebAssembly. Moreover, it was trivial to make
F32Store a composite of these with
Function F32Load(buffer as Object, index as Integer) as Float Return F32ReinterpretI32(I32Load(buffer, index)) End Function Function F32Store(buffer As Object, index As Integer, value As Float) I32Store(buffer, index, I32ReinterpretF32(value)) End Function
It should be noted that if BrightScript were to add the ability to load and store floats from the roByteArray it would entirely mitigate the issue of approximation and result in a huge performance win for reinterpreting floats!
Other Interesting Instructions
Most instructions were straightforward to implement. However some instructions such as
I32Clz which counts leading zeros, or
I64GtU which compares unsigned i64 values, required special implementations. You can check them out here:
Nicely enough, BrightScript has a built in square root Sqr which we could use to implement f32.sqrt. Unfortunately they don’t have a Double version for f64.sqrt, and my attempt to use the Float version for Doubles resulted in test failure with large and small numbers (obviously).
This meant that I ended up having to resort to Newton’s method to approximate the square root for f64 (see
F64Sqrt). This is another part of the reason that our test suite uses an epsilon for floating point comparisons.
This exact same issue exists with the BrightScript’s Log function. Log isn’t actually a WebAssembly instruction, but we use it for the reinterpreting of float and int. For f64 there was no 64 bit
Log, so we again used a loop to approximate it (see
With over 10,000 tests, getting to 100% tests passing with WebAssembly’s testsuite was a slog but well worth it. Without them, creating a trustworthy translation would be nearly impossible.
If you’d like to play around with this tool, head over to wasm2brs and give it a try!