Don’t be afraid to get your hands dirty.

If there’s one thing I’ve learned in this profession, it’s that sometimes you just need to go all in.

At the time of writing this post, an extensive search only turned up one article about code coverage and WebAssembly. From the looks of their approach they did not actually run the WebAssembly to get coverage, but rather they ran a native compiled version of tests that tested their code and got coverage from that. This is a very reasonable approach and probably the one I recommend, however we’re going to have our cake and eat it too!

The following tools were used in this post:
Emscripten 1.38.30
CMake 3.10.2
LLVM Tools 7.0.0
npm / node 10.15.3 (optional)

To compile our C++ samples you’ll need to setup Emscripten. Though Emscripten comes with clang 6, we used clang 7 because some of the features we utilized were not available in clang 6. This was tested on Ubuntu 18.04.2 LTS, and it would be lovely to hear if this works on Windows or Mac :)

For those that are unfamiliar with Emscripten, you will use emcc instead of clang. This command can take many of the flags you would normally pass to clang, but it also handles the full transition from C++ code to WebAssembly (wasm). Note that newer versions of clang can compile directly to wasm using --target=wasm32 (or 64) however, emcc handles binding to JavaScript, linking to a browser based version of the C and C++ run-times, and producing a bare-bones .html file that is easy to execute.
I started with a simple program to test coverage (note that we don’t call square):

emcc main.cpp -o main.html

#include <iostream>

int square(int a, int b) {
    return a * b;
}

int main(void) {
    std::cout << "hello world!" << std::endl;
    return 0;
}

By specifying -o main.html it will generate a page with a terminal along with the main.js and main.wasm files.

Unless you use the newer -s SINGLE_FILE=1, you cannot open the main.html directly because the browser file policy will not open the local main.wasm file (for your protection). You need to host an HTTP server of your choice that points at the local directory. I personally use http-server installed globally:

sudo npm install -g http-server

And I run it with -c-1 to disable caching:

http-server -c-1

After opening the browser and navigating to http://localhost:8080 we see hello world! printed in the terminal. The next step is to enable code coverage.

Following the steps we run into our first issue:

emcc -fprofile-instr-generate -fcoverage-mapping main.cpp -o main.html

error: undefined symbol: __llvm_profile_register_function
warning: To disable errors for undefined symbols use `-s ERROR_ON_UNDEFINED_SYMBOLS=0`
error: undefined symbol: __llvm_profile_register_names_function
error: undefined symbol: __llvm_profile_runtime
Error: Aborting compilation due to previous errors
shared:ERROR: '/home/trevor/coverage/emsdk/node/8.9.1_64bit/bin/node /home/trevor/coverage/emsdk/emscripten/1.38.30/src/compiler.js /tmp/tmpuigVck.txt /home/trevor/coverage/emsdk/emscripten/1.38.30/src/library_pthread_stub.js' failed (1)

If you were to compile using regular clang it handles linking in the necessary libraries needed for code coverage. Emscripten appears to be either not linking them in or entirely missing these libraries (not a surprise for how young Emscripten is). Giving our google-fu a little practice and we find that __llvm_profile_register_function is defined within the compiler-rt library, which is part of the LLVM tool-chain, which is typically included with clang:

We also find that Emscripten has it’s own copy of compiler-rt in the directory emsdk/emscripten/1.38.30/system/lib/compiler-rt/.

Upon further investigation it only includes the builtins library, however we care about the profile library. It appears they don’t compile it.

Roll up the sleeves.

I started by making a fork of compiler-rt because I knew I was going to be making some modifications. Note that we should probably be checkout out a change-set that is closest to when Emscripten forked compiler-rt, but the below steps work even with the latest. Luckily they have a CMakeLists.txt. The standard pipeline for using CMake with emscripten is to pass an Emscripten tool-chain file when calling cmake:

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=$EMSCRIPTEN/cmake/Modules/Platform/Emscripten.cmake ..

And we get:

CMake Warning at cmake/Modules/CompilerRTUtils.cmake:239 (message):
  llvm-config finding xray failed with status 1
Call Stack (most recent call first):
  CMakeLists.txt:69 (load_llvm_config)


CMake Warning at cmake/Modules/CompilerRTUtils.cmake:263 (message):
  testingsupport library not installed, some tests will be skipped
Call Stack (most recent call first):
  CMakeLists.txt:69 (load_llvm_config)


CMake Error at cmake/Modules/CompilerRTUtils.cmake:287 (include):
  include could not find load file:

    /home/trevor/coverage/emsdk/clang/lib/cmake/llvm/LLVMConfig.cmake
Call Stack (most recent call first):
  CMakeLists.txt:69 (load_llvm_config)


-- Found PythonInterp: /usr/bin/python2.7 (found version "2.7.15") 
CMake Error at cmake/Modules/CompilerRTUtils.cmake:311 (string):
  string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
  CMakeLists.txt:149 (construct_compiler_rt_default_triple)

For debugging you can print lines and variables using message("some value: ${variable}".

Focusing on the errors, we’ll take a look at line 287 in CompilerRTUtils.cmake and we can see include("${LLVM_CMAKE_PATH}/LLVMConfig.cmake").

We probably need to pull the rest of LLVM to get LLVMConfig.cmake, however in my tests it’s not needed to get the profile library built, so we replace it with this:

if (EXISTS "${LLVM_CMAKE_PATH}/LLVMConfig.cmake")
    include("${LLVM_CMAKE_PATH}/LLVMConfig.cmake")
endif()

The next error we’ll look at is line 311 in CompilerRTUtils.cmake where we see string(REPLACE "-" ";" TARGET_TRIPLE_LIST ${COMPILER_RT_DEFAULT_TARGET_TRIPLE}).

If we follow the variable COMPILER_RT_DEFAULT_TARGET_TRIPLE we find that by default it gets set to TARGET_TRIPLE. Note that if COMPILER_RT_DEFAULT_TARGET_ONLY was on (default is off) then would have been set to CMAKE_C_COMPILER_TARGET (which is empty under Emscripten).

Turns out TARGET_TRIPLE was supposed to be defined by LLVMConfig.cmake. For now we’re going to set COMPILER_RT_DEFAULT_TARGET_ONLY as well as CMAKE_C_COMPILER_TARGET. This approach ends up simplifying a future step where we try to enable the profile library (if you don’t believe me, try for yourself).

cmake\
    -DCMAKE_TOOLCHAIN_FILE=$EMSCRIPTEN/cmake/Modules/Platform/Emscripten.cmake\
    -DCMAKE_C_COMPILER_TARGET=asmjs-unknown-emscripten\
    -DCOMPILER_RT_DEFAULT_TARGET_ONLY=1\
    ..

Where did we get asmjs-unknown-emscripten? From here! This is typically passed as --target=... to vanilla clang and is known as a triple (even though it sometimes has more than 3 parts). Truthfully, I didn’t just stumble upon this code. Once I got to the linking step as we’ll talk about below, I received the warning:

Linking two modules of different target triples: x is 'wasm32' whereas y is 'asmjs-unknown-emscripten'

This is because I had passed in -DCMAKE_C_COMPILER_TARGET=wasm32, but now I know the exact triples emscripten uses. Interestingly, we can also set the environment variable EMCC_WASM_BACKEND=1 to use the pipeline that where clang directly outputs wasm, however at the time of writing that pipeline is still experimental. Note that without that set, from my understanding Emscripten will generate an LLVM Bitcode (.bc) file first, then convert it to asmjs, then run asm2wasm to convert it to wasm.

Now all the errors are gone and CMake fully generates. We can now build:

make

Unfortunately if you look at the output it basically never built any C or C++ files…

[  6%] Copying compiler-rt's sanitizer/allocator_interface.h...
[ 12%] Copying compiler-rt's sanitizer/asan_interface.h...
[ 18%] Copying compiler-rt's sanitizer/common_interface_defs.h...
[ 25%] Copying compiler-rt's sanitizer/coverage_interface.h...
[ 31%] Copying compiler-rt's sanitizer/dfsan_interface.h...
[ 37%] Copying compiler-rt's sanitizer/hwasan_interface.h...
[ 43%] Copying compiler-rt's sanitizer/linux_syscall_hooks.h...
[ 50%] Copying compiler-rt's sanitizer/lsan_interface.h...
[ 56%] Copying compiler-rt's sanitizer/msan_interface.h...
[ 62%] Copying compiler-rt's sanitizer/netbsd_syscall_hooks.h...
[ 68%] Copying compiler-rt's sanitizer/scudo_interface.h...
[ 75%] Copying compiler-rt's sanitizer/tsan_interface.h...
[ 81%] Copying compiler-rt's sanitizer/tsan_interface_atomic.h...
[ 87%] Copying compiler-rt's xray/xray_interface.h...
[ 93%] Copying compiler-rt's xray/xray_log_interface.h...
[100%] Copying compiler-rt's xray/xray_records.h...
[100%] Built target compiler-rt-headers
Scanning dependencies of target builtins
[100%] Built target builtins
Scanning dependencies of target compiler-rt
[100%] Built target compiler-rt

After some investigation, I realized that the lib/CMakeLists.txt has checks for whether it should build profile:

if(COMPILER_RT_BUILD_PROFILE AND COMPILER_RT_HAS_PROFILE)
    compiler_rt_build_runtime(profile)
endif()

It would be convenient if COMPILER_RT_BUILD_PROFILE was just off… but sadly that’s not the case. The default is ON. After tracking down where COMPILER_RT_HAS_PROFILE gets set, we discover it checks both PROFILE_SUPPORTED_ARCH as well as the name of the current operating system!

Under Emscripten, the operating system name is… Emscripten! Surprise! Easy fix, we just add |Emscripten to the end. Tracking down PROFILE_SUPPORTED_ARCH we find that it gets initialized from ALL_PROFILE_SUPPORTED_ARCH (or through a more complicated path if APPLE is defined). ALL_PROFILE_SUPPORTED_ARCH is set to a list of supported architectures, easy fix! To support all Emscripten modes, we’re going to add ASMJS (needs to be manually defined above), WASM32, and WASM64:

set(ALL_PROFILE_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC64}
    ${MIPS32} ${MIPS64} ${S390X} ${WASM32} ${WASM64} ${ASMJS})
Scanning dependencies of target clang_rt.profile-asmjs
[  3%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/GCDAProfiling.c.o
[  6%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfiling.c.o
[  9%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingValue.c.o
[ 12%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingBuffer.c.o
[ 15%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingFile.c.o
[ 18%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingMerge.c.o
[ 21%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingMergeFile.c.o
[ 24%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingNameVar.c.o
[ 27%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingWriter.c.o
[ 30%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingPlatformDarwin.c.o
[ 33%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingPlatformFuchsia.c.o
[ 36%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingPlatformLinux.c.o
[ 39%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingPlatformOther.c.o
[ 42%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingPlatformWindows.c.o
[ 45%] Building CXX object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingRuntime.cc.o
[ 48%] Building C object lib/profile/CMakeFiles/clang_rt.profile-asmjs.dir/InstrProfilingUtil.c.o
[ 51%] Linking CXX static library ../emscripten/libclang_rt.profile-asmjs.a
[ 51%] Built target clang_rt.profile-asmjs
Scanning dependencies of target profile
[ 51%] Built target profile

HUZZAH! It built!

Know that I almost always recommend building a project’s CMakeLists.txt if it is provided as this is the blessed path. However, in this one instance it turns out you can just pass all the .c and .cc files inside of lib/profile/ into emcc and build. This does generate some errors, however after careful examination you can define COMPILER_RT_HAS_UNAME, COMPILER_RT_HAS_ATOMICS, as well as COMPILER_RT_HAS_FCNTL_LCK. Normally these defines are generated via the CMakeLists.txt, but since we know we’re compiling on Emscripten we can manually compile those samples (or just try it and see if it works).

Now to revisit our original linker errors, we can link against libclang_rt.profile-asmjs.a:

emcc -fprofile-instr-generate -fcoverage-mapping main.cpp ./compiler-rt/build/lib/emscripten/libclang_rt.profile-asmjs.a -o main.html

Success! It built and linked, and __llvm_profile_register_function is no longer missing. Now when we open our page up again we still see hello world! but that’s it. If we look in our console you’ll see an error:

atexit() called, but EXIT_RUNTIME is not set, so atexits() will not be called. set EXIT_RUNTIME to 1

As per clang’s documentation the code coverage file is written in when the application exits. We can confirm this within the compiler-rt library. We could set -s EXIT_RUNTIME=1 for emcc, however Emscripten uses a virtual file system because the browser does not have access to your real file system. This means that even if Emscripten were to write the file out, it would get written to memory and lost in the void when the page closed. Even if we were to persist the file using Emscripten’s FS.syncfs, at best we could open it on the next run, but we’d still need to download it to the local machine.

Instead we’re going to manually call the method that dumps the file and read the file ourselves. Starting at the atexit handler and following function calls from there, we come across __llvm_profile_write_file:

#include <iostream>
#include <string>
#include <fstream>
#include <streambuf>

extern "C" int __llvm_profile_write_file(void);

int square(int a, int b) {
    return a * b;
}

int main(void) {
    std::cout << "hello world!" << std::endl;

    __llvm_profile_write_file();

    std::ifstream stream("default.profraw");
    std::string str((std::istreambuf_iterator<char>(stream)),
                    std::istreambuf_iterator<char>());

    std::cout << str;
    return 0;
}

Now when you run the page, you should see hello world! followed by a spew of text. The first letters should at least contain the letters rpl and some other Unicode babble which is indicative of the header for the profraw file format.

Using this JavaScript trick, we can download the file. Be sure to grab download.js. We can modify our code to call the download function:

#include <iostream>
#include <string>
#include <fstream>
#include <streambuf>
#include <emscripten.h>

extern "C" int __llvm_profile_write_file(void);

int square(int a, int b) {
    return a * b;
}

int main(void) {
    std::cout << "hello world!" << std::endl;

    __llvm_profile_write_file();

    const char* filename = "default.profraw";
    std::ifstream stream(filename);
    std::string str((std::istreambuf_iterator<char>(stream)),
                    std::istreambuf_iterator<char>());

    EM_ASM_({ window.download($0, $1, $2) }, filename, str.c_str(), str.size());
    std::cout << str;
    return 0;
}

emcc -pre-js download.js -fprofile-instr-generate -fcoverage-mapping main.cpp ./compiler-rt/build/lib/emscripten/libclang_rt.profile-asmjs.a -o main.html

After running it, I get a popup and it indeed downloaded the default.profraw file! Now we can run the rest of the clang profiling instructions.

llvm-profdata-7 merge -sparse default.profraw -o default.profdata
llvm-cov-7 show ./main.wasm -instr-profile=default.profdata

Now we’ve run into a snag again. Normally we pass in the compiled executable, however in this case we only have the wasm, js, and html file as our output. The wasm is the closest thing to a library or executable (it defines symbols, sections, etc), but instead we get this when running llvm-cov-7:

error: ./main.wasm: Failed to load coverage: No coverage data found

If you try and pass either the main.js or main.html you’ll see a different message:

error: ./main.js: Failed to load coverage: The file was not recognized as a valid object file

This tells us quite a bit actually: llvm-cov-7 is capable of loading the wasm file, but for some reason the coverage sections are missing. After some sleuthing we find which sections they are attempting to load, and the identifiers for the sections. This ends up being something like __llvm_covmap. We can use llvm-objdump-7 to see the sections:

llvm-objdump-7 -section-headers main.wasm

main.wasm:	file format WASM

Sections:
Idx Name          Size      Address          Type
  0 TYPE          000001c5 0000000000000000 
  1 IMPORT        0000056a 0000000000000000 
  2 FUNCTION      0000042e 0000000000000000 
  3 GLOBAL        00000063 0000000000000000 
  4 EXPORT        000002db 0000000000000000 
  5 ELEM          000044c8 0000000000000000 
  6 CODE          00058f6c 0000000000000000 TEXT 
  7 DATA          00007d28 0000000000000000 DATA 

Welp. None of those are coverage sections… :(

Somehow the coverage sections are getting lost when the final wasm file is output. I initially thought this was a case of lost in translation inside Emscripten (it compiles to LLVM bitcode, converts to asmjs, then converts to wasm). However, I ran a test using clang 8 and output directly to wasm (no Emscripten) and sure enough the coverage sections were missing there too. At this point, I could attempt to debug their pipeline to figure out where it’s getting dropped.

I have a dumb idea.

What if we compiled our program into a native static library and ran llvm-cov-7 against that, using the default.profraw file generated from the web. The generated static library would be missing a ton of symbols since it would try to link to Emscripten functions that wouldn’t exist, but we don’t care! We’re really just using the static library a vessel for symbols and the code coverage section. This may be entirely flawed, but I’m willing.

We can achieve this crazy idea by first asking emcc to output an LLVM bitcode .bc file:

emcc -fprofile-instr-generate -fcoverage-mapping main.cpp./compiler-rt/build/lib/emscripten/libclang_rt.profile-asmjs.a -o main.bc

The bitcode file contains raw LLVM IR. We can use regular clang to load that bitcode file and turn it into a native static library:

clang-7 main.bc -c

We pass -c to indicate that we’re not trying to link into an executable; if we don’t we get a mess of linker errors. This does however issue a warning:

overriding the module target triple with x86_64-unknown-linux-gnu

This makes sense since part of the LLVM bitcode file includes the original triple we passed in, and since we didn’t specify a triple in the clang command line it assumes that we’re compiling against the platform that we’re currently on. You can run llvm-dis-7 to disassemble the main.bc file into a human readable main.ll text file, and at the top you can see the triple.

We can now inspect our newly generated main.o:

llvm-objdump-7 -section-headers main.o

And now we find our diamond in the rough:

321 __llvm_covmap 00000070 0000000000000000

We can now compile our main.bc file into the wasm file:

emcc --pre-js download.js main.bc -o main.html

And once we have our default.profraw and then default.profdata, we can run:

llvm-cov-7 show ./main.o -instr-profile=default.profdata

-   1|       |#include <iostream>
    2|       |#include <string>
    3|       |#include <fstream>
    4|       |#include <streambuf>
    5|       |#include <emscripten.h>
    6|       |
    7|       |extern "C" int __llvm_profile_write_file(void);
    8|       |
    9|      0|int square(int a, int b) {
   10|      0|    return a * b;
   11|      0|}
   12|       |
   13|      1|int main(void) {
   14|      1|    std::cout << "hello world!" << std::endl;
   15|      1|
   16|      1|    __llvm_profile_write_file();
   17|      1|
   18|      1|    std::ifstream stream("default.profraw");
   19|      1|    std::string str((std::istreambuf_iterator<char>(stream)),
   20|      1|                    std::istreambuf_iterator<char>());
   21|      1|
   22|      1|    EM_ASM_({ window.download('default.profraw', $0, $1) }, str.c_str(), str.size());
   23|      1|    std::cout << str;
   24|      1|    return 0;
   25|      1|}

It worked! We can also see that the square function was never called (the 0 next to it is the number of times it was called). In this case it was pretty obvious that we didn’t call square and static analysis could have found that easily. However, as your project grows in complexity and you begin to include third party code, that is when this starts to become valuable. You can use this technique to find entire files that have no code executed within them, but aren’t removed by the linker because they are still reachable through some run-time code path.

Now if you’ll excuse me, I need to wash my hands.