Qemu.js with JIT support: stuffing can still be turned back

A few years ago, Fabrice Bellard wrote jslinux is a PC emulator written in JavaScript. After that there was at least Virtual x86. But all of them, as far as I know, were interpreters, while Qemu, written much earlier by the same Fabrice Bellard, and, probably, any self-respecting modern emulator, uses JIT compilation of the guest code into the host system code. It seemed to me that it's time to implement the inverse task in relation to the one that browsers solve: JIT compilation of machine code into JavaScript, for which it was most logical to port Qemu. It would seem, why exactly Qemu, there are simpler and user-friendly emulators - the same VirtualBox, for example - installed and works. But Qemu has some interesting features.

  • open source
  • the ability to work without a kernel driver
  • the ability to work in interpreter mode
  • support for a large number of both host and guest architectures

At the expense of the third point, now I can already explain that in fact, in TCI mode, not the guest machine instructions themselves are interpreted, but the bytecode obtained from them, but this does not change the essence - in order to build and run Qemu on a new architecture, if you're lucky, a C compiler is enough - writing a code generator can be postponed.

And now, after two years of leisurely picking the Qemu source codes in my free time, a working prototype appeared, in which you can already run, for example, Kolibri OS.

What is Emscript

Nowadays, there are many compilers, the end result of which is JavaScript. Some, like Type Script, were originally conceived as the best way to write for the web. At the same time, Emscripten is a way to take existing C or C++ code and compile it into a form that the browser can understand. On this page collected many ports of famous programs: here, for example, you can look at PyPy - by the way, as stated, they already have a JIT. In fact, not every program can be simply compiled and run in a browser - there are a number of features, which you have to put up with, however, as the inscription on the same page says "Emscripten can be used to compile almost any portable C/C++ code to JavaScript". That is, there are a number of operations that are undefined behavior by the standard, but usually work on x86 - for example, unaligned access to variables, which is generally prohibited on some architectures. In general, Qemu is a cross-platform program and , I wanted to believe, and so it does not contain a lot of undefined behavior - take it and compile, then tinker with JIT a little - and you're done!

First try

Generally speaking, I'm not the first to come up with the idea of ​​porting Qemu to JavaScript. The ReactOS forum asked if this is possible with Emscripten. Even earlier there were rumors that Fabrice Bellard did it personally, but it was about jslinux, which, as far as I know, is just an attempt to manually achieve sufficient performance in JS, and was written from scratch. Later, Virtual x86 was written - unfussed sources were posted to it, and, as stated, the greater "realism" of the emulation made it possible to use SeaBIOS as firmware. In addition, there was at least one attempt to port Qemu using Emscripten - I tried to do this socketpair, but the development, as far as I understand, was frozen.

So, it would seem, here are the source codes, here is Emscripten - take it and compile it. But there are also libraries that Qemu depends on, and libraries that those libraries depend on, etc., and one of them is libffi, which glib depends on. There were rumors on the Internet that in a large collection of ports of libraries for Emscripten there is also it, but somehow it was hard to believe: firstly, it was not going to be built with a new compiler, secondly, it is too low-level library to just take it, and compile to JS. And it's not even just about assembler inserts - probably, if you pervert, then for some calling conventions you can form the necessary arguments on the stack and call the function without them. Here are just Emscripten - a tricky contraption: in order for the generated code to look familiar to the browser's JS engine optimizer, some tricks are used. In particular, the so-called relooping - a code generator using the received LLVM IR with some abstract transition instructions tries to recreate plausible ifs, loops, etc. How are arguments passed to functions? Naturally, as arguments to JS functions, that is, if possible, not through the stack.

At the beginning, the idea was to simply write a replacement for libffi with JS and run regular tests, but in the end I got confused about how to make my header files so that they work with existing code - what can I do, as they say, "Are the tasks so complicated Are we that stupid? I had to port libffi to another architecture, so to speak - fortunately, Emscripten has both macros for inline assembly (in javascript, yeah - well, what architecture, such an assembler), and the ability to run code generated on the go. In general, after fiddling with the platform-specific libffi fragments for a while, I got some compiling code, and ran it on the first test that came across. To my surprise, the test was successful. Ofigev from his genius - no joke, it worked from the first launch - I, still not believing my eyes, climbed to look at the resulting code again, to evaluate where to dig further. Here I went nuts for the second time - the only thing my function did was ffi_call — it reported a successful call. There was no actual call. So I sent my first pull request, which corrected a mistake in the test that any Olympiad can understand - real numbers should not be compared as a == b and even how a - b < EPS - we must also not forget the module, otherwise 0 will turn out to be very much equal to 1/3 ... In general, I got a certain port of libffi, which passes the simplest tests, and with which glib is compiled - I decided it would be necessary, then I will add it. Looking ahead, I’ll say that, as it turned out, the compiler didn’t even include it in the final code of the libffi function.

But, as I already said, there are some limitations, and among the free use of a variety of undefined behavior, a nastier feature has crept in - JavaScript by design does not support multithreading with shared memory. In principle, this can usually even be called a good idea, but not for porting code whose architecture is tied to sish streams. Generally speaking, Firefox is experimenting with supporting shared workers, and Emscripten has a pthread implementation for them, but I didn’t want to depend on it. I had to slowly root out multithreading from the Qemu code - that is, find out where the threads are launched, move the body of the loop running in this thread into a separate function, and call such functions one by one from the main loop.

Second attempt

At some point, it became clear that things were still there, and that unsystematic stuffing of crutches according to the code would not lead to good. Conclusion: it is necessary to somehow systematize the process of adding crutches. Therefore, version 2.4.1, which was fresh at that time, was taken (not 2.5.0, because, you never know, there will be bugs of the new version that have not yet been caught, but my bugs are enough for me), and the first thing was safely rewritten thread-posix.c. Well, that is, how safe: if someone tried to perform an operation that led to blocking, the function was immediately called abort() - of course, this did not solve all the problems at once, but at least it is somehow more pleasant than quietly receiving data inconsistency.

In general, Emscripten options help a lot in porting code to JS -s ASSERTIONS=1 -s SAFE_HEAP=1 - they catch some kinds of undefined behavior like accesses to an unaligned address (which is not at all consistent with the code for typed arrays like HEAP32[addr >> 2] = 1) or a function call with the wrong number of arguments.

By the way, alignment errors are a separate issue. As I already said, Qemu has a "degenerate" TCI (tiny code interpreter) interpreting code generation backend, and if you're lucky, a C compiler is enough to build and run Qemu on a new architecture. Keywords "if you're lucky". I was unlucky, and it turned out that TCI uses unaligned access when parsing its bytecode. That is, on all sorts of ARM and other architectures with necessarily aligned access, Qemu compiles because they have a normal TCG backend that generates native code, but whether TCI will work on them is another question. However, as it turned out, something similar was clearly indicated in the TCI documentation. As a result, function calls for unaligned reading, which were found in another part of Qemu, were added to the code.

Heap destruction

As a result, the unaligned access in TCI was fixed, a main loop was made that called the processor, RCU, and something in turn in turn. And here I am running Qemu with the option -d exec,in_asm,out_asm, which means that you need to say which blocks of code are being executed, and also at the time of translation, write what guest code was, what host code has become (in this case, bytecode). It starts, executes several translation blocks, writes the debug message I left that RCU will now start and ... crashes abort() inside a function free(). By picking a function free() managed to find out that in the header of the heap block, which lies in the eight bytes preceding the allocated memory, instead of the block size or something like that, there was garbage.

Destruction of the heap - how nice ... In such a case, there is a useful tool - from (if possible) the same sources, build a native binary and run it under Valgrind. After some time, the binary was ready. I launch it with the same options - it still crashes on initialization, without reaching, in fact, execution. It’s unpleasant, of course - you see, the sources were not exactly the same, which is not surprising, because configure scouted out a few other options, but I have Valgrind - first I’ll fix this bug, and then, if I’m lucky, the original one will appear. I'm running the same thing under Valgrind ... Y-y-y, uh-uh, uh-uh, it started, initialized normally and moved on past the original bug without a single warning about incorrect memory access, not to mention about falls. As they say, life didn’t prepare me for this - a crashing program stops crashing when launched under Valgrind. What it was is a mystery. My hypothesis is that once in the vicinity of the current instruction, after a crash during initialization, gdb showed work memset-a with a valid pointer using either mmx, whether xmm registers, then perhaps it was some kind of alignment error, although it is still hard to believe.

Okay, Valgrind doesn't seem to be helping here. And here the most nasty thing began - everything, it seems, even starts, but falls for absolutely unknown reasons due to an event that could have happened millions of instructions ago. For a long time, it was not clear how to even step up. In the end, I still had to sit down and debug. Printing what the header was rewritten with showed that it does not look like a number, but rather some kind of binary data. And, lo and behold, this binary string was found in the file with the BIOS - that is, now it was possible to say with sufficient certainty that it was a buffer overflow, and it was even clear what was written to this buffer. Well, then something like this - in Emscripten, fortunately, there is no randomization of the address space, there are no holes in it either, so you can write somewhere in the middle of the code output data by pointer from the last run, look at the data, look at the pointer, and , if that has not changed, get information for reflection. True, a couple of minutes are spent on linking after any change, but what can you do. As a result, a specific line was found that copies the BIOS from the temporary buffer to the guest memory - and, indeed, there was not enough space in the buffer. Finding the source of that strange buffer address resulted in a function qemu_anon_ram_alloc in file oslib-posix.c - the logic there was as follows: sometimes it can be useful to align the address with a huge page of 2 MB in size, for this we will ask mmap first a little more, and then return the excess with the help of munmap. And if such alignment is not required, then instead of 2 MB we indicate the result getpagesize()mmap but it will still give an aligned address ... So in Emscripten mmap just calls malloc, and that, of course, does not align on the page. In general, a bug that upset me for a couple of months was corrected by a change in two lines.

Features of calling functions

And now the processor is counting something, Qemu does not crash, but the screen does not turn on, and the processor quickly loops, judging by the output -d exec,in_asm,out_asm. There was a hypothesis: timer interrupts do not come (well, or in general, all interrupts). And indeed, if interrupts are unscrewed from the native assembly, which for some reason worked, a similar picture is obtained. But this was not the answer at all: a comparison of the traces issued with the above option showed that the execution paths diverge very early. Here it must be said that the comparison of what was recorded using the launcher emrun debug output with native build output is not exactly a mechanical process. I don't know exactly how the program running in the browser connects to emrun, but some lines in the output turn out to be rearranged, so the difference in the diff is not a reason to believe that the trajectories have diverged. In general, it became clear that according to the instructions ljmpl there is a transition to different addresses, and the bytecode is generated fundamentally different: in one there is an instruction to call the helper function, in the other it is not. After googling the instruction and studying the code that translates these instructions, it became clear that, firstly, directly before it in the register cr0 a record was made - also with the help of a helper -, transferring the processor to protected mode, and secondly, that the js version did not switch to protected mode. But the fact is that another feature of Emscripten is the unwillingness to tolerate code like the implementation of an instruction call in TCI that any function pointer casts to a type long long f(int arg0, .. int arg9) - Functions must be called with the correct number of arguments. If this rule is violated, depending on the debugging settings, the program will either crash (which is good), or call the wrong function at all (which will be sad to debug). There is also a third option - to enable the generation of wrappers that add / throw out arguments, but in total these wrappers take up a lot of space, despite the fact that in fact I only need a little more than a hundred wrappers. This alone is very sad, but it turned out to be a more serious problem: in the generated code of the wrapper functions, the arguments were converted-converted, only the function with the generated arguments was sometimes not called - well, just like in my implementation of libffi. That is, some helpers were simply not executed.

Luckily, Qemu has machine-readable lists of helpers in the form of a header file like

DEF_HELPER_0(lock, void)
DEF_HELPER_0(unlock, void)
DEF_HELPER_3(write_eflags, void, env, tl, i32)

They are used quite funny: first, macros are redefined in the most bizarre way DEF_HELPER_n, and then include helper.h. Up to the fact that the macro is expanded into a structure initializer and a comma, and then an array is defined, and instead of elements - #include <helper.h> As a result, I finally got a reason to try the library pyparsing, and a script was written that generates wrappers for exactly those and exactly for those functions for which you need.

And now, after that, the processor seems to have earned. It seems to be because the screen was never initialized, although memtest86+ was able to run in the native assembly. Here it is necessary to clarify that Qemu's block I/O code is written in coroutines. Emscripten has its own very tricky implementation, but it still needed to be supported in the Qemu code, and you can debug the processor now: Qemu supports options -kernel, -initrd, -append, with which you can boot Linux or, for example, memtest86+, without using block devices at all. But here's the problem: in the native assembly, one could observe the output of the Linux kernel to the console with the option -nographic, and no output from the browser to the terminal from which it was launched emrun, did not come. That is, it is not clear: the processor is not working or graphics output. And then it occurred to me to wait a bit. It turned out that "the processor is not sleeping, but just blinking slowly", and after about five minutes the kernel threw a bunch of messages onto the console and went on hanging. It became clear that the processor, in general, works, and you need to dig into the code for working with SDL2. Unfortunately, I don’t know how to use this library, so in some places I had to act at random. At some point, the line parallel0 flashed on the screen on a blue background, which led to some thoughts. As a result, it turned out that the point was that Qemu opens several virtual windows in one physical window, between which you can switch using Ctrl-Alt-n: it works in the native assembly, but not in Emscripten. After getting rid of unnecessary windows using options -monitor none -parallel none -serial none and instructions to force the entire screen to be redrawn on every frame all of a sudden worked.

Coroutines

So, emulation in the browser works, but nothing interesting on a single disk can be launched in it, because there is no block I / O - you need to implement support for coroutines. Qemu already has several coroutine backends, but due to the peculiarities of JavaScript and the Emscripten code generator, you can’t just start juggling stacks. It would seem that “everything is gone, the plaster is being removed”, but the Emscripten developers have already taken care of everything. This is implemented quite funny: let's call a function call like emscripten_sleep and several others using the Asyncify mechanism, as well as pointer calls and calls to any function where one of the previous two cases can occur down the stack. And now, before each suspicious call, we will select an async context, and immediately after the call, we will check if an asynchronous call has occurred, and if it has, we will save all local variables in this async context, indicate which function to transfer control to when it will be necessary to continue execution , and exit the current function. That's where the scope for studying the effect squandering - for the needs of continuing code execution after returning from an asynchronous call, the compiler generates "stubs" of the function that begin after a suspicious call - like this: if there are n suspicious calls, then the function will be squandered somewhere n / 2 times - that's still, if not keep in mind that after each potentially asynchronous call, you need to add the preservation of a part of local variables to the original function. Subsequently, I even had to write a simple Python script, which, according to a given set of especially squandered functions, which, presumably, "do not allow asynchrony to pass through themselves" (that is, stack unwinding and everything that I just described does not work in them), indicates, calls through pointers in which functions should be ignored by the compiler so that these functions are not considered as asynchronous. And then JS-files under 60 MB are already overkill - even if at least 30. Although, once I set up a build script, and accidentally threw out the linker options, among which was -O3. I run the generated code, and Chromium eats up memory and crashes. I then accidentally looked at what he was trying to download ... Well, what can I say, I would also freeze if I were asked to thoughtfully study and optimize a 500+ MB javascript.

Unfortunately, the checks in the Asyncify support library code weren't quite friendly with longjmp-ami, which are used in the virtual processor code, but after a small patch that disables these checks and forcibly restores contexts as if everything is fine, the code worked. And then a strange thing began: sometimes checks in the synchronization code worked - the very ones that crash the code if, according to the execution logic, it should block - someone tried to capture an already captured mutex. Fortunately, this turned out to be not a logical problem in the serialized code - I just used the regular main loop functionality provided by Emscripten, but sometimes the asynchronous call completely unrolled the stack, and at that moment it worked setTimeout from the main loop - thus, the code entered the iteration of the main loop without exiting the previous iteration. Rewrote on an infinite loop and emscripten_sleep, and the mutex problems stopped. The code has become even more logical - after all, in fact, I don’t have some code that prepares the next animation frame - it’s just that the processor calculates something and the screen is periodically updated. However, the problems did not stop there: sometimes the execution of Qemu simply ended silently without any exceptions or errors. At that moment, I gave up on it, but, looking ahead, I’ll say that the problem was this: the coroutine code, in fact, does not use setTimeout (well, or at least not as often as you might think): function emscripten_yield just exposes the asynchronous call flag. All the salt is that emscripten_coroutine_next is not an asynchronous function: internally, it checks the flag, resets it, and transfers control to the right place. That is, the unwinding of the stack ends there. The problem was that due to the use-after-free that appeared when the coroutine pool was disabled due to the fact that I did not copy an important line of code from the existing coroutine backend, the function qemu_in_coroutine returned true when it should have returned false. This led to the call emscripten_yield, above which there was no stack emscripten_coroutine_next, the stack expanded all the way to the top, but no setTimeout, as I said, was not exhibited.

JavaScript code generation

And here, in fact, is the promised "turning the stuffing back." Not really. Of course, if you run Qemu in the browser, and Node.js in it, then, naturally, after code generation in Qemu, we will get completely different JavaScript. But still, what-no, but the reverse transformation.

First, a little about how Qemu works. Please forgive me right away: I am not a professional Qemu developer and my conclusions may be erroneous in places. As they say, "the opinion of the student does not have to coincide with the opinion of the teacher, Peano's axiomatics and common sense." Qemu has a number of supported guest architectures and each has a directory like target-i386. When building, you can specify support for multiple guest architectures, but the result will be just a few binaries. The code to support the guest architecture, in turn, generates some internal Qemu operations, which TCG (Tiny Code Generator) already turns into host architecture machine code. As stated in the readme file in the tcg directory, it was originally part of the regular C compiler, which was then adapted for JIT. Therefore, for example, target architecture in terms of this document is no longer a guest, but a host architecture. At some point, another component appeared - Tiny Code Interpreter (TCI), which should execute code (almost the same internal operations) in the absence of a code generator for a specific host architecture. In fact, as its documentation says, this interpreter may not always work as well as a JIT code generator, not only quantitatively in terms of speed, but also qualitatively. Although I'm not sure that his description is completely up to date.

At first I tried to make a full-fledged TCG backend, but quickly got confused in the source codes and the not entirely clear description of the bytecode instructions, so I decided to wrap the TCI interpreter. This gave several advantages at once:

  • when implementing a code generator, it was possible to look not at the description of instructions, but at the interpreter code
  • it is possible to generate functions not for each encountered translation block, but, for example, only after the hundredth execution
  • in case of a change in the generated code (and this, apparently, is possible, judging by the functions with names containing the word patch), I will need to invalidate the generated JS code, but at least I will have something to regenerate it from

As for the third point, I'm not sure that patching is possible after the code is executed for the first time, but the first two points are enough.

Initially, the code was generated in the form of a big switch at the address of the original bytecode instruction, but then, remembering the article about Emscripten, optimization of generated JS and relooping, I decided to generate more human code, especially since empirically it turned out that the only entry point into the translation block is its Start. No sooner said than done, after a while we got a code generator that generates code with ifs (albeit without loops). But here's the bad luck, he fell, giving out a message that the instruction turned out to be some wrong length. In this case, the last instruction at this level of recursion was brcond. Well, I’ll add an identical check to the generation of this instruction before and after the recursive call, and ... not one of them was executed, but after the switch, the assert still fell. In the end, after examining the generated code, I realized that after the switch, the pointer to the current instruction is reloaded from the stack and is probably frayed by the generated JavaScript code. And so it turned out. Increasing the buffer from one megabyte to ten did not lead to anything, and it became clear that the code generator was running in circles. We had to check that we did not go beyond the boundaries of the current TB, and if we did, then give the address of the next TB with a minus sign so that we could continue execution. In addition, this solves the problem "which generated functions to invalidate if this piece of bytecode has changed?" - you need to invalidate only the function that corresponds to this translation block. By the way, although I debugged everything in Chromium (since I use Firefox and it’s easier for me to use a separate browser for experiments), Firefox helped me fix incompatibilities with the asm.js standard, after which the code began to work faster in Chromium.

Generated code example

Compiling 0x15b46d0:
CompiledTB[0x015b46d0] = function(stdlib, ffi, heap) {
"use asm";
var HEAP8 = new stdlib.Int8Array(heap);
var HEAP16 = new stdlib.Int16Array(heap);
var HEAP32 = new stdlib.Int32Array(heap);
var HEAPU8 = new stdlib.Uint8Array(heap);
var HEAPU16 = new stdlib.Uint16Array(heap);
var HEAPU32 = new stdlib.Uint32Array(heap);

var dynCall_iiiiiiiiiii = ffi.dynCall_iiiiiiiiiii;
var getTempRet0 = ffi.getTempRet0;
var badAlignment = ffi.badAlignment;
var _i64Add = ffi._i64Add;
var _i64Subtract = ffi._i64Subtract;
var Math_imul = ffi.Math_imul;
var _mul_unsigned_long_long = ffi._mul_unsigned_long_long;
var execute_if_compiled = ffi.execute_if_compiled;
var getThrew = ffi.getThrew;
var abort = ffi.abort;
var qemu_ld_ub = ffi.qemu_ld_ub;
var qemu_ld_leuw = ffi.qemu_ld_leuw;
var qemu_ld_leul = ffi.qemu_ld_leul;
var qemu_ld_beuw = ffi.qemu_ld_beuw;
var qemu_ld_beul = ffi.qemu_ld_beul;
var qemu_ld_beq = ffi.qemu_ld_beq;
var qemu_ld_leq = ffi.qemu_ld_leq;
var qemu_st_b = ffi.qemu_st_b;
var qemu_st_lew = ffi.qemu_st_lew;
var qemu_st_lel = ffi.qemu_st_lel;
var qemu_st_bew = ffi.qemu_st_bew;
var qemu_st_bel = ffi.qemu_st_bel;
var qemu_st_leq = ffi.qemu_st_leq;
var qemu_st_beq = ffi.qemu_st_beq;

function tb_fun(tb_ptr, env, sp_value, depth) {
  tb_ptr = tb_ptr|0;
  env = env|0;
  sp_value = sp_value|0;
  depth = depth|0;
  var u0 = 0, u1 = 0, u2 = 0, u3 = 0, result = 0;
  var r0 = 0, r1 = 0, r2 = 0, r3 = 0, r4 = 0, r5 = 0, r6 = 0, r7 = 0, r8 = 0, r9 = 0;
  var r10 = 0, r11 = 0, r12 = 0, r13 = 0, r14 = 0, r15 = 0, r16 = 0, r17 = 0, r18 = 0, r19 = 0;
  var r20 = 0, r21 = 0, r22 = 0, r23 = 0, r24 = 0, r25 = 0, r26 = 0, r27 = 0, r28 = 0, r29 = 0;
  var r30 = 0, r31 = 0, r41 = 0, r42 = 0, r43 = 0, r44 = 0;
    r14 = env|0;
    r15 = sp_value|0;
  START: do {
    r0 = HEAPU32[((r14 + (-4))|0) >> 2] | 0;
    r42 = 0;
    result = ((r0|0) != (r42|0))|0;
    HEAPU32[1445307] = r0;
    HEAPU32[1445321] = r14;
    if(result|0) {
    HEAPU32[1445322] = r15;
    return 0x0345bf93|0;
    }
    r0 = HEAPU32[((r14 + (16))|0) >> 2] | 0;
    r42 = 8;
    r0 = ((r0|0) - (r42|0))|0;
    HEAPU32[(r14 + (16)) >> 2] = r0;
    r1 = 8;
    HEAPU32[(r14 + (44)) >> 2] = r1;
    r1 = r0|0;
    HEAPU32[(r14 + (40)) >> 2] = r1;
    r42 = 4;
    r0 = ((r0|0) + (r42|0))|0;
    r2 = HEAPU32[((r14 + (24))|0) >> 2] | 0;
    HEAPU32[1445307] = r0;
    HEAPU32[1445308] = r1;
    HEAPU32[1445309] = r2;
    HEAPU32[1445321] = r14;
    HEAPU32[1445322] = r15;
    qemu_st_lel(env|0, r0|0, r2|0, 34, 22759218);
if(getThrew() | 0) abort();
    r0 = 3241038392;
    HEAPU32[1445307] = r0;
    r0 = qemu_ld_leul(env|0, r0|0, 34, 22759233)|0;
if(getThrew() | 0) abort();
    HEAPU32[(r14 + (24)) >> 2] = r0;
    r1 = HEAPU32[((r14 + (12))|0) >> 2] | 0;
    r2 = HEAPU32[((r14 + (40))|0) >> 2] | 0;
    HEAPU32[1445307] = r0;
    HEAPU32[1445308] = r1;
    HEAPU32[1445309] = r2;
    qemu_st_lel(env|0, r2|0, r1|0, 34, 22759265);
if(getThrew() | 0) abort();
    r0 = HEAPU32[((r14 + (24))|0) >> 2] | 0;
    HEAPU32[(r14 + (40)) >> 2] = r0;
    r1 = 24;
    HEAPU32[(r14 + (52)) >> 2] = r1;
    r42 = 0;
    result = ((r0|0) == (r42|0))|0;
    if(result|0) {
    HEAPU32[1445307] = r0;
    HEAPU32[1445308] = r1;
    }
    HEAPU32[1445307] = r0;
    HEAPU32[1445308] = r1;
    return execute_if_compiled(22759392|0, env|0, sp_value|0, depth|0) | 0;
    return execute_if_compiled(23164080|0, env|0, sp_value|0, depth|0) | 0;
    break;
  } while(1); abort(); return 0|0;
}
return {tb_fun: tb_fun};
}(window, CompilerFFI, Module.buffer)["tb_fun"]

Conclusion

So, the work is still not completed, but in secret to perfect this long-term construction, I got fed up. So I decided to post what I have so far. The code is scary in places, because it is an experiment, and it is not clear in advance what needs to be done. Probably, then you should issue normal atomic commits on top of some more modern version of Qemu. In the meantime, there is a branch in the git in the blog format: a detailed commentary in Russian has been added to each somehow passed "level". Actually, this article is to a large extent a retelling of the conclusion git log.

You can try it all here (be careful, traffic).

What is already working:

  • x86 virtual processor running
  • There is a working prototype of a JIT code generator from machine code to JavaScript
  • There is a blank for building other 32-bit guest architectures: right now you can admire the hanging in the browser at the boot stage of Linux for the MIPS architecture

What else can be done

  • Speed ​​up emulation. Even in JIT mode, it seems to be slower than Virtual x86 (but potentially there is a whole Qemu with a lot of emulated hardware and architectures)
  • Make a normal interface - a web developer from me, frankly, is so-so, so for now I have redesigned the standard Emscripten shell as best I could
  • Try running more complex Qemu features - networking, VM migration, etc.
  • UPD: you will need to give your few developments and bug reports to Emscripten upstream, as previous porters of Qemu and other projects did. Thanks to them for making it possible to implicitly use their contribution to Emscripten as part of their task.

Source: habr.com

Add a comment