How we accelerated a web application 20 times using WebAssembly

How we accelerated a web application 20 times using WebAssembly

This article discusses a case of speeding up a browser application by replacing JavaScript calculations with WebAssembly.

WebAssembly - what is it?

In short, this is a binary instruction format for a stacked virtual machine. Often Wasm (abbreviated name) is called a programming language, but it is not. The instruction format is executed in the browser along with JavaScript.

It is important that WebAssembly can be obtained by compiling sources in languages ​​such as C / C ++, Rust, Go. It uses statistical typing and the so-called flat memory model. The code, as mentioned above, is stored in a compact binary format, which makes it almost as fast as if the application were run from the command line. These features have led to the rise in popularity of WebAssembly.

We remind you: for all readers of "Habr" - a discount of 10 rubles when enrolling in any Skillbox course using the "Habr" promotional code.

Skillbox recommends: Practical course "Mobile Developer PRO".

Wasm is currently used in many applications, from games like Doom 3 to web-ported apps like Autocad and Figma. Wasm is also applied in such a field as serverless computing.

This article provides an example of using Wasm to speed up an analytics web service. For clarity, we took a working application written in C, which will compile to WebAssembly. The result will be used to replace low-performance JS sections.

Application transformation

The example will use the fastq.bio browser service, which is intended for geneticists. The tool allows you to evaluate the quality of DNA sequencing (decoding).

Here is an example application in action:

How we accelerated a web application 20 times using WebAssembly

The details of the process need not be given as they are quite complex for non-specialists, but in short, scientists can use the above infographic to understand whether the DNA sequencing process went smoothly and what problems arose.

This service has alternatives, desktop programs. But fastq.bio allows you to speed up your work by visualizing the data. In most other cases, you need to be able to work with the command line, but not all geneticists have the necessary experience.

Everything works simply. The input is data presented as a text file. This file is generated by specialized sequencing tools. The file contains a list of DNA sequences and a quality score for each nucleotide. The file format is .fastq, hence the name of the service.

Implementation in JavaScript

The first step for the user when working with fastq.bio is to select the appropriate file. Using the File object, the application reads a random sample of data from the file and processes this batch. The task of JavaScript here is to perform simple string operations and calculate exponents. One of them is the number of nucleotides A, C, G and T on different DNA fragments.

After calculating the required indicators, they are visualized using Plotly.js, and the service starts working with a new data sample. The division into fragments is done to improve the quality of UX. If you work with all the data at once, the process will hang for a while, because the files with the results of the sequencing take up hundreds of gigabytes of file space. The service, on the other hand, takes data sections ranging in size from 0,5 to 1 MB and works with them step by step, building graphic data.

Here's how it works:

How we accelerated a web application 20 times using WebAssembly

The red rectangle contains the string conversion algorithm for rendering. This is the most computationally loaded part of the service. It is worth trying to replace it with Wasm.

Testing WebAssembly

To evaluate the possibility of using Wasm, the project team started looking for ready-made solutions for creating a QC metric (QC - quality control) based on fastq files. The search was carried out among tools written in C, C ++ or Rust, so that it was possible to port the code to WebAssembly. In addition, the tool should not be "raw", a service already tested by scientists was required.

As a result, the choice was made in favor of seqtk. The application is quite popular, it is open-source, the source language is C.

Before converting to Wasm, you should look at the principle of compiling seqtk for the desktop. According to the Makefile, here is what is needed:

# Compile to binary
$ gcc seqtk.c 
   -o seqtk 
   -O2 
   -lm 
   -lz

In principle, you can compile seqtk using Emscripten. If it's not there, we get by. Docker way.

$ docker pull robertaboukhalil/emsdk:1.38.26
$ docker run -dt --name wasm-seqtk robertaboukhalil/emsdk:1.38.26

If desired, you can assemble it yourselfbut this takes time.

Inside the container, you can easily take emcc as an alternative to gcc:

# Compile to WebAssembly
$ emcc seqtk.c 
    -o seqtk.js 
    -O2 
    -lm 
    -s USE_ZLIB=1 
    -s FORCE_FILESYSTEM=1

Minimum changes:

Instead of outputting to a binary file, Emscripten uses .wasm and .js to generate files, which is used to run the WebAssemby module.

The USE_ZLIB flag is used to support the zlib library. The library is distributed and ported to WebAssembly, and Emscripten includes it in the project.

The Emscriptpten virtual file system is activated. This POSIX-like FS, running in RAM inside the browser. When the page is refreshed, the memory is cleared.

To understand why a virtual file system is needed, it is worth comparing the way you run seqtk from the command line with the way you run a compiled WebAssembly module.

# On the command line
$ ./seqtk fqchk data.fastq
 
# In the browser console
> Module.callMain(["fqchk", "data.fastq"])

Getting access to the virtual file system is necessary in order not to rewrite seqtk for string input, not file input. In this case, the data fragment is displayed as a data.fastq file in a virtual file system with a call to main() seqtk on it.

Here is the new architecture:

How we accelerated a web application 20 times using WebAssembly

The figure shows that instead of computing in the main browser thread, it uses webworkers. This method makes it possible to perform calculations on a background thread without degrading the browser's responsiveness. Well, the WebWorker controller starts the Worker, managing its interaction with the main thread.

The seqtk command is run with a Worker on a mounted file. After completion of execution, the Worker issues the result in the form of a Promise. When the message is received by the main thread, the result is used to update the graphs. And so on for several iterations.

What about the performance of WebAssembly?

In order to evaluate the change in performance, the project team used the read operations per second parameter. Interactive plotting time is not taken into account because JavaScript is used in both implementations.

When using the out-of-the-box solution, the performance gain was nine times.

How we accelerated a web application 20 times using WebAssembly

This is an excellent result, but, as it turned out, there is an opportunity to optimize it. The fact is that a large number of QC analysis results are not used by seqtk, so they can be deleted. If this is done, the result is 13 times better than in JS.

How we accelerated a web application 20 times using WebAssembly

It was achieved by simply commenting out the printf() commands.

But that's not all. The point is that at this stage, fastq.bio receives the results of the analysis by calling various C functions. Each of them calculates its own set of characteristics, so that each fragment of the file was read twice.

In order to get around this problem, it was decided to combine the two functions into one. As a result, productivity increased by 20 times.

How we accelerated a web application 20 times using WebAssembly

It should be noted that such an outstanding result can not always be achieved. In some cases, performance drops, so it's worth evaluating each specific case.

As a conclusion, we can say that Wasm does provide an opportunity to improve the performance of the application, but you need to use it wisely.

Skillbox recommends:

Source: habr.com

Add a comment