Release of the Cosmopolitan 2.0 standard C library developed for portable executables

The release of the Cosmopolitan 2.0 project has been published, which develops a standard C library and a universal executable file format that can be used to distribute programs for different operating systems without the use of interpreters and virtual machines. The result obtained by compiling to GCC and Clang is linked into a statically linked universal executable that can be run on any distribution of Linux, macOS, Windows, FreeBSD, OpenBSD, NetBSD, and even called from the BIOS. The project code is distributed under the ISC license (simplified version of MIT/BSD).

The container for generating universal executable files is based on combining segments and headers specific to different operating systems (PE, ELF, MACHO, OPENBSD) in one file, combining several different formats used in Unix, Windows and macOS. To ensure that a single executable file runs on Windows and Unix systems, a trick is used to encode Windows PE files as a shell script, taking advantage of the fact that Thompson Shell does not use the "#!" script marker. To create programs that include several files (linking all resources into one file), it is supported to generate an executable file in the form of a specially designed ZIP archive. Scheme of the proposed format (example hello.com application):

MZqFpD=' BIOS BOOT SECTOR' exec 7 $(command -v $0) printf '\177ELF…LINKER-ENCODED-FREEBSD-HEADER' >&7 exec "$0" "$@" exec qemu-x86_64 "$0" "$@" exit 1 REAL MODE… ELF SEGMENTS… OPENBSD NOTE… MACHO HEADERS… CODE AND DATA… ZIP DIRECTORY…

At the beginning of the file, the label "MZqFpD" is indicated, which is perceived as a Windows PE format header. This sequence is also decoded in the instruction “pop %r10 ; jno 0x4a ; jo 0x4a", and the line "\177ELF" into the instruction "jg 0x47", which are used to forward to the entry point. On Unix systems, shellcode is executed using the exec command, passing the executable code over an unnamed pipe. The limitation of the proposed method is the ability to run in Unix-like operating systems only using shells that support Thompson Shell compatibility mode.

The qemu-x86_64 call is provided for additional portability and allows code compiled for the x86_64 architecture to run on non-x86 platforms, such as Raspberry Pi boards and Apple devices equipped with ARM processors. The project can also be used to create self-contained applications that work without an operating system (bare metal). In such applications, a bootloader is attached to the executable file, and the program acts as a bootable operating system.

In the standard C library libc being developed by the project, 2024 functions are proposed (in the first release there were about 1400 functions). In terms of performance, Cosmopolitan is as fast as glibc and noticeably outperforms Musl and Newlib, despite the fact that Cosmopolitan is an order of magnitude smaller than glibc in terms of code size and roughly corresponds to Musl and Newlib. To optimize frequently called functions such as memcpy and strlen, the “trickle-down performance” technique is additionally used, in which a macro-binding is used to call the function, in which the compiler is informed about the CPU registers involved in the code execution process, which allows saving resources while saving CPU state by saving only mutable registers.

Among the changes in the new release:

  • The scheme for accessing internal resources inside a zip file has been changed (when opening files, the usual paths /zip/… are now used instead of accessing by the zip:.. prefix). Similarly, to access disks in Windows, it is possible to use paths like “/c/…” instead of “C:/…”.
  • A new APE (Actually Portable Executable) loader is proposed, which defines the format of universal executable files. The new bootloader uses mmap to map the program to memory and no longer changes content on the fly. If necessary, the universal executable can be converted into regular executables tied to individual platforms.
  • On the Linux platform, it is possible to use the binfmt_misc kernel module to run APE programs. It is noted that using binfmt_misc is the fastest startup method.
  • For Linux, an implementation of the functionality of the pledge() and unveil() system calls developed by the OpenBSD project is proposed. An API is provided for using call data in C, C++, Python, and Redbean programs, as well as a pledge.com utility for isolating arbitrary processes.
  • The Landlock Make utility is used for the assembly - an edition of GNU Make with more stringent dependency checking and the use of the Landlock system call to isolate the program from the rest of the system and improve caching efficiency. As an option, the ability to build and the usual GNU Make is preserved.
  • Implemented functions for multithreading - _spawn() and _join(), which are universal bindings over APIs specific to different operating systems. Work is also underway to implement support for POSIX Threads.
  • The ability to use the _Thread_local keyword to use separate storage for each thread (TLS, Thread-Local Storage) is provided. By default, the C runtime initializes TLS for the main thread, which has increased the minimum executable size from 12 KB to 16 KB.
  • Added support for the "--ftrace" and "--strace" parameters to executable files to print information about all function calls and system calls to stderr.
  • Added support for the closefrom() system call supported in Linux 5.9+, FreeBSD 8+ and OpenBSD.
  • On the Linux platform, the performance of clock_gettime and gettimeofday calls has been increased up to 10 times due to the use of the vDSO (virtual dynamic shared object) mechanism, which makes it possible to transfer the system call handler to user space and avoid context switches.
  • Mathematical functions for working with complex numbers have been moved from the Musl library. Accelerated work of many mathematical functions.
  • The nointernet() function is proposed, which disables network capabilities.
  • Added new functions for efficient string appending: appendd, appendf, appendr, appends, appendw, appendz, kappendf, kvappendf and vappendf.
  • A protected variant of the kprintf() family of functions has been added, designed to work with elevated privileges.
  • Significantly improved performance of SSL, SHA, curve25519 and RSA implementations.

Source: opennet.ru

Add a comment