LLVM Creator Develops New Mojo Programming Language

Chris Lattner, founder and chief architect of LLVM and creator of the Swift programming language, and Tim Davis, former head of Google AI projects such as Tensorflow and JAX, introduced a new Mojo programming language that combines ease of use for R&D and rapid prototyping with suitability for high performance end products. The first is achieved through the use of the familiar syntax of the Python language, and the second is due to the ability to compile to machine code, mechanisms for safe memory management, and the use of tools for hardware acceleration of calculations.

The project is focused on use for machine learning development, but is presented as a general-purpose language that extends the capabilities of the Python language with systems programming and is suitable for a wide range of tasks. For example, the language is applicable to areas such as high performance computing, data processing and transformation. An interesting feature of Mojo is the ability to specify the emoji symbol "πŸ”₯" as an extension for code files (for example, "helloworld.πŸ”₯"), in addition to the text extension ".mojo".

Currently, the language is under intensive development and only the online interface is offered for testing. Separate assemblies to run on local systems are promised to be published later, after receiving feedback on the work of the interactive web environment. The source code for the compiler, JIT and other developments related to the project is planned to be opened after the design of the internal architecture is completed (the development model of a working prototype behind closed doors resembles the initial stage of development of LLVM, Clang and Swift). Since Mojo's syntax is based on Python and the type system is close to C/C++, there are plans in the future to develop a toolkit to facilitate the translation of existing projects written in C/C++ and Python to Mojo, as well as to develop hybrid projects that combine Python code. and Mojo.

The project is designed to involve the hardware resources of heterogeneous systems available in the system in the calculations. For example, GPUs, specialized machine learning accelerators, and vector processing instructions (SIMDs) can be used to run Mojo applications and parallelize computations. The reason for developing a separate subset of the Python language, rather than joining the optimization work of the existing CPython, is cited as a compilation focus, integration of system programming capabilities, and the use of a fundamentally different internal architecture that allows code to be executed on GPUs and various hardware accelerators. At the same time, the developers of Mojo intend to adhere to compatibility with CPython as much as possible.

Mojo can be used both in JIT interpretation mode and for compilation into executable files (AOT, ahead-of-time). The compiler has built-in modern technologies for automatic optimization, caching and distributed compilation. Source texts in the Mojo language are converted into low-level intermediate code MLIR (Multi-Level Intermediate Representation), developed by the LLVM project and providing additional features for optimizing the processing of a data flow graph. The compiler allows you to use various backends that support MLIR to generate machine code.

The use of additional hardware mechanisms to speed up calculations makes it possible to achieve performance that, with intensive calculations, exceeds C / C ++ applications. For example, when testing an application to generate a Mandelbrot set, the compiled Mojo application, when executed in the AWS cloud (r7iz.metal-16xl), was 6 times faster than the C ++ implementation (0.03 sec. vs. 0.20 sec.), as well as 35 thousand times faster than a Python application using stock CPython 3.10.9 (0.03 sec vs 1027 sec) and 1500 times faster using PYPY (0.03 sec vs 46.1 sec).

When evaluating the performance in the area of ​​solving machine learning problems, the Modular Inference Engine AI stack written in the Mojo language, compared to a solution based on the TensorFlow library, turned out to be 3 times faster on a system with an Intel processor when processing a language model, 6.4 times faster when execution of the recommendation generation model and 2.1 times faster when working with models for processing visual information. When using AMD processors, the gain when using Mojo was 3.2, 5 and 2.2 times, and when using ARM processors - 5.3, 7.5 and 1.7 times, respectively. The PyTorch-based solution lagged behind Mojo by 1.4, 1.1 and 1.5 times on the Intel CPU, 2.1, 1.2 and 1.5 times on the AMD CPU and 4, 4.3 and 1.3 times on the ARM CPU.

LLVM Creator Develops New Mojo Programming Language

The language supports static typing and low-level memory-safe features reminiscent of Rust features such as reference lifetime tracking and variable borrowing checker (borrow checker). In addition to the means for safe operation with pointers, the language also provides features for low-level work, for example, it is possible to directly access memory in unsafe mode using the Pointer type, call individual SIMD instructions, or access hardware extensions such as TensorCores and AMX.

LLVM Creator Develops New Mojo Programming Language

To simplify the separation of classic and optimized Python code for functions with explicit type definitions for all variables, it is proposed to use a separate keyword "fn" instead of "def". Similarly for classes, if you need to statically pack data in memory at compile time (as in C), instead of "class" you can use the type "struct". It is also possible to simply import modules in C / C ++ languages, for example, to import the cos function from the math library, you can specify "from "math.h" import cos".

Source: opennet.ru

Add a comment