Facebook develops TransCoder to translate code from one programming language to another

Facebook engineers published a transcompiler transcoder, which uses machine learning techniques to convert source texts from one high-level programming language to another. Support is currently provided for translating code between Java, C++, and Python. For example, TransCoder allows you to convert Java source code to Python code, and Python code to Java source code. Project achievements are put into practice theoretical research to create a neural network for efficient automatic code transpilation and extend licensed under the Creative Commons Attribution-NonCommercial 4.0 license for non-commercial use only.

The implementation of the machine learning system is based on Pytorch. Two ready-made models are offered for download: first to translate C++ to Java, Java to C++, and Java to Python, and second for broadcast
C++ to Python, Python to C++ and Python to Java. To train the models, the source code of the projects hosted on GitHub was used. If desired, translation models can be created for other programming languages. To check the quality of the translation, a collection of unit tests has been prepared, as well as a test suite that includes 852 parallel functions.

TransCoder is claimed to be significantly superior in conversion accuracy to commercial translators using conversion rule-based methods, and in the process eliminates the need for peer review by experts in the source and target languages. Most of the errors that occur during the operation of the model can be eliminated by adding simple restrictions to the decoder to ensure that the generated functions will be syntactically correct.

Facebook develops TransCoder to translate code from one programming language to another

Researchers have proposed a new neural network architecture "Transformer" for modeling sequences, in which recurrence is replaced by "attention” (seq2seq model with attention), which allows you to get rid of some dependencies in the computational graph and parallelize what previously could not be parallelized. A single common model is used for all supported languages, which is trained using three principles - initialization, language modeling and reverse translation.

Facebook develops TransCoder to translate code from one programming language to another

Source: opennet.ru

Add a comment