GitHub launches Copilot machine learning system that generates code

GitHub announced the completion of testing of the intelligent assistant GitHub Copilot, capable of generating standard constructs when writing code. The system was developed jointly with the OpenAI project and uses the OpenAI Codex machine learning platform, trained on a large array of source codes hosted in public GitHub repositories. The service is free for maintainers of popular open source projects and students. For other categories of users, access to GitHub Copilot is paid ($10 per month or $100 per year), but free trial access is provided for 60 days.

Code generation is supported in the programming languages ​​Python, JavaScript, TypeScript, Ruby, Go, C# and C++ using various frameworks. Modules are available to integrate GitHub Copilot with Neovim, JetBrains IDEs, Visual Studio, and Visual Studio Code development environments. Judging by the telemetry collected during testing, the service allows you to generate code of fairly high quality - for example, 26% of the recommendations proposed in GitHub Copilot were accepted by the developers as is.

GitHub Copilot differs from traditional code completion systems in its ability to generate fairly complex code blocks, up to ready-made functions synthesized taking into account the current context. GitHub Copilot adapts to the way the developer writes code and takes into account the APIs and frameworks used in the program. For example, if there is an example of a JSON structure in a comment, when you start writing a function to parse this structure, GitHub Copilot will offer ready-made code, and when writing routine listings of repeating descriptions, it will generate the remaining positions.

GitHub launches Copilot machine learning system that generates code

GitHub Copilot's ability to generate ready-made code blocks has led to controversy related to potential violations of copyleft licenses. When forming the machine learning model, real source texts from open source project repositories located on GitHub were used. Many of these projects are provided under copyleft licenses, such as the GPL, which require the code of the derivative works to be distributed under a compatible license. By inserting existing code as suggested by Copilot, developers may unwittingly violate the license of the project from which the code was borrowed.

It is not yet clear whether work generated by a machine learning system can be considered derivative. Questions also arise as to whether a machine learning model is subject to copyright and, if so, who owns these rights and how they relate to the rights to the code on which the model is built.

On the one hand, the generated blocks can repeat text passages from existing projects, but on the other hand, the system recreates the structure of the code rather than copying the code itself. According to a GitHub study, only 1% of the time a Copilot recommendation might include code snippets from existing projects that are longer than 150 characters. In most situations, repetitions occur when Copilot cannot correctly determine the context or offers standard solutions to a problem.

To prevent substitution of existing code, a special filter has been added to Copilot that does not allow intersections with existing projects. When setting up, the developer can activate or disable this filter at his discretion. Among other problems, there is a possibility that the synthesized code may repeat errors and vulnerabilities present in the code used to train the model.

Source: opennet.ru

Add a comment