How we translated 10 million lines of C++ code to the C++14 standard (and then to C++17)

Some time ago (in autumn 2016), when developing the next version of the 1C:Enterprise technology platform, the question arose within the development team about supporting the new standard C ++ 14 in our code. The transition to the new standard, as we assumed, would allow us to write many things more elegantly, more simply and more reliably, simplify the support and maintenance of the code. And there seems to be nothing extraordinary in translation, if it were not for the scale of the code base and the specific features of our code.

For those who do not know, 1C:Enterprise is an environment for rapid development of cross-platform business applications and runtime for their execution in different operating systems and DBMS. In general terms, the composition of the product includes:

  • Application Server Cluster, works on Windows and Linux
  • Customer, working with the server via http(s) or its own binary protocol, works on Windows, Linux, macOS
  • Web client, which works in Chrome, Internet Explorer, Microsoft Edge, Firefox, Safari browsers (written in JavaScript)
  • Development environment (Configurator), works on Windows, Linux, macOS
  • Administration tools application servers, work on Windows, Linux, macOS
  • Mobile client, connecting to the server via http(s), works on mobile devices running Android, iOS, Windows
  • Mobile platform β€” a framework for creating offline mobile applications with the ability to synchronize, working on Android, iOS, Windows
  • Development environment 1C:Enterprise Development Tools, written in Java
  • Server Interaction Systems

We try to write the same code for different operating systems as much as possible - the server code base is 99% common, the client - about 95%. Technological platform 1C:Enterprise is mainly written in C++ and below are the approximate characteristics of the code:

  • 10 million lines of C++ code,
  • 14 thousand files,
  • 60 thousand classes,
  • half a million methods.

And all this economy had to be translated into C ++ 14. About how we did it and what we encountered in the process, we will tell today.

How we translated 10 million lines of C++ code to the C++14 standard (and then to C++17)

Disclaimer

Everything written below about slow / fast work, (not) large memory consumption by implementations of standard classes in various libraries means one thing: this is true FOR US. It is quite possible that standard implementations will suit your tasks in the best way. We started from our tasks: we took data typical of our clients, ran typical scenarios on them, looked at the speed, amount of memory consumed, etc., and analyzed whether we and our clients were satisfied with such results or not. And acted depending on.

What we had

Initially, we wrote the code for the 1C:Enterprise 8 platform using Microsoft Visual Studio. The project started in the early 2000s and we had a Windows-only version. Naturally, since then the code has been actively developed, many mechanisms have been completely rewritten. But the code was written according to the 1998 standard, and, for example, we had space-separated right angle brackets in order to successfully compile, like this:

vector<vector<int> > IntV;

In 2006, with the release of platform version 8.1, we started supporting Linux and switched to a third-party standard library STL Port. One of the reasons for the switch was to work with wide strings. We use std::wstring throughout our code, which is based on the wchar_t type. Its size is 2 bytes on Windows and 4 bytes by default on Linux. This led to the incompatibility of our binary protocols between the client and the server, as well as various persistent data. With gcc options, you can specify that the size of wchar_t during compilation is also 2 bytes, but then you can forget about using the standard library from the compiler, because it uses glibc, which in turn is compiled to 4-byte wchar_t. Other reasons were better implementation of standard classes, support for hash tables, and even emulation of move semantics inside containers, which we actively used. And one more reason, as they say last but not least, was string performance. We had our own class for strings, because Due to the specifics of our software, we use string operations very widely and this is critical for us.

Our string is based on the ideas of string optimization expressed back in the early 2000s Andrei Alexandrescu. Later, when Alexandrescu worked at Facebook, at his suggestion, a string was used in the Facebook engine that works on similar principles (see the library folly).

Our line used two main optimization technologies:

  1. For short values, an internal buffer is used in the string object itself (no additional memory allocation required).
  2. For all others, mechanics are used Copy On Write. The value of the string is stored in one place, the reference count is used when assigning/modifying.

To speed up the compilation of the platform, we excluded the implementation of stream (which we did not use) from our STLPort variant, which gave us a compilation speedup of about 20%. Subsequently, we had to use limited Boost. Boost makes extensive use of stream, in particular in its service APIs (for example, for logging), so we had to modify it to exclude the use of stream from it. This, in turn, made it difficult for us to migrate to new versions of Boost.

The Third Way

When moving to the C++14 standard, we considered the following options:

  1. Raise the STLPort modified by us to the C ++ 14 standard. The option is very difficult, because. STLPort support was discontinued in 2010, and we would have to lift all of its code ourselves.
  2. Change to another STL implementation compatible with C++14. It is highly desirable that this implementation be under Windows and Linux.
  3. When compiling for each OS, use the library built into the corresponding compiler.

The first option was rejected immediately due to too much work.

We've been thinking about the second option for a while; considered as a candidate libc++, but at that time it did not work under Windows. To port libc++ to Windows, you would have to do a lot of work - for example, write everything related to threads, thread synchronization and atomicity yourself, since libc++ in these areas used POSIX API.

And we chose the third way.

The transition

So, we had to replace the use of STLPort with the libraries of the corresponding compilers (Visual Studio 2015 for Windows, gcc 7 for Linux, clang 8 for macOS).

Fortunately, our code was written mainly according to guidelines and did not use all sorts of tricky tricks, so the migration to new libraries proceeded relatively smoothly, using scripts that replaced the names of types, classes, namespaces and includes in source files. The migration affected 10 source files (out of 000). wchar_t was replaced with char14_t; we decided to stop using wchar_t, because char000_t occupies 16 bytes on all OSes and does not spoil code compatibility between Windows and Linux.

There were some small adventures. For example, in STLPort, an iterator could be implicitly cast to an element pointer, and this was used in some places in our code. In the new libraries, this was no longer possible, and these passages had to be analyzed and rewritten manually.

So, the code migration is finished, the code is compiled for all operating systems. It's time for tests.

Tests after the transition showed a drop in performance (in some places up to 20-30%) and an increase in memory consumption (up to 10-15%) compared to the old version of the code. This was, in particular, due to the suboptimal performance of standard strings. Therefore, we again had to use our own line, slightly modified.

An interesting feature of the implementation of containers in embedded libraries was also revealed: empty (without elements) std::map and std::set from embedded libraries allocate memory. And in our country, due to the peculiarities of the implementation, quite a lot of empty containers of this type are created in some places of the code. They allocate standard memory containers a little, for one root element, but for us it turned out to be critical - in a number of scenarios, our performance dropped significantly and memory consumption increased (compared to STLPort). Therefore, in our code, we replaced these two types of containers from the built-in libraries with their implementation from Boost, where these containers did not have this feature, and this solved the problem with slowdown and increased memory consumption.

As often happens after large-scale changes in large projects, the first iteration of the sources did not work without problems, and here we were very useful, in particular, support for debug iterators in the Windows implementation. Step by step we moved forward, and by the spring of 2017 (version 8.3.11 1C:Enterprise) the migration was completed.

Results

The transition to the C++14 standard took us about 6 months. Most of the time, one (but very highly qualified) developer worked on the project, and at the final stage, representatives of teams responsible for specific areas joined - UI, server cluster, development and administration tools, etc.

The transition greatly simplified our work on migrating to the latest versions of the standard. So, version 1C:Enterprise 8.3.14 (in development, the release is scheduled for the beginning of next year) has already been transferred to the standard C++17.

After the migration, developers have more options. If earlier we had our own modified version of STL and one std namespace, now in the std namespace we have standard classes from the built-in libraries of the compiler, in the stdx namespace - our lines and containers optimized for our tasks, in boost - a fresh version of boost. And the developer uses those classes that are best suited for solving his problems.

The "native" implementation of move constructors also helps in development (move constructors) for a number of classes. If a class has a move constructor and this class is placed in a container, then the STL optimizes copying of elements inside the container (for example, when the container expands and you need to change capacity and reallocate memory).

Fly in the Ointment

Perhaps the most unpleasant (but not critical) consequence of migration is that we are faced with an increase in the volume obj files, and the full result of the build with all the intermediate files began to take up 60 - 70 GB. This behavior is due to the peculiarities of modern standard libraries, which have become less critical of the volume of generated service files. This does not affect the operation of the compiled application, but it causes a number of inconveniences in development, in particular, it increases the compilation time. There are also increased requirements for free disk space on build servers and development machines. Our developers are working on several versions of the platform in parallel, and hundreds of gigabytes of intermediate files sometimes create difficulties in work. The problem is unpleasant, but not critical, we have postponed its solution for the time being. As one of the options for solving it, we consider the technique unity build (it, in particular, is used by Google when developing the Chrome browser).

Source: habr.com

Add a comment