The evolution of CI in the mobile development team

Today, most software products are developed in teams. The conditions for success in team development can be summarized in a simple diagram.

The evolution of CI in the mobile development team

After writing the code, you must make sure that it:

  1. Works.
  2. Doesn't break anything, including the code your colleagues wrote.

If both conditions are met, then you are on your way to success. To easily check these conditions and not turn off the profitable path, they came up with Continuous Integration.

CI is a workflow where you integrate your code into the overall product code as often as possible. And not just integrate, but also constantly check that everything works. Since you need to check a lot and often, you should think about automation. You can check everything on manual traction, but you shouldn’t, and here’s why.

  • Dear people. An hour of work of any programmer costs more than an hour of work of any server.
  • people are wrong. Therefore, situations may arise when tests were run on the wrong branch or collected the wrong commit for testers.
  • People are lazy. Periodically, when I finish some task, I have a thought: β€œWhat is there to check? I wrote two lines - absolutely everything works! I think some of you have such thoughts sometimes too. But you should always check.

Nikolay Nesterov tells how Continuous Integration was implemented and developed in the Avito mobile development team, how they got from 0 to 450 builds per day, and how build machines assemble 200 hours a day (nnesterov) is a participant in all evolutionary changes of the CI / CD Android application.

The story is based on the example of the Android team, but most of the approaches are applicable on iOS too.


Once upon a time, one person worked in Avito's Android team. By definition, he didn’t need anything from Continuous Integration: there was no one to integrate with.

But the application grew, more and more new tasks appeared, and the team grew accordingly. At some point, it was time to more formally establish the process of integrating the code. It was decided to use Git flow.

The evolution of CI in the mobile development team

The concept of Git flow is well known: the project has one common develop branch, and for each new feature, developers cut off a separate branch, commit to it, push, and when they want to pour their code into the develop branch, they open a pull request. To share knowledge and discuss approaches, we introduced code review, that is, colleagues must check and confirm each other's code.

Checks

Looking at the code with your eyes is cool, but not enough. Therefore, automatic checks are introduced.

  • First of all, we check ARC assembly.
  • A lot of Junit tests.
  • We consider code coverage, since we run the tests.

To understand how to run these checks, let's look at the development process in Avito.

Schematically, it can be represented as follows:

  • A developer writes code on his laptop. You can run integration checks right here - either as a commit hook, or just run checks in the background.
  • After the developer has pushed the code, he opens a pull request. In order for its code to get into the develop branch, it is necessary to pass a code review and collect the required number of confirmations. You can enable checks and builds here: until all builds are successful, the pull request cannot be merged.
  • After the pull request is merged and the code gets into develop, you can choose a convenient time: for example, at night, when all servers are free, and run checks as much as you like.

Nobody liked running checks on their laptop. When a developer has finished a feature, they want to push it quickly and open a pull request. If at this moment some long checks are launched, this is not only not very pleasant, but also slows down development: while the laptop is checking something, it is impossible to work on it normally.

We really liked running checks at night, because there is a lot of time and servers, you can roam. But, unfortunately, when the feature code got into develop, the developer already has much less motivation to fix the bugs that CI found. I periodically caught myself thinking when I looked at all the errors found in the morning report that I would fix them sometime later, because now there is a cool new task in Jira that I really want to start doing.

If the checks block the pull request, then there is enough motivation, because until the builds turn green, the code will not get into develop, which means the task will not be completed.

As a result, we chose the following strategy: at night we run the maximum possible set of checks, and the most critical of them and, most importantly, the fastest ones, we launch on a pull request. But we don’t stop there - in parallel, we optimize the speed of passing checks in such a way as to transfer them from night mode to pull request checks.

At that time, all our builds went quite quickly, so we simply included the ARC build, Junit tests, and code coverage calculation as a blocker to the pull request. They turned it on, thought about it - and abandoned code coverage, because they considered that we didn’t need it.

It took us two days to set up the basic CI (hereinafter, the time estimate is approximate, needed for scale).

After that, they began to think further - are we checking at all correctly? Are we running builds on a pull request correctly?

We ran the build on the last commit of the branch with which the pull request was opened. But reviews of this commit can only show that the code that the developer wrote works. But they don't prove that he didn't break anything. In fact, you need to check the state of the develop branch after a feature is merged into it.

The evolution of CI in the mobile development team

To do this, we wrote a simple bash script premerge.sh:

#!/usr/bin/env bash

set -e

git fetch origin develop

git merge origin/develop

Here, all the latest changes from develop are simply pulled up and merged into the current branch. We added the premerge.sh script as the first step in all builds and started checking exactly what we want, i.e. integration.

It took three days to localize the problem, find a solution, and write this script.

The application developed, more and more tasks appeared, the team grew, and premerge.sh sometimes started to let us down. Develop was getting conflicting changes that broke the build.

An example of how this happens:

The evolution of CI in the mobile development team

Two developers start working on features A and B at the same time. The developer of feature A discovers an unused feature in the project answer() and, like a good scout, removes it. At the same time, the developer of feature B in his branch adds a new call to this function.

Developers finish work and open pull requests at the same time. Builds are launched, premerge.sh checks both pull requests against the fresh state of develop - all checks are green. After that, the pull request of feature A is merged, the pull request of feature B is merged… Boom! Develop breaks because the develop code contains a call to a non-existent function.

The evolution of CI in the mobile development team

When it's not going to develop, it's local catastrophe. The whole team cannot collect anything and give it for testing.

It so happened that I was most often involved in infrastructure tasks: analytics, networking, databases. That is, it was I who wrote those functions and classes that other developers use. Because of this, I often find myself in such situations. I even had this picture for a while.

The evolution of CI in the mobile development team

Since this did not suit us, we began to work out options for how to prevent this.

How not to break develop

First option: rebuild all pull requests when updating develop. If, in our example, a pull request with feature A gets into develop first, the pull request for feature B will be rebuilt, and, accordingly, the checks will fail due to a compilation error.

To understand how long it will take, consider the example with two PRs. We open two PRs: two builds, two launches of checks. After the first PR is merged into develop, the second must be rebuilt. In total, two PRs take three runs of checks: 2 + 1 = 3.

Basically, it's fine. But we looked at the statistics, and a typical situation in our team was 10 open PRs, and then the number of checks is the sum of the progression: 10 + 9 + ... + 1 = 55. That is, to accept 10 PRs, you need to rebuild 55 times. And this is in an ideal situation, when all checks pass the first time, when no one opens an additional pull request while this dozen is being processed.

Imagine yourself as a developer who needs to be the first to press the β€œmerge” button, because if a neighbor does this, you will have to wait until all the builds go through again ... No, that won’t work, it will seriously slow down development.

Second possible way: collect pull request after code review. That is, open a pull request, collect the required number of approvals from colleagues, fix what you need, and then run the builds. If they are successful, the pull request is merged into develop. In this case, there are no additional restarts, but the feedback is greatly slowed down. As a developer, when I open a pull request, I immediately want to see if it is going to. For example, if some test has fallen, you need to quickly fix it. In the case of a delayed build, feedback slows down, and therefore the entire development. It didn't suit us either.

As a result, only the third option remained - cycling. All our code, all our sources are stored in a repository in the Bitbucket server. Accordingly, we had to develop a plugin for Bitbucket.

The evolution of CI in the mobile development team

This plugin overrides pull request merging mechanism. The beginning is standard: PR is opened, all builds are launched, code review is being carried out. But after the code review is passed, and the developer decides to click on "merge", the plugin checks against which state of develop the checks were run. If develop managed to update after the builds, the plugin will not allow such a pull request to be merged into the main branch. It will simply restart the builds with respect to the fresh develop.

The evolution of CI in the mobile development team

In our example with conflicting changes, such builds will fail due to a compilation error. Accordingly, the developer of feature B will have to fix the code, restart the checks, then the plugin will automatically apply the pull request.

Prior to implementing this plugin, we averaged 2,7 checkouts per pull request. With the plugin it became 3,6 launch. It suited us.

It is worth noting that this plugin has a drawback: it only restarts the build once. That is, there is still a small window through which conflicting changes can get into develop. But the probability of this is low, and we made this compromise between the number of launches and the probability of a breakdown. In two years, it fired only once, so, probably, not in vain.

It took us two weeks to write the first version of the plugin for Bitbucket.

New checks

Meanwhile, our team continued to grow. New checks have been added.

We thought: why fix mistakes if they can be prevented? And so they introduced static code analysis. We started with lint, which is included in the Android SDK. But at that time he did not know how to work with Kotlin code at all, and we already had 75% of the application written in Kotlin. Therefore, built-ins were added to lint android studio checks.

To do this, I had to pervert a lot: take Android Studio, package it in Docker and run it on CI with a virtual monitor so that it thinks it is running on a real laptop. But it worked.

Also at this time we began to write a lot test instrumentation and implemented screenshot testing. This is when a reference screenshot is generated for a separate small view, and the test is that a screenshot is taken from the view and compared with the reference directly pixel by pixel. If there is a discrepancy, it means that the layout went wrong somewhere or something is wrong in the styles.

But instrumentation tests and screenshot tests need to be run on devices: on emulators or on real devices. Considering that there are many tests and they are often raced, a whole farm is needed. Starting your own farm is too labor-intensive, so we found a ready-made option - Firebase Test Lab.

firebase test lab

Was chosen because Firebase is a product of Google, which means it should be reliable and is unlikely to ever die. The prices are affordable: $5 per hour for a real device, $1 per hour for an emulator.

It took about three weeks to implement Firebase Test Lab into our CI.

But the team continued to grow, and Firebase, unfortunately, began to let us down. At that time, he did not have any SLA. Sometimes Firebase made us wait until the required number of devices for tests were freed up, and did not start executing them right there, as we wanted. Waiting in line took up to half an hour, which is a very long time. Instrumentation tests raced on every PR, delays slowed development down a lot, and then the bill for the month came with a round sum. In general, it was decided to abandon Firebase and cut in-house, since the team had grown enough.

docker + python + bash

They took docker, stuffed emulators into it, wrote a simple Python program that, at the right time, raises the right number of emulators in the right version and stops them when necessary. And, of course, a couple of bash scripts - where would we be without them?

It took five weeks to create our own test environment.

The result was an extensive, merge-blocking list of checks for every pull request:

  • Assembly of the ARC;
  • Junit tests;
  • Lint;
  • Android Studio checks;
  • instrumentation tests;
  • Screenshot tests.

This prevented many possible breakdowns. Technically everything worked, but the developers complained that it took too long to wait for the results.

Too long - how much? We uploaded data from Bitbucket and TeamCity to the analysis system and realized that average waiting time 45 minutes. That is, a developer, opening a pull request, waits an average of 45 minutes for the results of builds. In my opinion, this is a lot, and it is impossible to work like this.

Of course, we decided to speed up all our builds.

We are accelerating

Seeing that builds are often queued, we first of all bought more iron - Extensive development is the simplest. Builds stopped queuing, but waiting times only went down a little, because some checks themselves were chasing for a very long time.

Remove too long checks

Our Continuous Integration could catch these types of errors and problems.

  • Not going. CI can catch a compilation error when conflicting changes prevent something from building. As I said, then no one can build anything, development stops, and everyone gets nervous.
  • Bug in behavior. For example, when the application is assembled, but when the button is pressed, it falls, or the button is not pressed at all. This is bad, because such a bug can get to the user.
  • Bug in layout. For example, a button is pressed but moved 10 pixels to the left.
  • Increase in technical debt.

After looking at this list, we realized that only the first two points are critical. We want to catch such problems in the first place. Bugs in the layout are detected at the design-review stage and are easily fixed at the same time. Working with technical debt requires a separate process and planning, so we decided not to check it for a pull request.

Based on this classification, we shook up the entire list of checks. Strike out Lint and postponed its launch for the night: just so that it gives a report on how many problems there are in the project. With technical debt, we agreed to work separately, and Android Studio checks were completely abandoned. Android Studio in Docker to run inspections sounds interesting but causes a lot of trouble in support. Any update of Android Studio versions is a struggle with incomprehensible bugs. It was also difficult to maintain screenshot tests, because the library was not very stable, there were false positives. Screenshot tests removed from the list of checks.

As a result, we are left with:

  • Assembly of the ARC;
  • Junit tests;
  • instrumentation tests.

Gradle remote cache

Without heavy checks, everything got better. But there is no limit to perfection!

Our application has already been split into about 150 gradle modules. Gradle remote cache usually works well in this case, and we decided to give it a try.

Gradle remote cache is a service that can cache build artifacts for individual tasks in individual modules. Gradle, instead of actually compiling the code, knocks on the remote cache via HTTP and asks if someone has already performed this task. If yes, just download the result.

Running Gradle remote cache is easy because Gradle provides a Docker image. We managed to do it in three hours.

All you had to do was run Docker and write one line in the project. But although you can start it quickly, it will take a lot of time to get everything working well.

Below is a graph of cache misses.

The evolution of CI in the mobile development team

At the very beginning, the percentage of cache misses was about 65. After three weeks, we managed to bring this value up to 20%. It turned out that the tasks that the Android application collects have strange transitive dependencies, which caused Gradle to miss the cache.

By enabling the cache, we greatly accelerated the build. But besides the assembly, instrumentation tests are also chasing, and they are chasing for a long time. Perhaps not all tests need to be run on every pull request. To find out, we use impact analysis.

Impact analysis

On a pull request, we collect git diff and find the modified Gradle modules.

The evolution of CI in the mobile development team

It makes sense to run only those instrumentation tests that check the modified modules and all modules that depend on them. It makes no sense to run tests for neighboring modules: the code has not changed there, and nothing can break.

With instrumentation tests, things are not so simple, because they must be in the top-level Application module. We applied bytecode analysis heuristics to understand which module each test belongs to.

It took about eight weeks to modernize how the instrumentation tests work so that they only test the modules involved.

Measures to speed up inspections have worked successfully. From 45 minutes we got to about 15. A quarter of an hour to wait for the build is already normal.

But now the developers have begun to complain that they do not understand which builds are being run, where to see the log, why the build is red, which test failed, etc.

The evolution of CI in the mobile development team

Feedback issues slow down development, so we've tried to provide as clear and detailed information as possible about each PR and build. We started with comments in Bitbucket to the PR, indicating which build fell and why, and wrote targeted messages in Slack. In the end, we made a dashboard for the PR page with a list of all the builds that are currently running and their status: in the queue, running, crashed or completed. You can click on the build and get to its log.

The evolution of CI in the mobile development team

Six weeks were spent on detailed feedback.

Plans

Let's move on to recent history. Having solved the issue of feedback, we reached a new level - we decided to build our own emulator farm. When there are many tests and emulators, they are difficult to manage. As a result, all our emulators moved to a k8s cluster with flexible resource management.

In addition, there are other plans.

  • Return Lint (and other static analysis). We are already working in this direction.
  • Run everything on PR blocker end-to-end tests on all SDK versions.

So, we have traced the history of the development of Continuous Integration in Avito. Now I want to give some advice from the point of view of the experienced.

Tips

If I could only give one piece of advice, it would be this one:

Please be careful with shell scripts!

Bash is a very flexible and powerful tool, it is very convenient and fast to write scripts on it. But with him you can fall into a trap, and we, unfortunately, fell into it.

It all started with simple scripts that ran on our build machines:

#!/usr/bin/env bash
./gradlew assembleDebug

But, as you know, everything develops and becomes more complicated over time - let's run one script from another, let's pass some parameters there - as a result, we had to write a function that determines at what level of bash nesting we are now in order to substitute the necessary quotes, to get it all started.

The evolution of CI in the mobile development team

You can imagine the labor involved in developing such scripts. I advise you not to fall into this trap.

Than it is possible to replace?

  • Any scripting language. write on Python or Kotlin Script more convenient because it's programming, not scripting.
  • Or describe the whole build logic in the form Custom gradle tasks for your project.

We decided to choose the second option, and now we are systematically deleting all bash scripts and writing a lot of custom gradle tasks.

Tip #2: Store infrastructure in code.

It is convenient when the Continuous Integration setting is not stored in the Jenkins or TeamCity UI, etc., but in the form of text files directly in the project repository. This gives versioning. It won't be hard to rollback or build the code on another branch.

Scripts can be stored in a project. And what to do with the environment?

Tip #3: Docker can help with the environment.

It will definitely help Android developers, unfortunately not iOS yet.

This is an example of a simple docker file that contains jdk and android-sdk:

FROM openjdk:8

ENV SDK_URL="https://dl.google.com/android/repository/sdk-tools-linux-3859397.zip" 
    ANDROID_HOME="/usr/local/android-sdk" 
    ANDROID_VERSION=26 
    ANDROID_BUILD_TOOLS_VERSION=26.0.2

# Download Android SDK
RUN mkdir "$ANDROID_HOME" .android 
    && cd "$ANDROID_HOME" 
    && curl -o sdk.zip $SDK_URL 
    && unzip sdk.zip 
    && rm sdk.zip 
    && yes | $ANDROID_HOME/tools/bin/sdkmanager --licenses

# Install Android Build Tool and Libraries
RUN $ANDROID_HOME/tools/bin/sdkmanager --update
RUN $ANDROID_HOME/tools/bin/sdkmanager "build-tools;${ANDROID_BUILD_TOOLS_VERSION}" 
    "platforms;android-${ANDROID_VERSION}" 
    "platform-tools"

RUN mkdir /application
WORKDIR /application

After writing this docker file (I’ll tell you a secret, you don’t need to write it, but download it from GitHub) and build the image, you get a virtual machine on which you can build the application and run Junit tests.

The two main arguments why this makes sense are scalability and repeatability. Using docker, you can quickly raise a dozen build agents, which will have exactly the same environment as the previous one. This makes the life of CI engineers a lot easier. Pushing android-sdk into docker is quite simple, with emulators it’s a little more difficult: you have to work a little harder (well, or again download it from GitHub).

Tip number 4: do not forget that checks are made not for the sake of checks, but for people.

Fast and, most importantly, understandable feedback is very important for developers: what broke, which test failed, where to see the buildlog.

Tip #5: Be pragmatic when developing Continuous Integration.

Clearly understand what types of errors you want to prevent, how much you are willing to spend resources, time, machine time. Checks that are too long can, for example, be moved overnight. And those of them that catch not very important errors should be completely abandoned.

Tip #6: Use ready-made tools.

Now there are many companies that provide cloud CI.

The evolution of CI in the mobile development team

For small teams, this is a good solution. You don't need to maintain anything, just pay some money, build your application and even run instrumentation tests.

Tip number 7: In-house solutions are more profitable in a large team.

But sooner or later, with the growth of the team, in-house solutions will become more profitable. There is one point with these decisions. There is a law of diminishing returns in economics: in any project, each successive improvement is more difficult, requires more and more investment.

Economics describes our whole life, including Continuous Integration. I built a schedule of labor costs for each stage of the development of our Continuous Integration.

The evolution of CI in the mobile development team

It can be seen that any improvement is given more and more difficult. Looking at this graph, you can understand that you need to develop Continuous Integration in concert with the growth of the team size. For a team of two people, spending 50 days developing an internal emulator farm is not a good idea. But at the same time, for a large team, not doing Continuous Integration at all is also a bad idea, because integration problems, communication repair, etc. it will take even more time.

We started with the fact that automation is needed because people are expensive, they make mistakes and are lazy. But people also automate. Therefore, all these same problems apply to automation.

  • Automation is expensive. Remember the labor schedule.
  • When it comes to automation, people make mistakes.
  • Sometimes it’s too lazy to automate, because everything works anyway. Why improve something else, why all this Continuous Integration?

But I have statistics: errors are caught in 20% of assemblies. And this is not because our developers write bad code. This is because developers are sure that if they make some mistake, it will not get into develop, it will be caught by automated checks. Accordingly, developers can spend more time writing code and interesting things, rather than locally chasing and checking something.

Do Continuous Integration. But in moderation.

By the way, Nikolai Nesterov not only makes great presentations himself, but is also a member of the program committee AppsConf and helping others prepare meaningful presentations for you. The completeness and usefulness of the program of the next conference can be assessed by topics in timetable. And for details, come April 22-23 to the Infospace.

Source: habr.com

Add a comment