Monorepositories: please

Monorepositories: please

Translation of the article prepared for students of the course "DevOps practices and tools" in the educational project OTUS.

You should choose a monorepo because the behavior it promotes in your teams is transparency and shared responsibility, especially as teams grow. Either way, you'll have to invest in tooling, but it's always better if the default behavior is the behavior you want to see in your commands.

Why are we talking about this?

Matt Klein wrote an article Monorepos: Please don't!β€Š (translator's note: translation on HabrΓ© "Monorepositories: please don't"). I like Matt, I think he is very smart and you should read his point of view. He originally posted the poll on Twitter:

Monorepositories: please

Translation:
This New Year's Day, I'm going to argue about how ridiculous monorepositories are. 2019 started off uneventfully. In the spirit of this, I offer you a survey. Who are the big fanatics? Supporters:
β€” Monorepositories
β€” Rust
β€” Wrong poll / both

My response was, "I am literally both of those people." Instead of talking about what kind of drug Rust is, let's look at why I think he is wrong about monorepositories. A little bit about yourself. I'm the CTO of Chef Software. We have about 100 engineers, a code base of about 11-12 years, and 4 core products. Some of this code is in the polyrepository (my starting position), some is in the monorepository (my current position).

Before I begin: every argument I make here will apply to both kinds of repositories. In my opinion, there is no technical reason why you should choose one or the other type of repository. You can make any approach work. I'm happy to talk about it, but I'm not interested in artificial technical reasons why one is superior to the other.

I agree with the first part of Matt's point:

Because on a large scale, a monorepo will solve all the same problems that a polyrepository solves, but at the same time provokes you to tightly couple your code and requires incredible efforts to increase the scalability of your version control system.

You have to solve the same problems, regardless of whether you choose a monorepository or a polyrepository. How do you release releases? What is your approach to updates? Backward compatibility? Cross project dependencies? What architectural styles are acceptable? How do you manage your build and test infrastructure? The list is endless. And you will solve them all as you grow. There is no free cheese.

I think Matt's argument is similar to the views shared by many engineers (and managers) that I respect. This is from the perspective of the engineer working on the component, or the team working on the component. You hear things like:

  • The codebase is cumbersome - I don't need all this junk.
  • It's harder to test because I have to test all this crap that I don't need.
  • It is more difficult to work with external dependencies.
  • I need my own virtual version control systems.

Of course, all these points are justified. This happens in both cases - I have my own junk in the polyrepository, in addition to what is needed for the build ... I may need other junk. Therefore, I "just" create tools that checkout the entire project. Or I create a fake monorepo with submodules. We could walk all day around this. But I think Matt's argument misses the main reason, which I've flipped quite a bit in favor of a monorepo:

It provokes communication and shows problems

When we separate repositories, we create a de facto problem of coordination and transparency. This is in line with how we think about teams (especially how individual members think about them): we are responsible for a certain component. We work in relative isolation. The bounds are fixed on my team and the component(s) we are working on.

As the architecture becomes more complex, one team can no longer manage it alone. Very few engineers keep the whole system in their head. Let's say you manage a shared component A that is used by teams B, C, and D. Team A is refactoring, improving the API, and changing the internal implementation. As a result, the changes are not backward compatible. What advice would you give?

  • Find all places where the old API is used.
  • Are there places where the new API cannot be used?
  • Can you fix and test other components to make sure they don't break?
  • Can these commands test your changes right now?

Note that these questions are independent of the repository type. You will need to find teams B, C and D. You will need to talk to them, figure out the time, understand their priorities. At least we hope you do.

In fact, no one wants to do this. It's a lot less fun than just fixing a damn API. All this is human and confusing. In a polyrepository, you can just make changes, get it reviewed by those who work on that component (probably not B, C, or D), and move on. Teams B, C, and D can just stay on their current version for now. They will upgrade when they realize your genius!

In a monorepository, responsibility is shifted by default. Team A changes their component and, if not careful, immediately breaks B, C, and D. This results in B, C, and D showing up at A's door, wondering why Team A broke the assembly. This teaches A that they cannot skip my list above. They should talk about what they are going to do. Can B, C and D move? What if B and C can, but D was closely related to a side effect of the behavior of the old algorithm?

Then we should talk about how we will get out of this situation:

  1. Support for several internal APIs, with the old algorithm deprecated until D can stop using it.
  2. Support for multiple release versions, one with the old interface, one with the new one.
  3. Delaying the release of A's changes until B, C, and D can all accept it at the same time.

Let's say we chose 1, multiple APIs. In this case, we have two pieces of code. Old and new. Pretty handy in some situations. We revert the old code back, mark it deprecated, and schedule its removal with the D team. Essentially identical for poly and mono.

To release multiple versions, we need a branch. Now we have two components - A1 and A2. Teams B and C use A2 and D uses A1. We need every component to be ready for release because security updates and other bug fixes may be needed before D can move on. In a polyrepository, we can stash this in a long-lived branch that feels good. In a monorepository, we force the code to be generated in a new module. Team D will still have to make changes to the "old" component. Everyone can see the cost we're paying here - we now have twice as much code, and any bug fixes that apply to A1 and A2 should apply to both of them. With the approach of using branches in a polyrepository, this is hidden behind a cherry-pick. We consider the cost to be less because there is no duplication. As a practical matter, the cost is the same: you will build, release, and maintain two basically identical code bases until you can remove one of them. The difference is that in a mono-repository this pain is direct and visible. It's worse, and it's good.

Finally, we got to the third point. Release delay. It is possible that changes made by A will improve the lives of team A. Important, but not urgent. Can we just delay? In the polyrepository, we push this to pin the artifact. Of course, we are talking about this to the D team. Just stay on the old version until you catch up! It sets you up to play the coward. Team A continues to work on their component, ignoring the fact that Team D is using an increasingly outdated version (this is Team D's problem, they are stupid). Meanwhile, Team D speaks badly about Team A's careless attitude towards code stability, if they talk about it at all. Months pass. Finally, the D team decides to look at the possibility of upgrading, but the changes in A have only become more. Team A barely remembers when and how they broke D. The upgrade is more painful and will take longer. Which sends it further down the priority stack. Until the day we have a security issue in A that forces us to branch. Team A needs to go back in time, find a point where D was stable, fix the problem there, and make it release-ready. This is the de facto choice people make, and by far the worst. This seems to be good for both Team A and Team D as long as we can ignore each other.

In a monorepository, the third is really not an option. You have to deal with the situation in one of two ways. You need to see the overhead of having two release branches. Learn to protect yourself from updates that break backwards compatibility. But most importantly: you can't avoid a difficult conversation.

In my experience, when teams get big, it's no longer possible to keep the entire system in mind, and that's the most important part. You must improve the visibility of disagreements in the system. You must actively work to get teams to take their eyes off their components and look at the work of other teams and consumers.

Yes, you can create tools that will try to solve the problem of polyrepositories. But my experience of learning continuous delivery and automation in large enterprises tells me this: the default behavior without the use of additional tools is the behavior that you expect to see. The default behavior of a polyrepository is isolation, that's the whole point. The default behavior of a monorepository is shared responsibility and transparency, that's the whole point. In both cases, I'm going to create a tool that will smooth out sharp corners. As a leader, I will choose a monorepository every time because the tools are supposed to reinforce the culture I want, and culture comes from tiny decisions and the day-to-day work of the team.

Only registered users can participate in the survey. Sign in, you are welcome.

Who are the big fanatics? Supporters:

  • Monorepositories

  • Rust

  • Wrong poll / both

33 users voted. 13 users abstained.

Source: habr.com

Add a comment