How Uma.Tech developed infrastructure

We launched new services, traffic grew, we replaced servers, connected new sites and redesigned data centers - and now we will tell this story, which was introduced to you five years ago.

Five years is a typical time for summing up intermediate results. Therefore, we decided to talk about the development of our infrastructure, which over the past five years has passed a surprisingly interesting development path, which we are proud of. The quantitative changes we implemented have turned into qualitative ones, now the infrastructure can operate in modes that seemed fantastic in the middle of the past decade.

We ensure the work of the most complex projects with the most stringent requirements for reliability and loads, including PREMIER and Match TV. On sports broadcasts and at the premiere of popular TV shows, traffic in terabits / s is required, we can easily implement this, and so often that working with such speeds has long become commonplace for us. And five years ago, the heaviest project running on our systems was Rutube, which has since developed, increased volumes and traffic, which had to be taken into account when planning loads.

We talked about how we developed the hardware of our infrastructure ("Rutube 2009-2015: the history of our hardware") and developed a system responsible for uploading videos (“From zero to 700 gigabits per second - how one of the largest video hosting sites in Russia uploads video”), but a lot of time has passed since the writing of these texts, many other solutions have been created and implemented, the results of which allow us to meet modern requirements and be flexible enough to adapt to new tasks.

How Uma.Tech developed infrastructure

network core we are constantly developing. We switched to Cisco equipment in 2015, as we mentioned in the last article. Then it was all the same 10 / 40G, but for obvious reasons, a few years later they upgraded the existing chassis, and now we are also actively using 25 / 100G.

How Uma.Tech developed infrastructure

100G links have long been neither a luxury (rather, this is an urgent requirement of the time in our segment), nor a rarity (more and more operators provide connection at such speeds). However, 10/40G remains relevant: through these links, we continue to connect operators with a small amount of traffic, for which at the moment it is not advisable to use a more capacious port.

The network core we created deserves separate consideration and will be the subject of a separate article a little later. There we will delve into the technical details and consider the logic of our actions when creating it. But now we will continue to draw the infrastructure more schematically, since your attention, dear readers, is not unlimited.

Video upload servers evolve rapidly, for which we offer a lot of effort. If earlier we used mainly 2U servers with 4-5 network cards with two 10G ports each, now most of the traffic is sent from 1U servers, in which there are 2-3 cards with two 25G ports each. Cards with 10G and 25G are almost equal in cost, and faster solutions allow you to give both 10G and 25G. The result is clear savings: fewer server components and cables to connect – less cost (and more reliable), components take up less rack space – more servers can be placed per unit area and therefore lower rental costs.

But the gain in speed is more important! Now we can give more than 1G with 100U! And this is against the backdrop of a situation where some large Russian projects are calling 40G output from 2U an “achievement”. We would have their problems!

How Uma.Tech developed infrastructure

Note that we still use the generation of network cards that can only work on 10G. This equipment works stably and is very familiar to us, so we did not throw it away, but found a new use for it. We installed these components in video storage servers, for which one or two 1G interfaces are clearly not enough for efficient operation, here 10G cards turned out to be relevant.

Storage systems are also growing. Over the past five years, they have changed from twelve-disk (12x HDD 2U) to thirty-six-disk (36x HDD 4U). Some people are afraid to use such capacious “carcasses”, because if one such chassis fails, there may be a threat to performance - and even performance! - for the entire system. But this will not happen with us: we have provided redundancy at the level of geo-distributed copies of data. We spread the chassis to different data centers - we use three in total - and this eliminates the occurrence of problems both with failures in the chassis and with the fall of the platform.

How Uma.Tech developed infrastructure

Of course, this approach made hardware RAID redundant, which we abandoned. By eliminating redundancy, we simultaneously increased the reliability of the system by simplifying the solution and removing one of the potential points of failure. Recall that our storage systems are “home-made”. We went for this quite consciously and the result completely suited us.

data centers over the past five years we have changed several times. Since writing the previous article, we have not changed only one data center - DataLine - the rest required replacement as our infrastructure developed. All transfers between sites were planned.

Two years ago, we migrated inside MMTS-9, moving to a site with high-quality repairs, a good cooling system, stable power supply and without dust, which used to lie in thick layers on all surfaces, and also abundantly clogged the insides of our equipment. The choice in favor of the quality of services - and the absence of dust! was the reason for our move.

How Uma.Tech developed infrastructure

Almost always “one move equals two fires”, but the challenges of migration are different each time. This time, the main difficulty of moving within one data center was “provided” by optical cross-connects - their interfloor abundance without being combined into a single cross-connect by telecom operators. The process of updating and re-laying crossovers (in which MMTS-9 engineers helped us) was perhaps the most difficult stage of migration.

The second migration took place a year ago, in 2019 we moved from a not very good data center to O2xygen. The reasons for the move were similar to those discussed above, but they added the problem of the unattractiveness of the original data center for telecom operators - many providers had to "catch up" to this point on their own.

How Uma.Tech developed infrastructure

The migration of 13 counters to a high-quality site in MMTS-9 made it possible to develop this location not only as an operator location (a couple of counters and “forwarding” operators), but also to use it as one of the main ones. This somewhat simplified the migration from a not-so-good data center - we moved most of the equipment from it to another site, and O2xygen assigned the role of a developing one, sending 5 racks of equipment there as well.

Today, O2xygen is already a full-fledged platform, where the operators we need have “came” and new ones continue to connect. For operators, O2xygen also proved to be attractive in terms of strategic development.

We always carry out the main phase of the move overnight, and during the migration within MMTS-9 and on O2xygen, we adhered to this rule. We emphasize that we strictly comply with the rule “moving overnight” regardless of the number of racks! There was even a precedent when we moved 20 racks and completed it in one night too. Migration is a fairly simple process that requires accuracy and consistency, but even here there are some tricks both in the process of preparation, and when moving, and when deploying to a new location. We are ready to talk about migration in detail if you are interested.

The results We like five-year development plans. We have completed building a new fault-tolerant infrastructure distributed across three data centers. We have dramatically increased the density of traffic output - if recently we were happy with 40-80G with 2U, now it is normal for us to give 100G with 1U. Now even a terabit of traffic is perceived by us as commonplace. We are ready to further develop our infrastructure, which turned out to be flexible and scalable.

Question: what to tell you in the following texts, dear readers? About why we began to create self-made data storage systems? About the network core and its features? About the tricks and subtleties of migration between data centers? About optimizing search results decisions by selecting components and fine-tuning parameters? About building resilient solutions with multiple redundancy and horizontal scalability within the data center, which are implemented in a structure of three data centers?

Author: Petr Vinogradov — Technical Director of Uma.Tech Hamsters

Source: habr.com

Add a comment