SRE online intensive: we will break everything to the ground, then we will fix it, we will break it a couple more times, and then we will build it again

How about we break something? And then we build and build, repair and repair. Boredom is deadly.

Let's break it down so that we don't have anything for it - not only that we are also praised for this disgrace. And then we will build everything again - so much so that it will be an order of magnitude better, more fault-tolerant and faster.

And we break again.

Do you think this is a competition to use the most secret tool of our entire cosmonautics - the Big Russian Space Hammer?

No, this is an online SRE intensive. It so happened that each course Slurm SRE never and never like the previous one. Simply because you will never guess that in a huge complex system, to which thousands and thousands of users connect every second, and the audience itself is several million, it can fall off, break, blunt, turn off, and in hundreds of other ways spoil the mood of the shift of SRE engineers on duty.

In December we will hold another SRE intensive.

SRE online intensive: we will break everything to the ground, then we will fix it, we will break it a couple more times, and then we will build it again

Let's do a little retrospective. Recall how, just a few years ago, HR ran a race to see who would grab the most DevOps engineers in their company. Prize has changed. Now they, like the Pantsir-S1 tracking system, are examining the surrounding area, looking for SRE engineers. I mentioned in the articleEugene Varavva, developer at Google. How to describe Google in 5 words”, how an SRE engineer lives at Google, and how even such a corporation is experiencing a shortage of SRE specialists.

Online Intensive Slurm SRE in December, in three days, from 10:00 to 19:00, you will learn how to ensure the speed, fault tolerance and availability of sites in conditions of limited resources, eliminate IT incidents and conduct debriefing so that problems do not recur.

Course speakers:

Ivan Kruglov. Staff Software Engineer at Databricks. He has experience in enterprise companies in distributed delivery and processing of messages, BigData and web-stack, search, building an internal cloud, service mesh.

Pavel Selivanov. Senior DevOps Engineer at Mail.ru Cloud Solutions. On account of dozens of built infrastructures and hundreds of written CI / CD pipelines. Certified Kubernetes Administrator. Author of several courses on Kubernetes and DevOps. Regular speaker at Russian and international IT conferences.

Everything will be tough, unpredictable and practical. You will build, break and repair - and sometimes in very different sequences.

Build: You have to formulate SLO, SLI, SLA indicators for a site consisting of several microservices; develop the architecture and infrastructure that will provide them; build, test and deploy the site; set up monitoring and alerting.

Break: You will consider the internal and external factors of SLO deterioration: developer errors, infrastructure failures, influx of visitors, DoS attacks. Learn to understand resiliency, error budget, testing practices, interrupt management, and operational load.

Repair: You will be trained to quickly and effectively organize the work of the accident response team in the shortest possible time: connect colleagues, notify stakeholders (stakeholders), and set priorities.

Study: You will be able to analyze the approach to the site from the point of view of SRE. Analyze incidents. Determine how to avoid them in the future: improve monitoring, change the architecture, approaches to development and operation, regulations. Automate processes.

SRE Online Intensive simulates real conditions - the time to restore the service will be extremely limited. As in real life, as in a real work situation.

You can find out the conditions of the SRE course, as well as study the full program at link.

The online intensive is scheduled for December 2020. For those who pay in advance, we have prepared a discount.

Are you ready for intense learning, unusual tasks and sudden accidents?

Simply, it won't. There will be professional growth.

Source: habr.com

Add a comment