Slurm SRE - learning to ensure user happiness

Slurm SRE - learning to ensure user happiness

On February 3, Slurm SRE starts in Moscow.

This is the first intensive where we left the "Repeat after the teacher" scheme. You will work in the SRE-project, as close as possible to combat conditions.

You will get a full-fledged working project in your hands and will work with it in real time. A typical SRE task awaits you: working with unfamiliar code, problems with synchronizing distributed systems, difficulties communicating with colleagues.

You are waiting for non-trivial system failures taken from real life. (From time to time I hear from speakers: “Colleagues, I’m sorry, I won’t be able to join the meetings in the next two days, but an excellent case has appeared for our program”).

Incidents will develop rapidly, given that every second is a lost profit for our training company.

We will divide the participants into teams. Each team will have a mentor, one of the course speakers. Each team is responsible for its own backend. As incidents develop, you will need to organize work in your team and interact with other teams. We play on the score: the judges will remove and add points so that the team can see how adequate and effective its actions are. And at the end we will announce the winner.

After each incident there will be a debriefing where we will identify and correct systemic problems in the processes. Mentors will enforce the blameless culture of postmortems. In our area, the blameless approach has not yet spread very much, but this is one of the keys to the implementation of SRE and DevOps.

We expect to achieve a global paradigm shift in three days: teach you to think like an SRE engineer and look at a project like an SRE engineer.

You will need a laptop, headset, and basic knowledge of Kubernetes to participate. If there is no last item, you can complete an online course in the remaining time Slurm Kubernetes.

Register here.

Source: habr.com

Add a comment