Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

On November 24, Slurm Mega, an advanced Kubernetes intensive, ended. Next Mega will be held in Moscow on May 18-20.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

The idea of ​​Slurm Mega: we look under the hood of the cluster, analyze in theory and practice the intricacies of installing and configuring a production-ready cluster (“the-not-so-easy-way”), consider the mechanisms for ensuring application security and fault tolerance.

Mega Bonus: Those who pass Slurm Basic and Slurm Mega receive all the knowledge necessary to pass the exam for CKA to CNCF and a 50% discount on the exam.

Special thanks to Selectel for providing a cloud for practice, thanks to which each participant worked in their own full-fledged cluster, and we did not have to add an extra 5 thousand to the ticket price for this.

Who are Bondarev and Selivanov, I will not tell anyone who is interested, Read here.

Slurm Mega. First day.

On the first day of Slurm Mega, we loaded the participants with 4 topics. Pavel Selivanov spoke about the process of creating a failover cluster from the inside, about the work of Kubeadm, as well as testing and troubleshooting the cluster.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

First coffee break. Usually a "call for the teacher", but on Slurm, while the students are drinking coffee, the teachers continue to answer questions.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2
And despite the fact that a cloud of “Break II” is hovering over Pavel Selivanov’s head, it’s not his destiny to leave for a break.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2
Sergei Bondarev and Marcel Ibraev are waiting for their turn to go to the pulpit.

During a break, I approached Sergey Bondarev and asked: “What advice would you give to all Kubernetes engineers based on your experience with our clients’ clusters?”

Sergey gave a simple recommendation: “Block access from the Internet to the API server. Because periodically there are security threats that allow unauthorized users to access the cluster.»

After a couple of minutes and a bottle of mineral water, Pavel Selivanov rushed into the shadow fight with the topic “Authorization in a cluster using an external provider”, namely LDAP (Nginx + Python) and OIDC (Dex + Gangway).

During the next break, Marcel Ibraev, Slurm speaker, Certified Kubernetes Administrator, gave his advice to Kubernetes engineers: “I will say a seemingly banal thing, but given how often I encounter this, there is a suspicion that not everyone takes this into account. You should not blindly believe any How-To from the Internet, which will tell you how cool this or that solution works. In the context of Kubernetes, this takes on a special meaning. For Kubernetes is a complex system and adding a solution to it that has not been tested specifically in your project and your cluster installation can lead to sad consequences, despite the fact that they wrote about its coolness on the Internet. Even just Kubernetes itself, without a balanced approach, can harm your project, “what is good for a Russian is death for a German.” Therefore, we test, check, run in any solution before implementing it at home. Only in this way you will take into account all the nuances that may arise.».

After dinner, Sergey Bondarev joined the fight. His topic is Network policy, namely an introduction to CNI and Network Security Policy.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

The Internet is full of articles about Network Policy. There is an opinion among admins that you can do without Network Policies, but security people love this tool very much and require Network Policies to be enabled.

The helm of Kubernetes from Sergey Bondarev was intercepted by Pavel Selivanov with the topic “Secure and Highly Available Applications in a Cluster”. He has favorite topics: PodSecurityPolicy, PodDisruptionBudget, LimitRange/ResourceQuota.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

The topic of Mega, which Pavel spoke at DevOpsConf: how to easily and quickly break a Kubernetes cluster and get all rights in 5 minutes.

After telling how easily a Kubernetes cluster is hacked, skeptical admins say: “Yeah, I told you, your Kubernetes is bullshit full of holes.” Pavel explains that it is possible to set up security in a cluster, and it's not difficult, it's just that security settings are disabled by default. Details in decryption the report.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2
— Who broke the cluster? Here he broke the cluster! I can see perfectly from here!

On Slurms, everything is not simple and easy, so as not to get bored. But this time Telegram decided to show everyone the fifth point:

Марсель Ибраев, [22 нояб. 2019 г., 16:52:52]:
Коллеги, в данный момент наблюдаются сбои в работе Телеграм, имейте это ввиду

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

On this bright and filled with practical knowledge, the first day ended. On the second day there will be even more practice, launching a database cluster using PostgreSQL as an example, launching a RabbitMQ cluster, managing secrets in Kubernetes.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

Slurm Mega. Second day.

The host started the second day with a cheerful announcement: “In the morning, as Pavel put it yesterday, real hardcore awaits us. In the language of surgeons, we will fit into the guts of Kubernetes!

Massovik entertainer is a different story. One of Slurm's problems is that people from information overload turn off and fall asleep. We've always been looking for a way to do something about it, and in the past Slurm, small games with an audience have worked well. This time we hired a specially trained person. There was a lot of fun in the chat about “interesting contests”, but the fact remains that we have never seen such cheerful participants.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

Marcel Ibraev came to the rescue - and he began to study Stateful applications in the cluster. Namely, launching a database cluster using PostgreSQL as an example and launching a RabbitMQ cluster.

After lunch Sergey Bondarev started K8S. And the theme was "Keeping Secrets". He was covered by Mulder and Scully. Studied secret management in Kubernetes and Vault. And also "The truth is out there".

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

Which continued until late in the evening, when Pavel Selivanov spoke about the Horizontal Pod Autoscaler

Slurm Mega. The third day.

Sharply and cheerfully from the very morning, Sergey Bondarev stirred up the audience with backup and recovery after failures. Backup and restore of the cluster using Heptio Velero and etcd checked personally.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

Sergey continued the theme of the annual rotation of certificates in the cluster: renewing control-plane certificates using kubeadm. Just before lunch, in order to work up the participants' appetite or beat it off completely, Pavel Selivanov raised the topic of deploying the application.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

Template and deployment tools were considered, as well as deployment strategies.

Pavel Selivanov told a new topic: Service Mesh, installing Istio. The topic turned out to be so rich that it is possible to do a separate intensive on it. We discuss plans, stay tuned for announcements.

The main thing is that everything works correctly. Because it's time to practice:
building a CI / CD to simultaneously run the application deployment and cluster upgrade. Everything works well in educational projects. And sometimes life is full of surprises.

Slurm Mega. Setting up a production-ready cluster, 3 tips from speakers and Slurm with Luke Skywalker and R2D2

May the Slurm be with you!

Source: habr.com

Add a comment