DevOps vs DevSecOps: how it looked in one bank

DevOps vs DevSecOps: how it looked in one bank

The Bank outsources its projects to many contractors. "Outsiders" write code, then transmit the results in a not very convenient form. Specifically, the process looked like this: they handed over a project that passed functional tests with them, and then it was tested already inside the banking perimeter for integration, load, and so on. It was often found that the tests were failing. Then everything went back to the external developer. As you can guess, this meant a long time to fix bugs.

The bank decided that it was possible and necessary to drag the entire pipeline to itself “under its wing” from commits to release. So that everything is uniform and under the control of the teams responsible for the product in the bank. That is, as if the outside contractor was just working somewhere in the next room of the office. On the corporate stack. This is a normal devops.

Where did Sec come from? Bank security has placed high demands on how an external contractor can work in the network segment, what access to whom, how and who works with the code. It's just that IB didn't know yet that when contractors work outside, few banking standards are followed. And then in a couple of days you need to start observing them for everyone.

One simple revelation that the contractor has full access to the product code has already turned their world upside down.

At this moment, the story of DevSecOps began, which I want to talk about.

What practical conclusions did the bank draw from this situation?

There was a lot of controversy about the fact that everything is done wrong in the field. The developers said that security is only busy trying to interfere with development, and they, like watchmen, try to prohibit without thinking. In turn, the security guys hesitated between choosing between the points of view: "developers create vulnerabilities in our circuit" and "developers do not create vulnerabilities, but they themselves are." The debate would have continued for a long time if it were not for new market requirements and the emergence of the DevSecOps paradigm. It was possible to explain that this very automation of processes, taking into account the requirements of information security out of the box, will help everyone to be satisfied. In the sense that the rules are written immediately and do not change during the game (IB will not ban something unexpectedly), and the developers keep IB informed about everything that happens (IB does not encounter something suddenly). Each team is also responsible for the ultimate security, and not some abstract older brothers.

  1. Since external employees already have access to the code and a number of internal systems, it is probably possible to remove the requirement “development must be carried out entirely on the bank’s infrastructure” from the documents.
  2. On the other hand, it is necessary to strengthen control over what is happening.
  3. The compromise was the creation of cross-functional teams, when employees work closely with external people. In this case, you need to make the command work on the tools on the bank's servers. From the beginning to the end.

That is, contractors can be allowed, but you need to make separate segments for them. So that they don’t drag some kind of infection into the bank’s infrastructure from the outside and so that they don’t see more than necessary. Well, so that their actions are logged. DLP for protection against "drains", all this was applied.

In principle, all banks come to this sooner or later. Here we went along the beaten path and agreed on the requirements for such environments where the “outsiders” work. There appeared the maximum set of access control tools, vulnerability checkers, anti-virus analysis on contours, on assemblies and tests. This is called DevSecOps.

It suddenly became clear that if before DevSecOps banking security did not have control over what happens on the developer side, then in the new paradigm, security is controlled in the same way as regular infrastructure events. Only now there are alerts on assemblies, library control, and so on.

It remains only to transfer the teams to the new model. Well, create infrastructure. But these are trifles, it's like drawing an owl. Actually, we helped with the infrastructure, and at this time the development processes were changing.

What has changed

We decided to implement it in small steps, because we understood that many processes would fall apart, and many “outsiders” might not be able to withstand the new working conditions under the supervision of everyone.

First, we created cross-functional teams and learned how to organize projects taking into account new requirements. In the sense of organizationally discussed what processes. The result was an assembly pipeline scheme with all those responsible.

  • CI: Git, Jenkins, Maven, Roslyn, Gradle, jUnit, Jira, MF Fortify, CA Harvest, GitlabCI.
  • CD: Ansible, Puppet, TeamCity, Gitlab TFS, Liquidbase.
  • Test: Sonarqube, SoapUI, jMeter, Selenium: MF Fortify, Performance Centre, MF UFT, Ataccama.
  • Presentation (reporting, communication): Grafana, Kibana, Jira, Confluence, RocketChat.
  • Operations (maintenance, management): Ansible, Zabbix, Prometheus, Elastic + Logstash, MF Service Manager, Jira, Confluence, MS Project.

Chosen stack:

  • Knowledge base - "Atlassian Confluence";
  • Task tracker - "Atlassian Jira";
  • Artifact Repository - "Nexus";
  • Continuous integration system - "Gitlab CI";
  • Continuous analysis system - "SonarQube";
  • Application security analysis system - "Micro Focus Fortify";
  • Communication system - "GitLab Mattermost";
  • Configuration management system - "Ansible";
  • Monitoring system - "ELK", "TICK Stack" ("InfluxData").

They began to create a team that would be ready to drag the contractors inside. It came to the realization that there are several important things:

  • Everything should be unified at least on the transfer of the code. Because how many contractors there were - so many different development processes with features. It was necessary to put everyone in about one, but with options.
  • There are many contractors, and manual creation of infrastructure is not suitable. Any new task must be launched very quickly - that is, the instance must be deployed almost instantly, so that developers receive sets of solutions to manage their pipeline.

To take the first step, it was necessary to understand what was being done in general. And we had to figure out how to get there. We started by helping to draw the architecture of the target solution both in infrastructure and in CI / CD automation. Then we began to assemble this conveyor. We needed one infrastructure, the same for everyone, where the same conveyors would run. We offered options with calculations, the bank thought, then decided what would be built and with what funds.

Next, the creation of the contour - software installation, configuration. Development of scripts for infrastructure deployment and management. Next is the transition to pipeline support.

We decided to run everything on the pilot. Interestingly, in the course of piloting, a certain stack appeared in the bank for the first time. Among other things, a domestic vendor of one of the solutions for an early launch was proposed for the volume of the pilot. Security got to know him in the course of piloting, and this left an unforgettable impression. When we decided to move on, fortunately, the infrastructure layer was replaced with a Nutanix solution, which had already been circulating in the bank before. And before that it was for VDI, and we reused it for infrastructure services. At small volumes, it did not fit into the economy, but at large it became an excellent environment for development and testing.

The rest of the stack is more or less familiar to everyone. Automation tools in the Ansible part were used, the security people worked closely with them. The Atlassin stack was used by the bank before the project. Fortinet security tools - it was proposed by the security people themselves. The testing frame was created by the bank, no questions asked. Questions were caused by the system of repositories, I had to get used to it.

The contractors were given a new stack. We were given time to rewrite under GitlabCI, and to migrate Jira to the bank segment and so on.

Step by step

Step 1. First, a solution from a domestic vendor was used, the product was connected to a newly created DSO network segment. The platform was chosen for delivery time, scalability and full automation capabilities. Conducted tests:

  • Possibility of flexible and fully automated management of the infrastructure of the virtualization platform (network, disk subsystem, subsystem of computing resources).
  • Automation of virtual machine lifecycle management (templating, snapshots, backups).

After installation and basic configuration of the platform, it was used as a hosting point for the second stage subsystems (DSO tools, retail systems development circuits). The necessary sets of pipelines were created - creating, deleting, changing, backing up virtual machines. These pipelines were used as the first step in the deployment process.

As a result, the provided equipment does not meet the bank's performance and fault tolerance requirements. The DIT of the bank decided to create a complex based on the Nutanix PAK.

Step 2. We took the stack that was defined, and wrote automated installation and post-configuration scripts for all subsystems so that everything was transferred from the pilot to the target circuit as quickly as possible. All systems were deployed in a fault-tolerant configuration (where this possibility is not limited by the vendor's licensing policies), connected to the subsystems for collecting metrics and events. IS analyzed compliance with its requirements and gave the green light.

Step 3. Migration of all subsystems and their settings to the new PACK. Infrastructure automation scripts have been rewritten, and the migration of DSO subsystems has been performed in a fully automated manner. The circuits of IC development were re-created by pipelines of development teams.

Step 4. Automation of application software installation. These tasks were set by the team leaders of the new teams.

Step 5. Exploitation.

Remote access

The development teams were asking for maximum flexibility in working with the circuit, and the requirement for remote access from personal laptops was placed at the very beginning of the project. The bank already had remote access, but it was not suitable for developers. The fact is, the scheme used the user's connection to a secure VDI. This was suitable for those who had enough mail and an office package at their workplace. Developers would need heavy clients, high performance, with lots of resources. And of course, they had to be static, since losing a user session for those who work with VStudio (for example) or another SDK is unacceptable. The organization of a large number of thick static VDI ​​for all development teams greatly increased the cost of the existing VDI solution.

We decided to work on remote access directly to the resources of the development segment. Jira, Wiki, Gitlab, Nexus, build and test benches, virtual infrastructure. The security guards demanded that access could only be arranged if the following were observed:

  1. Using technologies already available in the bank.
  2. The infrastructure should not use existing domain controllers that keep records of productive object accounts.
  3. Access should be limited to only those resources required by a particular team (so that a team on one product cannot access resources on another team).
  4. Maximum control over RBAC in systems.

As a result, a separate domain was created for this segment. This domain hosted all the resources of the development segment, both user credentials and infrastructure. The life cycle of records in this domain is managed using the bank's existing IdM.

Direct remote access was organized on the basis of the bank's existing equipment. Access control was split into AD groups, which corresponded to the rules on the contexts (one product group = one group of rules).

VM template management

The speed of creating the build loop and testing is one of the main KPIs set by the head of the development unit, because the speed of preparing the environment directly affects the total pipeline execution time. Two options for preparing base VM images were considered. The first is the minimum image size, default for all product systems, the maximum compliance with the bank's settings policies. The second is the base image, which contains the installed heavy POPPO, the installation time of which could greatly affect the speed of the pipeline.

Infrastructure and security requirements were also taken into account during development - keeping images up to date (patches, etc.), integration with SIEM, security settings according to bank standards.

As a result, it was decided to use minimal images in order to minimize the cost of maintaining them up to date. It is much easier to update the base OS than to patch each image for new versions of POPPO.

Based on the results, a list was formed from the minimum required set of operating systems, whose update is carried out by the operation team, and the scripts from the pipeline are fully responsible for updating the software, and if necessary, change the version of the installed software, it is enough to transfer the required tag to the pipeline. Yes, this requires more complex deployment scenarios from the devops product team, but it greatly reduces the operation time for supporting base images, which could fall on the maintenance of more than a hundred base VM images.

Access to the Internet

Another stumbling block with banking security was access from the development environment to Internet resources. Moreover, this access can be divided into two categories:

  1. infrastructure access.
  2. Developer access.

Infrastructure access was organized by proxying external repositories by Nexus. That is, direct access from virtual machines was not provided. This made it possible to reach a compromise with information security, which was categorically against providing any access to the outside world from the development segment.

Developers needed internet access for obvious reasons (stackoverflow). And although all commands, as mentioned above, had remote access to the contour, it is not always convenient when you cannot do ctrl + v from the developer's workplace in the bank in the IDE.

Agreements were reached with IB that initially, at the testing stage, access will be provided through a banking proxy based on white-list. At the end of the project, access will be transferred to the black-list. Huge access tables were prepared, which indicated the main resources and repositories that required access at the start of the project. The coordination of these accesses took a decent amount of time, which made it possible to insist on the fastest possible transition to blacklists.

The results

The project ended a little less than a year ago. Oddly enough, but all the contractors switched to the new stack in time and no one left because of the new automation. IB is in no hurry to share positive feedback, but it does not complain, from which we can conclude that they like it. Conflicts have subsided, because information security feels in control again, but at the same time does not interfere with development processes. More responsibility fell on the teams, and in general, the attitude towards information security has become better. The bank understood that the transition to DevSecOps was almost inevitable, and did it, in my opinion, in the most gentle and correct way.

Alexander Shubin, system architect.

Source: habr.com

Add a comment