Bloomberg Storage Support Team Relies on Open Source and SDS

Bloomberg Storage Support Team Relies on Open Source and SDS

TL; DR: The Bloomberg Storage Engineering team has created cloud storage for internal use that does not interfere with infrastructure and withstand heavy load during the volatility of trading during a pandemic.

Mattew Leonard, talking about his work as a technical manager on the Bloomberg Storage Engineering team, often uses the words "complex" and "fun". The complexity arises from the wide scope of storage, from the latest NVMe-based SAN arrays to open source software-defined storage in DevOps. This is where the "fun" begins (see my avatar on HabrΓ©, approx. translator).

Leonard and his team of 25 colleagues oversee over 100 petabytes of capacity and an internal cloud for 6000 engineers developing applications for Bloomberg Terminal, the technology that made Michael Bloomberg a billionaire. The team designs, builds and maintains storage systems for Bloomberg Engineering.

Like the rest of IT professionals, 2020 has been an unusual year for members of the Storage Engineering team as COVID-19 forced them to work remotely. Leonard said the pandemic has impacted his "close-knit team" socially as face-to-face interactions have faded, but employees have quickly adjusted to working from home on laptops and videoconferencing.

It's amazing, but I want to say that it did not get worse. There was a short adaptation period - not everyone was set up to work from home. After a week or two, everyone understood it. We were able to find ways to get ourselves to work, buy and upgrade equipment, increase costs to support the company during this time. We had to get creative, but we didn't get hurt

The biggest challenge may have come before the peak of COVID-19. This was driven by the instability of market trade due to fears that the pandemic would affect the global economy. The volume of data flowing into Bloomberg terminals from global capital markets nearly doubled, reaching 240 billion pieces of information on some days at the end of March. This is a serious test of storage systems.

When you instantly double your storage requirements in one day, it does create interesting challenges. We've been able to handle this and ensure that application development teams are given the space and performance they need. Most of this has to do with our thoughts about storage systems. Today we do not create anything. We don't say, "We use ABC, so we'll build the infrastructure for ABC." We do what we call "data budgeting" with our teams to predict usage, analyze usage trends and performance, and we also care about security. This kind of planning, thinking, and methodical due diligence allows us to take drastic action during bursts of workload without even breaking a sweat. I was nervous, of course, but it was convenient for me to be in my place.

Leonard recently spoke to SearchStorage in detail about managing storage for a data-driven business. He discussed what it takes to offer a private storage cloud, with the ability to provide AWS functionality to its users, while still storing any data in Bloomberg data centers.

If there is no longer a pandemic, what difficulties do Bloomberg engineers have with managing storage?

We have many needs, we are simply torn apart in different directions. So we need to provide many different types of products at different SLA levels to help our application developers focus on their tasks instead of worrying about the storage itself.

And what strategy do you follow for this?

Part of what we're trying to do is improve storage performance. Think of the AWS model where a development engineer walks in, presses a button, and then β€œclick” magically gets the right type of storage to solve their problem.

What does your storage infrastructure look like?

Since we have a very diverse ecosystem, and we also have many different developers, we cannot offer a single product. We have object, file and block storage. These are different products and we offer different types of technologies to provide them. For block we use SAN. We also have SDS, which provides another block storage option with a different set of performance requirements. We use NFS for files. SDS is also used for object storage. The block and object parts form an internal private cloud for computing and storage.

So you don't use public cloud storage?

All right. Some development teams have permission to use public clouds. But due to the nature of our business, we prefer to have more control over the things that leave our walls. So yes, we have our own clouds that are under our control. This equipment is located in our data center under our control.

We in our data centers prefer a multi-vendor strategy. They are major suppliers, but we will not say who (it is Bloomberg's policy not to support any supplier, approx. translator).

Are you using hyperconverged infrastructure to build your private cloud?

No. We at Bloomberg are taking a direction where we are not moving towards hyperconvergence. We're trying to separate compute from storage so that we can scale them independently. The direction we're moving in, especially with regards to our cloud, is for us to be able to separate these two entities. And all because we have some things that require intensive calculations, while others require storage. If you scale them evenly, you will lose resources, no matter if they are money, or place in data centers, or by buying capacities that you do not need. This is why we like to use a common interface between these two entities, but let them be completely different systems and run by different teams.

What obstacles must be overcome to build a private cloud?

Scale problem. As with most things, the devil is in the details. When you think about how these things work, how to make them fault-tolerant, how to handle the operational load, how you communicate with the physical asset teams, things get a little interesting. The challenge is to find a way to make everything a scalable and maintainable product that our application developers would want to use, while being able to enrich the feature set while staying at the forefront of what public clouds are doing. And also make it all work together so that it continues to work. This is our main problem - we work in all areas of business, trying to satisfy all needs, but not ignoring other needs.

Do you think you need the latest features available on AWS and other public clouds?

The funniest fact about the S3 is the constantly changing standard of living, new features are always being added. It's like a new toy. If someone sees a new feature in the latest release, they will want it. Not all AWS features are applicable in our environment, so it is important and interesting to know what will help developers and how to get it on their own.

What storage hardware do you use?

We use the latest equipment. Our internal cloud is fully based on NVMe Flash, which makes these systems very productive. This makes our lives a little easier, and it's also a nice feature for our developers since they don't have to care about storage performance.

What are you using object storage for?

We have 6000 developers working on the infrastructure, they are not united by any one use case. Any option you can think of, we probably have in object storage. Some teams use it for cold archival storage, some for data transfer, and others who use it for transactional applications. All these use cases require different levels of SLA, so as you can see, we have different types of traffic, all kinds of needs for different users of our infrastructure. This is not a homogeneous use case that runs on top of any of our storage, which clearly complicates the task.

How big a role do Kubernetes and containers play for you, and how does that affect storage?

We're pushing storage productivity to create a cloudy, something-as-a-service feel where there's a button for developers to speed up their craft and remove infrastructure along the way.

Editor's n.b.: October 15, 2020 will be ready Ceph video course. You will learn Ceph network storage technology to use in your projects to improve fault tolerance.

We have three teams, the first is the storage API team. They make programmatic accesses, endpoints, and predefined workflows for Bloomberg's application developer clients. This is a team of full stack web developers, they use node.js, python, open source technologies like Apache Airflow, so they are learning containerization and virtualization.

We also have two technical teams that actually drive bits and bytes. They are more directly related to the equipment. We have a lot of equipment, and these teams do not use virtualization and containers.

We're trying to keep up with what's happening in the industry, looking into the Kubernetes CSI drivers, and working closely with the Kubernetes implementation team at Bloomberg to see if we can get Kubernetes storage to run smoothly with the technology we have, and it's working for us. . We use SDS to support Kubernetes connected to persistent storage. We have successfully piloted this technology, and there is ongoing discussion between the two teams on how we can make it available to everyone else at Bloomberg. We have shown that this is quite real.

What other open source software do you use, specifically for storage?

We use Apache Airflow, HAProxy to restrict application traffic. We also use Ceph, the platform for SDS. With it, you can have one system for commands, but provide multiple interfaces to clients. One of the virtualization platforms runs on OpenStack - we work closely with this team. We have an open source virtualization platform using the open source SDS platform for storage. It's funny.

What storage technologies are you considering for the next two to three years?

We are always looking into other exciting new things that are happening in the storage industry. It's part of our job, it's not "here's your SAN, manage here, here's your NFS, manage there." We try to communicate with our clients, ie. our application developers. We are working together to understand what problems they are trying to solve and how it will affect our external Bloomberg customers - banks and other entities that use our software. And then we go back to the world of data storage to find a way to help them reach their goal. How can we help them find the right storage technology for their SLA or whatever they're trying to do? Because we have so many engineers doing cool stuff, it never gets boring.

We are currently looking into a way to improve performance for SDS, which could potentially run on general purpose servers. So we are working on NVMe over TCP, which is a very interesting and cool initiative, one of many. We are also working with key people in the industry and some of the existing suppliers to find out what they offer and what the actual performance will be, whether we can start using it in production in the company. This opens up new horizons that were not previously available.

A little help in PS

PS If I may, I would like to remind you that on September 28-30 Intensive Kubernetes Base, for those who do not know Kubernetes, but want to get acquainted with it and start working.

Source: habr.com

Add a comment