Cloudflare chooses AMD processors for XNUMXth generation edge servers

Cloudflare chooses AMD processors for XNUMXth generation edge servers

More than a billion unique IP addresses pass through the Cloudflare Network every day; it serves more than 11 million HTTP requests per second; it is within 100ms of 95% of the Internet population. Our network spans 200 cities in over 90 countries, and our engineering team has built an extremely fast and reliable infrastructure.

We are very proud of our work and are determined to help make the Internet better and safer. Cloudflare's hardware engineers have a deep understanding of servers and their components to understand and select the best hardware to maximize performance.

Our software stack handles high-load computing and is highly dependent on CPU speed, which requires our engineers to constantly optimize Cloudflare's performance and reliability at all levels of the stack. On the server side, the easiest way to increase processing power is by adding CPU cores. The more cores you can fit in a server, the more data it can process. This is important to us as the diversity of our products and clients grows over time, and the growth in requests requires servers to increase performance. In order to increase their performance, we needed to increase the density of cores - and this is the task we completed. Below is a breakdown of the processor data for the servers we have been deploying since 2015, including the number of cores:

β€”
Gen 6
Gen 7
Gen 8
Gen 9

Beginning of work
2015
2016
2017
2018

CPU
Intel Xeon E5 2630 v3
Intel Xeon E5 2630 v4
Intel Xeon Silver 4116
Intel Xeon Platinum 6162

physical cores
2 x 8
2 x 10
2 x 12
2 x 24

TDP
2 x 85W
2 x 85W
2 x 85W
2 x 150W

TDP per core
10.65w
8.50w
7.08w
6.25w

In 2018, we made a big jump in total cores per server with Gen 9. The environmental impact has been reduced by 33% compared to the 8th generation, giving us the opportunity to increase volume and processing power per rack. Design requirements for heat dissipation (Thermal Design Power, TDP) are mentioned to emphasize that our energy efficiency has also increased over time. This indicator is important for us: firstly, we want to release less carbon into the atmosphere; secondly, we want to make the best use of the energy of data centers. But we know that we have something to strive for.

Our main defining metric is the number of requests per watt. It is possible to increase the number of requests per second by adding cores, however we need to stay within our energy budget. We are constrained by the data center power infrastructure, which, together with our select power distribution modules, gives us a certain upper bound for each server rack. Adding servers to a rack increases power consumption. Operational costs will skyrocket if we go beyond the power limit per rack and have to add new racks. We need to increase processing power while staying within the same power consumption range, which will increase requests per watt, our key metric.

As you might guess, we carefully studied energy consumption at the design stage. The table above shows that we should not spend time deploying more energy-hungry CPUs if the TDP per core is higher than the current generation - this will negatively affect our metric, the number of requests per watt. We carefully examined the ready-to-run systems for our generation X on the market and made a decision. We are moving from our 48 core Intel Xeon Platinum 6162 dual socket design to a single socket 48 core AMD EPYC 7642.

Cloudflare chooses AMD processors for XNUMXth generation edge servers

β€”
Intel
AMD

CPU
Xeon Platinum 6162
EPYC 7642

Microarchitecture
skylake
Zen 2

Code Name
Skylake SP
Rome

Technical process
14nm
7nm

Nucleus
2 x 24
48

Frequency
1.9 GHz
2.4 GHz

L3 cache/socket
24x1.375MiB
16x16MiB

memory/socket
6 channels, up to DDR4-2400
8 channels, up to DDR4-3200

TDP
2 x 150W
225w

PCIe/socket
48 lanes
128 lanes

ISA
x86-64
x86-64

From the specifications it is clear that the chip from AMD will allow us to keep the same number of cores, lowering the TDP. The 9th generation had a TDP per core of 6,25W, while the Xth generation would have it at 4,69W. 25% reduction. Thanks to the increased frequency, and perhaps a simpler single-socket design, we can assume that the AMD chip will perform better in business. While we are conducting various tests and simulations to see how much better AMD will perform.

In the meantime, note that TDP is a simplified metric from the manufacturer's specifications that we used in the early stages of server design and CPU selection. A quick Google search reveals that AMD and Intel have different approaches to defining TDP, which makes this spec unreliable. The actual power consumption of the CPU, and more importantly the power consumption of the server, is what we actually use when making the final decision.

Ecosystem readiness

Early on in our journey to picking the next processor, we looked at a wide range of CPUs from different vendors that were well suited to our software stack and services (written in C, LuaJIT, and Go). We have already described in detail a set of tools for measuring speed in one of our blog posts. In this case, we used the same set - it allows us to evaluate the efficiency of the CPU in a reasonable time, after which our engineers can begin to adapt our programs to a particular processor.

We tested different processors with different numbers of cores, sockets and frequencies. Since we are describing why we settled on the AMD EPYC 7642 in this article, all of the charts in this blog focus on how AMD's processors compare to the Intel Xeon Platinum 6162 from our 9th generation.

The results correspond to the measurements of one server with each of the processor options - that is, with two 24-core processors from Intel, or with one 48-core processor from AMD (server for Intel with two sockets and server for AMD EPYC with one). In the BIOS, we set the parameters corresponding to the running servers. This is 3,03 GHz for AMD and 2,5 GHz for Intel. Simplifying a lot, we expect AMD to perform 21% better than Intel with the same number of cores.

Cryptography

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Looks promising for AMD. On public key cryptography, it performs 18% better. With a symmetric key, it loses for AES-128-GCM encryption options, but in general it shows itself comparable.

Compression

On edge servers, we compress a lot of data to save on bandwidth and increase the speed of content delivery. We pass data through the zlib and brotli C libraries. All tests were run on the blog.cloudflare.com HTML file in memory.

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Cloudflare chooses AMD processors for XNUMXth generation edge servers

AMD won by an average of 29% when using gzip. In the case of brotli, the results are even better, on tests with quality 7, which we use for dynamic compression. On the brotli-9 test, there is a sharp drop - we explain this by the fact that Brotli consumes a lot of memory and overflows the cache. However, AMD wins by a wide margin.

Many of our services are written in Go. In the following graphs, we retest the speed of cryptography and compression on Go with RegExp on 32 KB strings using the strings library.

Go cryptography

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Go Compression

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Go Regexp

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Cloudflare chooses AMD processors for XNUMXth generation edge servers

Go Strings

Cloudflare chooses AMD processors for XNUMXth generation edge servers

AMD performs best in all tests with Go except for the ECDSA P256 Sign, where it was 38% behind - which is strange, given that it performed 24% better in C. It's worth figuring out what's going on. In general, AMD does not win much, but still shows the best results.

LuaJIT

We often use LuaJIT on the stack. This is the glue that holds all parts of the Cloudflare together. And we are glad that AMD won here as well.

In general, the tests show that the EPYC 7642 performs better than two Xeon Platinum 6162s. On a couple of tests, AMD loses - for example, AES-128-GCM and Go OpenSSL ECDSA-P256 Sign - but wins on all others, on average by 25% .

Workload Simulation

After our rapid tests, we ran the servers through another set of simulations in which a synthetic load is applied to the software edge stack. Here we simulate the workload of scenarios with different types of requests that you might encounter in real work. Requests vary in data volume, HTTP or HTTPS protocols, WAF sources, Workers, and many other variables. Below is a comparison of throughput between two CPUs for the types of requests that we encounter most often.

Cloudflare chooses AMD processors for XNUMXth generation edge servers

The results in the chart are measured against the baseline of 9th generation machines with Intel processors, normalized to a value of 1,0 on the x-axis. For example, taking simple 10 KiB requests over HTTPS, we can see that AMD is doing 1,5 times better than Intel in terms of requests per second. On average, for these tests, AMD performed 34% better than Intel. Considering that the TDP for a single AMD EPYC 7642 is 225 W, and for two Intel processors - 300 W, it turns out that in terms of "requests per watt" AMD shows 2 times better results than Intel!

At this point, we were already clearly leaning towards a single socket option for the AMD EPYC 7642 as our future Gen X CPUs. from data centers.

Real work

The first step, of course, was to prepare the server to work in real conditions. All machines in our fleet work with the same processes and services, which provides an excellent opportunity to correctly compare performance. As in most data centers, we have several generations of servers deployed, and we assemble our servers into clusters so that each class contains servers of approximately the same generations. In some cases, this may cause utilization curves to differ between clusters. But not with us. Our engineers have optimized CPU utilization for all generations so that whether a particular machine has 8 cores or 24, CPU usage is usually the same.

Cloudflare chooses AMD processors for XNUMXth generation edge servers

The graph illustrates our comment on the similarity of utilization - there is no significant difference between the use of AMD CPUs in Gen X servers and the use of Intel processors in Gen 9 servers. This means that both test and base servers are equally loaded. Great. This is exactly what we are trying to achieve in the operation of our servers, and we need it for a fair comparison. The two graphs below show the number of requests processed by one CPU core and all cores at the server level.

Cloudflare chooses AMD processors for XNUMXth generation edge servers
Requests per core

Cloudflare chooses AMD processors for XNUMXth generation edge servers
Requests per server

It can be seen that on average AMD processes 23% more requests. Quite good! We've blogged a lot about ways to improve Gen 9 performance. And here we have the same number of cores, but AMD is doing more work with less power. Immediately from the specifications for the number of cores and TDP, it is clear that AMD delivers more speed with greater energy efficiency.

But, as we already mentioned, TDP is not a standard specification, and it is not the same for all manufacturers, so let's look at the real energy use. By measuring the power consumption of the server in parallel with the number of requests per second, we got the following graph:

Cloudflare chooses AMD processors for XNUMXth generation edge servers

In terms of requests per second per watt, the Gen X server on AMD processors is 28% more efficient. You could expect more, given that AMD has a 25% lower TDP, but it should be remembered that TDP is an ambiguous characteristic. We have seen that AMD's actual power consumption is almost the same as the indicated TDP at a frequency much higher than the base one; Intel doesn't have that. This is another reason why TDP is not a reliable measure of energy consumption. CPUs from Intel in our Gen 9 servers are integrated into a multi-node system, while those from AMD work in standard 1U form factor servers. This is not in favor of AMD, since multi-node servers should provide more density with less power consumption per node, but AMD still overtook Intel in terms of power consumption per node.

In most comparisons across specs, test simulations, and real world performance, the 1P configuration of the AMD EPYC 7642 performed significantly better than the 2P Intel Xeon 6162. In some conditions, AMD can perform up to 36% better, and we believe that by optimizing the hardware and software, we we can achieve this improvement on a permanent basis.

It turns out that AMD won.

Additional graphs show NGINX's average latency and p99 latency over a 24 hour period. On average, processes on AMD ran 25% faster. On p99 it runs 20-50% faster depending on the time of day.

Conclusion

The hardware and performance engineers at Cloudflare do a significant amount of testing and research to select the best server configuration for our customers. We love working here because we can solve such big problems and also help you solve your problems with services such as serverless edge computing and an array of security solutions such as Magic Transit, Argo Tunnel and DDoS protection. . All servers in the Cloudflare network are tuned to work reliably, and we are always trying to make each next generation of servers better than the previous one. We believe the AMD EPYC 7642 is the answer to Gen X processor selection.

With Cloudflare Workers, developers deploy their applications across our growing network around the world. We are proud to give our customers the opportunity to focus on writing code while we take care of security and reliability in the cloud. And today, we're even more excited to announce that their work will be deployed on our Gen X generation servers powered by XNUMXnd generation AMD EPYC processors.

Cloudflare chooses AMD processors for XNUMXth generation edge servers
Processors EPYC 7642, codename "Rome" [Rome]

By using AMD's EPYC 7642, we were able to increase our performance and make it easier to expand our network to new cities. Rome was not built in a day, but soon it will be closer to many of you.

In the last couple of years, we have been experimenting with many x86 chips from Intel and AMD, as well as processors from ARM. We expect these CPU vendors to work with us in the future so that we can all build a better internet together.

Source: habr.com

Add a comment