HPE Superdome Flex: A New Level of Performance and Scale

Last December, HPE announced the world's most scalable modular in-memory computing platform, HPE Superdome Flex. It is a breakthrough in computing to support mission-critical applications, real-time analytics, and data-intensive high-performance computing.

Platform HPE Superdome Flex has a number of characteristics that make it unique in its industry. We offer you a translation of the article from the blog Servers: The Right Compute, which discusses the platform's modular and scalable architecture.

HPE Superdome Flex: A New Level of Performance and Scale

Scalable beyond Intel

Like most x86 server vendors, HPE uses the latest Intel Xeon Scalable family, codenamed Skylake, in its latest generation of servers, including the HPE Superdome Flex. Intel's reference architecture for these processors uses the new UltraPath Interconnect (UPI) technology, with a scaling limit of eight sockets. Most of the vendors that use these processors use "no-glue" connections in servers, but HPE Superdome Flex uses a unique modular architecture that scales beyond Intel's ability to scale from 4 to 32 sockets in a single system.

This architecture is used because we saw a need for platforms that scale beyond eight Intel sockets; this is especially true today, when data volumes are increasing at an unprecedented rate. Also, since Intel designed the UPI primarily for two and four socket servers, "no glue" eight socket servers run into throughput issues. Our architecture ensures high throughput even as the system grows to its maximum configuration.

Price/performance ratio as a competitive advantage

HPE Superdome Flex: A New Level of Performance and ScaleThe HPE Superdome Flex modular architecture is based on a four-socket chassis that scales up to eight chassis and 32 sockets in one server system. A wide range of processors is available for use in the server: from inexpensive Gold models to the top Platinum series of the Xeon Scalable processor family.

This choice between Gold and Platinum processors across the entire scaling range provides excellent price/performance advantages over entry-level systems. For example, in a typical 6TB configuration, the Superdome Flex provides a cheaper and higher performance solution than competing four-socket offerings. Why? Due to the architecture, other manufacturers of 4-processor systems are forced to use 128 GB DIMMs and more expensive processors with support for 1.5 TB per socket. This is significantly more expensive than using 64GB DIMMs in an eight-socket Superdome Flex. As a result, the eight-socket Superdome Flex platform with 6TB of memory delivers twice the processing power, twice the memory bandwidth, and twice the I/O, and will still be more cost effective than competitive four-socket products. sockets and 6 TB of memory.

Likewise, for an 8-socket configuration with 6 TB of memory, the Superdome Flex platform can provide a lower cost, higher performance eight-socket solution. How? Other manufacturers of 8-socket systems are forced to use the more expensive Platinum processors, while the eight-socket Superdome Flex can use inexpensive Gold processors, providing the same amount of memory.

In fact, among platforms based on the Intel Xeon Scalable processor family, only Superdome Flex can support the more economical Gold processors in 8-socket or higher configurations (Intel's "glueless" architecture only supports 8 sockets with expensive Platinum processors). We also offer a wide range of processors ranging from 4 to 28 cores per processor, allowing you to match the number of cores to your workload requirements.

The importance of scaling within a single system

The ability to scale up within a single system, or scale up, provides a number of benefits for mission-critical workloads and databases where HPE Superdome Flex is best suited. These include traditional and in-memory databases, real-time analytics, ERP, CRM, and other transactional applications. For these types of workloads, it is easier and cheaper to manage a single scale-out environment than a scale-out cluster; in addition, it significantly reduces latency and improves performance.

Check out the blog post Scale-out and scale-out speed with SAP S/4HANAto understand why vertical scaling is much more efficient than horizontal scaling (clustering) for these types of workloads. Basically, it's all about speed and the ability to perform at the right level for these mission-critical applications.

Consistently high performance up to the highest configurations

The high scalability of the Superdome Flex is achieved with the unique HPE Superdome Flex ASIC chipset, connecting individual 4-socket chassis as shown in Figures 1 and 2. All ASICs are directly interconnected (with a one-step distance), providing minimal access delays to remote resources and maximum performance. HPE Superdome Flex ASIC technology provides adaptive routing to load-balance the fabric and optimize latency and throughput to improve system performance and availability. The ASIC integrates the chassis into a cache coherent fabric and maintains cache consistency across processors using a large directory of cache line state records built directly into the ASIC. This coherence scheme plays a critical role in giving the Superdome Flex the ability to support near-linear performance scaling from 4 to 32 sockets. Typical "no-glue" architectures exhibit already more limited performance scaling (between four and eight sockets) due to coherency service request broadcasts.

HPE Superdome Flex: A New Level of Performance and Scale
Rice. Figure 1. HPE Flex Grid 32-socket Superdome Flex Server Wiring Diagram

HPE Superdome Flex: A New Level of Performance and Scale
Rice. 2. 4-processor chassis

Common memory

Similar to processor resources, the amount of memory can be increased by adding a chassis to the system. Each chassis has 48 DDR4 DIMM slots that can accept 32GB RDIMMs, 64GB LRDIMMs, or 128GB 3DS LRDIMMs for a maximum of 6TB of memory per chassis. Accordingly, the total amount of RAM HPE Superdome Flex in a maximum configuration with 32 sockets reaches 48 TB, which allows you to work with the most resource-intensive applications using in-memory technology.

High I/O flexibility

In terms of I/O, each Superdome Flex chassis can be equipped with either a 16 or 12 I/O slot cage to provide a wide range of standard PCIe 3.0 card installation options and the flexibility to system balance for any workload. In either case, the I/O slots are connected directly to the processors without the use of bus repeaters or expanders, which could increase latency or reduce throughput. This ensures the best possible performance for each I/O card.

Low latency

Low latency access to the entire shared RAM space is a key factor in Superdome Flex's high performance. Regardless of whether the data resides in local memory or in remote (in another chassis), a copy of it may reside in the cache of different processors within the system. The cache coherency mechanism ensures that cached copies are consistent if a process modifies data. The processor access latency to local memory is about 100 ns. The latency of accessing data in the memory of another processor through the UPI channel is about 130 ns. Processors accessing data residing in the memory of another chassis travel between two Flex ASICs (always directly connected) with a latency of less than 400 ns, regardless of which chassis the processor is in. As a result, Superdome Flex provides over 210 GB/s bi-sectioned bandwidth in an 8-socket configuration, over 425 GB/s in a 16-socket configuration, and over 850 GB/s in a 32-socket configuration. This is more than enough for the most demanding and resource-intensive workloads.

Why is high modular scalability important?

It's no secret that the amount of data is growing at an unprecedented rate; this means that the infrastructure must be able to handle the increasingly demanding processing and analysis of mission-critical and ever-expanding data. But growth rates can be unpredictable.

When deploying RAM-intensive applications, you may ask: how much will it cost me next TB of memory? Superdome Flex allows you to expand memory without changing hardware because you are not limited to DIMM slots in a single chassis. In addition, as the number of users increases, mission-critical applications always require high performance, regardless of the amount of workload.

Today, in-memory databases require low latency, high throughput hardware platforms. With its innovative architecture, the HPE Superdome Flex Platform delivers exceptional performance, high throughput, and consistently low latency, even in the largest configurations. What's more, you can get all of this for your mission-critical workloads and databases at a very attractive price/performance ratio compared to other vendors' systems.

You can learn about the unique fault tolerance (RAS) properties of the Superdome Flex server from the blog HPE Superdome Flex: RAS Unique Features and technical description HPE Superdome Flex: Server Architecture and RAS Specifications. Also recently published a blog dedicated to HPE Superdome Flex updatesannounced at HPE Discover.

Of this article you can learn how HPE Superdome Flex is used to solve cosmology challenges and how the platform is prepared for memory-driven computing, a new memory-based computing architecture.

You can also learn more about the platform from webinar recordings.

Source: habr.com

Add a comment