DARC Corporate Consortium: [PS] High-Performance Computing - High-Productivity Servers, Clusters and Supercomputers

SGI® NUMAflex® Architecture

• Up to 24 TB globally addressable memory
• All memory in system addressable by any processor
• Best message passing performance too

Most PC users have learned already that they have more to gain by adding memory in their systems than by adding compute power—this is even more true in supercomputing. SGI Altix offers the industry's most scalable memory architecture, offering up to 24TB of globally addressable memory in a system. This is in stark contrast to clustered architectures where addressable memory is typically limited to 32GB.

Graph explaining how global shared memory allows access to all data in the system's memory directly.

Graph explaining how traditional clusters must pass copies of data since the memory is not shared.

Systems with global shared memory allow access to all data in the system's memory directly and efficiently, without having to move data through I/O or networking bottlenecks. The impact can be dramatic. For example, the latency or 'wait times' for messages that must go through system I/O are on the order of 1000 to 10000 greater than communication within the memory domain. To overcome this, clusters without global shared memory must instead pass copies of data, often in the form of messages, which can greatly complicate programming and slow down performance by increasing the time processors must wait for data.

Why does this matter? Users can do more and get results sooner. Entire databases can be driven directly out of memory. Hardware and software costs are lower. System administrators and users spend less time feeding and tuning the cluster. Developers have new flexibility to choose any programming model for application scalability. And it's all standards-based Linux. Read more about how SGI's customers and partners have already proven the benefits of Global Shared memory in a variety of industries.

Message Passing Performance:
Latency and Bandwidth performance of common interconnect technologies

Technology	Vendor	MPI latency usec, short msg	Bandwidth per link (unidirectional, MB/s)
NUMAlink 4 (Altix)	SGI	1	3200
RapidArray (XD1)	Cray	1.8	2000¹
QsNet II	Quadrics	2	900²
Infiniband	Voltaire	3.5	830³
High Performance Switch	IBM	5	1000⁴
Myrinet XP2	Myricom	5.7	495⁵
SP Switch 2	IBM	18	500⁶
Ethernet	Various	30	100

Comparison of Linpack system efficiencies in the November 2004 Top 500 list

System/interconnect	Ave. Linpack efficiency for 256P system, %*	Sample size, number of systems on list*
SGI Altix/NUMAlink	84	14
HP Superdome	79	18
Various/Quadrics	75	4
Various/Infiniband	75	3 (one system @288P)
Various/Myrinet	63	19
Various/Gigabit Ethernet	59	14

* Linpack Rmax/Rpeak for 256P systems listed on November, 2004 Top 500 list - see

DARC Corporate Consortium

Pages

Tuesday, January 22, 2008

[PS] High-Performance Computing - High-Productivity Servers, Clusters and Supercomputers