Tuesday, January 22, 2008

[PS] High-Performance Computing - High-Productivity Servers, Clusters and Supercomputers

Global Shared Memory
SGI® NUMAflex® Architecture

• Up to 24 TB globally addressable memory
• All memory in system addressable by any processor
• Best message passing performance too

Most PC users have learned already that they have more to gain by adding memory in their systems than by adding compute power—this is even more true in supercomputing. SGI Altix offers the industry's most scalable memory architecture, offering up to 24TB of globally addressable memory in a system. This is in stark contrast to clustered architectures where addressable memory is typically limited to 32GB.


Graph explaining how global shared memory allows access to all data in the system's memory directly. Graph explaining how traditional clusters must pass copies of data since the memory is not shared.

Systems with global shared memory allow access to all data in the system's memory directly and efficiently, without having to move data through I/O or networking bottlenecks. The impact can be dramatic. For example, the latency or 'wait times' for messages that must go through system I/O are on the order of 1000 to 10000 greater than communication within the memory domain. To overcome this, clusters without global shared memory must instead pass copies of data, often in the form of messages, which can greatly complicate programming and slow down performance by increasing the time processors must wait for data.

Why does this matter? Users can do more and get results sooner. Entire databases can be driven directly out of memory. Hardware and software costs are lower. System administrators and users spend less time feeding and tuning the cluster. Developers have new flexibility to choose any programming model for application scalability. And it's all standards-based Linux. Read more about how SGI's customers and partners have already proven the benefits of Global Shared memory in a variety of industries.

Message Passing Performance:
Latency and Bandwidth performance of common interconnect technologies

Technology Vendor MPI latency
usec, short msg
Bandwidth per link
(unidirectional, MB/s)
NUMAlink 4 (Altix) SGI 1 3200
RapidArray (XD1) Cray 1.8 20001
QsNet II Quadrics 2 9002
Infiniband Voltaire 3.5 8303
High Performance Switch IBM 5 10004
Myrinet XP2 Myricom 5.7 4955
SP Switch 2 IBM 18 5006
Ethernet Various 30 100


Comparison of Linpack system efficiencies in the November 2004 Top 500 list

System/interconnect Ave. Linpack efficiency for 256P system, %* Sample size, number of systems on list*
SGI Altix/NUMAlink 84 14
HP Superdome 79 18
Various/Quadrics 75 4
Various/Infiniband 75 3 (one system @288P)
Various/Myrinet 63 19
Various/Gigabit Ethernet 59 14
* Linpack Rmax/Rpeak for 256P systems listed on November, 2004 Top 500 list - see