Red Storm (computing)

Red Storm was a supercomputer architecture designed for the US Department of Energy’s National Nuclear Security Administration Advanced Simulation and Computing Program. Cray, Inc developed it in 2004 based on the contracted architectural specifications provided by Sandia National Laboratories.[1] The architecture was later commercially produced as the Cray XT3.[2]

Red Storm was a partitioned, space shared, tightly coupled, massively parallel processing machine with a high performance 3D mesh network. The processors were commodity AMD Opteron CPUs with off-the-shelf memory DIMMs. The NIC/router combination, called SeaStar, was the only custom ASIC component in the system and used a PowerPC 440 based core. When deployed in 2005, Red Storm’s initial configuration consisted of 10,880 single-core 2.0 GHz Opterons, of which 10,368 were dedicated for scientific calculations. The remaining 512 Opterons were used to service the computations and also provide the user interface to the system and run a version of Linux. This initial installation consisted of 140 cabinets, taking up 280 square metres (3,000 sq ft) of floor space.

The Red Storm supercomputer was designed to be highly scalable from a single cabinet to hundreds of cabinets and was scaled up twice during its lifetime. In 2006 the system was upgraded to 2.4 GHz Dual-Core Opterons. An additional fifth row of computer cabinets were also brought online resulting in over 26,000 processor cores. This resulted in a peak performance of 124.4 teraflops, or 101.4 running the Linpack benchmark.[3] A second major upgrade in 2008 introduced Cray XT4 technology: Quad-core Opteron processors and an increase in memory to 2 GB per core. This resulted in a peak theoretical performance of 284 teraflops.[4]

Top500 performance ranking for Red Storm after each upgrade:

  • November 2005: Rank 6 (36.19 TFLOPS)[5]
  • November 2006: Rank 2 (101.4 TFLOPS)[6]
  • November 2008: Rank 9 (204.2 TFLOPS)[7]

Red Storm was intended for capability computing. That is, a single application could be run across the entire system. This is in contrast to cluster-style capacity computing, in which portions of a cluster are assigned to run different applications. The performance of the memory subsystem, the processor, and the network must be in proper balance to achieve adequate application progress across the entire machine. System software played a key role as well. The Portals network programming API was used to ensure inter-processor communication can scale as large as the entire system, and was used on many different supercomputers, including the Intel Teraflops and Paragon. The compute processors use a custom lightweight kernel operating system named Catamount, which was based on the operating system of ASCI Red called "Cougar".[8] A userspace implementation of the Lustre file system, named liblustre, was ported to the Catamount environment using libsysio[9] library to provide POSIX-like semantics. This filesystem client ran in the single-threaded Catamount environment without interrupts,[10] and only serviced IO requests when explicitly allowed by the application, to reduce jitter introduced by background file system operations.

Red Storm was decommissioned in 2012.[11]

  1. ^ "Red Storm 2004 fact sheet" (PDF) (Press release). June 2004. Archived from the original (PDF) on 2009-05-11. Retrieved 2009-08-11.
  2. ^ "Sandia Red Storm press release" (Press release). 2004-07-27. Archived from the original on 2009-08-26. Retrieved 2009-08-11.
  3. ^ "Red Storm upgrade lifts Sandia supercomputer to 2nd in world, but 1st in scalability, say researchers" (Press release). 2006-11-14. Retrieved 2009-08-11.
  4. ^ "Cray and Sandia Announce Agreement to Upgrade "Red Storm" Supercomputer to 284 Teraflops" (Press release). 2008-02-06. Archived from the original on 2019-05-17. Retrieved 2009-08-11.
  5. ^ "Top 500 rankings for Nov 2005". November 2005. Retrieved 2009-08-11.
  6. ^ "Top 500 rankings for Nov 2006". November 2006. Retrieved 2009-08-11.
  7. ^ "Top 500 rankings for Nov 2008". November 2008. Retrieved 2009-08-11.
  8. ^ "Red Storm 2008 fact sheet" (PDF) (Press release). 2008. Archived from the original (PDF) on 2009-08-26. Retrieved 2009-08-11.
  9. ^ "libsysio". 2006. Retrieved 2016-02-16.
  10. ^ "Catamount Software Architecture with Dual Core Extensions" (PDF). 2005. Retrieved 2016-02-16.
  11. ^ "Red Storm Passes". June 2012. Retrieved 2012-11-02.