CSC World Home
Current Issue
Authors
Back Issues
Contact CSC World
Subscribe to CSC World
Download CSC World
CSC WORLD - Departments
Putting Innovation to Work
csc.com CSC World October/December 2006 Departments Man on top of a mountain

IN PRACTICE: CSC Builds NASA’s Supercomputer Super Fast and Super Cheap

E-mail StoryStory FeedbackPrint Version
NASA Supercomputer

By Rob Woodward

The National Aeronautics and Space Administration wanted CSC to help it build the world’s fastest computer. The catch: It had to be done at a tenth of the cost and time it had taken to build the existing record-holder, Japan’s Earth Simulator.  

In 2004, NASA knew that to accomplish upcoming missions it would need 10 times the computing power it had to start the year. Only a top supercomputer could perform the complex data modeling required to safely return the space shuttle to flight, build a Mars rover, track climate patterns, and make long-range weather forecasts — all at the same time.

Meanwhile, the United States had fallen behind in the high-performance computing race. Japan’s Earth Simulator had been the world’s fastest supercomputer for three years. Intel and SGI wanted to overtake Earth Simulator and offered NASA substantial discounts on hardware, including more than 10,000 processors. But Intel and SGI wanted the supercomputer to produce competitive results in time for the University of Mannheim’s November publication of the world’s top 500 fastest supercomputers. NASA would have to top the list within four months.

To make it happen, NASA initiated Project Columbia, named in honor of the fallen space shuttle, and partnered with CSC and Advanced Management Technology Inc. (AMTI). In July, the CSC-AMTI team started work at the NASA Advanced Supercomputing (NAS) facility at Ames Research Center in California. The team designed Columbia’s facilities and architecture, then integrated the system and gave it operational support. “All I asked was a miracle a day,” says Walt Brooks, then chief of NASA’s supercomputer staff. “120 days later, we did it.”

Setting a blistering pace

NASA faced challenges unsurpassed in supercomputing history, says Christopher Buchanan, CSC’s site manager for high-end computing. Budget constraints had forced a 40 percent reduction in NAS staff and restricted Project Columbia’s funding to a fraction of the cost of similar projects. While the offer by Intel and SGI made Project Columbia simpler financially, it made it more difficult technically. Top-tier supercomputers typically require years of planning and development, and several hundred million dollars. Earth Simulator, for example, took at least five years and $500 million to complete. Building Columbia in 120 days with less money and fewer people looked like an impossible task.

The immense workload compelled the team to develop a complex and demanding schedule. “Our team worked in shifts around the clock, each member putting in 60-80 hours a week,” says Buchanan. “We came up with a new process, coordinating groups to work simultaneously, performing repeatable operations almost in assembly-line fashion.” With each of the 20 systems CSC installed, the team was able to refine and speed the process. At its peak, the team installed nine systems and moved them into production in 10 days.

A work in progress

Throughout the build, NASA’s existing computing projects could not be disrupted. The usual procedure of installing all systems at once before using the computer could not be followed. The team had to maintain the production environment, minimizing disruption to users, while incrementally adding new systems. This had never been done before.

The NASA team had previously set a record by installing a single 512-processor SGI system in 30 days. It now had to install, power, cool and network 20 of these systems — in five days apiece. This was done by replicating an existing system that had been well tested, secured, and heavily used. The team divided into parallel work groups, each with their own process outline and responsibilities for tasks such as facilities construction, hardware installation, networks, security and software.

Project Columbia included new SGI hardware called the Altix 3700Bx2. The Altix system provided direct access to all 2,048 processors simultaneously, allowing scientists to run larger jobs. The system had not been available for testing before delivery at NASA, and the tight schedule didn’t allow for thorough testing.

As work progressed, the team faced many unforeseen challenges: The elevator to move hardware to the second floor broke; the forklift to move hardware downstairs broke; defective cables had to be replaced; and a major water main break caused a loss of cooling power. Then some of the hardware shipments were delayed by several weeks. Nine of the processor systems didn’t arrive until 10 days before the system installation deadline, and they all had to be put into production simultaneously.

On time, on budget, on the mark

Project Columbia revolutionized the cost and time required to build a top-level supercomputer. Benchmarked in October at 51.9 trillion calculations per second, Columbia became the world’s fastest production supercomputer and the first US computer to eclipse the record-holding Earth Simulator since it debuted in Japan in 2001. At a $50 million budget, however, Columbia’s cost was one-tenth that of the Japanese system. It also took one-tenth of the time to launch.

According to Buchanan, Columbia’s power makes the NASA Ames Research Center one of the world’s premier scientific resources. Columbia is accessible to scientists via the Internet from anywhere in the world. It is used by NASA missions, government agencies, major universities, and industries. Current applications on Columbia include: a hurricane weather simulator; applications supporting the New Horizons mission to Pluto; Return-to-Flight, a suite of programs supporting the return of NASA’s space shuttle fleet to operation; ECCO, an ocean-modeling application; and a suite of exploration systems applications used to design the Mars Crew Exploration Vehicle. These and other applications are providing scientific results at a scale and in a time frame that was not possible before Columbia.

For their work on the supercomputer, four members of CSC’s 45-person Columbia staff were selected to receive the 2006 Award for Technical Excellence, CSC’s highest honor for IT innovation. The team’s recipients included Davin Chan, Ed Hook, George Myers, and Herbert Yeung. “I had the difficult task to choose four to represent the larger group,” said Buchanan. “Even though these four people were chosen for the award, it took the entire CSC staff working together with AMTI and NASA to make this happen.”

Rob Woodward is a writer in CSC’s Corporate Communications & Marketing department.

Related Information

Read about CSC’s aerospace & defense solutions.

Learn more about CSC’s U.S. federal government business.

Visit NASA's Web site.

 

 

E-mail StoryStory FeedbackPrint Version
CSC World - Putting Innovation to Work