Tera Computer Company (Nasdaq: TERA) designs, builds and sells high-performance, general-purpose, shared-memory computers that are both easy to program and scalable. Tera's Multithreaded Architecture (MTA) systems constitute a significant breakthrough in high performance computing, enabling these systems to outperform similarly priced supercomputers running important industrial, scientific and engineering applications. MTA systems for delivery in 2000 are available in configurations of between 4 and 64 processors.
Web address: www.tera.com
Information: info@tera.com
International Sales Offices: |
|
|
Europe Pierre Hassid 33-1-46-840-815 europe@tera.com |
Japan Susumu Kobayashi Takeshi Jinnai 81-3-3535-7664 japan@tera.com |
|
U.S. Sales Offices: |
|
|
Houston George Stephenson 713-266-2106 |
Seattle Dick Russell 206-701-2068 |
|
Washington, DC Charles Puglisi 410-381-0077 |
West Coast Joe Grisillo 520-297-3312 |
Tera MTA systems represent a significant breakthrough in high performance computing, offering performance improvements over both parallel vector processor and massively parallel or clustered systems.
MTA systems exploit the parallelism inherent in most application programs that is usually run sequentially by other parallel systems. MTA systems offer scalable uniform shared memory. The programmer is freed completely from data layout concerns irrespective of system size. Concerns about data cache misses, poor computation to communication ratios, and parallelism that is too fine or too coarse to allow scalability are all irrelevant on the MTA.
Each MTA processor has up to 128 RISC-like virtual processors. Each virtual processor is a hardware stream with its own instruction counter, register set, stream status word and target and trap registers. A different hardware stream is activated every clock period. This fundamental hardware innovation provides scalable memory latency tolerance. An extremely high bandwidth interconnection network lets each processor access arbitrary locations in uniform shared memory at up to 2.5 gigabytes per second. About 25 active streams per MTA processor are needed to overlap all memory latency with computational processing. In practice, such levels of multithreading are easy to achieve.
A sophisticated, easy-to-use parallel programming environment is provided with the MTA. Tera's Fortran 77, Fortran 90, C, and C++ compilers offer a high level of automatic parallelization. Compiler analysis and performance programming tools are available. These tools, canal and traceview, have a user-friendly graphical interface.
Existing programs written for Cray Research supercomputers can be ported to the MTA system easily. Tera's compilers support Cray syntax wherever possible.
Tera's scalable uniform shared memory allows fast prototyping of parallel code and high levels of programmer productivity. Scientific application programmers are freed to concentrate on physics, not computer science.
MTA systems are constructed from resource modules. Each resource module measures approximately 5 by 7 by 32 inches and contains up to four resources:
The three-dimensional torus topology used in MTA systems has eight or sixteen routing nodes per resource module with the resources sparsely distributed among the nodes. In other words, there are several routing nodes per computational processor rather than the several processors per routing nodes that many systems employ. As a result, the bisection bandwidth of the network scales linearly with the number of processors.
Just as MTA system bandwidth scales with the number of processors, so too does its latency tolerance. The current implementation can tolerate hundreds of cycles of average memory latency, representing a comfortable margin; future versions of the architecture will be able to extend this limit without changing the programming model as seen by either the compilers or the users.
MTA systems offer an unsurpassed combination of performance, programmability, and portability to the high performance computer customer, both now and for many years to come.
| Model | Processors | Memory | Performance | Bisection Bandwidth |
I/O Bandwidth |
|---|---|---|---|---|---|
| MTA 8 | 8 | 8 to 32 GB | 7.2 Gflops | 76.8 GB/s | 3.2 GB/s |
| MTA 16 | 16 | 16 to 32 GB | 14.4 Gflops | 153.6 GB/s | 6.4 GB/s |
| MTA 32 | 32 | 32 to 64 GB | 28.8 Gflops | 153.6 GB/s | 12.8 GB/s |
| MTA 64 | 64 | 64 to 256 GB | 57.6 Gflops | 307.2 GB/s | 25.6 GB/s |
Further information about Tera Computer Company, including recent papers, benchmark results, and current news releases, is available at www.tera.com.