Tera Computer Company (Nasdaq: TERA) designs, builds and sells high-performance, general-purpose, shared-memory computers that are both easy to program and scalable. Tera's Multithreaded Architecture (MTA) systems constitute a significant breakthrough in high performance computing, enabling these systems to outperform similarly priced supercomputers running important industrial, scientific and engineering applications. MTA systems for delivery in 2000 are available in configurations of between 4 and 64 processors.

Web address: www.tera.com
Information: info@tera.com

International Sales Offices:

Europe
Pierre Hassid
33-1-46-840-815
europe@tera.com
Japan
Susumu Kobayashi
Takeshi Jinnai
81-3-3535-7664
japan@tera.com

U.S. Sales Offices:

Houston
George Stephenson
713-266-2106
Seattle
Dick Russell
206-701-2068
Washington, DC
Charles Puglisi
410-381-0077
West Coast
Joe Grisillo
520-297-3312


MTA

Beyond Massive Parallelism

Tera's Breakthrough

Tera MTA systems represent a significant breakthrough in high performance computing, offering performance improvements over both parallel vector processor and massively parallel or clustered systems.

MTA systems exploit the parallelism inherent in most application programs that is usually run sequentially by other parallel systems. MTA systems offer scalable uniform shared memory. The programmer is freed completely from data layout concerns irrespective of system size. Concerns about data cache misses, poor computation to communication ratios, and parallelism that is too fine or too coarse to allow scalability are all irrelevant on the MTA.

Virtual processors

Each MTA processor has up to 128 RISC-like virtual processors. Each virtual processor is a hardware stream with its own instruction counter, register set, stream status word and target and trap registers. A different hardware stream is activated every clock period. This fundamental hardware innovation provides scalable memory latency tolerance. An extremely high bandwidth interconnection network lets each processor access arbitrary locations in uniform shared memory at up to 2.5 gigabytes per second. About 25 active streams per MTA processor are needed to overlap all memory latency with computational processing. In practice, such levels of multithreading are easy to achieve.

Parallel programming

A sophisticated, easy-to-use parallel programming environment is provided with the MTA. Tera's Fortran 77, Fortran 90, C, and C++ compilers offer a high level of automatic parallelization. Compiler analysis and performance programming tools are available. These tools, canal and traceview, have a user-friendly graphical interface.

Existing programs written for Cray Research supercomputers can be ported to the MTA system easily. Tera's compilers support Cray syntax wherever possible.

Tera's scalable uniform shared memory allows fast prototyping of parallel code and high levels of programmer productivity. Scientific application programmers are freed to concentrate on physics, not computer science.

Scaling

MTA systems are constructed from resource modules. Each resource module measures approximately 5 by 7 by 32 inches and contains up to four resources:

Each resource module is individually connected to a separate routing node in the system's 3D toroidal interconnection network. This connection is capable of supporting data transfers to and from memory at full processor rate in both directions, as are all of the connections between the network routing nodes themselves.

The three-dimensional torus topology used in MTA systems has eight or sixteen routing nodes per resource module with the resources sparsely distributed among the nodes. In other words, there are several routing nodes per computational processor rather than the several processors per routing nodes that many systems employ. As a result, the bisection bandwidth of the network scales linearly with the number of processors.

Just as MTA system bandwidth scales with the number of processors, so too does its latency tolerance. The current implementation can tolerate hundreds of cycles of average memory latency, representing a comfortable margin; future versions of the architecture will be able to extend this limit without changing the programming model as seen by either the compilers or the users.

MTA systems offer an unsurpassed combination of performance, programmability, and portability to the high performance computer customer, both now and for many years to come.

Configurations

Model Processors Memory Performance Bisection
Bandwidth
I/O
Bandwidth
MTA 8 8 8 to 32 GB 7.2 Gflops 76.8 GB/s 3.2 GB/s
MTA 16 16 16 to 32 GB 14.4 Gflops 153.6 GB/s 6.4 GB/s
MTA 32 32 32 to 64 GB 28.8 Gflops 153.6 GB/s 12.8 GB/s
MTA 64 64 64 to 256 GB 57.6 Gflops 307.2 GB/s 25.6 GB/s

Specifications

Further information about Tera Computer Company, including recent papers, benchmark results, and current news releases, is available at www.tera.com.