
Features:
TeraShake: SDSC SIMULATES THE 'BIG ONE'
by Paul Tooby, SDSC Senior Science Writer
Everyone knows that the "big one" is coming -- a major earthquake on the San
Andreas fault. The southern part of the fault has not seen a major event since
1690, and the accumulated movement may amount to as much as six meters,
setting the stage for an earthquake that could be as large as magnitude 7.7.
But scientists and engineers want to know in more detail just how intensely
the earth will shake during such an event -- and what impact this will have on
structures, particularly in the populated sediment -- filled basins of
Southern California and northern Mexico.
Now, a collaboration of 33 earthquake scientists, computer scientists, and
others from eight institutions has produced the largest and most detailed
simulation yet of just what may happen during a major earthquake on the
southern San Andreas fault. The simulation, known as TeraShake, used the new
10 teraflops DataStar supercomputer and large-scale data resources of the San
Diego Supercomputer Center (SDSC) at UC San Diego (UCSD).
The collaboration is led by Tom Jordan, director of the Southern California
Earthquake Center (SCEC) and professor of Earth Sciences at the University of
Southern California (USC), under the SCEC Community Modeling Environment
(SCEC/CME) NSF Information Technology Research grant. "In addition to
enriching our understanding of the basic science of earthquakes," said Jordan,
"the TeraShake simulation will contribute to estimating seismic risk, planning
emergency preparation, and designing the next generation of earthquake-
resistant structures, potentially saving lives and property." Professor J.
Bernard Minster of the Institute of Geophysics and Planetary Physics
(IGPP/SIO/UCSD), Reagan Moore, Distinguished Scientist and director of SDSC's
SRB program, and Carl Kesselman, Director of the Center for Grid Technologies
at USC's Information Sciences Institute (ISI) are co-PIs of the project.
The TeraShake simulation is a good example of cyberinfrastructure, involving
not only large computation but also massive data and visualization. "The
simulation generated 47 TB of data in a little more than four days," said
Moore. "This required archiving 10 TB of data per day, the highest rate ever
sustained for a single simulation at SDSC." Forty-seven TB, or 47,000 GB, is
equivalent to about 47 million books, or nearly five times the printed
collection of the Library of Congress.
To carry out this complex simulation required sustained cooperation among many
people. "TeraShake is an outstanding example of interdisciplinary
collaboration between the SCEC earthquake scientists and the core groups at
SDSC, as well as the other participants in this groundbreaking research," said
Moore.
Big Earthquake Impacts
The TeraShake simulation modeled the earth shaking that would rattle Southern
California if a 230 kilometer section of the San Andreas fault ruptured from
north to south, beginning near Wrightwood, California and producing a
magnitude 7.7 earthquake.
The scientists emphasize that this research is not designed to predict when an
earthquake will happen, but rather to predict in detail the resulting ground
motion once the earthquake occurs. A key factor the TeraShake simulation will
shed light on is the response of Southern California's deep, sediment-filled
basins, from the Santa Clara Valley to the Los Angeles basin and down to the
Coachella Valley. "In a major earthquake, a basin can jiggle like a bowl of
jelly," said Minster. "The energy bounces off the boundaries and can produce
unexpectedly large and long-lasting ground motions and resulting damage."
Scientists have long known that the valley floors in these basins can
experience extended shaking, but TeraShake filled in the details, with the
southward-rupturing earthquake showing peak velocities of more than two meters
per second and lower velocities lasting for more than three minutes in the
Coachella Valley. For comparison, the strong motion in the 1906 San Francisco
earthquake has been estimated by the USGS to have lasted in the range of 45 to
60 seconds.
In addition to information that will help scientists better understand the
details of earthquakes, the TeraShake simulation will help answer questions
such as which regions of Southern California will be hit hardest under various
scenarios of large earthquakes, and the ground velocities that can be expected
to shake buildings and infrastructure.
Big Simulation
"The TeraShake simulation is the fulfillment of a dream we've had for over ten
years," said Minster. Previous simulations of Southern California have been
limited to smaller domains and coarser resolutions, and advances in both
supercomputers and related data technologies made the current simulation
possible. "If we want to be able to understand big earthquakes and how they
will impact sediment-filled basins, and finally structures, we need as much
detail as possible," said Minster. "And this means massive amounts of data,
produced by a high-resolution model running on the biggest supercomputer we
can get, and this can only be done at a facility with the combined data and
computing resources of SDSC."
The geographic region for the simulation was a large rectangular volume or box
600 km by 300 km by 80 km deep, spanning Southern California from the Ventura
Basin, Tehachapi, and the southern San Joaquin Valley in the north, to Los
Angeles, San Diego, out to Catalina Island, and down to the Mexican cities of
Mexicali, Tijuana, and Ensenada in the south.
To model this region, the simulation used a 3,000 by 1,500 by 400 mesh,
dividing the volume into 1.8 billion cubes with a spatial resolution of 200
meters on a side, and with a maximum frequency of .5 hertz-the biggest and
most detailed simulation of this region to date. In such a large simulation, a
key challenge is to handle the enormous range of length scales, which extends
from 200 meters-especially important near the ground surface and rupturing
fault-to hundreds of kilometers across the entire domain.
Another task was to prepare accurate input data for the domain. These inputs
included the San Andreas fault geometry, and the subsurface 3-D crustal
structure based on the SCEC Community Velocity Model. Seismologist Steven Day,
professor of geological sciences at SDSU, provided the earthquake source,
modeling the fault rupture as a 60 second duration slip, scaled down from the
2002 magnitude 7.9 Alaska earthquake on the Denali fault. In the future, the
researchers plan to integrate a physics-based spontaneous fault rupture model
to initiate the simulation.
Using some 18,000 CPU hours on 240 processors of the new 10 teraflops IBM
Power4+ DataStar supercomputer at SDSC, the model computed 20,000 time steps
of about 1/100 second each for the first 220 seconds of the earthquake,
producing a flood of data.
Data Challenges
"The TeraShake team faced unprecedented issues of data management," said
Moore. "The simulation generated so much data-47 TB in some 150,000 files-and
so rapidly that it pushed the envelope of SDSC's capabilities." Dealing with
this data deluge required the efforts of the High- End Systems and Scientific
Applications groups as well as the Data Grids Technologies group at SDSC to
transfer the data, first to the disk-based Global Parallel file system, GPFS,
and then to SDSC's archival tape storage-and moving it fast enough at 100 MB
per second to keep up with the 10 TB per day of simulation output.
This massive data collection, a valuable resource for further research, was
then registered into the SCEC Digital Library, which is managed by the SDSC
Storage Resource Broker (SRB). The collection is being annotated with
simulation metadata, which will allow powerful data discovery operations using
metadata-based queries. In addition, each surface and volume velocity file was
fingerprinted with MD5 checksums to preserve and validate data integrity. Data
access, management, and data product derivation are provided through various
interfaces to the SRB, including Web service and data grid workflow
interfaces.
The TeraShake simulation is also part of a larger SCEC scientific program with
data collections currently totalling 80 TB. To support research on this scale,
SDSC is working to provide efficient online access to the growing SCEC data
collections archived at SDSC.
Computational Challenges
"The large TeraShake simulation stretched SDSC resources across the board,
facing us with major computational as well as data challenges," said Nancy
Wilkins-Diehr, Manager of Consulting and Training at SDSC.
To simulate the earthquake, the scientists used the Anelastic Wave Model
(AWM), a fourth-order finite difference code developed by Kim B. Olsen,
associate professor of geological sciences at SDSU, that models 3-D velocity
in the volume and surface of the domain. To enhance the code so it could scale
up and run on the very large mesh size of 1.8 billion points, and with large
memory allocation, SDSC computational experts Yifeng Cui, Giri Chukkapalli,
and others in the Scientific Applications Group worked closely with Olsen and
the other scientists who developed the AWM model. To successfully "build
bridges" between the earthquake scientists and SDSC resources, the SDSC staff
made use of their multidisciplinary expertise, which includes degrees in
scientific and engineering disciplines, combined with extensive experience in
the intricacies of today's parallel supercomputers.
For a large scale run such as TeraShake, new problems tend to emerge that are
not significant in smaller scale runs. It took months of effort by the SDSC
researchers and 30,000 allocation hours to port the code to the DataStar
platform and resolve parallel computing issues, testing, validation, and
performance scaling related to the large simulation.
SDSC's computational effort was supported through the NSF-funded SDSC
Strategic Applications Collaborations (SAC) and Strategic Community
Collaborations (SCC) programs. "TeraShake is a great example of why these
programs are so important," said Wilkins-Diehr. "Allowing us to develop close
collaborations between the computational scientists who use SDSC's
supercomputers and our computational experts is crucial to achieving new
science like TeraShake." The effort will also provide lasting value, with the
enhanced AWM code now available to the earthquake community for future large-
scale simulations.
Big Collaboration
"TeraShake owes its success to the enthusiastic teamwork over a number of
months among groups with very different skills-seismologists, computer
scientists, the computational experts in SDSC's Scientific Applications Group,
the storage, HPC, and visualization groups at SDSC, and many others," said
Marcio Faerman, a postdoctoral researcher in SDSC's Data Grids Technologies
group who coordinated the team at SDSC. "These activities are not always
visible, but they are essential."
For example, researchers from SIO provided the checkpoint restart capability,
executed cross-validation runs, and helped define the metadata. SDSC's
Scientific Applications Group and High-End Systems Group executed DataStar
benchmarks to determine the best resource configuration for the run, and
scheduled these resources for the simulation. The Data Grids Technologies
group, which develops the SDSC SRB, designed and benchmarked the archival
process. Steve Cutchin and Amit Chourasia of SDSC's visualization group
labored long and hard to produce high resolution visualizations, including
movies, of how the earthquake waves propagated, even while the simulation was
still running. This helped the scientists ensure that the simulation was
producing valid data and produced dramatic views of the enormous energy that
may strike areas near the San Andreas fault during the "big one."
Earthquake Science
The long term goal of SCEC is to integrate information into a comprehensive,
physics-based and predictive understanding of earthquake phenomena. TeraShake
is an important step forward in this process, and the researchers presented
the simulation results at the recent SCEC Annual meeting, attended by nearly
400 of the best earthquake seismologists in the country and world. "This is a
very tough audience," said Minster, "and they positively loved the TeraShake
results-many scientists who had been skeptical of large-scale simulations came
to us using words like 'fantastic,' and 'amazing.'"
Seismologists see the TeraShake results as very valuable. "Because the
TeraShake simulation is such high resolution, we can see things we've never
seen before," explained Minster. "For example, we were surprised to see that
the strong shaking in the Coachella Valley made it behave like a secondary
earthquake source, and despite the southward-moving rupture, it reflected
waves back northward to shake Los Angeles."
The earthquake research community is enthusiastic about making use of the
capabilities demonstrated in TeraShake. "Many want to participate, they want
the movies of TeraShake on the Web, and many want to know how to get the
archived output to use in further research," said Minster. "Others want to
team up for new simulations."
In the near future, the researchers plan to run multiple scenarios at the same
resolution, for example, having the fault rupture from south to north, instead
of north to south as in the first TeraShake run. Eventually, the scientists
would like to be able to extend the simulations to even higher resolution to
more accurately model the intricate details and higher frequency shaking of
earthquakes, which affects structures.
But even doubling the spatial resolution from 200 to 100 meters, for example,
will produce eight times the spatial data, along with twice as many time
steps, for a total of 16 times more information-in the range of 800 TB. This
exceeds the current capabilities of even the large resources of SDSC. And
scaling the code to run in larger simulations will require additional efforts
from SDSC's computational experts. These challenges will drive future
cyberinfrastructure growth to support such simulations with one to two PB of
disk and 10 to 20 PB of tape, and with GB/sec parallel I/O so that researchers
can access and compute with these massive and fast-growing collections.
"Beyond TeraShake, expanding our capability to handle large simulations and
data at SDSC is useful for other large-data simulations such as ENZO, an
astrophysics simulation of the early universe, as well as data-intensive
analyses of observed data collections like the multi-TB all-sky image
collections of the National Virtual Observatory," said Moore. TeraShake
demonstrates SDSC's capabilities as a leading site for end-to-end data-
intensive computing, and is expected to encourage more researchers to explore
how far the capabilities have grown to support their own large-scale
computational and data problems.
In addition to SDSC, IGPP/SIO, USC, and ISI, other institutions taking part
include San Diego State University (SDSU), the University of California Santa
Barbara (UCSB), and Carnegie Mellon University (CMU), along with the
Incorporated Research Institutions for Seismology (IRIS) and the US Geological
Survey (USGS), which participate in the SCEC/CME Project. -Paul Tooby.
Project Leaders
J. Bernard Minster, IGPP/SIO/UCSD, Kim B. Olsen and Steven Day, SDSU, Tom
Jordan and Phil Maechling, SCEC/USC, Reagan Moore and Marcio Faerman,
SDSC/UCSD
Participants
Bryan Banister, Leesa Brieger, Amit Chourasia, Giridhar Chukkapalli, Yifeng
Cui, Steve Cutchin, Larry Diegel, Yuanfang Hu, Arun Jagatheesan, Christopher
Jordan, Patricia Kovatch, George Kremenek, Amit Majumdar, Richard Moore, Tom
Sherwin, Donald Thorp, Nancy Wilkins-Diehr, and Qiao Xin, SDSC/UCSD
Jacobo Bielak and Julio Lopez, CMU, Marcus Thiebaux, ISI, Ralph Archuleta,
UCSB, Geoffrey Ely and Boris Shkoller, UCSD, David Okaya, USC
|