
Features - Enterprise Data Insights:
UNLOCKING THE MYSTERY OF MAID By Aloke Guha, Founder/CTO, COPAN
Systems
Recently there has been much discussion about the management of active archive
data, 85 percent of which resides on automated tape libraries (ATL) or optical
disks. Whether for faster recovery, compliance with government regulations or
improving business access, companies have been gradually migrating their
long-term data to lower cost, traditional disk arrays. This has improved
access and recovery speeds and has even offered RAID protection to some of the
data.
Most data ultimately moves to tape due to its superior scalability and
affordability, since tape costs three to five times less than even the lowest
priced disk technology. As a result, companies have to manage data both on
accessible, protected disk and on slower and more labor-intensive tape. To
fill this gap, the MAID (massive array of idle disks) architecture is being
introduced.
This idea was conceptualized at the University of Colorado to examine
tradeoffs in disk power consumption and performance. The findings show that
with a modest disk cache, MAID can effectively support most reads from a large
database archive.
The Basis For MAID
The basic concept behind MAID is to build storage designed specifically for
WORO (write once, read occasionally) applications, where the focus is on
bandwidth rather than IOPS. These applications typically invoke the 80/20
rule. Because access is directed to a small fraction of the data, the disks
can be powered as needed, reducing heat, vibration and sympathetic resonance.
This enables very high density and spreads the controller cost across hundreds
of drives.
A typical MAID architecture has 25 percent of the data spinning at any one
time. Data access is measured in milliseconds to 10 seconds -- the time it
takes to spin up a drive. This is far superior to tape, which typically
enables 2 percent of the data to be accessible, and the time to access the
first byte of data takes 60 seconds or longer.
An advantage of MAID is increased reliability. In long-term and petabyte-sized
storage environments, the life of SATA drives becomes a factor. SATA drive
MTBF is rated at 400,000 hours, about a third of SCSI drives. If a data center
has 1,000 drives powered on all the time, a drive failure would occur every 18
days.
This failure rate is unacceptable in many data centers. However, when the
drives are powered on only 25 percent of the time, drive service life is
extended four-fold, with an MTBF of 1.2M hours, which is longer than with SCSI
drives.
Power-Managed RAID
To further improve storage density and the number of I/O operations, a new
RAID architecture is being used. Power-managed RAID provides parity protection
with only a subset of the RAID disks actually powered on. This minimizes large
power swings experienced when full RAID sets are powered up.
For a write, only the parity and associated data drive(s) are powered on. When
reading the data, only the disk being read needs to be spinning. To enable
non-disruptive and sequential read/writes, the data is staged to an
always-spinning drive while the next drive is being powered up. The result is
a subsystem that can scale to hundreds of terabytes in a single footprint.
For long-term data, a MAID design can be further simplified by eliminating
such features as high-performing cache and high-speed, non-blocking
interconnects used in typical RAID systems. Furthermore, a multi-tiered
interconnection architecture that balances capacity, bandwidth and data
protection while minimizing cost proves most effective. In this design, many
parallel RAID engines are placed in the middle tier to increase both total
bandwidth and density, which is similar to adding tape drives in an ATL.
To further improve drive life, start/stops are deliberately managed. SATA
drives are rated for up to 50,000 start/stops. Typical desktop users start and
stop their computers two to three times a day. This translates to about 10,000
start/stops over a five-year period. Thus, if start/stops are managed where
every single drive is power-cycled four times a day, this equates to being
less than 50 percent of the specified level.
By managing the drives for start/stops, power-on hours, temperature levels,
observed data errors, etc., the system can proactively migrate data to spare
drives before a drive failure. This becomes critical, especially with larger
capacity drives, since it reduces the probability of long RAID rebuild
times.
Disks with inactive data can be periodically spun up and checked, which is
similar to the labor-intensive random tape media sampling required by many
institutions. The difference is, no manual intervention is needed. Power is
also reserved for background activities such as RAID rebuilds, sparing, disk
caching, etc.
MAID Applications
Backup and recovery continues to be a core problem for data centers. In a
study conducted by Enterprise Storage Group, over a third of respondents
reported that 25 percent or more of their backups fail, primarily due to the
hardware and the media. Disk-based backup has a higher success rate than tape.
However, most companies still eventually end up moving data to tape, which
exposes it to risk and adds a new layer of storage management and cost.
With its density, affordability, reliability and speed, MAID offers a new
solution wherein data is only migrated to tape for deep archive or offsite
storage. Known as D2D, the tape library is a fraction of the size of that for
D2D2T.
Active archiving spans applications such as scientific computing, data
warehousing and government compliance. Highly characterized by WORO, MAID can
justify keeping more data online, which increases its business value. In the
case of research, online data means faster turnarounds for analysis and
experiments. For government compliance, companies can lower their associated
labor and management requirements.
MAID technology is filling the gap between disk and tape. As companies grapple
with the future of petabyte data centers, the reality may be that all active
archive data will finally be managed on disk because of the many advantages of
MAID.
About Aloke Guha
Aloke Guha is founder and CTO of COPAN Systems in Longmont, Colo.
For more information, and to obtain a free whitepaper by Fred Moore, president
of, Horison Inc -- "Introducing MAID Architecture: Massive Arrays of Idle
Disks," -- see www.copansys.com/solutions/request_PDF.htm.
|