In a server or workstation environment, hard disk performance is always a concern. While system
memory may have a latency of only 7ns for needed information, and a bandwidth of over 1GB per
second, hard drives are significantly slower. The average server level hard drive is only capable of
access times as low as 300x that of memory, and transfer speeds maxed out at 40MB per second. Server hard drives
most often have to store critical information, and there has to be a constant backup incase of hard drive failure
or malfunction.
RAID support is needed by the SCSI or IDE controller to be implemented, but does not have to be
supported by the hard drive. The hard drives used are never aware that they are being used in RAID,
because all RAID control is done by the hardware controller or OS. This means that RAID
can be done with any hard drive, no matter how old or poor quality. Some operating systems like Unix
and WindowsNT are able to implement RAID levels 0 and 5 through software, but this is slower than having the
hard drive controller do it.
RAID was first introduced to SCSI because of SCSI's ability to multitask requests, generally higher
performance, and device capacity. All servers use SCSI because SCSI has less CPU usage, and more
bandwidth than IDE. SCSI is generally more expensive than IDE, and is the higher performance
standard, while IDE was designed to be the budget interface. Multiple hard drives mean
unneeded expense for IDE, and the performance gain is less than it is for SCSI.
Some of the latest motherboards and expansion IDE controllers have brought RAID to IDE, so higher
hard disk performance is no longer for only the high end.
Stiping
RAID introduces the ability to interleave information between disks, called striping.
This effectively doubles transfer speeds, in both reading and writing. Effectively it allows the data to be
divided between 2 or more disks, so that all of the hard disks are able to collectively store the
information. If one hard drive is able to store the data at 20MB/sec, than two hard drives in RAID 0
would be able to store information at 40MB/sec. Another advantage of this is that because data is
divided between all of the hard drives, the storage capacity is the sum of the drives. If one hard drive is
able to store 4GB of data, then two disks in RAID 0 would be able to store 8GB. RAID 0 is able to
work with different disks, but both the speed and capacity of the slowest and lowest capacity disk is
used for all of the other disks. If a 4GB hard drive is being used that can transfer at speeds of 30MB/sec and a 7GB hard drive that
can transfer at speeds of 20MB/sec are being used in RAID 0, than the final output would be a 8GB drive
that has a speed of 40MB/sec. Another thing to note in this situation, while speed is increased,
reliability is decreased. If one hard drive in the array fails, all data is lost.
Duplication
RAID has the ability to duplicate information, on to one or more disks. This means that there is an
identical backup of all of the information if the original hard drives were to fail. This can be used for
one or more disks. For every disk using duplication, one hard drive with at least its capacity and speed
is used as a mirrored image. Duplication offers faster recover from failures than parity because data
only needs to be copied to the replacement disk.
Parity
Parity is a form of duplication, but doesn't require as much storage for a backup. It uses a scheme
were it stores the difference between the disks to be backed up, so if one were to fail, the parity
information could be read and the data recovered. It does not protect against more than one hard
drive crash, but that is very rare. Parity information works as such; the parity disk stores what ever bit
is needed to make the sum all of the other disks equal to 1.
| 2 Disks + Parity |
| Disk 1 | Disk 2 | Pariy Disk |
| 0 | 0 | 1 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
|
| Number 2 Disk Crash |
| Disk 1 | Disk 2 | Pariy Disk |
| 0 | x (0+x=1) | 1 |
| 0 | x (0+x=0) | 0 |
| 1 | x (1+x=0) | 0 |
| 1 | x (1+x=1) | 1 |
|
This setup an work with any number of disks; 2, 3, 7, ...; it doesn't matter how many disks are used,
only one parity disk is needed. This scheme only protects against a single disk failure. If more than one
disk fails all data is lost. Recovery is slower than with duplication, because all of the disk data needs to
be processed and calculated so that the missing bit can be determined. This is often used when a cheap
but reliable backup is needed, but a duplication RAID isn't as economically feasible.