SCSI, (P)ATA, SAS, NL-SAS and SATA, what’s the difference? (part 1)

Everybody needs storage space nowadays. Whether it is used for high performance computing or simply storing family snapshots, we all need room to store data which is important to us.

In the old days (the 1990s) things were fairly easy: you had either ATA or SCSI. The much older RLL and MFM are now called ancient and therefore not talked about in this article. ATA was mainstream for about 10 years and SCSI was expensive, but also very fast. Both standards used a flatcable and the data was sent to and from the drive in parallel. But when speeds increased the timing of each of the separate signals became difficult and just like cd players in the 1980s manufacturers started using serial lines. This meant that higher speeds could be accomplished and also that the huge flatcables were now traded in for much smaller cable, which improved the airflow as well.

The pictures below show a long flat cable, containing 7 connectors as well as a (red) smaller 4 to 1 SAS or SATA cable.

cascaded flatcablemuch smaller SAS/SATA cable

RPM / IOps

In the beginning mainstream drives rotated at 3600 RPM where industry drives always were a bit ahead of these (5400). When the mainstream ones reached 5400, the industry reached 7200 and when mainstream reached 7200 the industry drives reached 10k and even 15k. Some mainstream SATA drives have 10k RPM, but these are rarely used except for the gaming community. The number of rotations per minute determines the number of IOps.

As a rule of thumb (ROT) the following list is commonly used (it’s a careful prediction of what drives actually can perform):

  • 7200 RPM = 80 IOps
  • 10k RPM = 130 IOps
  • 15k RPM = 180 IOps

When IOs are smaller than used in the ROT (4 kB I believe it is) more IOps can be reached and when IOs are much larger, lower IOps are reached. The more I/Os per second a drive can handle, the faster the IOs are processed and your application can send or ask more data per second.

Command Queueing

Besides the number of rotations per minute the other main difference was that SCSI drives were faster because of added intelligence called Command Queueing (CQ). SCSI drives were able to change the order of IOs so the drive’s arms didn’t have to cross the whole surface to reach each block in the sequence they arrived in the drive and during a rotation movement of the arm was optimized to as many blocks could be addressed in as little rotations as possible. Two different CQ techniques were introduced: TCQ (Tagged Command Queueing) and NCQ (Native Command Queueing). You can think of Command Queueing like an elevator. The elevator goes up and down and people get in and out where they’re supposed to, but the elevator doesn’t follow the sequence of the people who pushed the buttons. If the elevator is going up, it keeps going up until it reaches the top and on the way down people can get in or out, but if a person needs to go up, he or she simply has to wait until the elevator goes up again.

With SCSI-2 the TCQ was introduced in the 90s and it was very commonly found on SCSI hard drives from then on. Since SCSI drives were mainly used in server hardware, TCQ was targeted to the enterprise-level hard drives.

The lack of CQ in the newer SATA drives in the early years of this millennium lead to a new CQ technique: NCQ. It was introduced with SATA-2 to provide home computers with the same benefit of IO reordering as the server drives already had for a decade.

In order to use NCQ or TCQ, both the hard drive port and the hard drive must support the standard. So, if you have a NCQ hard drive connector (the Serial ATA-2 ports found on most motherboards these days for example) but install a hard drive without this feature, you won’t notice any performance enhancement.

Command Queueing improves the performance of the hard drive when the computer sends a sequence of commands to read sectors distant from each other. The hard drive takes these commands and reorders them, in order to read the maximum possible data at just one rotation of the disk.

Command Queueing Example

In the picture I tried to show what the elevator principle is. The computer asked the hard drive to read (or write) A, B, C and D blocks on the disk. Without any Command Queueing feature, the hard drive would take two and a half rotations of the disk to read all requested data (blue line). With Command Queuing, the hard drive will reorder the commands to B, D, A and C, taking only one rotation to read all requested data (red line).

NCQ can deal with up to 32 commands at a time, while TCQ can deal with up to 216 commands (TCQ hard drives, however, can usually support a queue of ”only“ 64 commands). TCQ also has two extra features over NCQ: the initiator (the computer, i.e., the port) can specify commands to be executed in the same order sent to the hard drive; and the initiator can send a high-priority command that will be executed before all other commands found in the queue.

So in short:

  • NCQ = up to 32 commands
  • TCQ = up to 216 commands (64 in general is used) + extra command priority features

So what else is there that differentiates SCSI, (P)ATA, SAS, NL-SAS and SATA? –> Part 2 (April 2, 2013)

  1. Great topic! I love the mention of MFM & RLL from the early days. My first hard drive had an RLL controller (I paid extra to get the extra 10MB capacity out of it).

    I did want to point out though that the IOPS estimates you use are for long periods of sustained IO. For short bursts the drives are capable of significantly more performance. The numbers you presented are the modest estimates that assume long sustained “hammering” on the drive. It’s the right way to calculate for an enterprise environment, but the numbers are very conservative for average SMB or home use.

    • Damn, I just deleted my reply. I’ll try to recover from my RAM.

      I didn’t want to mention MFM or RLL at all since those techniques are totally irrelevant right now. I remember my first hard card. It was a Western Digital 40MB 26ms drive on an 8 bit card in my XT and it used an interleave factor of about 7 I think it was. I remember trying to optimize the thing all the time, trying to get the most space out of it I could, but realtime compression techniques just didn’t do it on my 8MHz machine. I remember paying something like 1050DM for it, which was about 1150 Dutch Guilders, which would be about 4-5-600 US Dollar I guess. And it was just 40MB!

  2. Storage Technologies |   Nuno Paixão - pingback on May 14, 2013 at 12:50

Would you like to comment on this post?

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackbacks and Pingbacks: