Today we rely on our computers more than ever.  They are critical to running your business and even at home, where you pay your bills online, sell your wares on eBay, or save your baby photos.   

Smartphone’s and tablets are slowly becoming more intertwined in our daily lives and therefore we are storing more and more information on these devices as well.

Let’s say this together, “My hard drive will fail”. 

Do you really know where your data is being saved?  Other than, just a hard drive?  Most of us are unaware of what’s going on underneath it all.  And I’ll bet that after you read this, you’ll be surprised!

Most hard drives are mechanical (more on this later).  We all know that everything built by mankind, will sooner or later, fail.  It’s a fact.  And knowing that your hard drive has moving parts only exacerbates the issue. 

Mechanical hard drives have two main types of failures: Logical, Physical or both.  When it’s both, diagnostics will be extra difficult because the logical failure will generally not be known until the physical failure is repaired. 

Before explaining what failures are, let’s review how a hard drive is constructed.

All hard drives have a circuit board (also known as a PCBA or Printed Circuit Board Assembly) that allows the hard drive to communicate with the computer system and vice versa such as controlling the position of the read/write head, controlling the recording of the data on the platters, controlling the read access from the platters.  Within the hard drive enclosure, you have the actual read/write head and its’ associated actuator and then you have the actual platters that the data is stored on.  Depending on the size of the hard drive (and when we talk about “size” we’re referring to the amount of data that the platters can hold; example, is it a 250GB hard drive or a 500GB), you may have multiple platters.  Think of it as an old LP record player; the platter is your record and the read/write head is the “arm”.  These platters are coated with a very thin layer of non-electrostatic lubricant, which is designed to allow the read/write head to glance off the surface of the platter in the event of a minor collision (such as a bump).  Keep in mind that this read/write head hovers only nanometers above the platters surface which makes a collision only the more obvious. 

Here are two photos of inside of a hard drive:

courtesy: REUK.co.uk

courtesy: rg011.k12.sd.us

Now that we understand how a hard drive is constructed, let’s review the types of failures.  Hopefully after you read this, you’ll realize that it’s inevitable that your hard drive will fail. 

Physical failures are a result of either mechanical or electronic. 

Mechanical failure is usually a result of a breakdown of the moving parts or a read/write head crash.  Since these parts are moving constantly, as we pointed out before, everything that’s mechanical will fail.  Also these moving parts generate heat.  This heat is detrimental to the moving parts.  So they will wear out.  This is the breakdown of the moving parts.  A read/write head “crash” is something different.  This is where the head literally crashes down on the platter.  An example would be dropping the computer, banging it against a door frame, slamming your fist down on the laptop.  If the computer is turned off, this will be less likely to happen because when the computer is off, the read/write head is “parked” (using our record player analogy, it’s placing the needle back on its stand).  Another cause of a mechanical failure is a faulty air filter.  The filters on today’s hard drives equalize the air pressure and moisture within the enclosure and the outside world.  If this air filter fails to stop a single dust particle, this particle can land on the platter causing the head to crash if it happens to “sweep” over it.  Here’s a good link to watch a failing hard drive in action: (http://www.dataclinic.co.uk/data-recovery-ticking-hard-disk-head-problem.htm).

Electronic failures are usually the circuit board failing.  Again, heat is its worst enemy, so be sure to keep your system cool.  For example, laying your laptop in the sun or keeping your desktop in an area of the room where the sun will shine, even for a few hours, should not be done.  If the computer suffers a power spike or electrical surge, this can knock out the controller board.

Now let’s cover logical failures.  Again before we do this, let’s discuss how data is saved onto these platters. 

Your data is stored (saved) onto the platter via magnetic impressions, which the drive electronically converts to the 1’s and 0’s that we all call “data”.  This data is stored in “sectors” (also called clusters).  Some hard drive have millions and millions of sectors (some even more!).  Usually a logical failure occurs when the hard drive is healthy but you cannot boot into the operating system (such as Windows or OS X).  Logical failures are a result of the “data” being inaccessible and this is a result of accidently formatting the hard drive, deletion of important registry entries, viruses, a failing read/write head that accidently writes data in an area on the platter.  The File Allocation Table (FAT Table) can get corrupted as well.  Think of the FAT Table as your index on the hard drive.  Without this FAT Table, Windows or your Operating System does not know where your data is on the hard drive.   Also this can occur due to the normal wear and tear on the platters themselves.  This is very slow and gradual, but nevertheless, physical deterioration is happening. 

Usually hard drives will fail within a short time of purchasing them if there is a defect present from the manufacturer.  If the hard drive proves reliable within the first few months, the chances of the hard drive lasting a long time are greater.  Keep in mind, with millions and millions of sectors on a hard drive, there will be failures.  Hard Drives have ECC (Error Correction Control) that handles these bad sectors and usually repairs them without you even realizing it.  However, if there is a manufacturing defect or a read/write head crash, ECC will not be able to handle these large amounts of failures.  Special software will need to be run against the hard drive to pull this data off.  Most hard drive manufactures have a rating called MTBF, which stands for Mean Time Between Failures, which is a predicted elapsed time between a failure.  They come up with this number by running timed tests on the hard drive and then looking at the deterioration of the unit within that timed run and then extrapolating it for the lifetime of the hard drive.

What are the signs of a failing hard drive?  Failure can be gradual or catastrophic.  Obviously catastrophic is immediate which can be a result of a drop of the computer or slamming something down on your laptop (here’s a sad tidbit of info: most laptops have their hard drive located right below the palm rests!).  Gradual is of course, overtime, such as computer overheating.  A sign of a failing hard drive is clicking noises from your computer. Here’s a link to what it will sound like:  http://datacent.com/hard_drive_sounds.php.  Your computer slowing down.  Random error messages.  Disappearing data.   In severe cases your computer will not even boot up (if the BIOS cannot detect your hard drive, it has nowhere to “go” after it turns on).  Sometimes a real good sign is, your computer is working normal and it freezes.  You turn it off and when turning it back on you receive a disk error.  In this scenario your hard drive’s read/write head literally freezes up (locks up).

So far this article has explained a mechanical hard drive.  The new kid on the block are SSD drives, or Solid State Drives.  You’re already familiar with this technology as it’s the same in your USB thumb drive.  The major advantages to SSD drives are there are no moving parts. Therefore we do not have the same issues we have with mechanical drives.  No moving parts mean less heat (which equals less wear and tear on the unit); less power consumption (and in a laptop this is important for battery life); less wear and tear on the hard drive, faster boot up and access time.  No concern over read/write head crashes.  Big drawback on SSD’s is something called “garbage”.  The unit does not “clean” itself up from files that are deleted or moved.  However, the latest Operating Systems have something called the TRIM command which handles clean up on the hard drive.  Make sure your Operating System supports the TRIM command and it is enabled.  And no, TRIM does not stand for anything; it’s just a command.  Also, SSD’s are new, therefore they are extremely expensive.

Most tablets and smart phones have SSD hard drives in them (which are a good thing because of the way they’re handled).  Even though we just discussed the benefits of SSD’s and why they will fail less, they still will fail!

How should you protect yourself?  Backup!  Can we say that again?  Backup!  Don’t keep your desktop computer in the sun, even for a couple of hours; laptops same thing.  Don’t abuse your computers.  Some laptops have hard drive protection built into them.  For example, they will detect when there is a sudden moment and will “park” the read/write head in milliseconds.  However, don’t rely on this.  Don’t throw your laptop down on the counter, especially if it is on.  Some BIOS have S.M.A.R.T (Self Monitoring Analysis Reporting Tool) built into them, which means it will warn you of a pending hard drive failure (another tidbit of sad info: a lot of BIOS’s have this turned off by default!)  And recognize the signs of a failing hard drive.

I hope this article helped you understand why your data needs to be backed up.  And why we have hard drive failures in the first place.  If you think about it, not only is your data stored on hard drives, but everyone’s data.  This includes companies such as Google, Microsoft, your bank, the White House.  It’s amazing that we rely on these devices so much knowing that their weaknesses (and there are a lot!) will result in failure one of these days.