Random Bits

Random Bits

  • Home
  • Contact

Monitoring Hard Drive Health on Linux with smartmontools

S.M.A.R.T. is a system in modern hard drives designed to report conditions that may indicate impending failure. smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Although smartmontools runs on a number of platforms, I will only cover installing and configuring it on Linux.

Why Use S.M.A.R.T.?

Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. There is some amount of conflicting information on the internet about how reliable the warnings are. The best source of research that I found is a paper from Google that describes an internal study of hard drive failure. A quick summary: certain events greatly increase the chance of hard drive failure including reallocation events and failed self-tests, but only about 60% of the drives that failed in the study had any negative S.M.A.R.T. attributes. Obviously, nothing replaces regular backups.

A good source for more information is the S.M.A.R.T. wikipedia page.

Installation

On Debian or Ubuntu systems:

$ sudo apt-get install smartmontools

On Fedora:

$ sudo yum install smartmontools

Capabilities and Initial Tests

smartmontools comes with two programs: smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T. Let’s look at smartctl first:

$ sudo smartctl -i /dev/sda

Replace /dev/sda with your hard drive’s device file in this command and all subsequent commands. If there’s only one hard drive in the system, it should be /dev/sda or /dev/hda. If this command fails, you may need to let smartctl know what type of hard drive interface you’re using:

$ sudo smartctl -d TYPE -i /dev/sda

where TYPE is usually one of ata, scsi, or sat (for serial ata). See the smartctl man page for more information. Note that if you need -d here, you will need to add it to all smartctl commands. This should print information similar to:

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint T133 series
Device Model:     SAMSUNG HD300LJ
Serial Number:    S0D7J1UL303628
Firmware Version: ZT100-12
User Capacity:    300,067,970,560 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Fri Jan  2 03:08:20 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Now that smartctl can access the drive, let’s turn on some features. Run the following command:

$ sudo smartctl -s on -o on -S on /dev/sda

  • -s on: This turns on S.M.A.R.T. support or does nothing if it’s already enabled.
  • -o on: This turns on offline data collection. Offline data collection periodically updates certain S.M.A.R.T. attributes. Theoretically this could have a performance impact. However, from the smartctl man page:

    Normally, the disk will suspend offline testing while disk accesses are taking place, and then automatically resume it when the disk would otherwise be idle, so  in  practice  it has little effect.

  • -S on: This enables “autosave of device vendor-specific Attributes”.

The command should return:

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Enabled.
SMART Automatic Offline Testing Enabled every four hours.

Next, let’s check the overall health:

$ sudo smartctl -H /dev/sda

This command should return:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

If it doesn’t return PASSED, you should immediately backup all your data. Your hard drive is probably failing. Next, let’s make sure that the drive supports self-tests. I have yet to see a drive that doesn’t, but the following command also gives time estimates for each test:

$ sudo smartctl -c /dev/sda

I won’t list the complete output because it’s somewhat lengthy. Make sure “Self-test supported” appears in the “Offline data collection capabilities” section. Also, look for output similar to:

Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 127) minutes.

These are rough estimates of how long the short and long self-test’s will take respectively. Let’s run the short test:

$ sudo smartctl -t short /dev/sda

On my drive, this test should take 2 minutes, but this obviously varies. You can run:

$ sudo smartctl -l selftest /dev/sda

to check results. Unfortunately, there’s no way to check progress, so just keep running that command until the results show up. A successful run will look like:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     21472         -

Now, do the same for the long self-test:

$ sudo smartctl -t long /dev/sda

The long test can take a significant amount of time. You might want to run it overnight and check for the results in the morning. If either test fails, you should immediately backup all your data and read the last section of this guide.

Configuring smartd

We’ve now enabled some features and run the basic tests. Instead of repeating the previous section daily, we can setup smartd to do it all automatically. If your system has an /etc/smartd.conf file, check for a line that begins with DEVICESCAN. If you find one comment it out by adding a ‘#’ to the beginning of the line. DEVICESCAN doesn’t work on my system and specifying a device file is easy. Add the following line to /etc/smartd.conf:

/dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner

Here’s what each option does:

  • /dev/sda: Replace this with the device file you’ve been using in smartctl commands.
  • -a: This enables some common options. You almost certainly want to use it.
  • -d sat: On my system, smartctl correctly guesses that I have a serial ata drive. smartd on the other hand does not. If you had to add a “-d TYPE” parameter to the smartctl commands, you’ll almost certainly have to do the same here. If you didn’t, try leaving it out initially. You can add it later if smartd fails to start.
  • -o on, -S on: These have the same meaning as the smartctl equivalents
  • -s (S/../.././02|L/../../6/03): This schedules the short and long self-tests. In this example, the short self-test will run daily at 2:00 A.M. The long test will run on Saturday’s at 3:00 A.M. For more information, see the smartd.conf man page.
  • -m root: If any errors occur, smartd will send email to root. On my system, mail for root is forwarded to my normal email account. If you don’t have a similar setup, replace root with your normal email address. This option also requires a working email setup. Most Linux distributions automatically have working outbound email.
  • -M exec /usr/share/smartmontools/smartd-runner: This last part may be specific to the Debian and Ubuntu smartmontools packages. Check if your system has /usr/share/smartmontools/smartd-runner. If it doesn’t, remove this option. Instead of sending email directly, “-M exec” makes smartd run a different command when errors occur. On Debian, smartd-runner will run each script in /etc/smartmontools/run.d/, one of which emails the user specified by the “-m” option.

If you have more than one hard drive in your system, add a line for each one replacing /dev/sda with a different device file.

Update on 2009-01-06:

Thanks to commenter robert for pointing out an omission on my part. If your system has the file /etc/default/smartmontools, uncomment the “#start_smartd=yes” line by removing the “#”.

Finally, restart smartd:

$ sudo /etc/init.d/smartmontools restart

If this command fails, the end of /var/log/daemon.log should have some diagnostic information. If smartd started fine, we should still test that email notifications are working. Add “-M test” to the end of the configuration line in /etc/smartd.conf. This will make smartd send out a test notification when it’s next started. Once again, restart smartd:

$ sudo /etc/init.d/smartmontools restart

You should receive an email similar to:

This email was generated by the smartd daemon running on:

   host name: polar
  DNS domain: shadypixel.com
  NIS domain: (none)

The following warning/error was logged by the smartd daemon:

TEST EMAIL from smartd for device: /dev/sda

For details see host's SYSLOG (default: /var/log/syslog).

Afterward, you can delete “-M test”.

What To Do If smartd Detects Problems

First, immediately backup everything. Depending on the error, your drive might be close to death or it may still have a long life ahead. Consult the smartmontools FAQ. It has some recommendations for specific errors. Otherwise, ask for help on the smartmontools-support mailing list.

Related Posts:

  • Safely Removing External Drives in Linux

    Simply unmounting a filesystem is not the ideal way to...

  • Monitoring a UPS with nut on Debian or Ubuntu Linux

    There are two different ways to monitor a UPS with...

  • How to Shrink an LVM Volume Safely

    Logical Volume Management is a vast improvement over standard partitioning...

  • Log iptables Messages to a Separate File with rsyslog

    Learn how to filter iptables log messages to a separate...

Sponsored Links:

This entry was posted on Friday, January 2nd, 2009 at 6:18 AM and is filed under Linux. You can leave a response, or trackback from your own site.

50 Comments on “Monitoring Hard Drive Health on Linux with smartmontools”

  1. robert says:
    January 6, 2009 at 3:19 PM

    Hey, nice intro. Small addition: on my Xubuntu intrepid and jaunty (alpha) installation, I had to uncomment the line ‘#start_smartd=yes’ in the file /etc/default/smartmontools.

    Cheers

    Reply
  2. btmorex says:
    January 6, 2009 at 3:46 PM

    Thanks robert, I updated the article.

    Reply
  3. robert says:
    January 6, 2009 at 5:10 PM

    No problem, I expected smartd to start as well after running ‘sudo /etc/init.d/smartmontools restart’ :) Once again, nice article.

    Reply
  4. Agip says:
    January 11, 2009 at 11:55 PM

    Hi btmorex, nice howto.

    I configured my smartd.conf like this:

    dev/sdb -I 194 -a -o on -S on -s (S/../.././03|L/../../6/04) \
    -m sys@base.com \
    -M exec /usr/share/smartmontools/smartd-runner

    Also, by adding “-M test”, I tested email notifications and received test email message.

    As you see, each morning my HDD is tested, but I didn’t received any email notification about test results.

    Probably, notifications are sent when something is getting wrong, am I right on this point?

    Right now my drive is reports OK status with “smartctl -H” command.

    Thanks a lot again.

    Reply
  5. btmorex says:
    January 12, 2009 at 7:10 AM

    Agip,

    It sounds like you’ve set it up correctly. You’re right that smartd will only email you if there is a problem. If you want to look at the test results, you can do:

    smartctl -l selftest /dev/sdb

    Reply
  6. James says:
    January 16, 2009 at 12:06 PM

    I appreciate this effort vary much. Many thanks

    Reply
  7. Dan says:
    January 18, 2009 at 7:48 PM

    You can view the progress of a self test by running smartctl -Hc /dev/XXX. It will be across from the “Self Test Execution Status”. Should look something like this:

    Self-test routine in progress…
    70% of test remaining.

    Reply
  8. Bruno says:
    January 27, 2009 at 3:06 AM

    Hi btmorex

    Nice work.
    I had to remove the first line in the /etc/smartd.conf:
    # *SMARTD*AUTOGENERATED* /etc/smartd.conf
    without doing this, all changes are lost after restarting the deamon.

    Cheers

    Reply
  9. Heqtek says:
    February 12, 2009 at 3:49 PM

    Nice work! using smart to do checks before its too late.

    Reply
  10. robert says:
    February 14, 2009 at 4:03 AM

    For Xubuntu users, I’ve made a little script that will work together with smartd to pop up a notification in case of any hard disk trouble. See http://ubuntuforums.org/showthread.php?t=1031244

    I think it can be easily adapted to Ubuntu though.

    @Agip: a mail server needs to be installed (and probably configured) if you’d like e-mail notifications. You either try my script (:)), or use the smart-notifier package if available for your distribution.

    Cheers

    Reply
  11. Johan says:
    February 27, 2009 at 8:36 PM

    Great tutorial!

    Just one question: what is a reasonable schedule for the short, long and offline tests?
    Short: every day?
    Long: Once per week?
    Offline: ???

    Reply
  12. Mack says:
    April 14, 2009 at 8:26 PM

    Great tut and really useful! Really easy to understand. Thanks!

    Reply
  13. Monitoring Hard Drive Health on Linux with smartmontools | blog.q8lug.org says:
    April 19, 2009 at 3:48 PM

    [...] via Random Bits. [...]

    Reply
  14. Andrew says:
    April 19, 2009 at 9:45 PM

    Great tutorial, I’ve been using smartctl for years, but never got around to setting up smartd. Thanks for making it easy to setup.

    Reply
  15. links for 2009-04-20 « My Weblog says:
    April 21, 2009 at 12:07 AM

    [...] Monitoring Hard Drive Health on Linux with smartmontools | Random Bits (tags: hardware linux) [...]

    Reply
  16. Alex says:
    April 22, 2009 at 11:28 AM

    Great info!
    One question, will SMART tool function correctly on an un-formatted drive?
    Say I found an old drive that is raw, can I run SMART on it?
    Thanks,
    Alex

    Reply
    • btmorex says:
      April 22, 2009 at 12:47 PM

      Yes, it will work fine. The SMART tests are completely independent of what data is stored on the drive.

      Reply
    • mark says:
      October 31, 2009 at 12:29 PM

      here didn’t work :(
      takes forever with no result.
      no error message anyway.

      Reply
      • btmorex says:
        October 31, 2009 at 4:21 PM

        Are you talking about running a self test? You need to check results too. Run smartctl -l selftest <device>

        Reply
      • dunk says:
        April 2, 2010 at 5:15 AM

        hi Mark,
        i’m still not 100% sure about this but from my initial testing of smartmontools it appears smartctl needs at least one disk partition to be mounted otherwise it just stays at the “90% completed” point for some time until smartctl eventually gives up and kills the test with a message like this:
        “# 7 Short offline Interrupted (host reset) 90% 2455 -”

        if you have a partition on your disks, try mounting it before the test.
        if not, maybe it is possible to mount the disk it’s self as a raw device? (i don’t know. haven’t tested it yet.)

        dunk.

        Reply
        • btmorex says:
          April 2, 2010 at 11:15 AM

          That’s odd that it works when you mount something. I know that having a partition mounted is not a requirement though, as I run tests daily against drives that are almost never mounted.

          Reply
  17. David Grant says:
    April 24, 2009 at 11:48 PM

    Any tips on how to run these commands on a hard drive connected via USB?

    Reply
    • btmorex says:
      April 25, 2009 at 12:00 AM

      USB support for SMART commands isn’t great and it depends on the specific chipset of that your USB enclosure uses. See http://smartmontools.wiki.sourceforge.net/USB and http://smartmontools.wiki.sourceforge.net/overview_USB-Support

      If you’re lucky, something like:

      # smartctl -i -d sat,12 <device>

      will work. If you’re unlucky, nothing will work.

      Reply
      • David Grant says:
        April 26, 2009 at 10:48 PM

        Thanks so much, that worked like a charm with my USB enclosure.

        Reply
  18. Dao says:
    May 6, 2009 at 1:35 AM

    Thank you. Clearly written and informative. Other explanations always left me a bit dazed and confused.

    Reply
  19. Karti says:
    June 8, 2009 at 3:00 AM

    Many thanks,

    Clear, consise and just what I needed

    Cheers

    K
    ;)

    Reply
  20. Karti says:
    June 8, 2009 at 3:18 AM

    Just thought to add….

    http://tazbuntu.blogspot.com/2008/12/check-your-hard-drive-smart-status.html

    It adds GsmartControl, what appears to be a useful gui. I liked the fact that you were able to read the results of the test logs described above.

    Cheers

    K
    ;)

    Reply
    • btmorex says:
      June 8, 2009 at 4:14 AM

      I actually have a half-written post about gsmartcontrol :)

      It’s a nice program although I prefer the set-it-up-and-forget-about-it nature about smartd.

      Reply
  21. Zdenek says:
    June 12, 2009 at 1:26 PM

    Thank you, setting up smartd went fine, however I cannot persuade the system to send mail which somehow makes the whole thing useless.
    I use postfix on Ubuntu server 8.04. I can send mail from the command line; I installed logwatch which can mail as well, but when smartd tries to send out mail, it always fails with the following error message:
    “Test of mail to root produced unexpected output (90 bytes) to STDOUT/STDERR: send-mail: invalid option — i Can’t send mail: sendmail process failed with error code 1″

    I spent a lot of time on Google, configuration files, forwarding etc., but the result is always the same.

    Does anybody have an idea of what might go wrong? Thank you.

    Zdenek

    Reply
    • Terje says:
      June 19, 2010 at 6:54 AM

      I haven’t been able to get it to send a mail either, but logwatch manages just fine.

      Wondering if I have the same problem you do, however I haven’t been able to locate any kind of error message in the logs, where exactly do you find yours?

      Reply
  22. Mattia says:
    June 22, 2009 at 11:30 AM

    Thanks for the guide, very useful. However I got a problem: my second hard disk has some unreadable sector, every time I boot up the PC a new mail is sent.
    Is there a way to get smartd to send mail only when a new short/long test is performed? I just want to monitor the situation, not receive the same mail every time i boot the PC….
    Thanks for any help.

    Reply
    • Mattia says:
      June 23, 2009 at 1:01 PM

      Resolved: I added “-U 0 -C 0″ to disable reporting of unreadable sectors.

      Reply
  23. Charles says:
    June 25, 2009 at 6:14 AM

    Thanks for the guide; worked perfectly and easy to follow. The hardest part was setting up postfix!

    Thoughts on usefulness of doing more than regular short and long tests? For example from the sample config file:
    # Monitor all attributes except normalized Temperature (usually 194),
    # but track Temperature changes >= 4 Celsius, report Temperatures
    # >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
    # Send mail on SMART failures or when Temperature is >= 55 Celsius.
    #/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com

    Best

    Charles

    Reply
  24. Liuc says:
    June 29, 2009 at 6:12 AM

    Thanks!! Your tutorial is very useful!

    Reply
  25. links for 2009-08-01 says:
    August 1, 2009 at 10:04 AM

    [...] Monitoring Hard Drive Health on Linux with smartmontools | Random Bits (tags: Linux hardware disk health monitoring smart) Visit my other sites: Photo Gallery | Error_Success programming blog | Main website « 75C CPU Temp [...]

    Reply
  26. David says:
    August 17, 2009 at 11:53 PM

    Ok, I’m running the long test, I’ve figured out how to see the progress it’s making as it goes along, but I can’t figure out how to view the results. I have 4 drives testing simultaneously – so I want to view the results separately, and maybe several times.

    THANK YOU,
    David

    Reply
    • David says:
      August 18, 2009 at 12:14 AM

      I found it –

      smartctl -Hc /dev/sdx

      Shows the progress and the code with interpretation if the test has completed.

      Thanks for a great tutorial!

      Reply
  27. Jan says:
    September 29, 2009 at 2:57 AM

    Great tutorial, thank you very much!

    Reply
  28. peshkira says:
    October 6, 2009 at 9:44 AM

    Hi!

    Is there any chance, that one can estimate the remaining lifetime of the hard drive based on some of the S.M.A.R.T attributes (a very rough estimation is perfect for what i am doing.) I know that SMART data is correct but you cannot rely on it to catch a fail, however if there is such a formula to roughly estimate the remaining lifetime i will be very greatful.

    10x in advance.

    Reply
    • btmorex says:
      October 6, 2009 at 11:23 AM

      No, you can’t really make an estimate like that. Actually, Google did there S.M.A.R.T. study (http://labs.google.com/papers/disk_failures.pdf) to find out exactly what you’re asking. The conclusion they reached is that even though some values have predictive value, they are nowhere near good enough to actually preemptively replace hard drives (which is very similar to estimating remaining life).

      Reply
      • peshkira says:
        October 7, 2009 at 9:13 AM

        Hey!

        10x for your reply. However, the study says that if you combine all parameters only 36 % of all failed drives were unable to predict or have zero values, so actually this is quite good for me. What is more, even if I take only the 4 important parameters into account I will be successful in 44% of the cases. Combining this with the age of the hard drive will be enough for me… So are you aware of a formula or combination of these parameters in a way that I can estimate the health (or the remaining life time) of a hdd.

        Thanks in advance…

        Reply
        • btmorex says:
          October 7, 2009 at 12:59 PM

          To answer your question right away: I don’t have any formula.

          I want to add though that I think what you’ll find is that you’ll be able to split drives into two groups: one group will have no predictive S.M.A.R.T. values, and one group will have one or more values that indicate imminent failure. There’s no doubt that that’s valuable, but I don’t think you’ll be able to estimate remaining life with any accuracy for most of the drives.

          Reply
  29. Pol says:
    October 28, 2009 at 3:06 PM

    cool!
    thanks :-)

    Pol

    Reply
  30. SMARTmontools « denispyr says:
    January 4, 2010 at 3:04 AM

    [...] http://www.adslgr.com/forum/showthread.php?p=278455#poststophttp://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/also: http://denispyr.blogspot.com/2009/12/local-mail.html. Use denispyr@localhost as the email [...]

    Reply
  31. Justin says:
    February 2, 2010 at 3:51 PM

    Here’s an oddity. I’m testing smartmontools vs. cciss_vol_status on an HP with external array, and getting some inconsistent results.
    I know that one drive has failed.
    I know that another drive is in jeopardy (which is why I’m testing on the box I’m testing on).
    Running smartctl, I see in my health report that the second drive is in danger, but it makes no mention of the failed drive.
    Then, running cciss_vol_status, I see that the first drive has failed, but no mention is made of the second.

    I’ll post this in the cciss_vol_status forum as well, but I find it interesting that the two utilities show such different results!

    Reply
    • btmorex says:
      February 2, 2010 at 4:33 PM

      What is cciss_vol_status actually checking? One possibility is that the drive is completely dead. There would be no S.M.A.R.T. status, but cciss_vol_status would know that there was supposed to be a drive there so it could determine it was dead.

      As for the one that’s failing, probably cciss_vol_status isn’t checking S.M.A.R.T. status (I have no idea because I haven’t used it).

      Reply
    • wobbe says:
      February 7, 2010 at 6:48 AM

      Hi, check out:

      http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1404978

      Reply
  32. conexiva.net » S.M.A.R.T en Linux con smartmontools says:
    June 8, 2010 at 8:24 PM

    [...] http://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/ [...]

    Reply
  33. Log message (hard drive related) says:
    July 4, 2010 at 8:43 AM

    [...] … Accroding to [1], this seems to indicate the drive is ok. Thanks again for your help. [1]. http://blog.shadypixel.com/monitorin…smartmontools/ [...]

    Reply
  34. Gabriel says:
    July 5, 2010 at 7:44 AM

    You can pass sudo smartctl -l selftest /dev/sda as argument to watch in order to follow its progress.

    Reply

Leave a Reply

Click here to cancel reply.

  • Topics

    • Personal (1)
    • Technology (13)
      • Linux (11)
      • Web (4)
  • Archives

    • April 2009
    • March 2009
    • January 2009
    • December 2008
    • August 2008
    • July 2008
  • Meta

    • Register
    • Log in
    • Valid XHTML

Feed | Privacy Policy
Copyright © 2008-2009 Random Bits
Powered by Wordpress