Random Bits

Random Bits

  • Home
  • Contact

Monitoring Hard Drive Health on Linux with smartmontools

S.M.A.R.T. is a system in modern hard drives designed to report conditions that may indicate impending failure. smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Although smartmontools runs on a number of platforms, I will only cover installing and configuring it on Linux.

Why Use S.M.A.R.T.?

Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. There is some amount of conflicting information on the internet about how reliable the warnings are. The best source of research that I found is a paper from Google that describes an internal study of hard drive failure. A quick summary: certain events greatly increase the chance of hard drive failure including reallocation events and failed self-tests, but only about 60% of the drives that failed in the study had any negative S.M.A.R.T. attributes. Obviously, nothing replaces regular backups.

A good source for more information is the S.M.A.R.T. wikipedia page.

Installation

On Debian or Ubuntu systems:

$ sudo apt-get install smartmontools

On Fedora:

$ sudo yum install smartmontools

Capabilities and Initial Tests

smartmontools comes with two programs: smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T. Let’s look at smartctl first:

$ sudo smartctl -i /dev/sda

Replace /dev/sda with your hard drive’s device file in this command and all subsequent commands. If there’s only one hard drive in the system, it should be /dev/sda or /dev/hda. If this command fails, you may need to let smartctl know what type of hard drive interface you’re using:

$ sudo smartctl -d TYPE -i /dev/sda

where TYPE is usually one of ata, scsi, or sat (for serial ata). See the smartctl man page for more information. Note that if you need -d here, you will need to add it to all smartctl commands. This should print information similar to:

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint T133 series
Device Model:     SAMSUNG HD300LJ
Serial Number:    S0D7J1UL303628
Firmware Version: ZT100-12
User Capacity:    300,067,970,560 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Fri Jan  2 03:08:20 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Now that smartctl can access the drive, let’s turn on some features. Run the following command:

$ sudo smartctl -s on -o on -S on /dev/sda

  • -s on: This turns on S.M.A.R.T. support or does nothing if it’s already enabled.
  • -o on: This turns on offline data collection. Offline data collection periodically updates certain S.M.A.R.T. attributes. Theoretically this could have a performance impact. However, from the smartctl man page:

    Normally, the disk will suspend offline testing while disk accesses are taking place, and then automatically resume it when the disk would otherwise be idle, so  in  practice  it has little effect.

  • -S on: This enables “autosave of device vendor-specific Attributes”.

The command should return:

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Enabled.
SMART Automatic Offline Testing Enabled every four hours.

Next, let’s check the overall health:

$ sudo smartctl -H /dev/sda

This command should return:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

If it doesn’t return PASSED, you should immediately backup all your data. Your hard drive is probably failing. Next, let’s make sure that the drive supports self-tests. I have yet to see a drive that doesn’t, but the following command also gives time estimates for each test:

$ sudo smartctl -c /dev/sda

I won’t list the complete output because it’s somewhat lengthy. Make sure “Self-test supported” appears in the “Offline data collection capabilities” section. Also, look for output similar to:

Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 127) minutes.

These are rough estimates of how long the short and long self-test’s will take respectively. Let’s run the short test:

$ sudo smartctl -t short /dev/sda

On my drive, this test should take 2 minutes, but this obviously varies. You can run:

$ sudo smartctl -l selftest /dev/sda

to check results. Unfortunately, there’s no way to check progress, so just keep running that command until the results show up. A successful run will look like:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     21472         -

Now, do the same for the long self-test:

$ sudo smartctl -t long /dev/sda

The long test can take a significant amount of time. You might want to run it overnight and check for the results in the morning. If either test fails, you should immediately backup all your data and read the last section of this guide.

Configuring smartd

We’ve now enabled some features and run the basic tests. Instead of repeating the previous section daily, we can setup smartd to do it all automatically. If your system has an /etc/smartd.conf file, check for a line that begins with DEVICESCAN. If you find one comment it out by adding a ‘#’ to the beginning of the line. DEVICESCAN doesn’t work on my system and specifying a device file is easy. Add the following line to /etc/smartd.conf:

/dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner

Here’s what each option does:

  • /dev/sda: Replace this with the device file you’ve been using in smartctl commands.
  • -a: This enables some common options. You almost certainly want to use it.
  • -d sat: On my system, smartctl correctly guesses that I have a serial ata drive. smartd on the other hand does not. If you had to add a “-d TYPE” parameter to the smartctl commands, you’ll almost certainly have to do the same here. If you didn’t, try leaving it out initially. You can add it later if smartd fails to start.
  • -o on, -S on: These have the same meaning as the smartctl equivalents
  • -s (S/../.././02|L/../../6/03): This schedules the short and long self-tests. In this example, the short self-test will run daily at 2:00 A.M. The long test will run on Saturday’s at 3:00 A.M. For more information, see the smartd.conf man page.
  • -m root: If any errors occur, smartd will send email to root. On my system, mail for root is forwarded to my normal email account. If you don’t have a similar setup, replace root with your normal email address. This option also requires a working email setup. Most Linux distributions automatically have working outbound email.
  • -M exec /usr/share/smartmontools/smartd-runner: This last part may be specific to the Debian and Ubuntu smartmontools packages. Check if your system has /usr/share/smartmontools/smartd-runner. If it doesn’t, remove this option. Instead of sending email directly, “-M exec” makes smartd run a different command when errors occur. On Debian, smartd-runner will run each script in /etc/smartmontools/run.d/, one of which emails the user specified by the “-m” option.

If you have more than one hard drive in your system, add a line for each one replacing /dev/sda with a different device file.

Update on 2009-01-06:

Thanks to commenter robert for pointing out an omission on my part. If your system has the file /etc/default/smartmontools, uncomment the “#start_smartd=yes” line by removing the “#”.

Finally, restart smartd:

$ sudo /etc/init.d/smartmontools restart

If this command fails, the end of /var/log/daemon.log should have some diagnostic information. If smartd started fine, we should still test that email notifications are working. Add “-M test” to the end of the configuration line in /etc/smartd.conf. This will make smartd send out a test notification when it’s next started. Once again, restart smartd:

$ sudo /etc/init.d/smartmontools restart

You should receive an email similar to:

This email was generated by the smartd daemon running on:

   host name: polar
  DNS domain: shadypixel.com
  NIS domain: (none)

The following warning/error was logged by the smartd daemon:

TEST EMAIL from smartd for device: /dev/sda

For details see host's SYSLOG (default: /var/log/syslog).

Afterward, you can delete “-M test”.

What To Do If smartd Detects Problems

First, immediately backup everything. Depending on the error, your drive might be close to death or it may still have a long life ahead. Consult the smartmontools FAQ. It has some recommendations for specific errors. Otherwise, ask for help on the smartmontools-support mailing list.

This entry was posted on Friday, January 2nd, 2009 at 6:18 AM and is filed under Linux. You can leave a response. Pinging is currently not allowed.

75 Comments on “Monitoring Hard Drive Health on Linux with smartmontools”

  1. robert says:
    January 6, 2009 at 3:19 PM

    Hey, nice intro. Small addition: on my Xubuntu intrepid and jaunty (alpha) installation, I had to uncomment the line ‘#start_smartd=yes’ in the file /etc/default/smartmontools.

    Cheers

    Reply
  2. btmorex says:
    January 6, 2009 at 3:46 PM

    Thanks robert, I updated the article.

    Reply
  3. robert says:
    January 6, 2009 at 5:10 PM

    No problem, I expected smartd to start as well after running ‘sudo /etc/init.d/smartmontools restart’ :) Once again, nice article.

    Reply
  4. Agip says:
    January 11, 2009 at 11:55 PM

    Hi btmorex, nice howto.

    I configured my smartd.conf like this:

    dev/sdb -I 194 -a -o on -S on -s (S/../.././03|L/../../6/04) \
    -m [email protected] \
    -M exec /usr/share/smartmontools/smartd-runner

    Also, by adding “-M test”, I tested email notifications and received test email message.

    As you see, each morning my HDD is tested, but I didn’t received any email notification about test results.

    Probably, notifications are sent when something is getting wrong, am I right on this point?

    Right now my drive is reports OK status with “smartctl -H” command.

    Thanks a lot again.

    Reply
  5. btmorex says:
    January 12, 2009 at 7:10 AM

    Agip,

    It sounds like you’ve set it up correctly. You’re right that smartd will only email you if there is a problem. If you want to look at the test results, you can do:

    smartctl -l selftest /dev/sdb

    Reply
  6. Dan says:
    January 18, 2009 at 7:48 PM

    You can view the progress of a self test by running smartctl -Hc /dev/XXX. It will be across from the “Self Test Execution Status”. Should look something like this:

    Self-test routine in progress…
    70% of test remaining.

    Reply
  7. Bruno says:
    January 27, 2009 at 3:06 AM

    Hi btmorex

    Nice work.
    I had to remove the first line in the /etc/smartd.conf:
    # *SMARTD*AUTOGENERATED* /etc/smartd.conf
    without doing this, all changes are lost after restarting the deamon.

    Cheers

    Reply
  8. robert says:
    February 14, 2009 at 4:03 AM

    For Xubuntu users, I’ve made a little script that will work together with smartd to pop up a notification in case of any hard disk trouble. See http://ubuntuforums.org/showthread.php?t=1031244

    I think it can be easily adapted to Ubuntu though.

    @Agip: a mail server needs to be installed (and probably configured) if you’d like e-mail notifications. You either try my script (:)), or use the smart-notifier package if available for your distribution.

    Cheers

    Reply
  9. Johan says:
    February 27, 2009 at 8:36 PM

    Great tutorial!

    Just one question: what is a reasonable schedule for the short, long and offline tests?
    Short: every day?
    Long: Once per week?
    Offline: ???

    Reply
    • btmorex says:
      November 17, 2013 at 7:32 AM

      Offline data collection is just on/off, so turn it on. As for schedule, the config above does a short test every day and a long test once a week.

      Reply
  10. Mack says:
    April 14, 2009 at 8:26 PM

    Great tut and really useful! Really easy to understand. Thanks!

    Reply
  11. Andrew says:
    April 19, 2009 at 9:45 PM

    Great tutorial, I’ve been using smartctl for years, but never got around to setting up smartd. Thanks for making it easy to setup.

    Reply
  12. Alex says:
    April 22, 2009 at 11:28 AM

    Great info!
    One question, will SMART tool function correctly on an un-formatted drive?
    Say I found an old drive that is raw, can I run SMART on it?
    Thanks,
    Alex

    Reply
    • btmorex says:
      April 22, 2009 at 12:47 PM

      Yes, it will work fine. The SMART tests are completely independent of what data is stored on the drive.

      Reply
    • mark says:
      October 31, 2009 at 12:29 PM

      here didn’t work :(
      takes forever with no result.
      no error message anyway.

      Reply
      • btmorex says:
        October 31, 2009 at 4:21 PM

        Are you talking about running a self test? You need to check results too. Run smartctl -l selftest <device>

        Reply
      • dunk says:
        April 2, 2010 at 5:15 AM

        hi Mark,
        i’m still not 100% sure about this but from my initial testing of smartmontools it appears smartctl needs at least one disk partition to be mounted otherwise it just stays at the “90% completed” point for some time until smartctl eventually gives up and kills the test with a message like this:
        “# 7 Short offline Interrupted (host reset) 90% 2455 -”

        if you have a partition on your disks, try mounting it before the test.
        if not, maybe it is possible to mount the disk it’s self as a raw device? (i don’t know. haven’t tested it yet.)

        dunk.

        Reply
        • btmorex says:
          April 2, 2010 at 11:15 AM

          That’s odd that it works when you mount something. I know that having a partition mounted is not a requirement though, as I run tests daily against drives that are almost never mounted.

          Reply
  13. David Grant says:
    April 24, 2009 at 11:48 PM

    Any tips on how to run these commands on a hard drive connected via USB?

    Reply
    • btmorex says:
      April 25, 2009 at 12:00 AM

      USB support for SMART commands isn’t great and it depends on the specific chipset of that your USB enclosure uses. See http://smartmontools.wiki.sourceforge.net/USB and http://smartmontools.wiki.sourceforge.net/overview_USB-Support

      If you’re lucky, something like:

      # smartctl -i -d sat,12 <device>

      will work. If you’re unlucky, nothing will work.

      Reply
      • David Grant says:
        April 26, 2009 at 10:48 PM

        Thanks so much, that worked like a charm with my USB enclosure.

        Reply
  14. Dao says:
    May 6, 2009 at 1:35 AM

    Thank you. Clearly written and informative. Other explanations always left me a bit dazed and confused.

    Reply
  15. Karti says:
    June 8, 2009 at 3:00 AM

    Many thanks,

    Clear, consise and just what I needed

    Cheers

    K
    ;)

    Reply
  16. Karti says:
    June 8, 2009 at 3:18 AM

    Just thought to add….

    http://tazbuntu.blogspot.com/2008/12/check-your-hard-drive-smart-status.html

    It adds GsmartControl, what appears to be a useful gui. I liked the fact that you were able to read the results of the test logs described above.

    Cheers

    K
    ;)

    Reply
    • btmorex says:
      June 8, 2009 at 4:14 AM

      I actually have a half-written post about gsmartcontrol :)

      It’s a nice program although I prefer the set-it-up-and-forget-about-it nature about smartd.

      Reply
  17. Zdenek says:
    June 12, 2009 at 1:26 PM

    Thank you, setting up smartd went fine, however I cannot persuade the system to send mail which somehow makes the whole thing useless.
    I use postfix on Ubuntu server 8.04. I can send mail from the command line; I installed logwatch which can mail as well, but when smartd tries to send out mail, it always fails with the following error message:
    “Test of mail to root produced unexpected output (90 bytes) to STDOUT/STDERR: send-mail: invalid option — i Can’t send mail: sendmail process failed with error code 1”

    I spent a lot of time on Google, configuration files, forwarding etc., but the result is always the same.

    Does anybody have an idea of what might go wrong? Thank you.

    Zdenek

    Reply
    • Terje says:
      June 19, 2010 at 6:54 AM

      I haven’t been able to get it to send a mail either, but logwatch manages just fine.

      Wondering if I have the same problem you do, however I haven’t been able to locate any kind of error message in the logs, where exactly do you find yours?

      Reply
    • btmorex says:
      November 17, 2013 at 7:36 AM

      Try a different mail package, eg mailutils

      Reply
  18. Mattia says:
    June 22, 2009 at 11:30 AM

    Thanks for the guide, very useful. However I got a problem: my second hard disk has some unreadable sector, every time I boot up the PC a new mail is sent.
    Is there a way to get smartd to send mail only when a new short/long test is performed? I just want to monitor the situation, not receive the same mail every time i boot the PC….
    Thanks for any help.

    Reply
    • Mattia says:
      June 23, 2009 at 1:01 PM

      Resolved: I added “-U 0 -C 0” to disable reporting of unreadable sectors.

      Reply
      • btmorex says:
        November 17, 2013 at 7:37 AM

        You might be better off replacing the drive…

        Reply
  19. Charles says:
    June 25, 2009 at 6:14 AM

    Thanks for the guide; worked perfectly and easy to follow. The hardest part was setting up postfix!

    Thoughts on usefulness of doing more than regular short and long tests? For example from the sample config file:
    # Monitor all attributes except normalized Temperature (usually 194),
    # but track Temperature changes >= 4 Celsius, report Temperatures
    # >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
    # Send mail on SMART failures or when Temperature is >= 55 Celsius.
    #/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m [email protected]

    Best

    Charles

    Reply
  20. Liuc says:
    June 29, 2009 at 6:12 AM

    Thanks!! Your tutorial is very useful!

    Reply
  21. David says:
    August 17, 2009 at 11:53 PM

    Ok, I’m running the long test, I’ve figured out how to see the progress it’s making as it goes along, but I can’t figure out how to view the results. I have 4 drives testing simultaneously – so I want to view the results separately, and maybe several times.

    THANK YOU,
    David

    Reply
    • David says:
      August 18, 2009 at 12:14 AM

      I found it –

      smartctl -Hc /dev/sdx

      Shows the progress and the code with interpretation if the test has completed.

      Thanks for a great tutorial!

      Reply
  22. Jan says:
    September 29, 2009 at 2:57 AM

    Great tutorial, thank you very much!

    Reply
  23. peshkira says:
    October 6, 2009 at 9:44 AM

    Hi!

    Is there any chance, that one can estimate the remaining lifetime of the hard drive based on some of the S.M.A.R.T attributes (a very rough estimation is perfect for what i am doing.) I know that SMART data is correct but you cannot rely on it to catch a fail, however if there is such a formula to roughly estimate the remaining lifetime i will be very greatful.

    10x in advance.

    Reply
    • btmorex says:
      October 6, 2009 at 11:23 AM

      No, you can’t really make an estimate like that. Actually, Google did there S.M.A.R.T. study to find out exactly what you’re asking. The conclusion they reached is that even though some values have predictive value, they are nowhere near good enough to actually preemptively replace hard drives (which is very similar to estimating remaining life).

      Reply
      • peshkira says:
        October 7, 2009 at 9:13 AM

        Hey!

        10x for your reply. However, the study says that if you combine all parameters only 36 % of all failed drives were unable to predict or have zero values, so actually this is quite good for me. What is more, even if I take only the 4 important parameters into account I will be successful in 44% of the cases. Combining this with the age of the hard drive will be enough for me… So are you aware of a formula or combination of these parameters in a way that I can estimate the health (or the remaining life time) of a hdd.

        Thanks in advance…

        Reply
        • btmorex says:
          October 7, 2009 at 12:59 PM

          To answer your question right away: I don’t have any formula.

          I want to add though that I think what you’ll find is that you’ll be able to split drives into two groups: one group will have no predictive S.M.A.R.T. values, and one group will have one or more values that indicate imminent failure. There’s no doubt that that’s valuable, but I don’t think you’ll be able to estimate remaining life with any accuracy for most of the drives.

          Reply
  24. Pol says:
    October 28, 2009 at 3:06 PM

    cool!
    thanks :-)

    Pol

    Reply
  25. Justin says:
    February 2, 2010 at 3:51 PM

    Here’s an oddity. I’m testing smartmontools vs. cciss_vol_status on an HP with external array, and getting some inconsistent results.
    I know that one drive has failed.
    I know that another drive is in jeopardy (which is why I’m testing on the box I’m testing on).
    Running smartctl, I see in my health report that the second drive is in danger, but it makes no mention of the failed drive.
    Then, running cciss_vol_status, I see that the first drive has failed, but no mention is made of the second.

    I’ll post this in the cciss_vol_status forum as well, but I find it interesting that the two utilities show such different results!

    Reply
    • btmorex says:
      February 2, 2010 at 4:33 PM

      What is cciss_vol_status actually checking? One possibility is that the drive is completely dead. There would be no S.M.A.R.T. status, but cciss_vol_status would know that there was supposed to be a drive there so it could determine it was dead.

      As for the one that’s failing, probably cciss_vol_status isn’t checking S.M.A.R.T. status (I have no idea because I haven’t used it).

      Reply
    • wobbe says:
      February 7, 2010 at 6:48 AM

      Hi, check out:

      http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1404978

      Reply
  26. Gabriel says:
    July 5, 2010 at 7:44 AM

    You can pass sudo smartctl -l selftest /dev/sda as argument to watch in order to follow its progress.

    Reply
  27. Anonymous says:
    June 19, 2011 at 5:38 PM

    I have huge problem with my disks, my disks are killed by bad IO synchronisation. I’ve ran the iotop check and i see all processes in IO column on 99,90%, What to do to make my disks again stabil to gaing synchronisation.

    I runs game servers on my dedicated server and all my customers have lags and can’t play, what to do please help :?

    Reply
    • btmorex says:
      November 17, 2013 at 7:42 AM

      Get a better server?

      Reply
  28. richard says:
    July 6, 2011 at 4:29 PM

    Great tut there.

    There seems to be some interest in an automatic periodic “all-is-well” email notification perhaps containing the info from the last health checks. Since smartmon normally only sends an email when there is a problem, can we add a line to smartd.conf that forces an “all-is-well” email to be sent, say, monthly? Or would we have to cron a scratch built script which uses smartctl to do that?
    Anybody know how with smartd? or have the “cronable” script for smartctl? or pehaps a how-to to get some monitoring program to get this feature?

    Reply
    • btmorex says:
      November 17, 2013 at 7:45 AM

      There’s no way to get that with standard smartd, so it would have to be a script that parsed the output of smartctl.

      Reply
  29. Janet says:
    July 15, 2011 at 11:12 PM

    smartctl has “-s on” option to make the hard disk to support S.M.A.R.T. For some new hard disk, it is required to set at the beginning. However, sine the new hard disk doesn’t contain any SMART information, for smart health check, it would be show failed. But, after a day, the result change to “passed”. I am thinking how to reset the value of the old-age attributes

    Reply
  30. Cliff says:
    November 29, 2011 at 7:38 PM

    Thanks for the awesome tutorial! I was hoping you could help me with emailing to 2 separate email accounts. The current line I have in /etc/smartd.conf is DEVICESCAN -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m [email protected] -M exec /usr/share/smartmontools/smartd-runner which works without a problem. I have tried adding [email protected],[email protected] but that doesn’t work. What is the best way to accomplish sending results to 2 emails. Thanks for the help.

    Reply
    • btmorex says:
      November 17, 2013 at 7:46 AM

      I believe you can add multiple -m options to the same line, but I haven’t tried.

      Reply
  31. Jamie says:
    December 27, 2011 at 1:30 AM

    Thanks for your hard work. If anyone has advice on suggestions for the frequency of the various tests, I would be very interested. Obviously, continually running the long test over and over would be enough th wear out the drive, but some advice on which tests to run and how often would be appreciated.

    Reply
    • btmorex says:
      November 17, 2013 at 7:47 AM

      I run the short test daily and long test weekly.

      Reply
  32. John McLean says:
    January 13, 2012 at 11:35 AM

    On a Debian system (wheezy/sid) I needed to install bsd-mailx to get smartd to send emails via sendmail:

    apt-get install bsd-mailx

    Thanks for the guide!

    Reply
    • btmorex says:
      November 17, 2013 at 7:48 AM

      You can also get the mail program from mailutils

      Reply
  33. Mbwun says:
    September 14, 2012 at 4:10 PM

    Excellent post. Thanks for you time in putting this together. One of the better walkthroughs of configuration from smartmontools out there.

    Reply
  34. laurens says:
    May 19, 2013 at 10:04 AM

    Hallo, Nice tutorial but I have hardware raid and need to use this to view data, How can I use the runner to execute this periodlically?

    smartctl -c -a /dev/cciss/c0d0p1 -d cciss,0

    Reply
    • btmorex says:
      November 17, 2013 at 7:52 AM

      If you mean configuring smartd, you can just add those options to the smartd.conf line. The smartd-runner program just executes scripts in /etc/smartmontools/run.d on failures.

      Reply
  35. Mak says:
    June 8, 2013 at 11:17 PM

    Very informative and precise tutorial with all the correct commands and screen shots
    just got round to testing my HD as its making a buzzing noise that is worse under Linux
    preliminary results are promising i will run the extended test and see what it spits out.

    Thanks dude.

    Reply
  36. Barun Saha says:
    August 17, 2013 at 1:20 PM

    Excellent tutorial! Thanks!

    Reply
  37. jmac says:
    October 12, 2013 at 1:33 PM

    Thanks for the tutorial, found it helpful. Below is a short script used on an ubuntu 12.04 system run via cron once a week to email a summary of disk information and self tests completed for each disk found at boot time. The formatting of the summary output can be changed to add or remove info as needed. It assumes you have the system configured to send mail, and will email the output to the root user.

    #!/bin/bash

    #
    # script created to provide the general disk information and smartmon test completion status for
    # all disk devices found at boot time by OS
    # 10/12/2013 jmm
    #

    export PATH=/usr/bin:/bin:/usr/sbin
    export Smart_Out=/tmp/smart.out
    export Device_file=/tmp/devs
    export HoSt=`hostname`
    export emailsubj=”`hostname` – SMART self-test summary for `date “+%A %B %d %Y”`”
    export SendTo=root

    #
    # get the devices seen at OS boot time
    #

    ls /dev/sd? > $Device_file

    #
    # for each device found in /dev get the general drive info and SMART self test status
    # send both to a temp file and do simple formatting
    #

    while IFS= read -r line
    do
    if [ “$line” = “/dev/sda” ]; then
    echo -e “The SMART status for Hard disk $line is: \n\n” > /tmp/smart.out
    smartctl -a $line|awk ‘NR>=4&&NR> /tmp/smart.out
    smartctl -l selftest $line >> /tmp/smart.out
    echo -e “=== END OF READ SMART DATA SECTION === \n\n” >> /tmp/smart.out
    else
    echo -e “The SMART status for Hard disk $line is: \n\n” >> /tmp/smart.out
    smartctl -a $line |awk ‘NR>=4&&NR> /tmp/smart.out
    smartctl -l selftest $line >> /tmp/smart.out
    echo -e “=== END OF READ SMART DATA SECTION === \n\n” >> /tmp/smart.out
    fi

    done < "$Device_file"

    #
    # send output to the appropriate user
    #

    cat $Smart_Out | mailx -s "$emailsubj" $SendTo
    logger $emailsubj

    rm $Device_file $Smart_Out

    Reply
    • btmorex says:
      November 17, 2013 at 7:56 AM

      Thanks. Could you upload it to github or one of those text hosting sites and link it? WordPress automatically changes a lot of characters.

      Reply
  38. Clement says:
    November 20, 2013 at 4:23 PM

    Hi,

    I think it has been spotted in a previous post but with another command, for this part of the article:
    “Unfortunately, there

    Reply
    • Clement says:
      November 20, 2013 at 4:24 PM

      End of the comment was:

      You can find out the advancement of your test using the command:

      smartctl –capabilities /dev/sdX

      It will show the advancement for your test in percentage.

      Thanks for the tutorial, really helpful!

      Cheers,
      Clem

      Reply
  39. arpee says:
    April 1, 2014 at 5:42 PM

    Tried setting up on CentOs 6.5, tried to restart the service using “/etc/init.d/smartmontools” restart but got an error.
    “smartmontools” is not located in the “/etc/init.d/” directory, but smartd is. is getting the smartd service started enough to get smartmontools working?

    Reply
    • btmorex says:
      April 3, 2014 at 9:08 PM

      Yup, should be.

      Reply
  40. felipe1982 says:
    December 22, 2014 at 8:54 PM

    You should include what ERRORS look like, so I can write a script which greps then emails sysadmin if errors found.

    Reply
  41. Satish says:
    February 27, 2015 at 1:22 AM

    Thanks for publishing this article. I had few questions:
    1. Shall I rely on smartctl -H to see of the device is in good health ? Or do I need to do further selftest, short or long test ? My aim is just find if the disk is fine or not for read and write. We have a high availability solution and we want to use this utility to failover to standby node in case of any issue with the disk.

    2. Is health check with -H option or selftest or short test – are they handled by the device driver independently ? Or they consume some CPU cycles ? Any data read or write is involved to run these tests that takes CPU times ?

    Reply
  42. Mick says:
    October 3, 2015 at 7:24 AM

    I cannot get the mail notification to work. After some tries I got sendmail to work from command line using a gmail account, but having set it to test using “-M test” it now generates a mail every 20 minutes but the subject is
    Cron test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp

    and the message is
    /usr/share/sendmail/sendmail: 899: /usr/share/sendmail/sendmail: /usr/sbin/sendmail-msp: not found

    Any idea what I need to do?

    This is running on Linux Mint 17

    Reply
  43. PsySc0rpi0n says:
    November 29, 2015 at 10:03 AM

    Hello…

    I’m trying the short test on an SSD drive but it looks like that is not ending! It freezes at 10% remaining and it doesn’t get’s to 0% remaining! I have to abort it with the “-X” flag!

    I’m using:
    sudo smartctl -t short /dev/sda

    then when I issue:
    sudo smartctl -l selftest /dev/sda

    I get this, which is an aborted previous test:
    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Short offline Aborted by host 10% 14687 –

    If I try to run again the command for the short test it says:
    smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.1.0-0.bpo.2-amd64] (local build)
    Copyright (C) 2002-14, Bruce Allen, Christian Franke, http://www.smartmontools.org

    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Can’t start self-test without aborting current test (10% remaining),
    add ‘-t force’ option to override, or run ‘smartctl -X’ to abort test.

    Reply
  44. D hari shankar says:
    May 9, 2016 at 11:20 PM

    I want to use smarttool for C++ code, is there any exposed C/C++ api available

    Thanks
    Hari Shankar

    Reply
  45. Royce Lithgo says:
    October 30, 2016 at 4:58 PM

    Weekly long test is generating a temp warning as the drive temp reaches 46 degrees near the end of the test. Normal operating temp is 38 degrees. Short test isn’t a problem.

    Perhaps weekly long tests are doing more harm than good?

    Reply
  46. NssY says:
    January 7, 2017 at 3:01 AM

    Thanks for the information. I’ve managed to set everything up as indicated. Its a shame for me that am doing this only after my disk died.
    Much appreciated for this blog.

    Reply
  47. Naveed Iqbal says:
    March 11, 2017 at 11:28 AM

    i have question….can you tell me how to recover linux window without reinstalling if i got some issue in system?

    Reply

Leave a Reply

Click here to cancel reply.

  • Topics

    • Personal (2)
    • Technology (13)
      • Linux (11)
      • Web (4)
  • Meta

    • Log in
    • Valid XHTML

Feed | Privacy Policy
Copyright © 2008-2013 Random Bits
Powered by Wordpress