S.M.A.R.T. is a system in modern hard drives designed to report conditions that may indicate impending failure. smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Although smartmontools runs on a number of platforms, I will only cover installing and configuring it on Linux.
Why Use S.M.A.R.T.?
Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. There is some amount of conflicting information on the internet about how reliable the warnings are. The best source of research that I found is a paper from Google that describes an internal study of hard drive failure. A quick summary: certain events greatly increase the chance of hard drive failure including reallocation events and failed self-tests, but only about 60% of the drives that failed in the study had any negative S.M.A.R.T. attributes. Obviously, nothing replaces regular backups.
A good source for more information is the S.M.A.R.T. wikipedia page.
Installation
On Debian or Ubuntu systems:
$ sudo apt-get install smartmontools
On Fedora:
$ sudo yum install smartmontools
Capabilities and Initial Tests
smartmontools comes with two programs: smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T. Let’s look at smartctl first:
$ sudo smartctl -i /dev/sda
Replace /dev/sda with your hard drive’s device file in this command and all subsequent commands. If there’s only one hard drive in the system, it should be /dev/sda or /dev/hda. If this command fails, you may need to let smartctl know what type of hard drive interface you’re using:
$ sudo smartctl -d TYPE -i /dev/sda
where TYPE is usually one of ata, scsi, or sat (for serial ata). See the smartctl man page for more information. Note that if you need -d here, you will need to add it to all smartctl commands. This should print information similar to:
=== START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint T133 series Device Model: SAMSUNG HD300LJ Serial Number: S0D7J1UL303628 Firmware Version: ZT100-12 User Capacity: 300,067,970,560 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Fri Jan 2 03:08:20 2009 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled
Now that smartctl can access the drive, let’s turn on some features. Run the following command:
$ sudo smartctl -s on -o on -S on /dev/sda
- -s on: This turns on S.M.A.R.T. support or does nothing if it’s already enabled.
- -o on: This turns on offline data collection. Offline data collection periodically updates certain S.M.A.R.T. attributes. Theoretically this could have a performance impact. However, from the smartctl man page:
Normally, the disk will suspend offline testing while disk accesses are taking place, and then automatically resume it when the disk would otherwise be idle, so in practice it has little effect.
- -S on: This enables “autosave of device vendor-specific Attributes”.
The command should return:
=== START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. SMART Attribute Autosave Enabled. SMART Automatic Offline Testing Enabled every four hours.
Next, let’s check the overall health:
$ sudo smartctl -H /dev/sda
This command should return:
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
If it doesn’t return PASSED, you should immediately backup all your data. Your hard drive is probably failing. Next, let’s make sure that the drive supports self-tests. I have yet to see a drive that doesn’t, but the following command also gives time estimates for each test:
$ sudo smartctl -c /dev/sda
I won’t list the complete output because it’s somewhat lengthy. Make sure “Self-test supported” appears in the “Offline data collection capabilities” section. Also, look for output similar to:
Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 127) minutes.
These are rough estimates of how long the short and long self-test’s will take respectively. Let’s run the short test:
$ sudo smartctl -t short /dev/sda
On my drive, this test should take 2 minutes, but this obviously varies. You can run:
$ sudo smartctl -l selftest /dev/sda
to check results. Unfortunately, there’s no way to check progress, so just keep running that command until the results show up. A successful run will look like:
=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 21472 -
Now, do the same for the long self-test:
$ sudo smartctl -t long /dev/sda
The long test can take a significant amount of time. You might want to run it overnight and check for the results in the morning. If either test fails, you should immediately backup all your data and read the last section of this guide.
Configuring smartd
We’ve now enabled some features and run the basic tests. Instead of repeating the previous section daily, we can setup smartd to do it all automatically. If your system has an /etc/smartd.conf file, check for a line that begins with DEVICESCAN. If you find one comment it out by adding a ‘#’ to the beginning of the line. DEVICESCAN doesn’t work on my system and specifying a device file is easy. Add the following line to /etc/smartd.conf:
/dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner
Here’s what each option does:
- /dev/sda: Replace this with the device file you’ve been using in smartctl commands.
- -a: This enables some common options. You almost certainly want to use it.
- -d sat: On my system, smartctl correctly guesses that I have a serial ata drive. smartd on the other hand does not. If you had to add a “-d TYPE” parameter to the smartctl commands, you’ll almost certainly have to do the same here. If you didn’t, try leaving it out initially. You can add it later if smartd fails to start.
- -o on, -S on: These have the same meaning as the smartctl equivalents
- -s (S/../.././02|L/../../6/03): This schedules the short and long self-tests. In this example, the short self-test will run daily at 2:00 A.M. The long test will run on Saturday’s at 3:00 A.M. For more information, see the smartd.conf man page.
- -m root: If any errors occur, smartd will send email to root. On my system, mail for root is forwarded to my normal email account. If you don’t have a similar setup, replace root with your normal email address. This option also requires a working email setup. Most Linux distributions automatically have working outbound email.
- -M exec /usr/share/smartmontools/smartd-runner: This last part may be specific to the Debian and Ubuntu smartmontools packages. Check if your system has /usr/share/smartmontools/smartd-runner. If it doesn’t, remove this option. Instead of sending email directly, “-M exec” makes smartd run a different command when errors occur. On Debian, smartd-runner will run each script in /etc/smartmontools/run.d/, one of which emails the user specified by the “-m” option.
If you have more than one hard drive in your system, add a line for each one replacing /dev/sda with a different device file.
Update on 2009-01-06:
Thanks to commenter robert for pointing out an omission on my part. If your system has the file /etc/default/smartmontools, uncomment the “#start_smartd=yes” line by removing the “#”.
Finally, restart smartd:
$ sudo /etc/init.d/smartmontools restart
If this command fails, the end of /var/log/daemon.log should have some diagnostic information. If smartd started fine, we should still test that email notifications are working. Add “-M test” to the end of the configuration line in /etc/smartd.conf. This will make smartd send out a test notification when it’s next started. Once again, restart smartd:
$ sudo /etc/init.d/smartmontools restart
You should receive an email similar to:
This email was generated by the smartd daemon running on: host name: polar DNS domain: shadypixel.com NIS domain: (none) The following warning/error was logged by the smartd daemon: TEST EMAIL from smartd for device: /dev/sda For details see host's SYSLOG (default: /var/log/syslog).
Afterward, you can delete “-M test”.
What To Do If smartd Detects Problems
First, immediately backup everything. Depending on the error, your drive might be close to death or it may still have a long life ahead. Consult the smartmontools FAQ. It has some recommendations for specific errors. Otherwise, ask for help on the smartmontools-support mailing list.
Hey, nice intro. Small addition: on my Xubuntu intrepid and jaunty (alpha) installation, I had to uncomment the line ‘#start_smartd=yes’ in the file /etc/default/smartmontools.
Cheers
Thanks robert, I updated the article.
No problem, I expected smartd to start as well after running ‘sudo /etc/init.d/smartmontools restart’ :) Once again, nice article.
Hi btmorex, nice howto.
I configured my smartd.conf like this:
dev/sdb -I 194 -a -o on -S on -s (S/../.././03|L/../../6/04) \
-m sys@base.com \
-M exec /usr/share/smartmontools/smartd-runner
Also, by adding “-M test”, I tested email notifications and received test email message.
As you see, each morning my HDD is tested, but I didn’t received any email notification about test results.
Probably, notifications are sent when something is getting wrong, am I right on this point?
Right now my drive is reports OK status with “smartctl -H” command.
Thanks a lot again.
Agip,
It sounds like you’ve set it up correctly. You’re right that smartd will only email you if there is a problem. If you want to look at the test results, you can do:
smartctl -l selftest /dev/sdb
I appreciate this effort vary much. Many thanks
You can view the progress of a self test by running smartctl -Hc /dev/XXX. It will be across from the “Self Test Execution Status”. Should look something like this:
Self-test routine in progress…
70% of test remaining.
Hi btmorex
Nice work.
I had to remove the first line in the /etc/smartd.conf:
# *SMARTD*AUTOGENERATED* /etc/smartd.conf
without doing this, all changes are lost after restarting the deamon.
Cheers
Nice work! using smart to do checks before its too late.
For Xubuntu users, I’ve made a little script that will work together with smartd to pop up a notification in case of any hard disk trouble. See http://ubuntuforums.org/showthread.php?t=1031244
I think it can be easily adapted to Ubuntu though.
@Agip: a mail server needs to be installed (and probably configured) if you’d like e-mail notifications. You either try my script (:)), or use the smart-notifier package if available for your distribution.
Cheers
Great tutorial!
Just one question: what is a reasonable schedule for the short, long and offline tests?
Short: every day?
Long: Once per week?
Offline: ???
Great tut and really useful! Really easy to understand. Thanks!
[...] via Random Bits. [...]
Great tutorial, I’ve been using smartctl for years, but never got around to setting up smartd. Thanks for making it easy to setup.
[...] Monitoring Hard Drive Health on Linux with smartmontools | Random Bits (tags: hardware linux) [...]
Great info!
One question, will SMART tool function correctly on an un-formatted drive?
Say I found an old drive that is raw, can I run SMART on it?
Thanks,
Alex
Yes, it will work fine. The SMART tests are completely independent of what data is stored on the drive.
here didn’t work :(
takes forever with no result.
no error message anyway.
Are you talking about running a self test? You need to check results too. Run
smartctl -l selftest <device>hi Mark,
i’m still not 100% sure about this but from my initial testing of smartmontools it appears smartctl needs at least one disk partition to be mounted otherwise it just stays at the “90% completed” point for some time until smartctl eventually gives up and kills the test with a message like this:
“# 7 Short offline Interrupted (host reset) 90% 2455 -”
if you have a partition on your disks, try mounting it before the test.
if not, maybe it is possible to mount the disk it’s self as a raw device? (i don’t know. haven’t tested it yet.)
dunk.
That’s odd that it works when you mount something. I know that having a partition mounted is not a requirement though, as I run tests daily against drives that are almost never mounted.
Any tips on how to run these commands on a hard drive connected via USB?
USB support for SMART commands isn’t great and it depends on the specific chipset of that your USB enclosure uses. See http://smartmontools.wiki.sourceforge.net/USB and http://smartmontools.wiki.sourceforge.net/overview_USB-Support
If you’re lucky, something like:
# smartctl -i -d sat,12 <device>will work. If you’re unlucky, nothing will work.
Thanks so much, that worked like a charm with my USB enclosure.
Thank you. Clearly written and informative. Other explanations always left me a bit dazed and confused.
Many thanks,
Clear, consise and just what I needed
Cheers
K
;)
Just thought to add….
http://tazbuntu.blogspot.com/2008/12/check-your-hard-drive-smart-status.html
It adds GsmartControl, what appears to be a useful gui. I liked the fact that you were able to read the results of the test logs described above.
Cheers
K
;)
I actually have a half-written post about gsmartcontrol :)
It’s a nice program although I prefer the set-it-up-and-forget-about-it nature about smartd.
Thank you, setting up smartd went fine, however I cannot persuade the system to send mail which somehow makes the whole thing useless.
I use postfix on Ubuntu server 8.04. I can send mail from the command line; I installed logwatch which can mail as well, but when smartd tries to send out mail, it always fails with the following error message:
“Test of mail to root produced unexpected output (90 bytes) to STDOUT/STDERR: send-mail: invalid option — i Can’t send mail: sendmail process failed with error code 1″
I spent a lot of time on Google, configuration files, forwarding etc., but the result is always the same.
Does anybody have an idea of what might go wrong? Thank you.
Zdenek
I haven’t been able to get it to send a mail either, but logwatch manages just fine.
Wondering if I have the same problem you do, however I haven’t been able to locate any kind of error message in the logs, where exactly do you find yours?
Thanks for the guide, very useful. However I got a problem: my second hard disk has some unreadable sector, every time I boot up the PC a new mail is sent.
Is there a way to get smartd to send mail only when a new short/long test is performed? I just want to monitor the situation, not receive the same mail every time i boot the PC….
Thanks for any help.
Resolved: I added “-U 0 -C 0″ to disable reporting of unreadable sectors.
Thanks for the guide; worked perfectly and easy to follow. The hardest part was setting up postfix!
Thoughts on usefulness of doing more than regular short and long tests? For example from the sample config file:
# Monitor all attributes except normalized Temperature (usually 194),
# but track Temperature changes >= 4 Celsius, report Temperatures
# >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
# Send mail on SMART failures or when Temperature is >= 55 Celsius.
#/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com
Best
Charles
Thanks!! Your tutorial is very useful!
[...] Monitoring Hard Drive Health on Linux with smartmontools | Random Bits (tags: Linux hardware disk health monitoring smart) Visit my other sites: Photo Gallery | Error_Success programming blog | Main website « 75C CPU Temp [...]
Ok, I’m running the long test, I’ve figured out how to see the progress it’s making as it goes along, but I can’t figure out how to view the results. I have 4 drives testing simultaneously – so I want to view the results separately, and maybe several times.
THANK YOU,
David
I found it –
smartctl -Hc /dev/sdx
Shows the progress and the code with interpretation if the test has completed.
Thanks for a great tutorial!
Great tutorial, thank you very much!
Hi!
Is there any chance, that one can estimate the remaining lifetime of the hard drive based on some of the S.M.A.R.T attributes (a very rough estimation is perfect for what i am doing.) I know that SMART data is correct but you cannot rely on it to catch a fail, however if there is such a formula to roughly estimate the remaining lifetime i will be very greatful.
10x in advance.
No, you can’t really make an estimate like that. Actually, Google did there S.M.A.R.T. study (http://labs.google.com/papers/disk_failures.pdf) to find out exactly what you’re asking. The conclusion they reached is that even though some values have predictive value, they are nowhere near good enough to actually preemptively replace hard drives (which is very similar to estimating remaining life).
Hey!
10x for your reply. However, the study says that if you combine all parameters only 36 % of all failed drives were unable to predict or have zero values, so actually this is quite good for me. What is more, even if I take only the 4 important parameters into account I will be successful in 44% of the cases. Combining this with the age of the hard drive will be enough for me… So are you aware of a formula or combination of these parameters in a way that I can estimate the health (or the remaining life time) of a hdd.
Thanks in advance…
To answer your question right away: I don’t have any formula.
I want to add though that I think what you’ll find is that you’ll be able to split drives into two groups: one group will have no predictive S.M.A.R.T. values, and one group will have one or more values that indicate imminent failure. There’s no doubt that that’s valuable, but I don’t think you’ll be able to estimate remaining life with any accuracy for most of the drives.
cool!
thanks :-)
Pol
[...] http://www.adslgr.com/forum/showthread.php?p=278455#poststophttp://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/also: http://denispyr.blogspot.com/2009/12/local-mail.html. Use denispyr@localhost as the email [...]
Here’s an oddity. I’m testing smartmontools vs. cciss_vol_status on an HP with external array, and getting some inconsistent results.
I know that one drive has failed.
I know that another drive is in jeopardy (which is why I’m testing on the box I’m testing on).
Running smartctl, I see in my health report that the second drive is in danger, but it makes no mention of the failed drive.
Then, running cciss_vol_status, I see that the first drive has failed, but no mention is made of the second.
I’ll post this in the cciss_vol_status forum as well, but I find it interesting that the two utilities show such different results!
What is cciss_vol_status actually checking? One possibility is that the drive is completely dead. There would be no S.M.A.R.T. status, but cciss_vol_status would know that there was supposed to be a drive there so it could determine it was dead.
As for the one that’s failing, probably cciss_vol_status isn’t checking S.M.A.R.T. status (I have no idea because I haven’t used it).
Hi, check out:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1404978
[...] http://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/ [...]
[...] … Accroding to [1], this seems to indicate the drive is ok. Thanks again for your help. [1]. http://blog.shadypixel.com/monitorin…smartmontools/ [...]
You can pass sudo smartctl -l selftest /dev/sda as argument to watch in order to follow its progress.