S.M.A.R.T. is a system in modern hard drives designed to report conditions that may indicate impending failure. smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Although smartmontools runs on a number of platforms, I will only cover installing and configuring it on Linux.
Why Use S.M.A.R.T.?
Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. There is some amount of conflicting information on the internet about how reliable the warnings are. The best source of research that I found is a paper from Google that describes an internal study of hard drive failure. A quick summary: certain events greatly increase the chance of hard drive failure including reallocation events and failed self-tests, but only about 60% of the drives that failed in the study had any negative S.M.A.R.T. attributes. Obviously, nothing replaces regular backups.
A good source for more information is the S.M.A.R.T. wikipedia page.
Installation
On Debian or Ubuntu systems:
$ sudo apt-get install smartmontools
On Fedora:
$ sudo yum install smartmontools
Capabilities and Initial Tests
smartmontools comes with two programs: smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T. Let’s look at smartctl first:
$ sudo smartctl -i /dev/sda
Replace /dev/sda with your hard drive’s device file in this command and all subsequent commands. If there’s only one hard drive in the system, it should be /dev/sda or /dev/hda. If this command fails, you may need to let smartctl know what type of hard drive interface you’re using:
$ sudo smartctl -d TYPE -i /dev/sda
where TYPE is usually one of ata, scsi, or sat (for serial ata). See the smartctl man page for more information. Note that if you need -d here, you will need to add it to all smartctl commands. This should print information similar to:
=== START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint T133 series Device Model: SAMSUNG HD300LJ Serial Number: S0D7J1UL303628 Firmware Version: ZT100-12 User Capacity: 300,067,970,560 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Fri Jan 2 03:08:20 2009 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled
Now that smartctl can access the drive, let’s turn on some features. Run the following command:
$ sudo smartctl -s on -o on -S on /dev/sda
- -s on: This turns on S.M.A.R.T. support or does nothing if it’s already enabled.
- -o on: This turns on offline data collection. Offline data collection periodically updates certain S.M.A.R.T. attributes. Theoretically this could have a performance impact. However, from the smartctl man page:
Normally, the disk will suspend offline testing while disk accesses are taking place, and then automatically resume it when the disk would otherwise be idle, so in practice it has little effect.
- -S on: This enables “autosave of device vendor-specific Attributes”.
The command should return:
=== START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. SMART Attribute Autosave Enabled. SMART Automatic Offline Testing Enabled every four hours.
Next, let’s check the overall health:
$ sudo smartctl -H /dev/sda
This command should return:
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
If it doesn’t return PASSED, you should immediately backup all your data. Your hard drive is probably failing. Next, let’s make sure that the drive supports self-tests. I have yet to see a drive that doesn’t, but the following command also gives time estimates for each test:
$ sudo smartctl -c /dev/sda
I won’t list the complete output because it’s somewhat lengthy. Make sure “Self-test supported” appears in the “Offline data collection capabilities” section. Also, look for output similar to:
Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 127) minutes.
These are rough estimates of how long the short and long self-test’s will take respectively. Let’s run the short test:
$ sudo smartctl -t short /dev/sda
On my drive, this test should take 2 minutes, but this obviously varies. You can run:
$ sudo smartctl -l selftest /dev/sda
to check results. Unfortunately, there’s no way to check progress, so just keep running that command until the results show up. A successful run will look like:
=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 21472 -
Now, do the same for the long self-test:
$ sudo smartctl -t long /dev/sda
The long test can take a significant amount of time. You might want to run it overnight and check for the results in the morning. If either test fails, you should immediately backup all your data and read the last section of this guide.
Configuring smartd
We’ve now enabled some features and run the basic tests. Instead of repeating the previous section daily, we can setup smartd to do it all automatically. If your system has an /etc/smartd.conf file, check for a line that begins with DEVICESCAN. If you find one comment it out by adding a ‘#’ to the beginning of the line. DEVICESCAN doesn’t work on my system and specifying a device file is easy. Add the following line to /etc/smartd.conf:
/dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner
Here’s what each option does:
- /dev/sda: Replace this with the device file you’ve been using in smartctl commands.
- -a: This enables some common options. You almost certainly want to use it.
- -d sat: On my system, smartctl correctly guesses that I have a serial ata drive. smartd on the other hand does not. If you had to add a “-d TYPE” parameter to the smartctl commands, you’ll almost certainly have to do the same here. If you didn’t, try leaving it out initially. You can add it later if smartd fails to start.
- -o on, -S on: These have the same meaning as the smartctl equivalents
- -s (S/../.././02|L/../../6/03): This schedules the short and long self-tests. In this example, the short self-test will run daily at 2:00 A.M. The long test will run on Saturday’s at 3:00 A.M. For more information, see the smartd.conf man page.
- -m root: If any errors occur, smartd will send email to root. On my system, mail for root is forwarded to my normal email account. If you don’t have a similar setup, replace root with your normal email address. This option also requires a working email setup. Most Linux distributions automatically have working outbound email.
- -M exec /usr/share/smartmontools/smartd-runner: This last part may be specific to the Debian and Ubuntu smartmontools packages. Check if your system has /usr/share/smartmontools/smartd-runner. If it doesn’t, remove this option. Instead of sending email directly, “-M exec” makes smartd run a different command when errors occur. On Debian, smartd-runner will run each script in /etc/smartmontools/run.d/, one of which emails the user specified by the “-m” option.
If you have more than one hard drive in your system, add a line for each one replacing /dev/sda with a different device file.
Update on 2009-01-06:
Thanks to commenter robert for pointing out an omission on my part. If your system has the file /etc/default/smartmontools, uncomment the “#start_smartd=yes” line by removing the “#”.
Finally, restart smartd:
$ sudo /etc/init.d/smartmontools restart
If this command fails, the end of /var/log/daemon.log should have some diagnostic information. If smartd started fine, we should still test that email notifications are working. Add “-M test” to the end of the configuration line in /etc/smartd.conf. This will make smartd send out a test notification when it’s next started. Once again, restart smartd:
$ sudo /etc/init.d/smartmontools restart
You should receive an email similar to:
This email was generated by the smartd daemon running on: host name: polar DNS domain: shadypixel.com NIS domain: (none) The following warning/error was logged by the smartd daemon: TEST EMAIL from smartd for device: /dev/sda For details see host's SYSLOG (default: /var/log/syslog).
Afterward, you can delete “-M test”.
What To Do If smartd Detects Problems
First, immediately backup everything. Depending on the error, your drive might be close to death or it may still have a long life ahead. Consult the smartmontools FAQ. It has some recommendations for specific errors. Otherwise, ask for help on the smartmontools-support mailing list.
78 replies on “Monitoring Hard Drive Health on Linux with smartmontools”
Hey, nice intro. Small addition: on my Xubuntu intrepid and jaunty (alpha) installation, I had to uncomment the line ‘#start_smartd=yes’ in the file /etc/default/smartmontools.
Cheers
Thanks robert, I updated the article.
No problem, I expected smartd to start as well after running ‘sudo /etc/init.d/smartmontools restart’ :) Once again, nice article.
Hi btmorex, nice howto.
I configured my smartd.conf like this:
dev/sdb -I 194 -a -o on -S on -s (S/../.././03|L/../../6/04) \
-m sys@base.com \
-M exec /usr/share/smartmontools/smartd-runner
Also, by adding “-M test”, I tested email notifications and received test email message.
As you see, each morning my HDD is tested, but I didn’t received any email notification about test results.
Probably, notifications are sent when something is getting wrong, am I right on this point?
Right now my drive is reports OK status with “smartctl -H” command.
Thanks a lot again.
Agip,
It sounds like you’ve set it up correctly. You’re right that smartd will only email you if there is a problem. If you want to look at the test results, you can do:
smartctl -l selftest /dev/sdb
You can view the progress of a self test by running smartctl -Hc /dev/XXX. It will be across from the “Self Test Execution Status”. Should look something like this:
Self-test routine in progress…
70% of test remaining.
Hi btmorex
Nice work.
I had to remove the first line in the /etc/smartd.conf:
# *SMARTD*AUTOGENERATED* /etc/smartd.conf
without doing this, all changes are lost after restarting the deamon.
Cheers
For Xubuntu users, I’ve made a little script that will work together with smartd to pop up a notification in case of any hard disk trouble. See http://ubuntuforums.org/showthread.php?t=1031244
I think it can be easily adapted to Ubuntu though.
@Agip: a mail server needs to be installed (and probably configured) if you’d like e-mail notifications. You either try my script (:)), or use the smart-notifier package if available for your distribution.
Cheers
Great tutorial!
Just one question: what is a reasonable schedule for the short, long and offline tests?
Short: every day?
Long: Once per week?
Offline: ???
Offline data collection is just on/off, so turn it on. As for schedule, the config above does a short test every day and a long test once a week.
Great tut and really useful! Really easy to understand. Thanks!
Great tutorial, I’ve been using smartctl for years, but never got around to setting up smartd. Thanks for making it easy to setup.
Great info!
One question, will SMART tool function correctly on an un-formatted drive?
Say I found an old drive that is raw, can I run SMART on it?
Thanks,
Alex
Yes, it will work fine. The SMART tests are completely independent of what data is stored on the drive.
here didn’t work :(
takes forever with no result.
no error message anyway.
Are you talking about running a self test? You need to check results too. Run
smartctl -l selftest
hi Mark,
i’m still not 100% sure about this but from my initial testing of smartmontools it appears smartctl needs at least one disk partition to be mounted otherwise it just stays at the “90% completed” point for some time until smartctl eventually gives up and kills the test with a message like this:
“# 7 Short offline Interrupted (host reset) 90% 2455 -”
if you have a partition on your disks, try mounting it before the test.
if not, maybe it is possible to mount the disk it’s self as a raw device? (i don’t know. haven’t tested it yet.)
dunk.
That’s odd that it works when you mount something. I know that having a partition mounted is not a requirement though, as I run tests daily against drives that are almost never mounted.
Any tips on how to run these commands on a hard drive connected via USB?
USB support for SMART commands isn’t great and it depends on the specific chipset of that your USB enclosure uses. See http://smartmontools.wiki.sourceforge.net/USB and http://smartmontools.wiki.sourceforge.net/overview_USB-Support
If you’re lucky, something like:
# smartctl -i -d sat,12
will work. If you’re unlucky, nothing will work.
Thanks so much, that worked like a charm with my USB enclosure.
Thank you. Clearly written and informative. Other explanations always left me a bit dazed and confused.
Many thanks,
Clear, consise and just what I needed
Cheers
K
;)
Just thought to add….
http://tazbuntu.blogspot.com/2008/12/check-your-hard-drive-smart-status.html
It adds GsmartControl, what appears to be a useful gui. I liked the fact that you were able to read the results of the test logs described above.
Cheers
K
;)
I actually have a half-written post about gsmartcontrol :)
It’s a nice program although I prefer the set-it-up-and-forget-about-it nature about smartd.
Thank you, setting up smartd went fine, however I cannot persuade the system to send mail which somehow makes the whole thing useless.
I use postfix on Ubuntu server 8.04. I can send mail from the command line; I installed logwatch which can mail as well, but when smartd tries to send out mail, it always fails with the following error message:
“Test of mail to root produced unexpected output (90 bytes) to STDOUT/STDERR: send-mail: invalid option — i Can’t send mail: sendmail process failed with error code 1”
I spent a lot of time on Google, configuration files, forwarding etc., but the result is always the same.
Does anybody have an idea of what might go wrong? Thank you.
Zdenek
I haven’t been able to get it to send a mail either, but logwatch manages just fine.
Wondering if I have the same problem you do, however I haven’t been able to locate any kind of error message in the logs, where exactly do you find yours?
Try a different mail package, eg mailutils
Thanks for the guide, very useful. However I got a problem: my second hard disk has some unreadable sector, every time I boot up the PC a new mail is sent.
Is there a way to get smartd to send mail only when a new short/long test is performed? I just want to monitor the situation, not receive the same mail every time i boot the PC….
Thanks for any help.
Resolved: I added “-U 0 -C 0” to disable reporting of unreadable sectors.
You might be better off replacing the drive…
Thanks for the guide; worked perfectly and easy to follow. The hardest part was setting up postfix!
Thoughts on usefulness of doing more than regular short and long tests? For example from the sample config file:
# Monitor all attributes except normalized Temperature (usually 194),
# but track Temperature changes >= 4 Celsius, report Temperatures
# >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
# Send mail on SMART failures or when Temperature is >= 55 Celsius.
#/dev/hdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com
Best
Charles
Thanks!! Your tutorial is very useful!
Ok, I’m running the long test, I’ve figured out how to see the progress it’s making as it goes along, but I can’t figure out how to view the results. I have 4 drives testing simultaneously – so I want to view the results separately, and maybe several times.
THANK YOU,
David
I found it –
smartctl -Hc /dev/sdx
Shows the progress and the code with interpretation if the test has completed.
Thanks for a great tutorial!
Great tutorial, thank you very much!
Hi!
Is there any chance, that one can estimate the remaining lifetime of the hard drive based on some of the S.M.A.R.T attributes (a very rough estimation is perfect for what i am doing.) I know that SMART data is correct but you cannot rely on it to catch a fail, however if there is such a formula to roughly estimate the remaining lifetime i will be very greatful.
10x in advance.
No, you can’t really make an estimate like that. Actually, Google did there S.M.A.R.T. study to find out exactly what you’re asking. The conclusion they reached is that even though some values have predictive value, they are nowhere near good enough to actually preemptively replace hard drives (which is very similar to estimating remaining life).
Hey!
10x for your reply. However, the study says that if you combine all parameters only 36 % of all failed drives were unable to predict or have zero values, so actually this is quite good for me. What is more, even if I take only the 4 important parameters into account I will be successful in 44% of the cases. Combining this with the age of the hard drive will be enough for me… So are you aware of a formula or combination of these parameters in a way that I can estimate the health (or the remaining life time) of a hdd.
Thanks in advance…
To answer your question right away: I don’t have any formula.
I want to add though that I think what you’ll find is that you’ll be able to split drives into two groups: one group will have no predictive S.M.A.R.T. values, and one group will have one or more values that indicate imminent failure. There’s no doubt that that’s valuable, but I don’t think you’ll be able to estimate remaining life with any accuracy for most of the drives.
cool!
thanks :-)
Pol
Here’s an oddity. I’m testing smartmontools vs. cciss_vol_status on an HP with external array, and getting some inconsistent results.
I know that one drive has failed.
I know that another drive is in jeopardy (which is why I’m testing on the box I’m testing on).
Running smartctl, I see in my health report that the second drive is in danger, but it makes no mention of the failed drive.
Then, running cciss_vol_status, I see that the first drive has failed, but no mention is made of the second.
I’ll post this in the cciss_vol_status forum as well, but I find it interesting that the two utilities show such different results!
What is cciss_vol_status actually checking? One possibility is that the drive is completely dead. There would be no S.M.A.R.T. status, but cciss_vol_status would know that there was supposed to be a drive there so it could determine it was dead.
As for the one that’s failing, probably cciss_vol_status isn’t checking S.M.A.R.T. status (I have no idea because I haven’t used it).
Hi, check out:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1404978
You can pass sudo smartctl -l selftest /dev/sda as argument to watch in order to follow its progress.
I have huge problem with my disks, my disks are killed by bad IO synchronisation. I’ve ran the iotop check and i see all processes in IO column on 99,90%, What to do to make my disks again stabil to gaing synchronisation.
I runs game servers on my dedicated server and all my customers have lags and can’t play, what to do please help :?
Get a better server?
Great tut there.
There seems to be some interest in an automatic periodic “all-is-well” email notification perhaps containing the info from the last health checks. Since smartmon normally only sends an email when there is a problem, can we add a line to smartd.conf that forces an “all-is-well” email to be sent, say, monthly? Or would we have to cron a scratch built script which uses smartctl to do that?
Anybody know how with smartd? or have the “cronable” script for smartctl? or pehaps a how-to to get some monitoring program to get this feature?
There’s no way to get that with standard smartd, so it would have to be a script that parsed the output of smartctl.
smartctl has “-s on” option to make the hard disk to support S.M.A.R.T. For some new hard disk, it is required to set at the beginning. However, sine the new hard disk doesn’t contain any SMART information, for smart health check, it would be show failed. But, after a day, the result change to “passed”. I am thinking how to reset the value of the old-age attributes
Thanks for the awesome tutorial! I was hoping you could help me with emailing to 2 separate email accounts. The current line I have in /etc/smartd.conf is DEVICESCAN -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m email1@gmail.com -M exec /usr/share/smartmontools/smartd-runner which works without a problem. I have tried adding email1@gmail.com,email2@gmail.com but that doesn’t work. What is the best way to accomplish sending results to 2 emails. Thanks for the help.
I believe you can add multiple -m options to the same line, but I haven’t tried.
Thanks for your hard work. If anyone has advice on suggestions for the frequency of the various tests, I would be very interested. Obviously, continually running the long test over and over would be enough th wear out the drive, but some advice on which tests to run and how often would be appreciated.
I run the short test daily and long test weekly.
On a Debian system (wheezy/sid) I needed to install bsd-mailx to get smartd to send emails via sendmail:
apt-get install bsd-mailx
Thanks for the guide!
You can also get the mail program from mailutils
Excellent post. Thanks for you time in putting this together. One of the better walkthroughs of configuration from smartmontools out there.
Hallo, Nice tutorial but I have hardware raid and need to use this to view data, How can I use the runner to execute this periodlically?
smartctl -c -a /dev/cciss/c0d0p1 -d cciss,0
If you mean configuring smartd, you can just add those options to the smartd.conf line. The smartd-runner program just executes scripts in /etc/smartmontools/run.d on failures.
Very informative and precise tutorial with all the correct commands and screen shots
just got round to testing my HD as its making a buzzing noise that is worse under Linux
preliminary results are promising i will run the extended test and see what it spits out.
Thanks dude.
Excellent tutorial! Thanks!
Thanks for the tutorial, found it helpful. Below is a short script used on an ubuntu 12.04 system run via cron once a week to email a summary of disk information and self tests completed for each disk found at boot time. The formatting of the summary output can be changed to add or remove info as needed. It assumes you have the system configured to send mail, and will email the output to the root user.
#!/bin/bash
#
# script created to provide the general disk information and smartmon test completion status for
# all disk devices found at boot time by OS
# 10/12/2013 jmm
#
export PATH=/usr/bin:/bin:/usr/sbin
export Smart_Out=/tmp/smart.out
export Device_file=/tmp/devs
export HoSt=`hostname`
export emailsubj=”`hostname` – SMART self-test summary for `date “+%A %B %d %Y”`”
export SendTo=root
#
# get the devices seen at OS boot time
#
ls /dev/sd? > $Device_file
#
# for each device found in /dev get the general drive info and SMART self test status
# send both to a temp file and do simple formatting
#
while IFS= read -r line
do
if [ “$line” = “/dev/sda” ]; then
echo -e “The SMART status for Hard disk $line is: \n\n” > /tmp/smart.out
smartctl -a $line|awk ‘NR>=4&&NR> /tmp/smart.out
smartctl -l selftest $line >> /tmp/smart.out
echo -e “=== END OF READ SMART DATA SECTION === \n\n” >> /tmp/smart.out
else
echo -e “The SMART status for Hard disk $line is: \n\n” >> /tmp/smart.out
smartctl -a $line |awk ‘NR>=4&&NR> /tmp/smart.out
smartctl -l selftest $line >> /tmp/smart.out
echo -e “=== END OF READ SMART DATA SECTION === \n\n” >> /tmp/smart.out
fi
done < "$Device_file"
#
# send output to the appropriate user
#
cat $Smart_Out | mailx -s "$emailsubj" $SendTo
logger $emailsubj
rm $Device_file $Smart_Out
Thanks. Could you upload it to github or one of those text hosting sites and link it? WordPress automatically changes a lot of characters.
Hi,
I think it has been spotted in a previous post but with another command, for this part of the article:
“Unfortunately, there
End of the comment was:
You can find out the advancement of your test using the command:
smartctl –capabilities /dev/sdX
It will show the advancement for your test in percentage.
Thanks for the tutorial, really helpful!
Cheers,
Clem
Tried setting up on CentOs 6.5, tried to restart the service using “/etc/init.d/smartmontools” restart but got an error.
“smartmontools” is not located in the “/etc/init.d/” directory, but smartd is. is getting the smartd service started enough to get smartmontools working?
Yup, should be.
You should include what ERRORS look like, so I can write a script which greps then emails sysadmin if errors found.
Thanks for publishing this article. I had few questions:
1. Shall I rely on smartctl -H to see of the device is in good health ? Or do I need to do further selftest, short or long test ? My aim is just find if the disk is fine or not for read and write. We have a high availability solution and we want to use this utility to failover to standby node in case of any issue with the disk.
2. Is health check with -H option or selftest or short test – are they handled by the device driver independently ? Or they consume some CPU cycles ? Any data read or write is involved to run these tests that takes CPU times ?
I cannot get the mail notification to work. After some tries I got sendmail to work from command line using a gmail account, but having set it to test using “-M test” it now generates a mail every 20 minutes but the subject is
Cron test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp
and the message is
/usr/share/sendmail/sendmail: 899: /usr/share/sendmail/sendmail: /usr/sbin/sendmail-msp: not found
Any idea what I need to do?
This is running on Linux Mint 17
Hello…
I’m trying the short test on an SSD drive but it looks like that is not ending! It freezes at 10% remaining and it doesn’t get’s to 0% remaining! I have to abort it with the “-X” flag!
I’m using:
sudo smartctl -t short /dev/sda
then when I issue:
sudo smartctl -l selftest /dev/sda
I get this, which is an aborted previous test:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Aborted by host 10% 14687 –
If I try to run again the command for the short test it says:
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.1.0-0.bpo.2-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, http://www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Can’t start self-test without aborting current test (10% remaining),
add ‘-t force’ option to override, or run ‘smartctl -X’ to abort test.
I want to use smarttool for C++ code, is there any exposed C/C++ api available
Thanks
Hari Shankar
Weekly long test is generating a temp warning as the drive temp reaches 46 degrees near the end of the test. Normal operating temp is 38 degrees. Short test isn’t a problem.
Perhaps weekly long tests are doing more harm than good?
Thanks for the information. I’ve managed to set everything up as indicated. Its a shame for me that am doing this only after my disk died.
Much appreciated for this blog.
i have question….can you tell me how to recover linux window without reinstalling if i got some issue in system?
Great Content!!
I lost an HD some time ago coz I wasn’t carrying about it.. lost 1tb of content x.x
cool.
I have noticed you don’t monetize shadypixel.com, don’t waste your traffic, you can earn extra cash
every month with new monetization method.
This is the best adsense alternative for any type
of website (they approve all sites), for more details simply search
in gooogle: murgrabia’s tools