PDA

View Full Version : Hard Drive Monitor


rbautch
04-12-2008, 06:29 PM
Here is a script that detects errors on your Tivo hard drive(s) by:
1. Scanning your kernel log looking for errors indicative of hard drive failure.
2. Running smartctl and checking for S.M.A.R.T. status of "failed".
3. Running extended offline drive checks using smartctl.
4. Comparing the temperature of your hard drive(s) to a user-defined threshold.
5. Comparing the number of reallocated sectors to manufacturer thresholds.
If an error is found or a value exceeds a threshold, it writes it to a log, and sends it to your Tivo UI as a message.

Usage:
Run it as a shell script from bash without arguments. If the script detects a cron installation, it will automatically append your crontab to run drivemonitor.sh every night.

Note:
Features 3,4, and 5 above require a newer version of smartctl than the one that ships with stock Tivos. Jamie was good enough to compile one and attach it to Post #5 below (http://www.dealdatabase.com/forum/showthread.php?p=295690&postcount=5). I suggest replacing your existing smartctl in /bin with this one.

Tested on 6.2 Dtivo, but should work on all platforms.

http://www.mastersav.com/tivo_tweak/kernel-50.jpg

jwciv
04-12-2008, 07:46 PM
cool tool. I'm running 6.2a on a DSR7000, single seagate 300GB drive, here's the output - - look like what you'd expect?
root@lvrmtivo:/var> drivemonitor.sh
No hard drive errors were found in your kernel log.
Device open failed: No such device or address
Your primary drive hda S.M.A.R.T. satus is: PASSED
Your secondary drive hdb S.M.A.R.T. satus is: FAILED
Your secondary drive hdb S.M.A.R.T. satus is: FAILED
root@lvrmtivo:/var>

rbautch
04-12-2008, 07:54 PM
Not quite. It erroneously thinks you have a secondary drive (hdb). What version of smartctl do you have? (usually in /bin)

jwciv
04-12-2008, 07:57 PM
Not quite. It erroneously thinks you have a secondary drive (hdb). What version of smartctl do you have? (usually in /bin)

version 1.2

Jamie
04-12-2008, 08:30 PM
That's pretty ancient. I've been using 5.32 for quite a while -- looks like I built it back in 2005.

smartctl -H /dev/hdX is useful with this version, as well as smartctl -t short /dev/hdX. smartctl -a /dev/hdX displays a ton of info that you might find useful, typically including the temperature of the drive, the number of remapped sectors, etc.

jwciv
04-12-2008, 08:56 PM
That's pretty ancient. I've been using 5.32 for quite a while -- looks like I built it back in 2005.

smartctl -H /dev/hdX is useful with this version, as well as smartctl -t short /dev/hdX. smartctl -a /dev/hdX displays a ton of info that you might find useful, typically including the temperature of the drive, the number of remapped sectors, etc.

thanks, now I'm up to date with a version from this century
root@lvrmtivo:/var> drivemonitor.sh
No hard drive errors were found in your kernel log.
Your primary drive hda S.M.A.R.T. satus is: PASSED
No secondary drive was detected.
root@lvrmtivo:/var>

rbautch
04-13-2008, 12:52 PM
Thanks for the updated smartctl, Jamie. I'll plan to make some updates to the script to take advantage of those features. I just updated the script to be backwards compatible with older versions of smartctl too.

RandC
04-13-2008, 02:10 PM
Thanks for the updated smartctl, Jamie. I'll plan to make some updates to the script to take advantage of those features. I just updated the script to be backwards compatible with older versions of smartctl too.Just checked, Alphawolfs All-In-One file has 5.32 version of smartctl included.

Soapm
04-13-2008, 02:44 PM
Cool...

FamRoom-bash# ./drivemonitor.sh
No hard drive errors were found in your kernel log.
Your primary drive hda S.M.A.R.T. satus is: PASSED
No secondary drive was detected.

How do I make an CRON entry to run this say once a week just before the logs get wiped. I put the file in /enhancements

RandC
04-13-2008, 03:05 PM
How do I make an CRON entry to run this say once a week just before the logs get wiped. I put the file in /enhancements
Set the day of the week and execution time for drivemonitor before you clear your logs

# cron settings
# * * * * * command to be executed
# - - - - -
# | | | | |
# | | | | +----- day of week (1 - 7) (monday = 1)
# | | | +------- month (1 - 12)
# | | +--------- day of month (1 - 31)
# | +----------- hour (0 - 23)
# +------------- min (0 - 59)
# time is based on utc no time zone offset calculated

* * * * * ./drivemonitor.sh; echo "Ran DriveMonitor on `date`" >> /var/log/crontab.log

rbautch
04-13-2008, 03:14 PM
FYI, the info module of TivoWebPlus uses "smartctl -c" to check SMART status, which is compatible only with older versions of smartctl. Editing TivoWebPlus/info.itcl to change the argument from -c to -H fixes this. Not a huge deal, but I'll make the TWP developers aware of it.

Jamie
04-13-2008, 03:16 PM
Just checked, Alphawolfs All-In-One file has 5.32 version of smartctl included.Yes, I think I sent it to him for inclusion (link (http://www.dealdatabase.com/forum/showpost.php?p=271894&postcount=15)).

The issue may be that /bin is on the path before the hacks bin directory, for some people, so they get the old version tivo ships if they don't specify the full path.

Soapm
04-13-2008, 05:13 PM
Wow, something in the 9.3 update messed up my Joe command. This is what I get when I type root for my CRON config. I looked in /tivo-bin and there is no /etc there so the error appears to be correct. Thoughts how to fix this?

FamRoom-bash# root
Couldn't open '/tivo-bin/etc/joerc'

Omikron
04-13-2008, 05:45 PM
Wow, something in the 9.3 update messed up my Joe command. This is what I get when I type root for my CRON config. I looked in /tivo-bin and there is no /etc there so the error appears to be correct. Thoughts how to fix this?

FamRoom-bash# root
Couldn't open '/tivo-bin/etc/joerc'

joe looks for joerc in two places. First it looks for "/.joerc", then it looks for "/tivo-bin/etc/joerc/".

If you used an automated script to install your hacks then most likely it's using "/.joerc", which you simply need to copy over from your previous install.

rbautch
04-13-2008, 05:57 PM
Wow, something in the 9.3 update messed up my Joe command. This is what I get when I type root for my CRON config. I looked in /tivo-bin and there is no /etc there so the error appears to be correct. Thoughts how to fix this?

FamRoom-bash# root
Couldn't open '/tivo-bin/etc/joerc'This is not a support thread! Please respect the clearly posted moderator wishes: Dedicated to a clean and concise listing of the most current versions of all the hacks out there. NO SUPPORT REQUESTS OR DISCUSSION AND NO DTV HACKING TALK ALLOWED!! - Non-File posts will be summarily deleted without noticeIf you have any questions on cron, post in the newbie forum.

rbautch
04-13-2008, 11:51 PM
The script is now updated to monitor the temperature of your drive(s) and compare to a user-defined threshold. It also compares the number of reallocated sectors to manufacturer thresholds. These new features only work with the updated version of smartctl posted above. Thanks Jamie!

Soapm
04-14-2008, 01:54 AM
Thanks again Russ.

FamRoom-bash# /enhancements/drivemonitor.sh
No hard drive errors were found in your kernel log.
Your primary drive hda S.M.A.R.T. satus is: PASSED
No secondary drive was detected.
Your primary drive (hda) temperature is 43 degrees Celsius.
Number of reallocated sectors on your primary drive (hda) is 0. That's good.

rbautch
04-27-2008, 08:28 PM
I updated the original post with a new version that:
- Runs smartctl extended offline self tests on all drives.
- If the script detects a cron installation, it will automatically (after prompting) append your crontab to run drivemonitor.sh every night.

bnm81002
04-27-2008, 09:34 PM
my drive status showed that it failed and not passed on both my receivers, what can I do to have both drives be passed? thanks

Soapm
04-28-2008, 02:06 AM
Thanks again Russ...

Run drivemonitor.sh again after two hours for results.
Commencing extended offline self test for hda now.
***
Found a crontab named "root" in
Would you like to append it to run drivemonitor.sh every night? [y/n]: y
/enhancements/drivemonitor.sh: dirname: command not found
Appending crontab to run drivemonitor from
Done!

rbautch
04-28-2008, 05:15 PM
You can safely ignore the error. Looks like you don't have the dirname utility on your Tivo. I'll take it out of the script since it's not a stock Tivo utility.

rbautch
05-03-2008, 11:46 PM
You can safely ignore the error. Looks like you don't have the dirname utility on your Tivo. I'll take it out of the script since it's not a stock Tivo utility.Made this change. Now uses awk to echo the crontab path if dirname fails. Cosmetic fix only.

ronsch
06-27-2008, 11:35 AM
That's pretty ancient. I've been using 5.32 for quite a while -- looks like I built it back in 2005.

smartctl -H /dev/hdX is useful with this version, as well as smartctl -t short /dev/hdX. smartctl -a /dev/hdX displays a ton of info that you might find useful, typically including the temperature of the drive, the number of remapped sectors, etc.

I don't suppose there's a updated binary for Series 1 ?

Rorschach
08-20-2008, 11:52 AM
Is there anything I can do when smartctl spots a bad area on the disc? I got a Pre-Tivo Central message informing me:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 40% 6092 -
# 2 Extended offline Interrupted (host reset) 80% 6092 -
# 3 Short offline Completed: read failure 60% 1 705085
# 4 Short offline Completed without error 00% 0 -

Is there a utility where I can lock out the bad spot at 705085?

rbautch
08-20-2008, 12:03 PM
Try spinrite. This forum has lots of info on it.

Rorschach
08-21-2008, 12:09 PM
Is there a way to clear the logged data so that i don't keep getting a Pre-Tivo Central message every day informing me of the one read error that occurred four days ago? Apparently subsequent daily runs did not have the same result, but every morning I get a new message about the old test.

rbautch
08-21-2008, 12:21 PM
Is there a way to clear the logged data so that i don't keep getting a Pre-Tivo Central message every day informing me of the one read error that occurred four days ago? Apparently subsequent daily runs did not have the same result, but every morning I get a new message about the old test.Yes, just use the "clear" argument when you run the script: ./drivemonitor.sh clear