View Full Version : Tivo Interface Froze but Telnet Still Working & No mfscheck Errors
Pete77
01-24-2007, 07:31 AM
I came back last night at 1am after around 3 days away and was too tired to watch any tv or look at my Tivo after a long drive.
This morning I turn on the television and find the Tivo television user interface completely frozen in the middle of a recording where the picture is also frozen on screen and no Tivo play bar or EPG data shows and nor will it go back to Tivo Central. But reverting to my PC shows that Telnet access is available and working normally and mfscheck reports a totally healthy hard disk status.
After exhausting all possible options I use the Telnet reboot command and the machine reboots as normal to Tivo Central. A check on recordings made over the period shows the Tivo UI interface freeze happened at 1.14am last night in the middle of a recording after which the recording is blank.
Now the Tivo has been rebooted it works as normal. Could this be a sign of imminent hard drive failure or has the normally reliable Linux software had a very rare headache and crash for reasons unknown?
blindlemon
01-24-2007, 07:39 AM
Worth keeping an eye on, but I suspect it's a one off. Are you using a UPS?
Pete77
01-24-2007, 07:53 AM
Worth keeping an eye on, but I suspect it's a one off. Are you using a UPS?
Just to further add to the mystery the daily call at 6.50am this morning shows as having succeeded and my Tivo now has data to Wednesday 14th Feb, even though the UI and the recording that hung would have still been frozen at this point.
I suppose this further increases the case for providing remote access to my Tivo via a 24/7 server on my network and attached to my router and using GotoMyPC or whatever as if this kind of freeze happened again while away (revealed by failure of DailyMail Jazz emails and inability to log in under TivoWeb) I could then Telnet to the box and reboot it that way remotely?
Yes I am using a UPS but it didn't do me much good when the power went off here for 19 hours at 4.15pm last Thursday afternoon after a circuit breaker at the local substation tripped out and the network operator didn't have adequate manpower available (due to all the storm damage and also the fact that they don't like working between midnight and 8am) to reset it until the following morning.
My UPS currently only supports my Tivo and Sky and Freeview boxes and powered switch and RF modulator (total load 70 watts according to my Maplin purchased energy monitor device) for 7 minutes after a power cut although it would probably be 20 minutes if I replaced the battery.
Ian_m
01-25-2007, 05:09 AM
Try running smartdrv on the drive to check for disk errors.
I had a TiVo that had occasional random hangs, at each hang the reallocated sector count went up by one, a clear indication the disk was on its way out. I wrote a script that logs the relevant SMART parameters to a log file to keep a check on the replacement drive(s). Went up to about an error count of 10 odd before leveling out, but I am watching....
BrianHughes
01-25-2007, 03:46 PM
Ian, I'd make sure I had a good backup ready - just in case.
Pete77
01-25-2007, 04:24 PM
Try running smartdrv on the drive to check for disk errors.
I had a TiVo that had occasional random hangs, at each hang the reallocated sector count went up by one, a clear indication the disk was on its way out. I wrote a script that logs the relevant SMART parameters to a log file to keep a check on the replacement drive(s). Went up to about an error count of 10 odd before leveling out, but I am watching....
Two unexplained hangs here to date.
Does that mean my 250Gb Samsungs only have another three months life left then?
They have done 19 months so far.
blindlemon
01-25-2007, 04:48 PM
Have you tried running a smartctl test on them yet?
Unfortunately the Samsung HA250JC now seems to be unavailable as the last UK supplier www.ultratec.co.uk have sold out :(
I still have a few in stock though... :up:
Pete77
01-25-2007, 05:34 PM
Have you tried running a smartctl test on them yet?
If I type smartctl at the Telnet prompt it just seems to tell me how to use it. But if I try some of the switches the same thing still happens? :confused:
blindlemon
01-25-2007, 05:44 PM
You probably need to install the updated version from this forum :)
Pete77
01-25-2007, 06:16 PM
You probably need to install the updated version from this forum :)
This is the disk health status report from smartctl provided by the Info function of TivoWebPlus 1.3.1 But are you saying I would get more info than this with the new enhanced smartctl?
File System/Disk Information
Filesystem Type Size Used Avail Capacity Mounted on
/dev/hda4 ext2 124M 27M 90M 23% /
/dev/hda9 ext2 124M 23M 95M 19% /var
/dev/hda:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
using_dma = 1 (on)
readahead = 8 (on)
geometry = 16383/16/63, sectors = 488397168, start = 0
drive state is: active/idle
Device: SAMSUNG HA250JC Supports ATA Version 7
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed
/dev/hdb:
multcount = 0 (off)
I/O support = 0 (default 16-bit)
using_dma = 1 (on)
readahead = 8 (on)
geometry = 16383/16/63, sectors = 488397168, start = 0
drive state is: active/idle
Device: SAMSUNG HA250JC Supports ATA Version 7
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed
RichardJH
01-26-2007, 06:51 AM
Pete you may be able to answer my problem. When I go to info in TWP i do not get any file/system disk information I get the following error
INTERNAL SERVER ERROR
--cut here--
action_info '/' ''
df: cannot read table of mounted filesystems: No such file or directory
while executing
"exec df -h -T"
(procedure "::action_info" line 285)
invoked from within
"::action_$action $chan $part $env"
("eval" body line 1)
invoked from within
"eval {::action_$action $chan $part $env}"
--cut here--
All the other bits of the info page respond correctly
blindlemon
01-26-2007, 08:51 AM
This is the disk health status report from smartctl provided by the Info function of TivoWebPlus 1.3.1 What version is that, and where are you running it from?
It should be in /var/hack/bin and should be at least version 5.1.9. Type
/var/hack/bin/smartctl -a /dev/hda
and
/var/hack/bin/smartctl -a /dev/hdb
to see the data on your drives.
Pete77
01-28-2007, 06:08 PM
What version is that, and where are you running it from?
It should be in /var/hack/bin and should be at least version 5.1.9.
I have installed the updated smartctl file dated 2003 rather than 2002 and rather larger in size - this is the one posted at http://archive.tivocommunity.com/tivo-vb/showthread.php?s=&threadid=153428. However I see from http://smartmontools.sourceforge.net/ that smartctl has continued to be subsequently developed but no one has ported those versions into a version suitable for running on the Tivo in recent times.
The overall health assessment of my two drives by this later Tivo compatible version of smartctl is reported as PASSED although I am slightly concerned about the largish numbers reported for "Hardware_ECC_Recovered" of 50,285,406 on HDA and 61,484,256 on HDB
However other values like Spin_Up_Time, Start_Stop_Count and Power_On_Hours disagree significantly between the two drives when they really ought to be virtually the same. The one value I might reasonably expect to often be different between the two drives is Temperature_Celsius as one drive is nearer to the Cachecard and power supply than the other. It does however seem reasonable for the Hardware_ECC_Recovered value to be significantly different but not Spin_Up_Time or Power_On_Hours.
Spin_Up_Time for HDA is 5824 hours and for HDB is 5056 hours. Start_Stop_Count is 248 for HDA but only 102 for HDB. Power_Cycle_Count is 247 for HDA and 102 for HDB and Temeperature_Celsius is 38 for HDA and 32 for HDB.
Any light you can shed on the information provided and especially the differing and apparently substantial Hardware_ECC_Recovered values would be appreciated.
blindlemon
01-29-2007, 05:19 AM
A quick Google search for Hardware_ECC_Recovered - eg. http://www.mail-archive.com/linux-raid@vger.kernel.org/msg06508.html - suggests this may not be indicative of problems for Samsung drives.
PhilG
01-29-2007, 05:23 AM
My Samsung (virtually brand new) also has huge (and rapidly increasing) numbers here
I'be just had an off-line chat with blindlemon and I am coming to the conclusion I'm just going to ignore them!
Pete77
01-29-2007, 06:34 AM
A quick Google search for Hardware_ECC_Recovered - eg. http://www.mail-archive.com/linux-raid@vger.kernel.org/msg06508.html - suggests this may not be indicative of problems for Samsung drives.
This post doesn't really seem to explain either way whether this error count is a problem though.
However I note that the overall health status of the disk is still pronounced as PASSED
As things like the number of Power on Hours, Spin Up Time and Power Cycle Count doesnt agree between the two drives (fitted at the same time 19 months ago) by a large margin one wonders how much faith one can have in any of these RAW readings. They should have very close or near identical values, unless one can believe that Samsung spends weeks bench testing some of their drives (but not others) before they are shipped - this seems rather unlikely.
PhilG
01-29-2007, 06:48 AM
I tend not to worry so much about the absolute values, rather how much (and how fast) they are changing in my Tivo
vBulletin® v3.6.8, Copyright ©2000-2009, Jelsoft Enterprises Ltd.