PDA

View Full Version : Oh, this is weird!


lrhorer
03-07-2009, 02:34 PM
To say the least. I have no idea how this could have happened. I recorded a show on my THD and transferred it to the Video Server. I lopped off the station intro and the trailers after the end of the program using Video Redo and saved it as a .mpg file. SOP. The video looks great, and I chalk up another addition to my video library. I went to transfer the video back to one of my S3s, just to make sure there are no fatal transfer errors, as sometimes happens, and although the program transfers in toto, the video is garbage! It has serious pixellization, video skips, and audio dropouts every two or three seconds throughout the entire program.

WTF?!?

My first thought is the high bandwidth bug experienced prior to the release of version 8 had crept back in to the code. It's been quite some while since I noticed an upgrade on the S3s, though, so I was puzzled. Just to make sure, I transferred the program to the other S3. It's perfect! Now I was worried maybe the 1st S3 (a 1T purchased from Weaknees) was going bad. I double-checked, and the program still looked lousy, but other transferred programs are fine. Just for giggles, I transferred the program to the 1st S3 again, and lo and behold: it's flawless.

I'm flabbergasted. The read routines from the RAID array guarantee an acurate read or else the read fails and the process either halts or branches to an errror handler. The transfer to the TiVo is via TCP, which guarantees intact data transfers. The write routines on the TiVo are essentially identical to the read routines on the RAID array in terms of guaranteeing data integrity. The encryption routines in the TiVo could certainly suffer an error, but with those as with the rest of the routines, while a random error is possible, how could one suddenly encounter tons of errors thoughout a single hour and a half long program transfer, but not in any other?

Yoav
03-07-2009, 03:09 PM
To say the least. I have no idea how this could have happened.

Well, I don't *know* what happened, but your conclusions on 'guarantees' are not as 'strong' as they should be:

RAID 'guarantees' that the raid driver will perform a CRC check before transmitting data up. But nothing says that the raid driver may not be buggy and once in a rare while 'munge' the data.

The same holds for TCP (except that TCP also provides VERY weak checks, which are trivially spoofable by an attacker -- probably not the case here).

Between the RAID and the sending via TCP, there is a program that accepts the data, does stuff with it, and sends it on -- so data can easily get corrupted here.

The write routines on the tivo make no guarantees. Linux writes do not have ANY guarantee that the hard drive correctly wrote the data. Only that if an error is reported by the hard drive, that the writer will immediately know about it (and only if he doesn't perform asynchronous writes -- I dont know what tivo does).

You point out correctly that the tivo could fail in processing the data (tons of code involved there -- and it wouldn't be the first time a bug gets tickled).

And something about bugs: once they are tickled, they often continue until the program/transfer is done -- for example if it accidentally added one in some computation, the rest of the stream would have incorrect computations all the way to the end until there's some reset....

So yeah. I have no clue what happened. But I also know that tivos often get into 'weird' modes that are only fixable via reboots, so it doesn't surprise me to hear that you may have found yet another bug. I haven't seen it happen though. My first instinct would actually be to blame some sectors on your tivo hard drive (that is now being avoided which is why you're not seeing it any more)... but it could also be a bug or a truly random 'beta particle hit the memory chip at the wrong time'.

ggieseke
03-07-2009, 03:26 PM
I don't know how the heck the internal encryption on the TiVo works, but if a random cosmic ray strike or some kind of EM interference screwed with it just as it was setting up the encryption key I can imagine that the entire file would be hosed.

Probably random bad luck (unless it happens again). :eek:

lrhorer
03-08-2009, 12:49 AM
Well, I don't *know* what happened, but your conclusions on 'guarantees' are not as 'strong' as they should be:

RAID 'guarantees' that the raid driver will perform a CRC check before transmitting data up. But nothing says that the raid driver may not be buggy and once in a rare while 'munge' the data.
Yes, but but hundreds (maybe thousands) of errors in a 10GB string of data, but none in any other string? At the RAID layer, the only thing unique about the transfers are the file itself. Subsequent access of the file and others produce no errors. A single error in a 10G file transfer isn't supposed to happen, but is quite understandable. So is a totally corrupt bit stream. A 10G file which is mostly intact but spattered with thousands of errors is another matter.

The same holds for TCP (except that TCP also provides VERY weak checks, which are trivially spoofable by an attacker -- probably not the case here).
No, it's a secure network, and a spoof wouldn't likely cause a splattering of errors in a single connection. Admittedly, this would be a unique connection, but there is nothing fundamentally different about the way packets are handled from one TCP connection to the next; nothing which could be corrupted at the connection layer which would cause sporadic errors throughout the connection lifetime, but not in any other. The likelihood of a CRC failure in one TCP packet is quite low. The likelihood of hundreds or thousands of CRC failures in a string of a few million packets is so low as to be essentially zero. One would have to wait a time far more than the age of the universe to see such a string of errors. Of course, if this really is a problem which will only occur every 100 trillion years or so, I'm not going to worry about it. :)

Between the RAID and the sending via TCP, there is a program that accepts the data, does stuff with it, and sends it on -- so data can easily get corrupted here.
Well "easily" is a relative term. Of course any program can go haywire, but going sporadically haywire is another matter. More to the point, there is very little which would be unique to the single transfer session. A unique thread is spawned to handle the transfer, no doubt, but what about that thread could ordinarily cause a small, very intermittent, but persistent bunch of errors being passed to the TCP stack? Neither the transfer program (pyTivo / Java), the RAID server, or the TiVo have been restarted since many days well before this event. The stream is not being transcoded, so the "does stuff with it" bit of code is pretty trivial.

The write routines on the tivo make no guarantees. Linux writes do not have ANY guarantee that the hard drive correctly wrote the data. Only that if an error is reported by the hard drive, that the writer will immediately know about it (and only if he doesn't perform asynchronous writes -- I dont know what tivo does).
Yes, and drives most certainly fail or can have bad sectors. This is not a new TiVo however, and again there is nothing unqiue about the string of sectors chosen for writing by a particular thread. The TiVo has been full and partially emptied many times without such problems ever occurring before. (Discounting the problem prior to release 8 which produced similar errors on any data stream above 12 Mbps, reproducible on every S3 TiVo.)

You point out correctly that the tivo could fail in processing the data (tons of code involved there -- and it wouldn't be the first time a bug gets tickled).
Yes, but why for only one program, and for the entire duration of that program? There is nothing of which I can think which would synchronize the tickling of a bug with the program boundaries. The fact I can't think of it doesn't necessarily mean it doesn't exist, of course.

And something about bugs: once they are tickled, they often continue until the program/transfer is done -- for example if it accidentally added one in some computation, the rest of the stream would have incorrect computations all the way to the end until there's some reset....
Yes, but why sporadically? I do not think the error was related to a GOP boundary, although I suppose it's possible. I'll check. A stream of totally corrupted frames is understandable, but a smattering of mildly corrupted frames every few seconds is a head scratcher. If it turns out every I frame or every B frame #10 in every GOP is corrupted or something like that, then it makes some sense.

So yeah. I have no clue what happened. But I also know that tivos often get into 'weird' modes that are only fixable via reboots, so it doesn't surprise me to hear that you may have found yet another bug.
'Except this hasn't required a reboot. It started the very first second of the program in question, continued all the way to the very last second of the film, and apparently quit the moment the transfer was done. There was a transfer immediately preceding this program (this program being in queue to be transferred), and only a few hours passed between this transfer and the next.

I haven't seen it happen though. My first instinct would actually be to blame some sectors on your tivo hard drive (that is now being avoided which is why you're not seeing it any more)... but it could also be a bug or a truly random 'beta particle hit the memory chip at the wrong time'.
All of those are pretty thin hypotheses given the nature of the evidence at hand (although no thinner than any others I have considered). Obviously something caused this, but if it does not recurr, I'm not going to worry too much about it. I'm just wracking my brain over a likely explanation.

lrhorer
03-08-2009, 12:55 AM
I don't know how the heck the internal encryption on the TiVo works, but if a random cosmic ray strike or some kind of EM interference screwed with it just as it was setting up the encryption key I can imagine that the entire file would be hosed.
Yes, but then the entire stream would be complete garbage. Instead, what we have is a stream of corruption of significantly less than 1%.

Yoav
03-08-2009, 01:14 AM
Yes, but then the entire stream would be complete garbage. Instead, what we have is a stream of corruption of significantly less than 1%.

I'll simply say:

There is absolutely no 'intended design behavior' that would lead to this. I can think of MANY possible bugs, some listed earlier, that could cause this, none of which you accept ( I could discount each of your replies, but I don't see the point). Sorry I've been unable to help you. If it sounds like a one-off that hasn't repeated, I'd move on. IF it happens again, then it is probably worth investigating more.

txporter
03-08-2009, 12:47 PM
Ok, this doesn't sound exactly the same...but maybe something about the recent software change is causing the change. I have been pushing a lot of mp4s to my tivos since wmcbrine has added it to his fork of pytivo. At any rate, I will sometimes have what looks like corrupted or poorly encoded video when playing back. It has what looks like macroblocking and the lip sync can get screwed up. I can "fix" this by either rewinding to the beginning of the video (resetting the buffer, sort of) or doing some sort of trickplay.

It doesn't sound exactly the same as your problem...but who knows?

Jason

P.S. I forgot to add, I have started to see what I believe is a similar issue with live tv playback. I am seeing interlacing defects and stuttering (slight) and sometimes also lip sync issues. I can many times fix this by changing channels or doing trickplay.

Rdian06
03-09-2009, 12:42 PM
To say the least. I have no idea how this could have happened. I recorded a show on my THD and transferred it to the Video Server. I lopped off the station intro and the trailers after the end of the program using Video Redo and saved it as a .mpg file. SOP. The video looks great, and I chalk up another addition to my video library. I went to transfer the video back to one of my S3s, just to make sure there are no fatal transfer errors, as sometimes happens, and although the program transfers in toto, the video is garbage! It has serious pixellization, video skips, and audio dropouts every two or three seconds throughout the entire program.

...

Just for giggles, I transferred the program to the 1st S3 again, and lo and behold: it's flawless.


It sounds like your Tivo hiccuped while it was demuxing the incoming MPEG2 container which probably messed up the timestamps for the audio and video streams.

Remember that when you do TTCB, the incoming MPEG2 is demuxed into separate audio and video streams for storage in Tivo's native ty format.

In playing around with ffmpeg demuxing and remuxing, I've done similar things on my PC where because of errors in the timestamp handling code the resultant video is messed up as you describe.

I'd consider it a fluke (like when my Tivo freezes occasionally when I'm settings the channel while creating a manual time based recording. I don't have CableCards in my S3 so I set time based recordings for the clear QAM OTA equivalents and I've locked up the unit at least 6 times this way in the year I've owned it. Most recently just a few weeks ago.)