Thursday, October 12, 2006

Recovering a corrupted .wav file with Python

Last night, I lost the recording of the Procrastination Cure workshop I was giving, due to a crash of PowerGramo, the recording software I use with Skype.  Luckily, a participant had a pair of recordings they made with Gizmo, which was what he was using to call in to the teleconference.

Unfortunately, the reason he had two recordings was because his machine had crashed, about two and a half hours into the workshop, so the first recording was corrupted and couldn't be played back in Media Player, and I tried using several audio editors to recover it, but it was a no-go.  The only thing I found on the net about recovering corrupted .wav files was this Gizmo FAQ answer, so it's apparently a common problem.  I tried the audio editor they recommended there, but it didn't work either.

So it was time to pull out the big guns.  I found this handy .wav format guide via Wikipedia, and then whipped out my Python interpreter.  After a little bit of playing around, I figured out that the problem was the program writing the file simply hadn't filled in the length fields in the header structure, and so I manually recovered the file, using steps roughly equivalent to this:

from struct import pack, unpack

wav_header = "4si4s4sihhiihh4si"

f = file(filename, 'rb+')
data = list(unpack(wav_header,
assert data[0]=='RIFF'
assert data[2]=='WAVE'
assert data[3]=='fmt '
assert data[4]==16
assert data[-2]=='data'
assert data[1]==data[-1]+36,2)
filesize = f.tell()
datasize = filesize - 44

data[-1] = datasize
data[1]  = datasize+36
f.write(pack(wav_header, *data))

And now the file plays. Yay! Unfortunately, the recordings aren't of very good quality, but there's nothing that Python can do about that, as far as I know. ;-)