Saturday, April 12, 2008

MP3 cutting adventures

I just tested out Medieval CUE Splitter version 1.0. It was broken, and it creates output files where the first few frames may be unplayable.

An MP3 file consists of series of frames and each frame always stores 1152 audio samples. If one wants to losslessly cut an MP3 file, they can do it at a frame boundary. However, the data for a frame may actually be in previous frames. (In the side information, right after the 4 byte frame header, there is a 9 bit value which tells how far back. This is used to implement the bit reservoir. For more information see this PDF on the MP3 format.) So if a cut happens right before a frame whose data is in a previous frame, that frame's data will be unavailable. At best, some data will be skipped. At worst, a poorly designed decoder will crash. In order to avoid this, an MP3 cutting utility must restructure the file after the cut.

Edit 1: I found that MP3Cutter 4 is broken in the same way. I'm also left wondering if it creates a VBR header when saving a VBR MP3.

Edit 2: mp3DirectCut 2.07 also simply cuts the file like that. At least it puts a VBR header on the output file if it's VBR.

Using mp3packer with the -b 320 -r switches one can transform an MP3 file into a 320 kbit file where all the frames are self-contained. I'm not certain if this works on all files, but it should work on all non-corrupt MP3 files. MP3 frames should always fit in 320 kbit frames. After this one can cut with any stupid MP3 cutting utility and get a valid MP3 file. However, when the files are decoded, 529x4 bytes go missing at the start of all files other than the first, apparently because most decoders have a 529 sample delay.

Edit 3: Another utility I've tried is pcutmp3. The name is short for properly cut MP3, and what it does is more proper, but even with self-contained frames the encoder delay can be around 2000, and some players such as the iPod ignore encoder delays that big. My goal is to have a smaller and more compatible encoder delay.

Edit 4: mp3splt 2.1 also cuts MP3 files in the same broken way.

Edit 5: If frames are made independent using mp3packer, and a LAME header with encoder delay set to 623 followed by the last frame of the previous file is prepended to all files other than the first, then the cut is lossless with mpg123 and Winamp.

I also see that the stupid cutting programs strip off the LAME part of the header, leading to 576 (LAME encoder delay) extra samples of silence at the start of the first file. That should be easy to fix by putting a LAME header there.

Note that the splitting process I propose should be entirely reversible. You might not end up with the MP3 file you started with, but you'd end up with the same MP3 data within it and it would lead to the same output. Lossless transformations should be reversible like that.

Friday, April 04, 2008

Another reason why the Windows console is worthless

If I view a batch file in a windowed text editor, I see é (lower case e with acute accent), but if I view view it from the command prompt I see Ú (upper case u with acute accent). If I run the batch file, the character is also interpreted as Ú, which means the command fails because it is given the wrong filename.

It seems this is happening because the console uses code page 850 while graphical applications use ISO-8859-1, and they're quite different. NTFS filenames are Unicode UTF-16 and they are translated to code page 850 for the console and ISO-8859-1 for the GUI. Sometimes, such as when pasting from the GUI to a console window, the translation works. In other situations, such as copying from the console window and pasting into the GUI, it fails. What a horrible kludge!

Fortunately, Cygwin doesn't have this problem.

UnxUtils are broken; Cygwin is great

Years ago when I wanted some Unix utilities in Windows I got them from UnxUtils. Since then I've deleted them one by one because of various bugs. For example, I just found that find(1) doesn't find the files in one directory from one starting point but finds them from one directory deeper. The find from Cygwin works fine, though one has to get used to using Unix-style paths in Windows with it.

UnxUtils actually has a newer version on the project downloads page which isn't mentioned on the home page. I could try that but Cygwin is more widely used, tested and kept up to date. There's also MSYS from MingW, but Cygwin is easier to install and update and it can also compile MingW binaries which don't depend on Cygwin.

If you want Unix tools in Windows I highly recommend Cygwin.