I wanted to capture video from an NTSC VHS tape that had been converted from a PAL tape. Of course, it would be better to capture from the original PAL tape, but I only had the NTSC tape and so I was forced to work from that.
The NTSC tape has some fields which have breaks in them. The area above the break is the next field, and the area before the break is the previous field. These fields cause a slight motion stutter, and motion is perfectly smooth without them. It's clear that they have been inserted to increase the field rate from PAL's 50 Hz to NTSC's 59.94 Hz.
Usually, the added fields occur every 6 fields. This means there are 5 original fields, 1 fake field, 5 original fields, and so on. However, the break in the field slowly moves down the field. When the break moves off the bottom of the field, then there are a few fields that are total duplicates. After that, the break moves into the top of the next field. Before that happens, it is necessary to skip ahead 7 fields once (outputting 6 original fields) to stay in sync with the inserted field. Another way to look at this is that there is a signal to go back one field which repeats at slightly more than 6 field times. It's not exactly 6 because NTSC uses 59.94 instead of 60 Hz, and so the ratio should be slightly less than 6/5.
Each inserted field flips the correspondence between PAL and NTSC odd and even fields, and in any case, PAL interlace can't really translate to NTSC interlace. (Imagine two overlapping combs with different teeth spacing.) Because of this, the converter had to deinterlace to 50 fps progressive, or more likely, just resize fields vertically while shifting them to compensate for vertical displacement due to interlace.
When outputting NTSC video, the converter bobbed the fields up and down to simulate interlace. However, this was done incorrectly. The image should bob up and down by one pixel at 480 lines or half a pixel at 240 lines, but instead, it bobbed up and down by one whole pixel at 240 lines. This finally made me give up on trying to recover interlaced video. Instead, I decided to capture at 320x480 and end up with 50 frames per second progressive video at 320x240.
The first processing step corrected for the bobbing. Using AviSynth, I added a 1 pixel border at the bottom of the top fields and cropped off 1 pixel from the top. This didn't lose any image data, because the top row in these fields was black.
Once the bobbing was corrected, I used AviSynth's RGBDifferenceFromPrevious function to collect data on inserted fields. By cropping the video so only a few lines at the top or bottom remain, I detected when the top part of the frame is the next frame or the bottom part is the previous frame. By using the function on the whole image, I detected when the switchover is in the vertical retrace interval, and frames are total duplicates. In all cases, it was necessary to crop off blackness, the very edges which are noisy, and the video head switching at the very bottom.
After a bit of experimentation, I chose to use data from frame tops, and to process it using a program which decides whether the duplicate frame occurs in 5, 6 or 7 frames from the previous one. It never occurs in 5 frames, but that capability allows the program to resynchronize if it chooses 7 when it should have chosen 6. After this, another simple program replaced the "7, 5" combinations with "6, 6" and counted the lengths of the spans of 6 that occurred before a 7.
The resulting data was good, but it had some glitches. I used a spreadsheet to work with it. There, I automatically removed some minor jitter and manually fixed a few larger glitches. When I tried to fit a line to the data, I found that it was actually a hockey stick curve with the bend at the start. This was probably because oscillators drifted during warmup and then stabilized. I considered trying to fit some kind of function to the graph, but minor fixes were sufficient.
Once I was satisfied with the data, I wrote another simple program which created a VirtualDub script. First, I had it keep the frames I wanted to remove, to ensure that I am indeed removing the frames which have tears in them. Then I created the final script which removed those frames.
After the video was finished, it was time to synchronize the audio. This can either be done by resampling the audio or changing the frame rate. I chose to resample the audio. For perfect sync, I could have used the data I generated to create a variable sample rate, but I instead just used a fixed sample rate based on a linear approximation. The errors were small enough to be unnoticeable.
Finally, it was time to encode the audio and video. I used a low (high quality) CRF in x264, because at 320x240 the video was quite sharp and I didn't want to degrade it.