Friday, February 06, 2015

SDL 2 text input and Emscripten

SDL 1 used SDL_EnableUNICODE() to enable and disable text input. It started off disabled, and you needed to call SDL_EnableUNICODE() if you depended on the unicode member of SDL_keysym.

SDL 2 has a different text input API which is designed to accommodate international users and touchscreen devices. There, SDL video initialization turns on text input if an on-screen keyboard is not needed. This is normally harmless, but it causes a problem in the Emscripten port. 

Normally, many keys trigger browser actions. Those actions should be prevented if you want to use the same keys in an SDL application. In Firefox, it is possible to prevent browser actions from keypress events, but in Chrome, Safari and IE they need to be prevented in keydown events. However, preventing default actions from keydown events prevents keypress events which are needed to get text characters.

If you need to prevent browser actions and don't need SDL 2 text input, simply disable text input after initializing SDL video:
if (SDL_IsTextInputActive()) SDL_StopTextInput();

You can find more information about these JavaScript events on quirks.org. They also have a test page which you can use to examine their behaviour on different browsers.

Tuesday, February 03, 2015

Fixing the hard problem in Em-DOSBox using Emscripten emterpreter sync

If you just want to use this

Use Emscripten incoming until a new version is released, after 1.29.4. Configure with: ./configure --enable-sync --with-sdl2 . The dosbox.html.mem file that is produced must be in the same directory as dosbox.js. The .mem file is big, but will compress well. Ensure your web server can serve files in compressed format. Ideally pre-compress them so they don't need to be compressed repeatedly.

Technical explanation

I previously wrote about the hard problem with porting DOSBox to Emscripten. Basically, DOSBox cannot easily work as one Emscripten main loop. I had an idea about how I could modify functions to make them resumable, but it would be messy and many functions would need modifications.

Fortunately, around the same time I learned about a new Emscripten feature: emterpreter sync. Instead of compiling to JavaScript, Emscripten can compile functions to a bytecode, and include an interpreter for that bytecode. Code running in emterpreter can be interrupted using emscripten_sleep(). This will call JavaScript setTimeout() and stop execution. The timeout function will then restore state as if the emscripten_sleep() call returned.

There is a catch here. Code running in emterpreter will run much more slowly than asm.js JavaScript. It is far too slow for DOSBox. The solution is to only use emterpreter for functions which could be interrupted by emscripten_sleep(). This means all functions that may be on the call stack when calling emscripten_sleep(). JavaScript functions must not be in the call stack, because their state cannot be resumed.

Once again, the way DOSBox is structured is a problem. The CPU interpreter can recursively call itself, leading to emscripten_sleep(), but emterpreter would make it too slow. A reasonable compromise is possible there: preventing sleep during those nested calls. They are typically CPU exceptions which should return quickly, so there is no need for sleep there.

The main remaining problem is the DOSBox paging code. It recursively runs the CPU interpreter, and only returns when execution returns to where the page fault occurred. This is a problem if execution never returns there, or another page fault happens and they don't return in a last in first out order. This is also a problem with ordinary DOSBox, but it is worse with Em-DOSBox. DOSBox would just run slowly and Em-DOSBox would hang the browser. A timeout check is used to prevent browser hangs.

Finding all the functions which require emterpreter was made easier by linking with -profiling, then using csplit dosbox.js '/^function /' '{*}' to split up functions and using grep to search for calls. Using console.trace() at the start of emscripten_sleep() is also helpful. If an abort occurs after a resume, the previous backtrace will show what function requires emterpreter. Note that -O3 is required, because otherwise functions may have too many local variables for emterpreter.

Originally emterpreter only supported a blacklist of functions which should not use emterpreter. That was unusable in this situation due to the number of functions and changing name mangling of some. Alon Zakai helpfully added a whitelist, so functions which need emterpreter can be listed instead. Right now, there are 40 functions on that list. That's a small number compared to the thousands of functions in Em-DOSBox, but manually transforming all those functions to make them resumable would be a lot of work. It would also complicate use of new changes from SVN or applying of DOSBox patches. Based on performance and compatibility with DOS games, emterpreter sync was the right choice.

Sunday, January 25, 2015

The DOSBox FPU emulator and ole2disp.dll

Running Netscape 3 in Windows 3.11 caused Em-DOSBox to fail with an "FPU stack underflow". At first I couldn't reproduce this in DOSBox in Linux, but then when I recompiled with --disable-fpu, I reproduced it.

DOSBox has two FPU (floating point unit) emulators. One is used by default when running DOSBox on x86 hardware. It uses actual x86 FPU instructions, and it should provide the full 80-bit long double precision. The other one does not require x86 hardware, and uses standard doubles. That means it does not give the full precision one would expect from a real FPU.

In OLE2DISP.DLL 2.3.3027.1, there is a check for precision loss. The code loads a 64-bit integer using FILD, stores a 10 byte BCD number using FBSTP, and then tests the last four digits. If the digits aren't correct, it pops another value, causing the stack underflow.

Normally, this would cause an FPU underflow exception to be noticed by Windows, but DOSBox doesn't pass on those exceptions and instead quits emulation. This is of course an inaccuracy in CPU emulation. DOSBox isn't a very good general purpose x86 CPU emulator, partly because it is a hybrid of an operating system and CPU emulator. DOSBox is designed for running old games and good at that, but not general purpose emulation. I'm tempted to try to port Bochs to make a better general purpose emulator available. For now, I simply disabled the FPU stack underflow check, because that allows many Windows 3.x apps to work.

Here is an image of execution diverging after the test. I used this to find the location where the test was. Note that due to relocations, searching for this in files can be tricky. The image at the right is from the very helpful http://ref.x86asm.net/coder32.html table.



Finally, here's some x86 assembler code performing this test. This fails in DOSBox 0.74 in 64 bit Linux, but works in my SDL 2 branch, which is based on r3869. This is probably due to r3851. I hadn't written a DOS assembler program in so long, so this was fun:

; FPU test like OLE2DISP.DLL 2.3.3027.1
; Build using: nasm ole2disp.asm  -o ole2disp.com
segment code
    org 100h

; This is the test
    wait
    fild qword [input]
    wait
    fbstp tword [output]
    nop
    wait

    mov dx, header
    mov ah, 9
    int 21h

; This displays output
    mov cx, 10
    mov si, output+9
    std
outloop:
    mov al, [si]
    shr al, 1
    shr al, 1
    shr al, 1
    shr al, 1
    call outdig
    lodsb
    call outdig
    loop outloop
    int 20h

; Display a single hex digit.
outdig:
    xor ah, ah
    and al, 0fh
    mov bx, hex
    add bx, ax
    mov dl, byte [bx]
    mov ah, 2
    int 21h
    ret
 
segment data
input:   times 6 db 0
         db 0dfh, 00dh
output:  times 10 db 0
hex:     db "0123456789ABCDEF"
header:  db "FILD, FBSTP FPU test like OLE2DISP.DLL 2.3.3027.1", 13, 10
         db "00999517642299539456 is correct result. "
         db "Last 4 digits of following must match:", 13, 10, '$'




Friday, January 09, 2015

Introducing an SDL 2 version of Em-DOSBox


When I first got my Emscripten port of DOSBox working, I was happy with the sound. It was not perfect. There were occasional small glitches at some CPU intensive moments. However, it was perfect most of the time in Firefox and pretty good in Chrome. (IE 11 does not support sound and IE 12 will hopefully support it in the future.) Then Firefox discontinued its Mozilla Audio Data API, and Emscripten began using the Web Audio API in both. That made Firefox sound significantly worse. When I recompiled Em-DOSBox with a recent version of Emscripten, sound was very bad.

I always had doubts about the way Emscripten's SDL plays audio via the Web Audio API. I don't think AudioBufferSource nodes are designed to be played one after another in a continuous stream. If it worked well, I wouldn't care about theoretical correctness, but it doesn't seem to work very well, so this may be a problem. I think the obvious way to play continuous sound from JavaScript is using a ScriptProcessorNode. (Those are depreciated, but their replacement, Audio Workers, are not implemented yet in browsers.)

[ Update: The SDL 2 version described below is old. SDL 2 is now part of Emscripten ports. To use it, simply add -s USE_SDL=2 to the command line arguments and Emscripten will take care of the rest. ]

I just found that SDL 2 has been ported to Emscripten. It is an actual port of SDL itself, not an SDL-compatible library like what is currently found in Emscripten. Looking at its audio code, I saw it using ScriptProcessorNode, and I decided to give it a try.

There is DOSBox SDL 2 patch by NY00123. It's based on a later version of DOSBox, but Git makes dealing with such things fun. First I merged with that revision of DOSBox, and then I merged the patch.

At first, sound was totally terrible. Then I found that the SDL callback needed float data, because ScriptProcessorNode needs floats. When I made that change, the sound was pretty good. It still isn't as good as old Firefox with the Audio Data API, but I think it's acceptable.

There is a problem with red and blue being swapped, and DOSBox won't run in IE, but SDL 2 for Emscripten definitely shows some promise. It may even be faster than Emscripten SDL, because more of it consists of C code which gets compiled into asm.js. Here is the Em-DOSBox branch.

The hard problem with porting DOSBox to Emscripten.

Getting DOSBox to successfully run many programs in a web browser wasn't hard, thanks to Emscripten. Improving performance was a bit harder. Here I'm going to describe the hardest problem, which still remains unsolved. Because of it, some programs cause web browsers to hang, and the interactive command prompt is unusable.

JavaScript code must return to the browser regularly. That's the only way the browser can regain control so it can update the display and handle new input. If JavaScript code doesn't return, the page or the whole browser appear to hang. The script may be producing output, but the user won't see it until the browser regains control. There isn't any function you can call to let the browser do its work. Your functions literally must return.

DOSBox emulates a PC running DOS using a mix of x86 assembly running under CPU emulation and C++ code running on the host. This can result in deeply nested calls. Here is the call stack from a program reading from the keyboard via the DOS device CON:

#0  DOSBOX_RunMachine () at dosbox.cpp:244
#1  0x000000000040e1f4 in CALLBACK_RunRealInt (intnum=22 '\026')
    at callback.cpp:106
#2  0x00000000004a1ed5 in device_CON::Read (this=0x3a2c430,
    data=0x7fffffffa15d "", size=0x7fffffffa12a) at dev_con.h:66
#3  0x00000000004a2f9b in DOS_Device::Read (this=0x3a4c1e0,
    data=0x7fffffffa15d "", size=0x7fffffffa12a) at dos_devices.cpp:67
#4  0x00000000004a73b7 in DOS_ReadFile (entry=0, data=0x7fffffffa15d "",
    amount=0x7fffffffa176) at dos_files.cpp:371
#5  0x000000000049e429 in DOS_21Handler () at dos.cpp:196
#6  0x00000000004073cf in Normal_Loop () at dosbox.cpp:135
#7  0x00000000004077bb in DOSBOX_RunMachine () at dosbox.cpp:244
#8  0x000000000040e1f4 in CALLBACK_RunRealInt (intnum=33 '!')
    at callback.cpp:106
#9  0x00000000006a8ecc in DOS_Shell::Execute (this=0x3a4c2c0,
    name=0x7fffffffbaf0 "debug", args=0x7fffffffcbe5 "") at shell_misc.cpp:492
#10 0x00000000006a0613 in DOS_Shell::DoCommand (this=0x3a4c2c0,
    line=0x7fffffffcbe5 "") at shell_cmds.cpp:153
#11 0x000000000069d96f in DOS_Shell::ParseLine (this=0x3a4c2c0,
    line=0x7fffffffcbe0 "debug") at shell.cpp:251
#12 0x000000000069ded8 in DOS_Shell::Run (this=0x3a4c2c0) at shell.cpp:329
#13 0x000000000069e8d2 in SHELL_Init () at shell.cpp:653
#14 0x00000000006978a8 in Config::StartUp (this=0x7fffffffddc0)
    at setup.cpp:853


There you can see the program running at #6. Normal_Loop() usually keeps calling the CPU emulator to run x86 code, but some instructions cause the emulator to quit, returning a value that tells Normal_Loop() to call a different function. In this case, the CPU emulator encountered int 21h, the main way to access DOS services. That is why Normal_Loop() called DOS_21Handler() at #5. After that, things are happening directly via C++ code, without CPU emulation. Then at #2, device_CON::Read() is calls int 16h (22 decimal), the BIOS interrupt for the keyboard. It calls either ah=0 or ah=10h functions, both of which wait for a key to be pressed and then return its value. The interrupt handler is implemented in x86 code, so you see DOSBOX_RunMachine() at #0. There is also a loop in device_CON::Read(), which keeps calling int 16h until it has the requested number of characters.

Such inner loops at #2 and #0 are not compatible with JavaScript. It's not possible to just keep waiting for input like that. Instead, the functions need to return, and then run again in the next iteration of the Emscripten main loop. That would involve re-establishing the entire call stack, with parameters and local variables.

Actually, my port has a shortcut. That backtrace is normal DOSBox running in Linux. My port establishes the Emscripten main loop at #7, in DOSBOX_RunMachine(). Because of that, there is no need to worry about #14 through #7. Because of this, when the running program exits, you can't get back to the DOS prompt. That's okay for now, because the interactive command prompt can't work anyways. It similarly gets stuck in a loop..

This is not impossible to fix, but I can't imagine a nice elegant fix yet. Adding code to re-establish that call stack on the next main loop iteration would be messy. It would also degrade performance. Maybe it would be possible to strip out DOS emulation and run FreeDOS instead? Currently DOSBox does not have a disk controller and relies on its DOS emulation to access files. The DOXBox-X branch adds an IDE controller.

Monday, January 05, 2015

Access to physical disk devices in Windows 7

I just noticed that PhotoViewer, the program for putting photos on the first digital photo frame I hacked, did not need to run as Administrator. It simply opens the drive using CreateFileA() and a path like "\\.\D:". I could use the same sort of path to open another USB storage device, so Windows isn't somehow recognizing the photo frame and allowing this for compatibility.

The CreateFileA documentation tells you to use a path like "\\.\PhysicalDrive1", but that only works for me if run as Administrator.

There is a difference between the two paths: "\\.\D:" is a partition and "\\.\PhysicalDrive1" is an entire drive. However, if the drive is not partitioned, the two are effectively the same.

Cygwin works the same way. The physical drive is "/dev/sdb", requiring Administrator access, and the partition is "/dev/sdb1", not requiring administrator access. The first partition exists even if the drive is not partitioned, and then it is the same as the whole drive.

Opening for writing can fail sometimes if the device is in use, but otherwise, even a non-admin user can write to sectors. I can have a FAT32 formatted USB drive open in an Explorer window and simultaneously write to sectors altering the files there. It does not work with an NTFS formatted drive, so this is not as insecure as it might seem at first.


Saturday, December 20, 2014

The earliest sunset is not on the winter solstice

I thought the earliest sunset occurs on the winter solstice. That's not true. As you can see here, the sun set at 5 PM for the first half of December, and started setting later after that. It will continue to rise later every day, until a plateau in early January.

Monday, December 01, 2014

Music visualization using an RGB lamp

I wrote Winamp and Audacious music visualization plugins for my RGB lamp. The code is published on GitHub. Pitch is represented by colour, and loudness is represented by intensity. This post explains the method in detail.

The code starts with an array of FFT bin values. This is provided by the music player via its visualization plugin API. If only waveform data was available, the FFT could be computed using the FFTW library.

Each value in that array corresponds to sound intensity in a particular range of frequencies. Those ranges are all of equal width in hertz, but humans perceive pitch in a roughly logarithmic way. This means perceived pitch varies a lot between entries at the start of the array, and a little at the end of the array. This is handled by makeramps.rb. The frequency at the midpoint of each bin is mapped to pitch using the MIDI pitch formula. This allows pitch to be mapped to colour in a sort of linear way. Here's how the various FFT bins sum into the different colours:
The lowest frequency bin sums to red. Increasing frequencies sum less into red and more into green. The pitch midpoint sums into green. Then increasing frequencies sum less into green and more into blue. Finally, the highest frequency sums into blue. The code uses a single table called green_tab, with values corresponding to the fraction of the bin that is to be summed into green. Before the pitch midpoint, one minus the table value times the bin is summed into red, and after the midpoint the same is summed into blue.

Before summing, bins can be scaled based on human ear audio sensitivity at that frequency. This ensures that frequencies where the human ear is less sensitive make less of a contribution. I found an ISO 226 Equal-Loudness-Level Contour Signal script for MATLAB which can run in GNU Octave. This isn't very important though.

When summing bins, it is important to sum power, not amplitude. Summing amplitudes does not make sense, but fortunately they are easy to convert to power: just square them before summing. Afterwards, they are converted back to amplitude via square root, because they are to be used for setting PWM values.

If the output of this algorithm was used to directly set lamp brightness, there would be a lot of rapid brightness changes. I find this unpleasant, so I added some smoothing. The code calculates an exponential moving average for each colour, but with more smoothing for decreases and less smoothing for increases. The faster response to increases ensures the lamp will appear responsive to loud sounds.

After all this, the colours still did not appear to be in balance. To fix this, I calibrated using pink noise, and created scale factors for red and blue which cause white light when playing pink noise. It may be better to calibrate using music, but that would be more difficult and the resulting calibration might be biased toward the music used to calibrate it.

I'm posting about this because of how I'm satisfied with the end result. I haven't done any tweaking of this algorithm in many months, and I still really like the effect.

Recently I have been experimenting with an on-screen visualization plugin inspired by this. Each moment is displayed on a horizontal line with stereo determining horizontal position. Pitch sets colour, intensity sets brightness, and very intense sounds cause a wider area to be coloured.  Old data scrolls down. This part is still a work in progress, and I haven't published that code yet.