Sunday, January 25, 2015

The DOSBox FPU emulator and ole2disp.dll

Running Netscape 3 in Windows 3.11 caused Em-DOSBox to fail with an "FPU stack underflow". At first I couldn't reproduce this in DOSBox in Linux, but then when I recompiled with --disable-fpu, I reproduced it.

DOSBox has two FPU (floating point unit) emulators. One is used by default when running DOSBox on x86 hardware. It uses actual x86 FPU instructions, and it should provide the full 80-bit long double precision. The other one does not require x86 hardware, and uses standard doubles. That means it does not give the full precision one would expect from a real FPU.

In OLE2DISP.DLL 2.3.3027.1, there is a check for precision loss. The code loads a 64-bit integer using FILD, stores a 10 byte BCD number using FBSTP, and then tests the last four digits. If the digits aren't correct, it pops another value, causing the stack underflow.

Normally, this would cause an FPU underflow exception to be noticed by Windows, but DOSBox doesn't pass on those exceptions and instead quits emulation. This is of course an inaccuracy in CPU emulation. DOSBox isn't a very good general purpose x86 CPU emulator, partly because it is a hybrid of an operating system and CPU emulator. DOSBox is designed for running old games and good at that, but not general purpose emulation. I'm tempted to try to port Bochs to make a better general purpose emulator available. For now, I simply disabled the FPU stack underflow check, because that allows many Windows 3.x apps to work.

Here is an image of execution diverging after the test. I used this to find the location where the test was. Note that due to relocations, searching for this in files can be tricky. The image at the right is from the very helpful http://ref.x86asm.net/coder32.html table.



Finally, here's some x86 assembler code performing this test. This fails in DOSBox 0.74 in 64 bit Linux, but works in my SDL 2 branch, which is based on r3869. This is probably due to r3851. I hadn't written a DOS assembler program in so long, so this was fun:

; FPU test like OLE2DISP.DLL 2.3.3027.1
; Build using: nasm ole2disp.asm  -o ole2disp.com
segment code
    org 100h

; This is the test
    wait
    fild qword [input]
    wait
    fbstp tword [output]
    nop
    wait

    mov dx, header
    mov ah, 9
    int 21h

; This displays output
    mov cx, 10
    mov si, output+9
    std
outloop:
    mov al, [si]
    shr al, 1
    shr al, 1
    shr al, 1
    shr al, 1
    call outdig
    lodsb
    call outdig
    loop outloop
    int 20h

; Display a single hex digit.
outdig:
    xor ah, ah
    and al, 0fh
    mov bx, hex
    add bx, ax
    mov dl, byte [bx]
    mov ah, 2
    int 21h
    ret
 
segment data
input:   times 6 db 0
         db 0dfh, 00dh
output:  times 10 db 0
hex:     db "0123456789ABCDEF"
header:  db "FILD, FBSTP FPU test like OLE2DISP.DLL 2.3.3027.1", 13, 10
         db "00999517642299539456 is correct result. "
         db "Last 4 digits of following must match:", 13, 10, '$'




Friday, January 09, 2015

Introducing an SDL 2 version of Em-DOSBox


When I first got my Emscripten port of DOSBox working, I was happy with the sound. It was not perfect. There were occasional small glitches at some CPU intensive moments. However, it was perfect most of the time in Firefox and pretty good in Chrome. (IE 11 does not support sound and IE 12 will hopefully support it in the future.) Then Firefox discontinued its Mozilla Audio Data API, and Emscripten began using the Web Audio API in both. That made Firefox sound significantly worse. When I recompiled Em-DOSBox with a recent version of Emscripten, sound was very bad.

I always had doubts about the way Emscripten's SDL plays audio via the Web Audio API. I don't think AudioBufferSource nodes are designed to be played one after another in a continuous stream. If it worked well, I wouldn't care about theoretical correctness, but it doesn't seem to work very well, so this may be a problem. I think the obvious way to play continuous sound from JavaScript is using a ScriptProcessorNode. (Those are depreciated, but their replacement, Audio Workers, are not implemented yet in browsers.)

[ Update: The SDL 2 version described below is old. SDL 2 is now part of Emscripten ports. To use it, simply add -s USE_SDL=2 to the command line arguments and Emscripten will take care of the rest. ]

I just found that SDL 2 has been ported to Emscripten. It is an actual port of SDL itself, not an SDL-compatible library like what is currently found in Emscripten. Looking at its audio code, I saw it using ScriptProcessorNode, and I decided to give it a try.

There is DOSBox SDL 2 patch by NY00123. It's based on a later version of DOSBox, but Git makes dealing with such things fun. First I merged with that revision of DOSBox, and then I merged the patch.

At first, sound was totally terrible. Then I found that the SDL callback needed float data, because ScriptProcessorNode needs floats. When I made that change, the sound was pretty good. It still isn't as good as old Firefox with the Audio Data API, but I think it's acceptable.

There is a problem with red and blue being swapped, and DOSBox won't run in IE, but SDL 2 for Emscripten definitely shows some promise. It may even be faster than Emscripten SDL, because more of it consists of C code which gets compiled into asm.js. Here is the Em-DOSBox branch.

The hard problem with porting DOSBox to Emscripten.

Getting DOSBox to successfully run many programs in a web browser wasn't hard, thanks to Emscripten. Improving performance was a bit harder. Here I'm going to describe the hardest problem, which still remains unsolved. Because of it, some programs cause web browsers to hang, and the interactive command prompt is unusable.

JavaScript code must return to the browser regularly. That's the only way the browser can regain control so it can update the display and handle new input. If JavaScript code doesn't return, the page or the whole browser appear to hang. The script may be producing output, but the user won't see it until the browser regains control. There isn't any function you can call to let the browser do its work. Your functions literally must return.

DOSBox emulates a PC running DOS using a mix of x86 assembly running under CPU emulation and C++ code running on the host. This can result in deeply nested calls. Here is the call stack from a program reading from the keyboard via the DOS device CON:

#0  DOSBOX_RunMachine () at dosbox.cpp:244
#1  0x000000000040e1f4 in CALLBACK_RunRealInt (intnum=22 '\026')
    at callback.cpp:106
#2  0x00000000004a1ed5 in device_CON::Read (this=0x3a2c430,
    data=0x7fffffffa15d "", size=0x7fffffffa12a) at dev_con.h:66
#3  0x00000000004a2f9b in DOS_Device::Read (this=0x3a4c1e0,
    data=0x7fffffffa15d "", size=0x7fffffffa12a) at dos_devices.cpp:67
#4  0x00000000004a73b7 in DOS_ReadFile (entry=0, data=0x7fffffffa15d "",
    amount=0x7fffffffa176) at dos_files.cpp:371
#5  0x000000000049e429 in DOS_21Handler () at dos.cpp:196
#6  0x00000000004073cf in Normal_Loop () at dosbox.cpp:135
#7  0x00000000004077bb in DOSBOX_RunMachine () at dosbox.cpp:244
#8  0x000000000040e1f4 in CALLBACK_RunRealInt (intnum=33 '!')
    at callback.cpp:106
#9  0x00000000006a8ecc in DOS_Shell::Execute (this=0x3a4c2c0,
    name=0x7fffffffbaf0 "debug", args=0x7fffffffcbe5 "") at shell_misc.cpp:492
#10 0x00000000006a0613 in DOS_Shell::DoCommand (this=0x3a4c2c0,
    line=0x7fffffffcbe5 "") at shell_cmds.cpp:153
#11 0x000000000069d96f in DOS_Shell::ParseLine (this=0x3a4c2c0,
    line=0x7fffffffcbe0 "debug") at shell.cpp:251
#12 0x000000000069ded8 in DOS_Shell::Run (this=0x3a4c2c0) at shell.cpp:329
#13 0x000000000069e8d2 in SHELL_Init () at shell.cpp:653
#14 0x00000000006978a8 in Config::StartUp (this=0x7fffffffddc0)
    at setup.cpp:853


There you can see the program running at #6. Normal_Loop() usually keeps calling the CPU emulator to run x86 code, but some instructions cause the emulator to quit, returning a value that tells Normal_Loop() to call a different function. In this case, the CPU emulator encountered int 21h, the main way to access DOS services. That is why Normal_Loop() called DOS_21Handler() at #5. After that, things are happening directly via C++ code, without CPU emulation. Then at #2, device_CON::Read() is calls int 16h (22 decimal), the BIOS interrupt for the keyboard. It calls either ah=0 or ah=10h functions, both of which wait for a key to be pressed and then return its value. The interrupt handler is implemented in x86 code, so you see DOSBOX_RunMachine() at #0. There is also a loop in device_CON::Read(), which keeps calling int 16h until it has the requested number of characters.

Such inner loops at #2 and #0 are not compatible with JavaScript. It's not possible to just keep waiting for input like that. Instead, the functions need to return, and then run again in the next iteration of the Emscripten main loop. That would involve re-establishing the entire call stack, with parameters and local variables.

Actually, my port has a shortcut. That backtrace is normal DOSBox running in Linux. My port establishes the Emscripten main loop at #7, in DOSBOX_RunMachine(). Because of that, there is no need to worry about #14 through #7. Because of this, when the running program exits, you can't get back to the DOS prompt. That's okay for now, because the interactive command prompt can't work anyways. It similarly gets stuck in a loop..

This is not impossible to fix, but I can't imagine a nice elegant fix yet. Adding code to re-establish that call stack on the next main loop iteration would be messy. It would also degrade performance. Maybe it would be possible to strip out DOS emulation and run FreeDOS instead? Currently DOSBox does not have a disk controller and relies on its DOS emulation to access files. The DOXBox-X branch adds an IDE controller.

Monday, January 05, 2015

Access to physical disk devices in Windows 7

I just noticed that PhotoViewer, the program for putting photos on the first digital photo frame I hacked, did not need to run as Administrator. It simply opens the drive using CreateFileA() and a path like "\\.\D:". I could use the same sort of path to open another USB storage device, so Windows isn't somehow recognizing the photo frame and allowing this for compatibility.

The CreateFileA documentation tells you to use a path like "\\.\PhysicalDrive1", but that only works for me if run as Administrator.

There is a difference between the two paths: "\\.\D:" is a partition and "\\.\PhysicalDrive1" is an entire drive. However, if the drive is not partitioned, the two are effectively the same.

Cygwin works the same way. The physical drive is "/dev/sdb", requiring Administrator access, and the partition is "/dev/sdb1", not requiring administrator access. The first partition exists even if the drive is not partitioned, and then it is the same as the whole drive.

Opening for writing can fail sometimes if the device is in use, but otherwise, even a non-admin user can write to sectors. I can have a FAT32 formatted USB drive open in an Explorer window and simultaneously write to sectors altering the files there. It does not work with an NTFS formatted drive, so this is not as insecure as it might seem at first.