Friday, February 06, 2015

SDL 2 text input and Emscripten

SDL 1 used SDL_EnableUNICODE() to enable and disable text input. It started off disabled, and you needed to call SDL_EnableUNICODE() if you depended on the unicode member of SDL_keysym.

SDL 2 has a different text input API which is designed to accommodate international users and touchscreen devices. There, SDL video initialization turns on text input if an on-screen keyboard is not needed. This is normally harmless, but it causes a problem in the Emscripten port. 

Normally, many keys trigger browser actions. Those actions should be prevented if you want to use the same keys in an SDL application. In Firefox, it is possible to prevent browser actions from keypress events, but in Chrome, Safari and IE they need to be prevented in keydown events. However, preventing default actions from keydown events prevents keypress events which are needed to get text characters.

If you need to prevent browser actions and don't need SDL 2 text input, simply disable text input after initializing SDL video:
if (SDL_IsTextInputActive()) SDL_StopTextInput();

You can find more information about these JavaScript events on quirks.org. They also have a test page which you can use to examine their behaviour on different browsers.

Tuesday, February 03, 2015

Fixing the hard problem in Em-DOSBox using Emscripten emterpreter sync

If you just want to use this

Use Emscripten incoming until a new version is released, after 1.29.4. Configure with: ./configure --enable-sync --with-sdl2 . The dosbox.html.mem file that is produced must be in the same directory as dosbox.js. The .mem file is big, but will compress well. Ensure your web server can serve files in compressed format. Ideally pre-compress them so they don't need to be compressed repeatedly.

Technical explanation

I previously wrote about the hard problem with porting DOSBox to Emscripten. Basically, DOSBox cannot easily work as one Emscripten main loop. I had an idea about how I could modify functions to make them resumable, but it would be messy and many functions would need modifications.

Fortunately, around the same time I learned about a new Emscripten feature: emterpreter sync. Instead of compiling to JavaScript, Emscripten can compile functions to a bytecode, and include an interpreter for that bytecode. Code running in emterpreter can be interrupted using emscripten_sleep(). This will call JavaScript setTimeout() and stop execution. The timeout function will then restore state as if the emscripten_sleep() call returned.

There is a catch here. Code running in emterpreter will run much more slowly than asm.js JavaScript. It is far too slow for DOSBox. The solution is to only use emterpreter for functions which could be interrupted by emscripten_sleep(). This means all functions that may be on the call stack when calling emscripten_sleep(). JavaScript functions must not be in the call stack, because their state cannot be resumed.

Once again, the way DOSBox is structured is a problem. The CPU interpreter can recursively call itself, leading to emscripten_sleep(), but emterpreter would make it too slow. A reasonable compromise is possible there: preventing sleep during those nested calls. They are typically CPU exceptions which should return quickly, so there is no need for sleep there.

The main remaining problem is the DOSBox paging code. It recursively runs the CPU interpreter, and only returns when execution returns to where the page fault occurred. This is a problem if execution never returns there, or another page fault happens and they don't return in a last in first out order. This is also a problem with ordinary DOSBox, but it is worse with Em-DOSBox. DOSBox would just run slowly and Em-DOSBox would hang the browser. A timeout check is used to prevent browser hangs.

Finding all the functions which require emterpreter was made easier by linking with -profiling, then using csplit dosbox.js '/^function /' '{*}' to split up functions and using grep to search for calls. Using console.trace() at the start of emscripten_sleep() is also helpful. If an abort occurs after a resume, the previous backtrace will show what function requires emterpreter. Note that -O3 is required, because otherwise functions may have too many local variables for emterpreter.

Originally emterpreter only supported a blacklist of functions which should not use emterpreter. That was unusable in this situation due to the number of functions and changing name mangling of some. Alon Zakai helpfully added a whitelist, so functions which need emterpreter can be listed instead. Right now, there are 40 functions on that list. That's a small number compared to the thousands of functions in Em-DOSBox, but manually transforming all those functions to make them resumable would be a lot of work. It would also complicate use of new changes from SVN or applying of DOSBox patches. Based on performance and compatibility with DOS games, emterpreter sync was the right choice.