Monday, May 11, 2015

Performance of the Em-DOSBox CPU interpreter

When I first got DOSBox to run in a web browser, performance was terrible. The problem was the CPU interpreter. A single function fetches, decodes and executes most x86 instructions. Most of the function consists of a big switch statement with many cases. It is big because there are many x86 instructions.

The first problem was Emscripten converting the switch statement into a long chain of else if comparisons. Actually, a switch statement was used, but in most cases it merely set a variable which was later tested via comparisons. Instruction decoding, which needs to be done for every instruction, changed from O(1) into O(n).

Emscripten could generate a much better switch statement with a patch. This made DOSBox run fast in Firefox, but it was much too slow to use in Chrome. When I profiled it, I saw a warning triangle by the CPU interpreter function, telling me it's not optimized because the switch statement is too big. There was already v8 bug filed about this issue.

I solved this problem by transforming the cases of the big switch statement into functions using extractfun.py. This reduces function size and allows a function pointer to be used instead of a switch statement. The process is somewhat convoluted because the switch statement is normally built using the preprocessor. First, preprocessor output files need to be produced. In order to get Automake to create them using proper dependencies, it needs to create a library which is otherwise unnecessary. Then the Python script parses the preprocessor files. It stores functions into a function store, removing duplicates and fixing name collisions. Finally, it creates header files which are used when building the final version of the CPU interpreters. Three CPU interpreters are processed this way: the simple, normal and prefetch cores.

Since then, the Emscripten bug has been fixed, I assume by the switch to Fastcomp. When Chrome started using TurboFan for asm.js, it could finally get good performance with an un-transformed CPU interpreter. This led me to check whether extractfun.py is still necessary.

Safari 8.0.6 and Internet Explorer 11 still get terrible performance without the transformation. Use of --llvm-opts '["-lowerswitch"]' doesn't seem to help. Looking at the JavaScript, I can confirm that it changes the big switch into a binary search, so this probably means the problem is due to the size or complexity of the function, and not just due to switch statements. I also experimented with the Emscripten outlining limit, with or without -lowerswitch. I assume that transforming switch cases into functions is a more efficient split than what's done by Emscripten outlining.

Friday, May 01, 2015

Gigabyte GA-P35-DS3R enables wake on any unicast packet by default

Recently I found that if I totally cut power to my computer (including standby power), booted to Linux and went into suspend, it would wake unexpectedly very soon. Booting into Windows would prevent this problem from happening until power is totally cut again. I suppose this problem existed before, but I never noticed it because I rarely fully cut power and I was using Linux less.

At first I thought this was a Linux bug, but actually, it the result of a crazy default setting for wake on LAN. By default, wake on unicast packet and magic packet are both enabled. If there was some network activity which caused ordinary unicast packets to arrive while the computer was sleeping, it woke up. That's why I found that with a minimal X setup using twm, the wakeups only happened if I was running a web browser.

This setting can be seen by running sudo ethtool eth0. Its output included:
        Supports Wake-on: pumbg
        Wake-on: ug

From the ethtool man page:
              u Wake on unicast messages
              g Wake on MagicPacket™

The solution was adding ethtool -s eth0 wol d to /etc/rc.local to disable wake on LAN. Then sudo ethtool eth0 would report Wake-on: d, which means "Disable (wake on nothing).". It's possible to also use g instead of d, which should only enable wake on magic packet.

The GA-P35-DS3R rev 1.0 motherboard F13 BIOS does not seem to have any options for changing wake on LAN settings, so this seems to be the only way to do it. I had already disabled wake on LAN in Windows via the Advanced settings in Device Manager. That setting persisted until standby power was cut.

It sure seemed like a Linux bug at first, so here's the Ubuntu bug I reported. Now I just wish Linux would tell me the wake reason. If something told me these wakeups were a result of wake on LAN, I would have wasted a lot less time on this.