Monday, May 11, 2015

Performance of the Em-DOSBox CPU interpreter

When I first got DOSBox to run in a web browser, performance was terrible. The problem was the CPU interpreter. A single function fetches, decodes and executes most x86 instructions. Most of the function consists of a big switch statement with many cases. It is big because there are many x86 instructions.

The first problem was Emscripten converting the switch statement into a long chain of else if comparisons. Actually, a switch statement was used, but in most cases it merely set a variable which was later tested via comparisons. Instruction decoding, which needs to be done for every instruction, changed from O(1) into O(n).

Emscripten could generate a much better switch statement with a patch. This made DOSBox run fast in Firefox, but it was much too slow to use in Chrome. When I profiled it, I saw a warning triangle by the CPU interpreter function, telling me it's not optimized because the switch statement is too big. There was already v8 bug filed about this issue.

I solved this problem by transforming the cases of the big switch statement into functions using This reduces function size and allows a function pointer to be used instead of a switch statement. The process is somewhat convoluted because the switch statement is normally built using the preprocessor. First, preprocessor output files need to be produced. In order to get Automake to create them using proper dependencies, it needs to create a library which is otherwise unnecessary. Then the Python script parses the preprocessor files. It stores functions into a function store, removing duplicates and fixing name collisions. Finally, it creates header files which are used when building the final version of the CPU interpreters. Three CPU interpreters are processed this way: the simple, normal and prefetch cores.

Since then, the Emscripten bug has been fixed, I assume by the switch to Fastcomp. When Chrome started using TurboFan for asm.js, it could finally get good performance with an un-transformed CPU interpreter. This led me to check whether is still necessary.

Safari 8.0.6 and Internet Explorer 11 still get terrible performance without the transformation. Use of --llvm-opts '["-lowerswitch"]' doesn't seem to help. Looking at the JavaScript, I can confirm that it changes the big switch into a binary search, so this probably means the problem is due to the size or complexity of the function, and not just due to switch statements. I also experimented with the Emscripten outlining limit, with or without -lowerswitch. I assume that transforming switch cases into functions is a more efficient split than what's done by Emscripten outlining.

Friday, May 01, 2015

Gigabyte GA-P35-DS3R enables wake on any unicast packet by default

Recently I found that if I totally cut power to my computer (including standby power), booted to Linux and went into suspend, it would wake unexpectedly very soon. Booting into Windows would prevent this problem from happening until power is totally cut again. I suppose this problem existed before, but I never noticed it because I rarely fully cut power and I was using Linux less.

At first I thought this was a Linux bug, but actually, it the result of a crazy default setting for wake on LAN. By default, wake on unicast packet and magic packet are both enabled. If there was some network activity which caused ordinary unicast packets to arrive while the computer was sleeping, it woke up. That's why I found that with a minimal X setup using twm, the wakeups only happened if I was running a web browser.

This setting can be seen by running sudo ethtool eth0. Its output included:
        Supports Wake-on: pumbg
        Wake-on: ug

From the ethtool man page:
              u Wake on unicast messages
              g Wake on MagicPacket™

The solution was adding ethtool -s eth0 wol d to /etc/rc.local to disable wake on LAN. Then sudo ethtool eth0 would report Wake-on: d, which means "Disable (wake on nothing).". It's possible to also use g instead of d, which should only enable wake on magic packet.

The GA-P35-DS3R rev 1.0 motherboard F13 BIOS does not seem to have any options for changing wake on LAN settings, so this seems to be the only way to do it. I had already disabled wake on LAN in Windows via the Advanced settings in Device Manager. That setting persisted until standby power was cut.

It sure seemed like a Linux bug at first, so here's the Ubuntu bug I reported. Now I just wish Linux would tell me the wake reason. If something told me these wakeups were a result of wake on LAN, I would have wasted a lot less time on this.

Thursday, April 30, 2015

Sleep and wake notifications with systemd

After I upgraded to Ubuntu 15.04 Vivid Vervet, the Audacious plugin I used to control a display device I built had a problem. Functionality related to sleep and wake didn't work anymore. That's because UPower doesn't handle sleep and wake anymore. Notification events instead come from systemd-logind. In particular, it is the "PrepareForSleep" signal on the "org.freedesktop.login1.Manager" interface. When its argument is true, the system is preparing to go to sleep, and when it's false, the system is waking up. Here's the new code.

I still need to find a way to determine the last wake time. Formerly I was using the /var/log/pm-suspend.log modification time, but that is also now unavailable because systemd is handling it instead. If the plugin observes a wakeup, it gets the correct time from that, but when started it needs another method.

This illustrates a major difference between Windows and Linux. The Windows version of the plugin can still use the WM_POWERBROADCAST message, with parameters dating from Windows 2000. Linux is a moving target, and things will break unless you keep updating them. This probably also extends to Linux application APIs.

Tuesday, April 28, 2015

CPUID HWMonitor 1.20 causes blue screens

32-bit Windows 7 is normally totally stable for me. Today I was running CPUID HWMonitor 1.20 to troubleshoot a UPS problem, and I got two blue screens. One was BAD_POOL_HEADER (19) and the other DRIVER_CORRUPTED_EXPOOL (c5). Both of these point to memory corruption.

HWMonitor is a nice utility which shows temperature, voltage and other measurements. It supports the motherboard, hard drive, graphics card and UPS. However, I've seen blue screens after using it in the past, and this experience further confirms that HWMonitor causes blue screens.

I just upgraded to version 1.27. I hope that version will work better.

Edit: Nope, it's not fixed. I didn't trust version 1.27, so wanted to restart after using it. I got another memory corruption crash on shutdown.

I wonder what's wrong with this APC BE500U-CN Back-UPS ES 500?

I saw Cinnamon claiming that the UPS is empty (0%), which made me assume there was some bug in Linux. It even hibernated the PC even though AC power wasn't interrupted, which is definitely a bug. Then I double-checked via apcupsd in Linux, and then in Windows, with PowerChute Personal Edition 3.0.2 and CPUID HWMonitor 1.20.0. The UPS was definitely claiming it was empty. However, PowerChute said it was operating normally and charging.

HWMonitor provides the most interesting information. It constantly updates the battery voltage and levels, and keeps minimum and maximum values..The voltage was fluctuating from below 12 to above 13. That does not seem like proper charging. When I took the battery out, it was around 12.6. My first thought was that there was a bad capacitor causing excessive ripple and giving bad readings. This led me to open up the UPS:

I saw high ripple at the leftmost capacitor, near the red wiring fault light and the heat sinks. After replacing that 220µF 50V capacitor, ripple went down from 5V to 3V. I'm not sure that this helped though. When it started charging, HWMonitor reported voltage steadily rising toward 13.50V. It stayed stable for a while. Then it fluctuated going down to 11.54V even. Now, the voltage is stable again, around 13.26V, and percentage is rising.

The stability makes me think the fluctuations were the firmware interrupting charging and doing some discharge tests. Surely 11.54V is too low though. Maybe the battery is bad and the firmware isn't smart enough to report that. I saw it quickly go below 12V with a car headlight, so it probably is worn out.

Sunday, April 26, 2015

Goodbye KDE!

I switched to KDE when GNOME 3 was released. Over time, version 4 generally got better and less buggy. I was satisfied and even happy with it. Now Plasma 5 seems to have thrown away a lot of that progress. There are far more bugs and less features. It's not as bad as the GNOME 3 change, but Plasma 5 is bad enough that I don't want to use it.

It's tragic how these free software desktop environments get to be good and then revert to a state which should be called an alpha release. Of course GNOME 2 and KDE 4 had some limitations and disadvantages. A big bold change can help there, but I think the only way to truly improve is to slowly evolve into something better. Even Microsoft can't successfully make sudden big changes.

GNOME 3 has improved in the meantime. I still think the window switching, application launcher and top bar are inefficient and limiting, and it wastes screen space. So, I definitely won't be choosing GNOME 3. MATE reminded me that things have improved since GNOME 2, and I don't want that either.

Cinnamon seems good now. It seems to be a combination of the best of GNOME 2 and GNOME 3. Its web page may be unimpressive, but the software works well and has enough features. It seems significantly faster than KDE, and I'm forced to install a lot less stuff. Losing KDE 4 was annoying at first, but now it seems I'm switching to something even better.

Thursday, April 23, 2015

My favourite music visualization program running in a web browser

Synaesthesia is my favourite music visualization program. I created an updated Windows port and also ported it to Emscripten. The code is on GitHub. Here is a screen shot, but you have to see it in action to really appreciate it.

Click here to run the program. Then start visualization by dragging an audio file from a file browser window on your computer to the web page. No information is sent over the network. The file is played by the web browser and visualized by asm.js running in the browser.

If you find that Firefox can't play MP3 files in Linux, install gstreamer1.0-fluendo-mp3.

Optimizing Emscripten SDL 1 settings

Emscripten's SDL tries to emulate desktop SDL. This involves some costly operations which many programs don't need. Performance can be improved by changing settings to prevent those operations.

Consider the multiple copies

The image exists in 3 places in the web page: program memory, canvas ImageData, and the canvas element. Normally, SDL_LockSurface() copies from the canvas, to ImageData and then to program memory, and SDL_UnlockSurface() copies from program memory to ImageData and to the canvas. Conversion may be needed between program memory and ImageData.

SDL.defaults.copyOnLock = false

SDL_LockSurface() will copy from canvas to ImageData but not from ImageData into program memory.

SDL.defaults.discardOnLock = true

SDL_LockSurface() will use createImageData() once to initially create ImageData, and never copy from the canvas. Copying from ImageData to program memory is prevented regardless of SDL.defaults.copyOnLock.

SDL.defaults.opaqueFrontBuffer = false

With normal SDL you can write only the RGB values and get opaque pixels of the requested colour. Canvas pixels also have an alpha value, which needs to be set to 255 to make pixels fully opaque.

Normally, both SDL_LockSurface() and SDL_UnlockSurface() set alpha values in ImageData to 255. This option prevents those operations. With it, the SDL_HWPALETTE 8 bpp mode works normally, your code that writes pixels into memory must set the alpha values. You can simply bitwise or pixels with Amask from the surface SDL_PixelFormat.

Use SDL_HWPALETTE flag for 8 bpp modes

It's possible to use 8 bpp without SDL_HWPALETTE. However, that uses less optimized code when converting to 32 bpp for the canvas, and doesn't work with SDL.defaults.opaqueFrontBuffer = false.

SDL_HWPALETTE requires that SDL_LockSurface() copying is disabled. 8 bpp modes without the flag don't have that requirement, but you'll end up with 32 bpp RGB values copied back, which you probably don't want.

Module.screenIsReadOnly = true

This prevents SDL_LockSurface() copying. You could use it instead of SDL.defaults.discardOnLock = true. The only difference is that ImageData is copied from the canvas the first time SDL_LockSurface() is called instead of being created via createImageData(). It's probably better to use the SDL.defaults options instead because they're better documented and better named.

Sample code

Here is a code fragment which sets uses the recommended optimization settings and enters 8 bpp mode in the recommended way. Some of this is redundant as noted above, but there's no harm in that.

    SDL.defaults.copyOnLock = false;
    SDL.defaults.discardOnLock = true;
    SDL.defaults.opaqueFrontBuffer = false;
surface = SDL_SetVideoMode(WIDTH, HEIGHT, 8, SDL_HWPALETTE);