Wednesday, November 11, 2015

Simple Linux boot repair by loading GRUB stage 1 from a file.

When booting on a PC with a legacy BIOS and MBR partitioning, Linux has a problem. At the start of the boot process, the BIOS can only load the first sector from the drive. That's only 512 bytes, and because it contains the partition table, only 446 bytes are available for code which could continue boot-up. The standard MBR code would load the first sector from the active partition, which is once again 512 bytes.

That's not enough space for GRUB, and it's not even enough space for code which could load a file from ext4 or other complex file systems used by Linux. The default solution is to put some more code right after the first sector, in the free space between it and the first partition. The BIOS loads the stage 1 code in the MBR, that code loads the stage 1.5 code which follows, and then finally it can load stage 2 from the Linux file system.

The problem is that it depends on GRUB stage 1 being in the MBR sector. When Windows and other operating systems are installed, they can overwrite stage 1 with their own code. The MBR code installed by Windows is reasonable. It will boot from whatever partition is active. However, if you installed GRUB in the MBR, the Linux partition is not bootable, so that won't help you.

There is plenty of information online on how to fix this by booting from removable media and re-installing GRUB. I'm proposing a much simper fix: load GRUB stage 1 from a file or write it to the MBR sector yourself. If you're running Windows you can take GRUB stage1 or boot.img files, put them somewhere on the Windows partition, and set up the Windows boot loader to boot them. In Windows Vista and later, you can do this via bcdedit. In earlier versions, it's just a single line of text in C:\boot.ini, like C:\stage1="GRUB stage 1".

If you want to write stage 1 to the MBR sector yourself, you can do that with a disk editor or dd in Cygwin. It's critical to remember to only overwrite the first 446 bytes, so you don't overwrite the partition table! This would mean using bs=446 count=1 arguments with dd. (If you overwrite the partition table, all is not lost though. TestDisk should be able to recover it.) GRUB stage 1 has some code after the first 446 bytes which doesn't get copied, but that code is only needed when booting from floppy. It's also critical to understand where you're writing, because writing to the wrong place could corrupt a file system! It is easier and safer to boot stage 1 from a file and then once booted into Linux use grub-install to do this.

On a system with multiple operating systems, I prefer installing GRUB in the Linux partition. In that situation, grub-install will complain about blocklists, and --force will be needed to make it install. The problem here is that the location of the file to load next is hard-coded in the boot sector, and if that file moves you can't boot anymore. It hasn't been a problem in practice when booting normally from a Linux partition. It is however a problem when using EasyBCD, because it makes a copy of the Linux boot sector which becomes out of date. My solution there is to chain load the Linux boot sector.

Tuesday, October 13, 2015

Anatomy of a miswired power supply

This Thermaltake power supply was working fine with an IDE/PATA hard drive. Then when I added a SATA SSD, the computer would turn off immediately after turning on. I first thought the motherboard was doing something because of some weird incompatibility, but the same thing happened with the data cable disconnected.

After finding that the SSD doesn't work anymore in another PC, I checked the SATA power connectors. The colours seemed right, but the black wires were at +12V. The other wires were fine. That means the SSD got -7V power instead of +5V, and its ground was at -7V relative to SATA signal ground. Surprisingly, I managed to fix the SSD by bypassing a damaged component. This post is about fixing the power supply.

Here's the power supply label:

It has passed all the quality control checks, and the warranty sticker is intact:

The SATA power connectors look fine based on the wire colours:

Here's an external demonstration that they're not okay, by measuring resistance from a yellow +12V wire to their black wires. This is a cheap multimeter, and less than 3 ohms basically means zero ohms.

Now it's time to open up the PSU. It looks nice inside, without obviously bulging capacitors, and with very little dust considering how long it has been in use:

Note that the two big capacitors at the bottom of the picture could hold a dangerous high voltage charge, which could give you a big shock. The low voltage capacitors at the top should not be a risk.

The circuit board is held in by 4 screws, but also fan had to be unscrewed to free it more. I couldn't unplug the fan connector, because it was glued.

Here's a closeup of the problem. There are markings on top of the circuit board, indicating areas which connect to a specific voltage. One of the black ground wires connects to the wrong area, among the yellow +12V wires. Further up, the wire, you can heat shrink tubing covering where the wire splits into two wires. That way this one connection goes to both of the SATA ground wires.

There already was an unused hole in the ground area. A high wattage soldering iron made the repair easy.

Putting the power supply back together was a bit tricky. There are several places where things need to interlock properly. Take care around the grommet where the wires come out. The top of the case is supposed to fit into a groove in the grommet, so the wires don't chafe against the metal edge.

Here's a photo of the nicely voided warranty sticker. I still hope Thermaltake will reimburse me for this. Power supplies should not have defects like this, and most users have no way to protect themselves from this. It would be easy to check for miswiring with a tester at the factory.

Finally, here's a photo of the SSD repair. I'm not sure what was damaged, because it's hard to find information on some surface mount parts. I'm guessing it's a regulator that comes before the regulators which provide voltages that the SSD actually needs, providing greater voltage stability. I just bypassed it with a diode, which had to be filed down to fit inside the SSD case.

Monday, July 27, 2015

Intel Rapid Storage Technology may be incompatible with SSDs

After installing a PNY CS1111 SSD, I started getting occasional hangs in Windows 7. Sometimes shutdown or standby would take way too long, and sometimes a hang would happen during normal use. Often, the hard drive LED would stay on all the time, but the computer would work normally, with no signs of actual constant disk activity. Also, trimcheck 0.7 reported that TRIM isn't working.

I just upgraded Intel Rapid Storage Technology from to The hard drive LED is behaving properly now, and trimcheck shows that TRIM works. I haven't gotten any hangs, though it hasn't been long enough to be sure.

There are newer versions of  Intel Rapid Storage Technology available, but I don't know of any newer versions which work with the ICH9R on this GA-P35-DS3R motherboard.

Sunday, June 28, 2015

Getting rid of the Tools pane in Acrobat Reader DC

Acrobat Reader is the best PDF viewer. It has the best performance on documents with huge amounts of stuff on a page, such as maps. It also has the fastest searching on documents with huge amounts of text.

Adobe stuffs it with all sorts of crap I do not want in an attempt to sell their online services, but the program starts up reasonably fast despite being bloated. There's just one important annoyance that the preferences can't fix: the Tools pane. It wastes the right side of the screen for a bunch of functions which I never use. It can be hidden, but there's no option to hide it permanently, so it has to be hidden again every time I open a document.

The Tools pane can be disabled by moving or deleting the Viewer.aapp plugin which creates it. For me, it is located at C:\Program Files\Adobe\Acrobat Reader DC\Reader\AcroApp\ENU\Viewer.aapp. In 64-bit windows it would be in C:\Program Files (x86)\, and the ENU part could be different if you have a different language installed. I created a new folder within ENU and put the file there, so I can move it back if I ever actually need it or if updates break because it's not there. It may be necessary to repeat this procedure after every Reader update.

Wednesday, June 10, 2015

AA cell diameter differs

I got some AA to D adapters so I can use AA rechargeables with the RCA RC3000A digital boombox.

The cell I tried was a new Duracell DX1500 pre-charged rechargeable NiMH, rated 2400 mAh. It was a very tight fit, and hard to remove. These adapters lock in the cell when you push it in all the way. You can start removal by pushing on the positive contact, which is like a button. After that, all you can do is pull on the negative side, which sticks out a bit. It's hard to apply a lot of pulling force by hand to a cell that sticks out very little. When I finally got the cell out, I saw the adapter had damaged it.

My first thought is that the inexpensive no-name adapters sucked. Then I tried three other kinds of cells. They all fit nicely. There was a bit of friction with some, but it wasn't a problem. Also, there was no damage. Even the older 2000 mAh Duracell DX1500 (same model as the problem cells) fit properly. Here are the cells. Only the cell at the right has a problem fitting in the adapters.

Does this PNY CS1111 SSD use a SandForce controller?

I recently finally upgraded to an SSD. I'm not too impressed. Some things are much faster, but those are generally rare operations such as rebooting and installing software or updates. Operating system caching and preloading was taking care of common operations. Practically speaking, going from a 160 GB Seagate 7200.7 to a 1 TB WD Black and later upgrading from 2 GB to 6 GB RAM were both more useful.

I chose a PNY CS1111 series 120 GB drive. It doesn't have the fastest read speeds, but it's faster than 3 Gbit/s SATA or 1x PCI Express 1.x, so the GA-P35-DS3R motherboard is the bottleneck.

According to PNY's 2015 SSD Product Comparison PDF, the CS1111 series uses a Silicon Motion SM2246EN controller. However, the SMART attributes don't make sense as SM2246EN attributes, and make sense as SandForce attributes. For example 241 and 242 are definitely measuring gigabytes. So, is PNY's information wrong? Are there multiple versions of this drive with different controllers? PNY's support didn't answer. I don't feel like opening up the drive to see if there's a SandForce controller inside, because that would void warranty. Here are the SMART attributes, as reported by smartmontools.

  1 Raw_Read_Error_Rate     0x0000   100   100   000    Old_age   Offline      -
  5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -
171 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -
172 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -
174 Unknown_Attribute       0x0000   000   000   000    Old_age   Offline      -
177 Wear_Leveling_Count     0x0000   000   000   000    Old_age   Offline      -
181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -
187 Reported_Uncorrect      0x0000   100   100   000    Old_age   Offline      -
194 Temperature_Celsius     0x0000   033   100   000    Old_age   Offline      -
195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -
196 Reallocated_Event_Count 0x0000   098   098   003    Old_age   Offline      -
201 Unknown_SSD_Attribute   0x0000   100   100   000    Old_age   Offline      -
204 Soft_ECC_Correction     0x0000   100   100   000    Old_age   Offline      -
230 Unknown_SSD_Attribute   0x0000   100   100   000    Old_age   Offline      -
231 Temperature_Celsius     0x0000   100   100   010    Old_age   Offline      -
233 Media_Wearout_Indicator 0x0000   000   000   000    Old_age   Offline      -
234 Unknown_Attribute       0x0000   000   000   000    Old_age   Offline      -
241 Total_LBAs_Written      0x0000   000   000   000    Old_age   Offline      -
242 Total_LBAs_Read         0x0000   000   000   000    Old_age   Offline      -

If the Malicious Software Removal Tool won't go away, next month's updates might fix it

In May I stopped installation of Windows 7 updates because one simple update seemed to be taking a long time with no activity. After that, the May 2015 Malicious Software Removal Tool would not go away. It would install successfully every time, writing to C:\Windows\debug\mrt.log and setting HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\RemovalTools\MRT\Version to the proper GUID. Nevertheless it would reappear after the next check for updates.

There were various ideas online, and I tried everything except deleting C:\Windows\SoftwareDistribution. Nothing helped. Eventually I just created a DWORD at HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\MRT\DontOfferThroughWUAU, setting it to 1 to disable offering of the tool via Windows Update. You cannot simply hide the update, because then last month's tool is offered instead, and so on.

I just removed that setting, wondering if the June updates might fix it. They did. I don't know what happened. Maybe the June Malicious Software Removal Tool set something that Windows Update finally recognized. All I know is I spent way too much time on this little problem.

BTW There is some useful information in KB891716.

Monday, May 11, 2015

Performance of the Em-DOSBox CPU interpreter

When I first got DOSBox to run in a web browser, performance was terrible. The problem was the CPU interpreter. A single function fetches, decodes and executes most x86 instructions. Most of the function consists of a big switch statement with many cases. It is big because there are many x86 instructions.

The first problem was Emscripten converting the switch statement into a long chain of else if comparisons. Actually, a switch statement was used, but in most cases it merely set a variable which was later tested via comparisons. Instruction decoding, which needs to be done for every instruction, changed from O(1) into O(n).

Emscripten could generate a much better switch statement with a patch. This made DOSBox run fast in Firefox, but it was much too slow to use in Chrome. When I profiled it, I saw a warning triangle by the CPU interpreter function, telling me it's not optimized because the switch statement is too big. There was already v8 bug filed about this issue.

I solved this problem by transforming the cases of the big switch statement into functions using This reduces function size and allows a function pointer to be used instead of a switch statement. The process is somewhat convoluted because the switch statement is normally built using the preprocessor. First, preprocessor output files need to be produced. In order to get Automake to create them using proper dependencies, it needs to create a library which is otherwise unnecessary. Then the Python script parses the preprocessor files. It stores functions into a function store, removing duplicates and fixing name collisions. Finally, it creates header files which are used when building the final version of the CPU interpreters. Three CPU interpreters are processed this way: the simple, normal and prefetch cores.

Since then, the Emscripten bug has been fixed, I assume by the switch to Fastcomp. When Chrome started using TurboFan for asm.js, it could finally get good performance with an un-transformed CPU interpreter. This led me to check whether is still necessary.

Safari 8.0.6 and Internet Explorer 11 still get terrible performance without the transformation. Use of --llvm-opts '["-lowerswitch"]' doesn't seem to help. Looking at the JavaScript, I can confirm that it changes the big switch into a binary search, so this probably means the problem is due to the size or complexity of the function, and not just due to switch statements. I also experimented with the Emscripten outlining limit, with or without -lowerswitch. I assume that transforming switch cases into functions is a more efficient split than what's done by Emscripten outlining.