Saturday, October 18, 2008

AHCI and freezes during startup while detecting SATA devices

You know how sometimes you're dealing with a computer problem and you find yourself trying a bunch of things, almost randomly, in a desperate attempt to try and get it to work? There's a name for that. If you're doing that while writing a program it's called voodoo chicken coding. If you're doing it while trying to debug some sort of operating system problem then I suppose it's voodoo chicken troubleshooting. it's called voodoo chicken coding (or troubleshooting) because you don't actually know what the problem is that so it is trying a bunch of random things to get it to work. It's the equivalent of mumbling things to the gods while waving some kind of voodoo chicken over the computer in a vain attempt to get to it to work.

This is not a good strategy. Even if you do manage to solve the problem this way it will probably reoccur again since whatever you did to fix the problem probably just fixed the problem by accident. (I say this coming from a background of trying to debug heavily threaded software).

What is not often stated, however, is that knowing how a system works can sometimes be of no help either. I recently had a problem on my PC where the computer would freeze early in the startup process while the BIOS was still scanning for IDE devices. I ran into this issue I was trying to switch my SATA bus over to AHCI. AHCI stands for Advanced Host Controller Interface. It's a newer protocol for talking to serial ATA devices that offers more features than the normal parallel ATA protocol such as hot swapping and native command queuing. I wanted to enable this on my internal hard drive for two reasons: 1) I thought I already had enabled it at some point in the past. 2) I wanted to enable native command queuing because it sounds cool. 3) I need to enable it in order to run the Mac OS which I've been trying, unsuccessfully, to get running on my computer since I bought it about a year and a half ago. At least part of the reason I haven't managed to get it running is because I haven't managed to get my computer running with AHCI.

Anyway, I turned on AHCI in my BIOS I started getting this freezing problem when the BIOS was scanning the SATA bus looking for new devices. I was a bit confused because as far as I know there's absolutely no way that anything I had done to the hard drive, in terms of formatting or partitioning or installing software, could cause this problem. Scanning for new devices shouldn't be reading anything on the hard drive. That's just weird. Well, it turned out that this was indeed the problem.

After trying everything I could think of I decided simply to wipe the drive in a voodoo chicken debugging attempt to try and get the system to recognize the hard drive without freezing. Amazingly, after doing a low-level formatted the drive and rebooting it worked fine. I still don't understand why this is. Why the heck is it reading off the drive during the device detection routine? Well, I don't know. Anyone want to explain this to me then feel free. Anyway, I'm just happy it's working now. Native command queuing is indeed cool!

So, I'm not sure what the moral of this story is. I think it is that once you've eliminated everything as being impossible the only thing that's left is the impossible, which is impossible. This in turn means you have absolutely no clue what you're doing and might as well start trying a whole bunch of random stuff that shouldn't work.

Now if you'll excuse me, I have some chicken stew to eat. yummy!

Sidenote: Switching from parallel ATA to AHCI requires installing drivers under Windows. Unfortunately installing drivers under Windows requires AHCI to be enabled. Enabling AHCI renders Windows unbootable unless it has the drivers for the AHCI controller installed. This is a very fun situation as it means you can't install AHCI drivers until AHCI is enabled and you can't load Windows if AHCI is enabled. I managed to get around this problem, on my machine by using my two SATA controllers to enable AHCI on the second controller, install the drivers in Windows for this controller, then manually change the hard drive over this this second controller. I then rebooted Windows under the second controller and turned on AHCI under the primary controller. I then moved my hard drive back to the primary controller and start up my machine again. And that effective all these shenanigans with the have AHCI enabled on both controllers and a half the drivers for these two AHCI controllers installed and enabled. There are apparently ways of installing the drivers by booting up from a CD-ROM or other startup disk and then inserting a floppy drive with those drivers at some point. I didn't bother reading up on how to do that as the above methods seemed far simpler to me.


levi said...

Hi Andrew. I have exactly the same problem with AHCI and software RAID. No issues unless I try to partition the raw array. After reboot AHCI BIOS gets frozen and nothing helps but only wiping the partition table of the first hard drive. Still perplexed with it...

Schwartz said...

Hi Andrew, there is nary information on this problem around, so I thought I would stop by and post.

I encountered this problem very recently when upgrading my system to Windows 7. Here is what I have discovered through many hours of experimentation:

It has something to do with the MBR records. I can cause this to happen by copying partitions using Paragon Partition Manager 11 in Windows 7. I have also caused this to happen when installing BOOTIT NG. However, I have used BOOTIT NG to FIX the problem several times.

I have repaired the problem several ways. Disconnect the drive, boot into windows using AHCI (assuming it is not your boot drive), then plug in the drive (AHCI allows hot swapping). Windows can now see the drive (both XP or Windows 7)

I have fixed the problem IN SOME CASES by setting an active partition or clearing it from Windows Disk Manager. In the case of a dynamic disk, this will not work.

From BOOTIT NG I have also fixed the problem. First, however, you must connect the drive to a controller set to IDE mode (change bios to IDE) so the BIOS will boot and recognize the drive at boot time.

In BOOT IT NG, I fixed a couple of disks by inspecting the MBR, through VIEW MBR which then reports and error in the CHS which I then tell it to fix (inspecting the properties on each partition might display a similar error). I have also fixed the hard disks with the dynamic disks by selecting the option FILL E-CHS.

I think this is a huge BUG or problem with the AHCI bios, but no one seems to talk about it.

Adam said...

Hey, I have been having freezing problems when idle myself. Windows 7, ASUS Sandy Bridges P8p67 pro mobo, i7 2600k.

I disabled AHCI in the bios and it seems to have cleared up. I am using a SSD for my windows install. I did not install with IDE enabled over AHCI but switching it back to IDE seems to have fixed my problem (3 days no freezes, a new record for me.

I actually was not able to start from hard drive or optical disk drive which got me thinking it was a SATA problem and not just my mobo or SSD. If anyone has a good explanation to why this may have fixed my problem or is it worth messing around to get AHCI working (not really sure it is in my case) that would be really helpful :)

DanielM said...

I came across this post today while Googleing. I am having the exact same problem on an older Core2 system that I am trying to set up as a media center. I had AHCI mode working properly, went to install Windows 7, let the Win7 installer partition the drives automatically, and now the system hangs when running through the AHCI scan on POST. I initially assumed the older HDD I was using had picked a bad time to die, but now I'm thinking perhaps Win7 messed up the partitioning.
I've had luck fixing similar problems with the DISKPART tool that comes with Windows. It's a real pain to use - I can partition a 'nix system with parted without much difficulty, but I still have trouble with DISKPART. IIRC the "clean" command is the one that blows away partition tables. I'll have to give it a shot.

Mozgalica said...

I have F1 GA- a75 d3h, 2xsata2x2TB,1xsata3x3TB
win7 sp1 x64 ultimate

1) Sata hdd's works as AHCI, sata0, sata2 channel,EFI off

2) adding 3TB sata 3 hdd on channel 4 (esata or sata^^ settings),EFI on

3) only 1 hdd sata works, smart data in error, 2 hdd's not visible

4) not working at all if all channels are set to ahci
5)working if one hdd is set sata, 3tb as ide, one hdd is unvisible
6)working all if all set to ide
7)speed is 100/200 if hdd's are ahci or sata
8)speed is 50/100 if hdd's are ide

3TB hdd is not mbr but gpt
2x2TB are mbr
installation win7 was with ahci

What is wrong with 3 sata hdd's?
Why not working all in ahci?

Deepfreeze is not a issue here (thawed)