Friday, January 30, 2009

Sql 2005 Install Woes on Shiny New Big Server

I had a new one this time.

I (of course) ran into the install-hangs-on-setting-file-security issue (KB910070), but I was expecting that. What really threw me was after that then install then just kept dying, leaving this in the logs:

Faulting application sqlservr.exe, version 2005.90.1399.0, faulting module sqlservr.exe, version 2005.90.1399.0, fault address 0x0000000000b323f0.

This really threw the installer too - after an uninstall, and even after a manual cleanup, the installer still though there was an instance hanging around. Which it was. I had to manually delete the SQL Services (using SC), a bunch of instance registry settings and the instance files directory (the MSSQL.1 folder) before I could finally get it to re-install. I guess the uninstall died too.

So then I tried installing again, and again. And again.

So I started speculating. Was it the virus scanner .... No. Could it be the Sql 2005 installer didn't like .Net 3.5 sp1? Uninstall... No. Was I definately using the 64 bit version... Yes. Could I slipstream SP2 and workaround some issue I didn't understand yet... No. Was it the monster 24 cores the server had (4 x hex core)... maybe.

There is a known issue with Sql 2005 instal failing with odd number of cores (ie Phenoms). That (obviously) doesn't count: but maybe Sql can't either. So I used the instructions in KB954835 to criple my monster server down to a single CPU, and then it all installed just fine. I can now install SP2 (3 actually) which allegedly should then make it all work.

Obviously I should have been installing Sql 2008 instead
It's clearly becoming way too easy - with multi-multi-core boxes - to drop into some massively unexplored race condition territory in something that's otherwise really quite stable and well tested.
You can have too many cores

Other thoughts:
There must be a better way to restrict an installer or app to run on only one core without farting about with BOOT.INI
Once I put SP3 on it better all work otherwise the boss is going to be really pissed ('What, those other 23 cores? They're um ... spares')


Unknown said...

This is the most unbelievable "bug" ever. After trying 3 times to install a cluster i kept having the same error. Doing everything I've done before for other cluster installations wouldn't help. So i thought to myself "It must be this extreme server we have - 48 cores/128GB" so is it the memory or the CPU. There was no max memory limits for SQL so it has to be the CPU. I thought it was the number of cores, maybe it was too much. But a divisor error? No way...

After the server reboots i will install a clustered instance again and apply SP3 and see if that works. Otherwise I'm off to SQL2008

Thanks for the info!

Anonymous said...

I just ran into this issue today. Did SP2 or SP3 let you "unleash" all the processors again?

Anonymous said...

This is a little worrying, as I too have just bumped my head with this one. I have installed 48 cpu itaium platforms before and never hit this. I have a new HP DL 580 intel 4 6 core 2.7Ghz system with 32Gb ram and guess what SQL will not install. I am going to try and scale down as you did and see if that helps. Thanks for blogging this. If I get a resolve to it I will update

NY said...

Same issue for me last week. Now after SP3 2 of my 3 Clustered Servers are running on all 12 CPUs, but 1 server can't start the SQL service with multiple cpus re-enabled. The same instance works fine on the other 2 servers.


ALEX said...

Tks so much.
You resolve my problems with a cluster enviroment.
I find only two links with this issue, tks for your blog.

Mathieu Mitchell said...

Wow, this fixed my problem, we've been installing this server for 2 days for a client and couldn't figure out what was going on... Damn AMD and their triple core :@ Ended up upgrading the client to a Phenom II X4.

Popular Posts