Friday, January 30, 2009

Sql 2005 Install Woes on Shiny New Big Server

I had a new one this time.

I (of course) ran into the install-hangs-on-setting-file-security issue (KB910070), but I was expecting that. What really threw me was after that then install then just kept dying, leaving this in the logs:

Faulting application sqlservr.exe, version 2005.90.1399.0, faulting module sqlservr.exe, version 2005.90.1399.0, fault address 0x0000000000b323f0.

This really threw the installer too - after an uninstall, and even after a manual cleanup, the installer still though there was an instance hanging around. Which it was. I had to manually delete the SQL Services (using SC), a bunch of instance registry settings and the instance files directory (the MSSQL.1 folder) before I could finally get it to re-install. I guess the uninstall died too.

So then I tried installing again, and again. And again.

So I started speculating. Was it the virus scanner .... No. Could it be the Sql 2005 installer didn't like .Net 3.5 sp1? Uninstall... No. Was I definately using the 64 bit version... Yes. Could I slipstream SP2 and workaround some issue I didn't understand yet... No. Was it the monster 24 cores the server had (4 x hex core)... maybe.

There is a known issue with Sql 2005 instal failing with odd number of cores (ie Phenoms). That (obviously) doesn't count: but maybe Sql can't either. So I used the instructions in KB954835 to criple my monster server down to a single CPU, and then it all installed just fine. I can now install SP2 (3 actually) which allegedly should then make it all work.

Moral:
Obviously I should have been installing Sql 2008 instead
It's clearly becoming way too easy - with multi-multi-core boxes - to drop into some massively unexplored race condition territory in something that's otherwise really quite stable and well tested.
You can have too many cores

Other thoughts:
There must be a better way to restrict an installer or app to run on only one core without farting about with BOOT.INI
Once I put SP3 on it better all work otherwise the boss is going to be really pissed ('What, those other 23 cores? They're um ... spares')

6 comments:

Unknown said...

This is the most unbelievable "bug" ever. After trying 3 times to install a cluster i kept having the same error. Doing everything I've done before for other cluster installations wouldn't help. So i thought to myself "It must be this extreme server we have - 48 cores/128GB" so is it the memory or the CPU. There was no max memory limits for SQL so it has to be the CPU. I thought it was the number of cores, maybe it was too much. But a divisor error? No way...

After the server reboots i will install a clustered instance again and apply SP3 and see if that works. Otherwise I'm off to SQL2008

Thanks for the info!

Anonymous said...

I just ran into this issue today. Did SP2 or SP3 let you "unleash" all the processors again?

Anonymous said...

This is a little worrying, as I too have just bumped my head with this one. I have installed 48 cpu itaium platforms before and never hit this. I have a new HP DL 580 intel 4 6 core 2.7Ghz system with 32Gb ram and guess what SQL will not install. I am going to try and scale down as you did and see if that helps. Thanks for blogging this. If I get a resolve to it I will update

NY said...

Same issue for me last week. Now after SP3 2 of my 3 Clustered Servers are running on all 12 CPUs, but 1 server can't start the SQL service with multiple cpus re-enabled. The same instance works fine on the other 2 servers.

ARGH@!@#

ALEX said...

Tks so much.
You resolve my problems with a cluster enviroment.
I find only two links with this issue, tks for your blog.

Mathieu Mitchell said...

Wow, this fixed my problem, we've been installing this server for 2 days for a client and couldn't figure out what was going on... Damn AMD and their triple core :@ Ended up upgrading the client to a Phenom II X4.

Popular Posts