Cup(Of T)

Wednesday, June 10, 2009

Visual Studio AddIns and remoted profiles

Here at work we have our profiles, including our documents folders, remoted to the network. It’s a constant pain, and this morning the fun was how to get the CopySourceAsHTML visual studio Add-in working as a result, since its ‘normal’ install location, My Documents\Visual Studio 2008\AddIn’ is within that remoted profile.

Irrespective of fiddling around with CASPOL and security zones, Visual Studio seems intransigent in not wanting to load AddIns from network locations. Fortunately it turns out there are a number of different places that you can put add-ins to get them to load, which is probably better explained in this conversation.

As a result I dumped the addin into

%ALLUSERSPROFILE%\Application Data\Microsoft\VisualStudio\9.0\AddIns

C:\Documents and Settings\All Users\Application Data\Microsoft\VisualStudio\9.0\AddIns

…and it worked just fine.

Wednesday, May 13, 2009

Miss Tech.Ed Online

Hopes dashed again:

Tuesday, April 07, 2009

TechEd 2008 DVD Silverlight Compatibility Issue

Shows how often I actually refer back to those DVDs, but now that Silverlight 2 is out they don’t work (well the navigation doesn’t, which is nearly as bad)

There are a number of crappy workarounds, but fortunately at least one clever fix (short of re-authoring the DVDs with the updated online content). Naively I thought you could just add the application/x-silverlight-2-b2 mime-type mapping into the registry and it would work, but apparently not.

The Silverlight version detection / browser sniffing / forwards compatibility story has been monumentally shite so far, because I’ve seen a whole heap of the ‘Install Silverlight’ button in lieu of content, but immortalizing it onto countless DVDs really takes the biscuit.

Flash – if I recall correctly – got this right first time, that being over a decade ago.

Friday, March 20, 2009

Do One Thing Per Line

Yeah, so your fancy-pants Linq query, and your fluent interface all looks so neat and expressive, but after you’ve tried to play ‘pin the null reference exception on the donkey’ in production for the n’th time you might consider that all those intermediate variables were generally considered good style for a reason. And it’s not just that you can pinpoint an error given a line number:

“Putting each statement on a line of its own provides an accurate view of a program’s complexity” – Steve McConnell [1]

That and naming the variables documented the code[2], right?

Everyone’s oooohh so up in arms if a method spans multiple pages on screen, but a 10 line fluent interface chain mixed with some linq and some lambdas is somehow ok!? Be serious now[3].

Dumb it down. One day you will appreciate it.

[1] Code Complete, Chapter 31.5 Para ‘Using Only One Statement Per Line’. See http://www.cc2e.com/Page.aspx?hid=291
[2] …and chapter 11.1 ‘Considerations in Choosing Good Names’
[3] Not that I’m advocating avoiding either Linq or Lambdas, but moderation in everything, right? Really not sold at all on Fluent Interfaces mind. So last year.

Old School JavaScript Officially ‘Old School’

I feel I have been rumbled:

Signs That Your JavaScript Skills Need Updating

Tuesday, March 17, 2009

Easy Upserts in SSIS 2005

Merging in updated or newly created data from an external source is a pretty common task in SSIS. The canonical way[1] these upserts are handled is the SSIS Lookup Pattern:

Incoming rows are run through the Lookup component to identify which already exist in the destination. Those that do become UPDATEs, and those that don’t (the Lookup’s error output) become INSERTs. You can further refine this to only retrieve the inserted-or-updated rows from the datasource in the first place and save on bandwidth and so on (a future post on delta loads I think).

It’s fairly straightforward, but to my mind this has a few major problems.

Firstly, it’s a pain in the arse. For something that’s such a common requirement, this seems like a lot of shape-twiddling-clicky-click-dragging to get it up and running[1]. And I hate how brittle the OLE DB Command parameter mapping is. But that’s another story.

More concerning is that the update flow uses the OLE DB Command component, and that’s something that fires per row, unlike the OLE DB Destination that your INSERTs go to, which can batch and/or use fast load. Normally the updates are going to be in the minority compared to the inserts, but if you get a significant number of them it still costs time.

Probably the bigger problem for me with this pattern is that the performance of the lookup degrades as the data volume progressively increases. Every time you insert data there's more for the lookup to subsequently lookup against. You can (and should) limit the columns in the lookup to restrict it's memory blow out, but it's always going to just progressively get worse over time.

To me this is symptomatic that the problem is being solved at the wrong tier. Why bring large quantities of data up from the database to determine 'newness' of a row, when that's clearly something best done closer to the database itself? Especially as both sets of rows are going to the same place anyway.

So then you think about pouring data into a temporary table, then doing the upsert within Sql (based on a join), then dropping the temporary table. But you’ve got to create and drop the temporary table. What you need is a temporary table that’s transparently created for you. And what, after all, is an INSTEAD OF trigger, if not precisely that?

The Upsert View

So my solution for this is a series of 'upsert views'. An upsert view is a view with an INSTEAD OF INSERT trigger that instead of the insert performs an upsert. In this case (being Sql 2005) the upsert is an IF … INSERT … ELSE … UPDATE construct, but a 'true' Sql 2008 upsert would be a minor tweak.

An upsert view looks a bit like this:

Select all-the-updatable-columns
From sometable

Non-updatable columns include timestamp and identity columns (you'd never mark a column you brought in from a source system as an identity column, so there's no need to mess about with identity inserts).

The instead of trigger looks a bit like this:

-- Insert all the genuinely new rows
Insert into [table]
 (all-the-updatable-columns)
From
 Select (all-the-updatable-columns)
 From inserted
 Where not exists (select 1 from [table] where (keys=inserted.keys)

-- Update all the modified rows
Update [table]
 Set all-the-updatable-columns=inserted.all-the-updatable-columns
From
 Inserted
 Inner join [table] on (keys = inserted.keys)
Where (concurrency_indicators <> inserted.concurrency_indicators)

(We’ll explain the keys and the concurrency indicators in a moment)

What this produces is a view over your data table that takes records INSERTED into the view and performs an UPSERT into the base table. As a result your SSIS can be simplified to this:

In reality of course you normally have more than this: probably have some other lookups and some data type conversions. However you've not had to put the Upsert functionality into the SSIS (by hand): instead the database is handling it all for you.

Provided you enable the FIRE_TRIGGERS option, this is entirely compatible with using Fast Load, and this can make a significant performance difference for the time of your package execution. You've kept all the row-matching down in Sql where it can use indexes, optimisations and the page cache, and you've reduced the memory footprint of your SSIS packages. I shame myself by not having any hard comparison data to back this up however.

We (of course) generate[2] these upsert views based on various bits of metadata we store to indicate which are the keys and the concurrency indicators. In a data warehouse it’s quite common for the primary key to be different to that defined in the source system, so the key used for row matching – the ‘business key’ – has to be indicated somewhere. Column-level extended properties seemed a fairly natural fit:

Similarly the columns that are used for concurrency control – rowversion / datetime updated – are marked as follows:

(If you don’t have one then you use the ‘fallback’ concurrency model that so many of Microsoft’s design-time tools like to kick out: you compare every column to see if it’s changed (mark all non-key columns). This is clearly a bit yuck).

And that’s pretty much it. Provided you can cope with INSERTing into a view and getting UPDATEs to your table, (which is a bit odd, but can be catered for with appropriate naming conventions) then maybe you can benefit from this approach too.

[1] Sometimes you can get the Slowly Changing Dimension wizard to kick out something similar, but that’s pretty hacky, and doesn’t like using certain types of columns as the update indicator (IRRC). It also re-creates the downstream flow every time you open it, which obviously removes any customisations you make. Avoid.
[2] The generation process was a fairly simple PowerShell script. I’ll post the important bits up one day.

Saturday, March 14, 2009

Dear HP / Dell / Microsoft

Please stop printing your product labels, serial numbers and OEM certificate of authenticity stickers in sweat soluble ink, then fixing them to the underside of our laptops.

Yours: everyone who ever actually used a laptop on their lap, or handled it using their hands.

Friday, March 13, 2009

Vista thinks I work for Borg

Hey did you see that thing that guy did where he brute-forced blur-redacted text back into plaintext? That was pretty cool. Damned if I can find the link again mind…

Monday, March 09, 2009

Say what you like about Vista…

…but the XP Start bar really did need a going over:

Avoiding Reporting Services ‘StreamNotFound’ Exception when Viewing Multiple Instances of the Same Report

I’ve been using Reporting Services 2005 to build a series of BI dashboards, initially using the SharePoint reports library, but more recently using straight ASP.Net pages. In essence we’ve been using SSRS as a headless charting engine.

On some of our dashboards we occasionally saw little red crosses rather than our charts – IE’s way of telling us the image hadn’t loaded. Didn’t happen very often, and at the time we had bigger fish to fry, but unfortunately when we moved to our big new multi-core server, it happened a lot more.

In the logs we saw a lot of this:

w3wp!library!f!01/21/2009-12:40:25:: Call to GetItemTypeAction(http://myserver/reports/BlahBlahReport.rdl).

w3wp!library!f!01/21/2009-12:40:26:: e ERROR: Throwing Microsoft.ReportingServices.Diagnostics.Utilities.StreamNotFoundException: The stream cannot be found. The stream identifier that is provided to an operation cannot be located in the report server database., ;

Hotfix 913363 looked promising, but that was included in SP1, and we were SP3, so unless it had regressed, that wasn’t going to help.

What we determined was that this only happened when the same user requested the same report multiple times (with different parameters) at more or less the same time. And that happens a fair bit for us: our dashboards (both the SharePoint versions, and the later ASP.Net ones) basically rendered down to pages of IFRAMES pointing back at SSRS, and sometimes multiple charts on one dashboard would actually just be the same report re-executed with different parameters. So my ‘race condition’ alarm was off and running.

For any report on the dashboard, IE does this:

Load the HTML in the IFRAME
Go back to the server and load all the images, javascript etc…

Any generated images in the report have to be cached momentarily on the server between when the HTML is generated, and when the browser comes and ask for them. And of course the browser has to supply some kind of reference back to the server so it gets the right images, and not the images from a different report or a what a different user is viewing. So there’s a Reporting Services Session (which you probably read about the first time you got a rsExecutionContextNotFound exception, right?).

So we speculated that SSRS was getting its sessions in a mess, and somehow a subsequent request to the same report, using different parameters, was throwing away the results of the previous execution (the chart image we are after) in the process of creating a new one. At which point Graeme discovered you can disable session cookies (in the ConfigurationInfo table) which (after an IISReset) indeed fixed the problem entirely.

But why the problem in the first place?

Looking at the HTTP traffic between the browser and the server, one sees that the scoping of the image request is indeed only scoped to an imageID and report as part of the GET url, and within a execution session as part of a cookie:

GET /ReportServer?BlahBlahReport&rs%3aFormat=HTML4.0&rs%3aImageID=d95ebcc7-deba-4734-93c9-270468bd133b HTTP/1.1

Cookie: ConsoleVisible5dd14218-294b-424e-a33a-013236be5290=false; RSExecutionSession%3ahttp%3a%2f%2fserver%2f%2fBlahBlahReport=54m0qr55hytlvn55nzkjxr55

I’ve not followed this through entirely, but basically the trouble is your browser only has one RSExecutionSession cookie, which appears to contains only one execution ID per report URI. You’d imagine then that what you’d get is just the wrong image (outstanding requests for images from the first execution are submitted using the execution ID from the second session), but you don’t, so the most likely scenario I can come up with is that starting the second execution for the report implicitly removes the artefacts (images etc…) still in cache from the first execution. If this happens before the browser’s loaded them then it’s little broken crosses time…

Once you turn cookieless sessions on, however, the GET requests look more like this:

GET /ReportServer?%2fBlahBlahReport&rs%3aSessionID=ucqtvo55wacqugqjln3l2045&rs%3aFormat=HTML4.0&rs%3aImageID=e9a2d08b-23cb-4586-8399-eb42ed5558a7 HTTP/1.1

ie they contain all three of report, execution and image. As such executing the same report multiple times in parallel doesn’t create any race conditions. This seems like such a better way of doing it that I’m surprised this isn’t the default behaviour (I guess it can’t clean up the temporary cache quite as aggressively, but other than that I can’t see the downside).

It’s worth noting that this kind of thing is an absolutely classic pitfall with any form of HTTP session state, which application frameworks like ASP and ASP.Net typically sidestep by serializing requests within a session and thus avoiding order-of-execution issues. However in this case, even if SSRS serialized access it would have to ensure it serialized it in the right sequence to really fix the problem. Tricky.

PS: Reporting Services 2005. Haven’t tested on 2008.

Tablet PCs: Still Too Heavy

I went out and bought a nice new HP Touchsmart TX2 the other day. And I’ve been very happy with it. For one thing I’ve hankered after a tablet for some time, but the shear ‘double the price’ barrier has always stopped me. But that’s not the case with the TX2, which goes for under $2k and is fairly comparable with it’s non-tablet brethren.

And it’s a touch-screen as well, though the multi-touch capabilities are pretty much just gimmicks supported by a handful of built-in HP apps until Windows 7 comes along. That being said, I’ve already got very used to flicking the screen for page up / page down operations (though I still think a scroll-wheel on the side would be just as easy).

But it’s just too heavy.

It’s not heavy by current standards: it’s sub 2 kilos, and even the (far more expensive) Fujitsu Lifebooks only go down to 1.6 kg (in a 12”). And it’s definitely not too heavy to carry in a bag on your shoulder. But in my totally non-scientific ‘picking things up with my hand’ testing I’ve determined that the ideal weight for a tablet PC is less than a kilo (the weight of a Dell Mini 9 in fact). Tablets are for holding, right? Well at 2kg (and without much grip on the edges) your thumb is getting a good workout, let me tell you.

So my requirements for a tablet PC that will genuinely shift usage patterns:

<1kg
4 hours battery life minimum, ideally 10, included in the weight above.
Some decent ergonomics around the frame so one can actually hold onto the damn thing.
About $600

Various rumours regarding tablet netbooks seem to be coming to fruition, so provided the dual-core Atom has enough grunt to handle the handwriting recognition, then this will be all good. Many a time I’ve sat there with the Mini 9 on my lap thinking ‘if only the screen would just flip round…’

Coming soon: less hardware ramblings. Promise.

Friday, March 06, 2009

I <3 Dell Mini 9

I’ll ‘fess up to being totally unimpressed with progress in the PC industry in the last 10 years. I appreciate we’ve made some fairly big steps technologically speaking (multi-core is right up there), but the growth of power consumption and (as a result) heat and noise dissipation has had me absolutely appalled. Kilowatt PSUs! Heatsinks the size of a fist! 3D graphics may have come on in leaps and bounds, but does it have to sound like a 747 about to take off? It’s all just so crude. I feel like opening the drive bay and shovelling in more coal.

I constantly think back to the Acorn Risc PC I had in uni, which seemed pretty powerful at the time, but whose CPU ran cool to the touch. It’s been downhill ever since.

Similarly I like to complain about laptop battery lives. What’s the point of a mobile computer if you have to be tied to the wall socket every two hours? But then laptops aren’t actually mobile computers, they are portable computers: portable like a portable TV is portable: you can pick them up and carry them around, but realistically to use them you’ve got to plug them in. This drives me *nuts*, particularly as I can easily burn through two hours before I really get started on anything, by which time it’s time for a battery break.

Enter the netbook.

The wife’s old laptop (one of my cast offs) finally went back to the Dell recycling plant in the sky, and so we got her a netbook as a replacement. A Mini 9 to be exact.

And it’s fantastic.

I can’t really say how it compares to it’s other netbook brethren, but if this is what the rest are like then viva the revolution because for on-sofa / in-kitchen[1] / in-bed casual browsing this form factor and weight (1kg) is absolutely on the mark[2]. Not too big, not too small. The Mini is SSD so it’s totally silent (excepting occasional sound card cross-chatter). The Atom CPU runs cool, so it doesn’t make your lap all sweaty. And it’s cheep enough to buy on a whim ($550 currently)[3]. Oh and it’s XP, so it can run .Net.

I don’t totally buy into it as the nirvana of mobile computing, because 3.5 hours is still not enough, but as something you can happily take room to room in the house it’s bang on.

Here it is cosying up to a Dell 15” Latitude: it’s exactly half the size[4]

P2030003

Here it is dwarfed by my 17” Inspiron 9300, which is only it’s grandfather in Laptop-generations:

P2030005

Ok, so the 9300 still beats the pants off it in terms of performance, but the Atom team have seriously lit a fire under the rest of the industry because it’s already good enough for most people, and it’s only going to get better. You’d have to imagine when the average punter catches on that Atom-based desktops, like the the EeeBox, can happily handle their Word/Email/Web demands there’s going to be a fairly major correction in the market, whatever the critics say. Just check out the price differential. That 90% of your horsepower you donate to SETI, Folding@Home or whatever: why not just not buy it in the first place and save your kilowatt hours? And that’s before you even consider any cloud computing offload.

In years to come we will look at our current CPU technologies in the same way we view CRTs, incandescent bulbs and the internal combustion engine. Here’s hoping we are making the first footsteps now.

[1] For recipes. I mean, no-one really bought an internet fridge, right?
[2] Though it would be better as a tablet
[3] Frankly it’s cheep enough to use as a universal remote control, if only all your devices talked Bluetooth
[4] If it looks like its overhanging on the right, it’s just the perspective

Oracle OLE DB provider precision issues with SSIS 2005

We had this problem[3] before where SSIS would keep complaining that the Oracle client wasn’t giving it the correct precision information for numeric columns:

Error at Import Broken Stock On Hand Levels [DTS.Pipeline]: The "output column "WORKD_TONNES" (42)" has a precision that is not valid. The precision must be between 1 and 38.

We were using Oracle 9r2 client + OLE DB provider against an Oracle 8 database, and I just assumed that one of those wasn’t passing though metadata it should have, leaving SSIS to see the precision and scale as 0:

You can explicitly set the scale/precision in the advanced editor, but it keeps wanting to overwrite it every time you look at the query, which is a major pain in the arse. And explicitly TRUNCing or ROUNDing the column in the source query doesn’t seem to make any difference, even while you’d think that would make the precision explicit. So instead we used the Microsoft OLE DB Provider for Oracle (MSDAORA), and everything was happy.

Well nearly.

Actually it turns out that rather than MSDAORA ‘working’, all it actually does is default the precision and scale as 38,0 in scenarios where it can’t determine the precision/scale. And this happens rather a lot, since an Oracle Numeric doesn’t actually have to have a length or precision specified (though it should): it can be some kind of wierd variable precision float/numerical hybrid, which isn’t something that SSIS 2005 really caters for (nor other parts of Sql for that matter). Anyway, the point is that 38,0 isn’t the right answer either: whilst it’s fine for integer data anything after the decimal is lost, and that can take a while to pick up on.

So I’m not sure which is worse. On the one hand you can use the Oracle OLE DB driver, and all your precisions will be wrong until you fix them, but at least it’s in your face enough you know about it (though sometimes you can’t even OK the dialog, to get into the advanced editor and fix it, and/or it whinges about mis-matches with error output columns).

On the other hand the Microsoft driver makes a good stab at ‘what you probably want’, but as a result you lose precision on decimal data. And even if you set the precision up, it’ll overwrite it again (silently) when you touch the query, so it can go from working to not very inconspicously.

Probably the only safe way here is to use the Microsoft driver, and modify all your queries to cast all non-integer variable Numeric columns to strings[2]. That way the conversion into decimal data has to be explicitly defined, and can’t be lost quite as easily. It’s a bit more work of course, but at least the output is right.

Which brings me back to why I started looking at this again because there is no 64 bit version of MSDAORA [1] (I’m not even sure if MSDAORA is actively supported anymore), and we’ve had all manner of fun getting the 32 bit version running talking to either of the 32 and 64 bit versions of the Oracle client. We failed, but I may have just configured it wrong.

So instead we are in this ridiculous situation where we use MSDAORA at design time, and the Oracle OLE DB driver on the server at execution time, which creates all manner of warnings in the logs and also means some Sql (with inline comments) works on a developers machine but doesn’t work on the server.

It’s a mess.

[1] Nor Jet either, if you were crazy enough to load something from an Excel file. Who would do such a thing?
[2] Interestingly this is exactly how SSIS 2008 gets round this problem: it gives up, maps the variable-Numeric column as a string and leaves the conversion up to you
[3] Obviously the first problem we had was the whinging about the DefaultCodePage issue
[4] Managed to get all the footnotes in backwards for this post, which is pretty poor.

Sunday, March 01, 2009

Consistency in SSIS

It just occurred to me that the Derived Column Transformation editor (and other SSIS expressions) take a C (ie C#) style approach to equality:

…whereas SSIS 2005 can only use VB.Net for Script Tasks / Components.

Go figure.

[ok, yeah, fixed in SSIS 2008. I know. Lucky you]

Friday, February 27, 2009

GDR Grrr...

I've totally had it with Visual Studio Database Edition 2008 GDR.

CREATE VIEW [dbo].[DateRangeExpanded] WITH SCHEMABINDING
AS
select n.Number, n.Date
from dbo.Numbers n
where n.Date >= CONVERT(DATETIME,'2007-12-01') and
n.Date <= datediff(day, 0, DATEADD(MONTH, 6, GETDATE()))

Results in

Error TSD03127: Cannot schema bind view '[dbo].[DateRangeExpanded]'
because name 'DATETIME' is invalid for schema binding. Names must be in two-part format and an object cannot reference itself.

WFT? SQL Server is quite happy with the above, it's just GDR getting it's knickers in a twist yet again: DateTime is a type not a column. I have found no way of making it happy with this one[1], other than taking the schemabinding off, and I am pretty pissed about limiting my database design just to accommodate GDR's inadequacies.

If this project wasn't finishing up in a few weeks, GDR would be in the bin by now.

[1] CAST vs CONVERT doesn't make any difference, for one thing.

Thursday, February 26, 2009

Dang! SSAS deployment configurations are held in the .user file

I really wish Microsoft would make up their mind where deployment target locations should live (ie setting the destinations for Publish / Deploy)

For SSRS this data is stored in the project file (.rptproj) , per configuration, so anyone can get latest on a project, change their solution configuration to (say) 'Test', hit deploy and away they go, happily deploying to your Test environment. This is how it should be.

For SSAS (2005 / 2008) this data is stored in the .dwproj.user file, so for any new developer / PC / workspace that comes along all those deployment configuration settings have to be re-setup (or copied from another workspace), or all their deployments are to localhost. This is just really irritating. As I discovered today.

For an XBAP (or just a WinForms ClickOnce project) a single deployment target is stored in the project file. Other, previously-used deployment locations, are stored per-user. None are correlated with a project configuration in any way. This 'deploy to prod' design mentality drives me nuts.

For an ASP.Net website the (single) deployment location isn't even stored against in the project at all, but squirreled away in %userprofile%\Local Settings\Application Data\Microsoft\WebsiteCache\Websites.xml. Of course! Web application projects put it soley in the .user file, which is at least less obscure, if no more useful.

Now I'd be the first to say that deployment to test environments etc... is best done from a seperate scripted process, but I accept that not everyone has my affinity for long PowerShell deployment scripts. Plus sometimes I just can't be arsed. So am I completely insane to wish that the in-IDE Publish / Deploy mechanisms would work in a manner that could be actually used would be usable within a team, and not scatter the deployment target information quite so liberally?

Wednesday, February 18, 2009

Mini USB Convergence Finally Coming

In a rare victory for common sense, it seems that the world’s mobile phone makers have at last agreed to converge on using mini-USB as a universal mobile phone charging standard. Not only that, but chargers will be optional at point of sale, so you won’t end up with a house full of them.

The plan is to do this by 2012, which is a bit slow in my book, but it’s all progress, right?

Thursday, February 12, 2009

Substituting view/synonym causes naming collisions in Database Edition GDR

The other day we discovered a duplicated table in the database. We're doing some refactoring, and a certain amount of duplication during this process is inevitable, but to cut a long story short, the two instances of this table needed collapsing into one.

So I dropped one instance, and created a synonym in it's place (pointing to the other). This saves fixing all the views and external references (in the SSAS cube) to that table. Those references will go away over time.

But when I tried to sync the database using GDR it just didn't get it. It's quite happy to create the new synonym, just too dumb to drop the original table first - so there's a naming collision I had to fix by hand. It basically does this:

CREATE SYNONYM [dbo].[Table_2] FOR [dbo].[Table_1];
GO
DROP TABLE [dbo].[Table_2];

Dur! And it does the same thing if you substitute a view for a table too:

CREATE VIEW [dbo].[Table_2]
AS
-- Create view instead of original table
SELECT One FROM dbo.Table_1;

GO
-- Drop original table
DROP TABLE [dbo].[Table_2];

Anyone see the problem here? This is not an edge case guys.

I should point out that this only happens when sync'ing a database with the database model. If you sync two live databases, GDR gets the execution order right.

Not impressed.

(By way of comparison Sql Compare 7 is quite happy with both these scenarios, whether comparing against a live database or a snapshot. Drop table; create synonym - it's not that hard. And Sql Compare 7's been out a while (8 is nearly out))

[Update: 2008-02-20] And another thing: GDR's idea of dropping an object (eg: if you sync to a dev database and the object isn't there any more) is to comment out the object definition in the database project. All well and good, but it's too stupid to also comment out any extended properties the object might have (say the ones the view designer put in to store the view layout), so obviously those extended properties cause the build to crap out as the object they reference doesn't exist.

Thursday, February 05, 2009

Not sold on Database Edition 2008 GDR

Just in time for the end of 2008 Microsoft finally got Database Edition 2008 GDR out the door. This replaces the version that came with VS 2008, primarily bringing with it Sql 2008 compatibility.

So GDR is now Microsoft's 3rd bite at integrating the database development workflow into Visual Studio. Lets start with what it does right:

Obviously the main benefit of bringing the database into Visual Studio is you get source control for free. Sure - you've been able to script out your database and add it to source control before, but this way you can check in database changes and the code that depends on it as one atomic checkin, against a work item. That's actually pretty neat from a tracking / branching point of view.

And because it's a fully model-driven system, all the SQL in the project is tested for consistency and correctness at build time. You could do this before of course (script out and build a structure only database to check it builds), but this way you know any problems you find are local to your workspace, so it really was you that broke them.

The output from the build process is a model that completely describes your database schema, so this can be used to sync up test and production databases, either via the IDE, MSBuild or a redistributable command line utility that can be incorporated into your deployment / install process. Not that you couldn't have done all this with RedGate's SqlComparison SDK of course, but if you struggle to get licences other than MSDN purchased, this does make your life a bit easier, and at least it's been constructed with a build-once-deploy-many philosophy, unlike the clowns behind ClickOnce.

The model also (allegedly) provides a safer platform for database refactoring: it's supposed to 'remember' things like column renames as renames and treat them as such the next time a database is sync'd to the model. I guess only time will tell how well that works in the real world given, for example, multiple databases at different schema versions (ie Prod database 5 versions behind Test etc… ). Are the refactorings going to be journalled away and somehow replayed in the right order? If so, how does it know at which version to start? It's not a simple matter.

But Microsoft has still totally, totally missed the point, because there is still no design support. Not one. Nada. Not even a schema browser.

Hey I can see where this is going, and I can see they're laying foundations here, but in my mind the fundamentals of a database development tool are that they assist in database development. You know - the core stuff: creating objects and sprocs. Source control and a deployment workflow are secondary to that. But with GDR you'd better be hankering after the Sql 6.5 days, because you're going to be writing lots and lots of TSQL. The best I can say is it's colour coded, but didn't we only just get IntelliSense in Management Studio? It just doesn't work for me.

But the alternative - do your dev / design in Management Studio and sync back to the model - is equally flawed, if only because you've got to keep syncing back again (and the sync UI is still a bit ragged). That's a whole lot of friction for relatively nebulous benefits (especially if you already have your database schema under some level of source control), and in my mind questions the whole utility of the product. What's the point of 'integrating' into the Visual Studio IDE if you have to keep tabbing out to do anything even remotely useful (think: execution plans)?

Similarly, whilst there's a Data Compare feature, it's not actually part of your database project, just something you do on an ad-hoc basis. So there's no easy way of designating a certain subset of tables as 'reference data' that should essentially be considered part of the schema, rather than part of the contents, and sync'd up when you redeploy. I'll accept that's probably a vNext feature, but its absence calls into question the viability of using GDR as an end-to-end database deployment manager in the first place, compared to tools like SqlPackager (or scripting using SqlCompare \ SqlDataCompare APIs).

And of course the model parser isn't perfect, so occasionally a valid bit of SQL creates build errors because the model can't fathom what it's doing. In my case it can't seem to resolve the columns in a CTE that queries another (properly referenced) database, and barfs about ambiguous references where there are none. Unfortunately fidelity is always the problem with Model Driven Architectures, even ones that use a DSL as the model, because they have to - at some point - cater for a lowest common denominator, and at any rate they are in an arms race with whatever it is they're modelling. Microsoft is touting it's extensible architecture, and IBM are apparently writing a DB2 provider, but what do you think the chances of 100% functionality coverage are? And for Oracle? I'll wager pretty low. So be prepared to take an 80/20 view, and put some bits into pre/post build scripts and / or re-structure them to keep the parser happy.

Then there are the little things. It creates a schema and object hierarchy when your database is initially imported, but it doesn't enforce it, and the Add \ New (whatever) templates don't pick up on it, so all your new objects end up in the wrong schema unless you're concentrating. If you do anything with any system views you'll need to reference the master database ( %ProgramFiles%\Microsoft Visual Studio 9.0\VSTSDB\Extensions\SqlServer\2008\DBSchemas), so why is it not referenced to start with? And the doco's not been updated for RTM so contains all kinds of outdated information around the command line interfaces, and absolutely no content in the API section other than listing the namespaces.

http://social.msdn.microsoft.com/Forums/en-US/vstsdb/thread/3f0f0888-9b37-4279-81d4-b924595630a8/
http://social.msdn.microsoft.com/Forums/en-US/vstsdb/thread/a966ac3c-c039-4c3f-9b6d-2882d9874282

So this still strikes me as neither fish nor fowl. Given the designers already exist in Management Studio it's a bit pisspoor that the DataDude team haven't borrowed them yet. I imagine I'll still be doing most of my work in Management Studio, and at best attempting to keep the database model in sync. And I suspect that for me, and many others, this will be just too much friction to bother with and so the whole effort will be more or less wasted.

Which is all a bit sad, given it's 2009, I've written a decades worth of database-centric apps, and we've still not got this right. At least no-one's actually paying for this any more.

Friday, January 30, 2009

Sql 2005 Install Woes on Shiny New Big Server

I had a new one this time.

I (of course) ran into the install-hangs-on-setting-file-security issue (KB910070), but I was expecting that. What really threw me was after that then install then just kept dying, leaving this in the logs:

Faulting application sqlservr.exe, version 2005.90.1399.0, faulting module sqlservr.exe, version 2005.90.1399.0, fault address 0x0000000000b323f0.

This really threw the installer too - after an uninstall, and even after a manual cleanup, the installer still though there was an instance hanging around. Which it was. I had to manually delete the SQL Services (using SC), a bunch of instance registry settings and the instance files directory (the MSSQL.1 folder) before I could finally get it to re-install. I guess the uninstall died too.

So then I tried installing again, and again. And again.

So I started speculating. Was it the virus scanner .... No. Could it be the Sql 2005 installer didn't like .Net 3.5 sp1? Uninstall... No. Was I definately using the 64 bit version... Yes. Could I slipstream SP2 and workaround some issue I didn't understand yet... No. Was it the monster 24 cores the server had (4 x hex core)... maybe.

There is a known issue with Sql 2005 instal failing with odd number of cores (ie Phenoms). That (obviously) doesn't count: but maybe Sql can't either. So I used the instructions in KB954835 to criple my monster server down to a single CPU, and then it all installed just fine. I can now install SP2 (3 actually) which allegedly should then make it all work.

Moral:
Obviously I should have been installing Sql 2008 instead
It's clearly becoming way too easy - with multi-multi-core boxes - to drop into some massively unexplored race condition territory in something that's otherwise really quite stable and well tested.
You can have too many cores

Other thoughts:
There must be a better way to restrict an installer or app to run on only one core without farting about with BOOT.INI
Once I put SP3 on it better all work otherwise the boss is going to be really pissed ('What, those other 23 cores? They're um ... spares')