Monday, September 28, 2009

Implementing Asynchronous Operations in WCF

Just as the WCF plumbing enables clients to call server operations asynchronously, without the server needing to know anything about it, WCF also allows service operations to be defined asynchronously. So an operation like:

        [OperationContract]

        string DoWork(int value);

…might instead be expressed in the service contract as:

        [OperationContract(AsyncPattern = true)]

        IAsyncResult BeginDoWork(int value, AsyncCallback callback, object state);

 

        string EndDoWork(IAsyncResult result);

Note that the two forms are equivalent, and indistinguishable in the WCF metadata: they both expose an operation called DoWork[1]:

image

... but in the async version you are allowing the original WCF dispatcher thread to be returned to the pool (and hence service further requests) while you carry on processing in the background. When your operation completes, you call the WCF-supplied callback, which then completes the request by sending the response to the client down the (still open) channel. You have decoupled the request lifetime from that of the original WCF dispatcher thread.

From a scalability perspective this is huge, because you can minimise the time WCF threads are tied up waiting for your operation to complete. Threads are an expensive system resource, and not something you want lying around blocked when they could be doing other useful work.

Note this only makes any sense if the operation you’re implementing really is asynchronous in nature, otherwise the additional expense of all this thread-switching outweighs the benefits. Wenlong Dong, one of the WCF team, has written a good blog article about this, but basically it all comes down to I/O operations, which are the ones that already have Begin/End overloads implemented.

Since Windows NT, most I/O operations (file handles, sockets etc…) have been implemented using I/O Completion Ports (IOCP). This is a complex subject, and not one I will pretend to really understand, but basically Windows already handles I/O operations asynchronously, as IOCPs are essentially work queues. Up in .Net land, if you are calling an I/O operation synchronously, someone, somewhere, is waiting on a waithandle for the IOCP to complete before freeing up your thread. By using the Begin/End methods provided in classes like System.IO.FileStream, System.Net.Socket and System.Data.SqlClient.SqlCommand, you start to take advantage of the asynchronicity that’s already plumbed into the Windows kernel, and you can avoid blocking that thread.

So how to get started?

Here’s the simple case: a WCF service with an async operation, the implementation of which calls an async method on an instance field which performs the actual IO operation. We’ll use the same BeginDoWork / EndDoWork signatures we described above:

    public class SimpleAsyncService : IMyAsyncService

    {

        readonly SomeIoClass _innerIoProvider = new SomeIoClass();

 

        public IAsyncResult BeginDoWork(int value, AsyncCallback callback, object state)

        {

            return _innerIoProvider.BeginDoSomething(value, callback, state);

        }

 

        public string EndDoWork(IAsyncResult result)

        {

            return _innerIoProvider.EndDoSomething(result);

        }

    }

I said this was simple, yes? All the service has to do to implement the Begin/End methods is delegate to the composed object that performs the underlying async operation.

This works because the only state the service has to worry about – the ‘SomeIoClass’ instance – is already stored on one of its instance fields. It’s the caller’s responsibility (i.e. WCF’s) to call the End method on the same instance that the Begin method was called on, so all our state management is taken care of for us.

Unfortunately it’s not always that simple.

Say the IO operation is something you want to / have to new-up each time it’s invoked, like a SqlCommand that uses one of the parameters (or somesuch). If you just modify the code to create the instance:

        public IAsyncResult BeginDoWork(int value, AsyncCallback callback, object state)

        {

            var ioProvider = new SomeIoClass();

            return ioProvider.BeginDoSomething(value, callback, state);

        }

…you are instantly in a world of pain. How are you going to implement your EndDoWork method, since you just lost the reference to the SomeIoClass instance you called BeginDoSomething on?

What you really want to do here is something like this:

        public IAsyncResult BeginDoWork(int value, AsyncCallback callback, object state)

        {

            var ioProvider = new SomeIoClass();

            return ioProvider.BeginDoSomething(value, callback, ioProvider); // !

        }

 

        public string EndDoWork(IAsyncResult result)

        {

            var ioProvider = (SomeIoClass) result.AsyncState;

            return ioProvider.EndDoSomething(result);

        }

We’ve passed the ‘SomeIoClass’ as the state parameter on the inner async operation, so it’s available to us in the EndDoWork method by casting from the AsyncResult.AsyncState property. But now we’ve lost the caller’s state, and worse, the IAsyncResult we return to the caller has our state not their state, so they’ll probably blow up. I know I did.

One possible fix is to maintain a state lookup dictionary, but what to key it on? Both the callback and the state may be null, and even if they’re not, there’s nothing to say they have to be unique. You could use the IAsyncResult itself, on the basis that almost certainly is unique, but then there’s a race condition: you don’t get that until after you call the call, by which time the callback might have fired. So your End method may get called before you can store the state into your dictionary. Messy.

Really what you’re supposed to do is implement your own IAsyncResult, since this is only thing that’s guaranteed to be passed from Begin to End. Since we’re not actually creating the real asynchronous operation here (the underlying I/O operation is), we don’t control the waithandle, so the simplest approach seemed to be to wrap the IAsyncResult in such a way that the original caller still sees their expected state, whilst still storing our own.

I’m not sure if there’s another way of doing this, but this is the only arrangement I could make work:

        public IAsyncResult BeginDoWork(int value, AsyncCallback callback, object state)

        {

            var ioProvider = new SomeIoClass();

            AsyncCallback wrappedCallback = null;

            if (callback!=null)

                wrappedCallback = delegate(IAsyncResult result1)

                                      {

                                          var wrappedResult1 = new AsyncResultWrapper(result1, state);

                                          callback(wrappedResult1);

                                      };

 

            var result = ioProvider.BeginDoSomething(value, wrappedCallback, ioProvider);

            return new AsyncResultWrapper(result, state);

        }

Fsk!

Note that we have to wrap the callback, as well as the return. Otherwise the callback (if present) is called with a non-wrapped IAsyncResult. If you passed the client’s state as the state parameter, then you can’t get your state in EndDoWork(), and if you passed your state then the client’s callback will explode. As will your head trying to follow all this.

Fortunately the EndDoWork() just looks like this:

        public string EndDoWork(IAsyncResult result)

        {

            var state = (AsyncResultWrapper) result;

            var ioProvider = (SomeIoClass)state.PrivateState;

            return ioProvider.EndDoSomething(result);

        }

And for the sake of completeness, here’s the AsyncResultWrapper:

        private class AsyncResultWrapper : IAsyncResult

        {

            private readonly IAsyncResult _result;

            private readonly object _publicState;

 

            public AsyncResultWrapper(IAsyncResult result, object publicState)

            {

                _result = result;

                _publicState = publicState;

            }

 

            public object AsyncResult

            {

                get { return _publicState; }

            }

 

            public object PrivateState

            {

                get { return _result.AsyncResult; }

            }

 

            // Elided: Delegated implementation of IAsyncResult using composed IAsyncResult

        }

By now you are thinking ‘what a mess’, and believe me I am right with you there. It took a bit of trail-and-error to get this working, it’s a major pain to have to implement this for each-and-every nested async call and – guess what – we have lots of them to do. I was pretty sure there must be a better way, but I looked and I couldn’t find one. So I decided to write one, which we’ll cover next time.

Take-home:

Operations that are intrinsically asynchronous – like IO - should be exposed as asynchronous operations, to improve the scalability of your service.

Until .Net 4 comes out, chaining and nesting async operations is a major pain in the arse if you need to maintain any per-call state.

 

Update: 30/9/2009 – Fixed some typos in the sample code

[1] Incidentally, if you implement both the sync and async forms for a given operation (and keep the Action name for both the same, or the default), it appears the sync version gets called preferentially.

Monday, September 21, 2009

Calling a WCF Service Asynchronously from the Client

In ‘old school’ ASMX web services, the generated proxy allowed you to call a service method asynchronously by using the auto-generated Begin/End methods for the operation in question. This is important for client applications, to avoid blocking the UI thread when doing something as (potentially) slow as a network call.

WCF continues this approach: if you use ‘add service reference’ with ‘Generate Asynchronous Operations’ checked (in the Advanced dialog):

image

... or SvcUtil.exe with the /a (async) option, your client proxy will contain Begin/End methods for each operation, in addition to the original, synchronous version. So for an operation like:

        [OperationContract]

        string DoWork(int value);

…the proxy-generated version of the interface will contain two additional methods, representing the Asynchronous Programming Model (APM)-equivalent signature:

        [OperationContract(AsyncPattern=true)]

        IAsyncResult BeginDoWork(int value, AsyncCallback callback, object asyncState);

 

        string EndDoWork(IAsyncResult result); // No [OperationContract] here

All it takes then to call the operation is to call the relevant Begin method, passing in a callback that is used to process the result. The UI thread is left unblocked and your request is executed in the background.

You can achieve the same result[1] with a ChannelFactory, but it takes a bit more work. Typically you use the ChannelFactory directly when you are sharing interface types, in which case you only get Begin/End methods if they are defined on the original interface – the ChannelFactory can’t magically add them. What you can do, however, is create another copy of the interface, and manually implement the additional ‘async pattern’ signatures, as demonstrated above.

Either way, what’s really important to realise here is that the transport is still synchronous. What the Begin/End methods are doing is allowing you to ‘hand off’ the message dispatch to WCF, and be called back when the result returns. And this explains why we can pull the trick above where we change the client’s version of the interface and it all ‘just still works’: from WCF’s perspective the sync and begin/end-pair signatures are considered identical. The service itself has only one operation ‘DoWork’.

And the reverse applies too. If you have a service operation already defined following the APM - i.e. BeginDoWork / EndDoWork - unless you choose to generate the async overloads your client proxy is going to only contain the sync version: DoWork(). Your client is calling – synchronously – an operation expressly defined asynchronously on the server. Weird, yes? But (as we will see later), it doesn’t matter. It makes no difference to the server.

Take-home

Clients should call service operations using async methods, to avoid blocking the UI thread. Whether the service is or isn’t implemented asynchronously is completely irrelevant[2].

[1] For completeness I should point out there are actually two more ways to call the operation asynchronously, but it all amounts to the same thing:

  • The generated service proxy client also contains OperationAsync / OperationCompleted method/event pairs. This event-based approach is considered easier to use (I don’t see it personally). Note this is on the generated client, not on the proxy’s version of the interface, so you can’t do this with a ChannelFactory (unless you implement it yourself).
  • If you’re deriving from ClientBase directly you can use the protected InvokeAsync method directly. Hardcore! This is how the generated proxy client implements the Async / Completed pattern above.

[2] …and transparent, unless you’re sharing interface types

Sunday, September 20, 2009

Tablet Netbooks

I said back in March that what I really wanted was a 10 hour tablet netbook. Well, they’re starting to come out now:

http://www.dynamism.com/viliv_s7.shtml

It’s actually too small, and not dual core (there must eventually be a netbook version of the Atom 300 yes?), but we’re so close now.

Friday, September 18, 2009

Asynchronous Programming With WCF

I’m currently spending a lot of time exposing intrinsically asynchronous server operations to a client application, and I’ve been really impressed with the way WCF caters for this. Asynchronicity is very important for performance and scalability of high-volume systems, but it’s a confusing area, and often something that’s poorly understood.

So I thought I’d write a series of posts to try and shed some light into these areas, as well as share some of the more interesting things I’ve learnt along the way.

Before we start you’ll have to have at least passing familiarity with the Asynchronous Programming Model (APM), which describes the general rules surrounding the expression of an operation as a pair of Begin/End methods. At very least, you need to get your head around the fact that these signatures:

	// sync version
int DoWork();

// async version as Begin/End pair
IAsyncResult BeginDoWork(AsyncCallback callback, object state);
int EndDoWork(IAsyncResult result);


…are equivalent.

Thursday, September 17, 2009

Christmas Come Early for Certifications

At TechEd Australia Microsoft were giving out 25% discount vouchers for certification exams, as part of a ‘get-certified’ push.

From now until Christmas you can also apply for (depending on the exam) 25%-off vouchers via this campaign site:

http://www.microsoft.com/learning/en/us/offers/career.aspx

Doesn’t seem to be any one-per-person restriction either.

Friday, September 11, 2009

TechEdAu 2009: Day 3

Highlights:

  • All the parallel tasks support in .Net 4

Lowlights:

  • No go-live licence for the parallel tasks support in .Net 4, or the existing CTPs.
  • The ‘real world’ EF talk, that blew

Update: Hey, somewhere in these last 3 posts I should have mentioned Windows Identity Foundation, which looked pretty cool (rationalizing all the claims-based identity stuff out of WCF). But I didn’t. So this’ll just have to do.

TechEdAu 2009: Day 2

Highlights:

Somewhere-in-the-middle-lights:

  • Windows API code pack: Use XP, Vista, Win7 features easily without interop, just going to cop a whole heap of if(win7) os-sniffing around anything interesting. Didn’t we get really sick of all that browser sniffing stuff?

Lowlights:

  • Diagnosing your misbehaving MDX is still too damn hard, even Darren Gosbell’s talk was really good.
  • The Claw

Wednesday, September 09, 2009

TechEdAu 2009: Day 1

Highlights:

  • Windows 7 Problem Steps Recorder: build-in screen flow capture can be used to troubleshoot – or just document – your own UIs
  • SketchFlow: Microsoft getting into the ‘paper-prototyping’ space
  • Sql 2008 R2 ‘Database Application Components’ – packages schema, logins and jobs as one deployment unit, mainly intended for dynamic virtualisation, but obvious implications for ‘over the fence’ UAT/Prod deployments (good commentary). And CEP (‘StreamInsight’) for real-time in-memory data analysis.
  • Gemini: heterogeneous data query and BI for the Excel guy. I think the business will go nuts, the question is how easy it’ll be to upscale it to SSAS when it becomes necessary. Some suggestion the Gemini code may be used to improve SSAS’s ROLAP performance.
  • Dublin: making your WF / WCF apps manageable via IIS manager, and visible to SCOM. One click admin UI to resume suspended workflows
  • Workflow: It’s back! And now looking more like SSIS than ever (variables as part of the pipeline etc..). XAML only all the way now, so no more CodeActivities, and no backwards compatibility :-/
  • Using SetToString for MDX debugging. Genius.
  • WCF for Net 4: Default bindings! Standard endpoints! Default behaviour configurations!
  • I should probably mention the free HP Mini netbook as well.
  • Printed schedule. At last.
  • Passing WCF exam

Lowlights:

  • Schedule builder designed for larger screens than HP Mini. Can we just get the data from a feed please?
  • Coffee queue
  • No pen or paper in bag
  • No outbound SMTP from the WiFi (again). IMAP users roll your eyes…

(Update 17/9: Added some hyperlinks in lieu of more detailed explanations. And realized I missed Sql CEP altogether…)

Monday, August 31, 2009

Active Directory Powershell Utils

These four basic fuctions make life so much better when dealing with AD:


$ds = new-object system.directoryservices.directorysearcher

function Find-Group($samName){
$ds.filter = ('(samaccountname={0})' -f $samName)
$ds.FindAll()
}

function Find-User($samName){
$ds.filter = ('(samaccountname={0})' -f $samName)
$ds.FindAll()
}

function Find-Members($groupName){
$group = Find-Group $groupName
$group.Properties.member | Get-Entry
}

function Get-Entry($cn){
begin{
if($cn){
new-object System.directoryservices.DirectoryEntry("LDAP://$cn")
}
}
process{
if($_){
new-object System.directoryservices.DirectoryEntry("LDAP://$_")
}
}
end{
}
}


(ok, three functions. Two are the same, I know)

Thursday, August 20, 2009

When to Automate (Revisited)

I'm going to add another scenario to my 'When to Automate' list: when the execution time is more important than the preperation time.

The classic example of this is a typical data migration effort. Months of work goes into creating a one-off artefact, primarily to ensure the system down-time is minimized. There are other advantages (testing, pre-transition sign-off, auditing), but in most cases these just exist to mitigate against the risk of downtime, so it's still all about the execution interval.

What I'd not really thought about until the other day was that this also applies to refactoring efforts on code. Any large refactoring burns a considerable amount of time just keeping up with the changes people check-in whilst you're still working away. Given a sizable enough refactoring, and enough code-churn, you get to a point beyond which you actually can't keep up, and end up trapped in merge integration hell.

There are two normal responses to this. The first is to keep the size down in the first place, and bite off smaller, more manageable chunks. This can involve some duplication of effort, which is why many team-leads will take the second option: come in in the evening or over the weekend for a big-bang macho refactorfest.

But there is a 'third way': the refactoring can be as big as you like, as long as you keep it brief. If you can take the latest version of the codebase, refactor it and check it in before anyone else commits any more changes then you only have to worry about the pained expressions on the face of your co-workers as the entire solution restructures in front of their eyes. But you'll get that anyway, right?

Right now I've got some 50 projects, spread over 4 solutions, most of which have inconsistent folder name to project names to assemblyname to base namespace, some of which we just want to rename as the solution has matured.

To do this by hand would be an absolute swine of a job, and highly prone to checkin-integration churn (all those merges on .csproj / .sln files! the pain! the pain!). Fortunately however Powershell just eats this kind of stuff up. It's taken the best part of the day to get the script ready, but now it has the whole thing can be done and checked in in some 15 mins, solutions and source control included. And there's no point-of-no-return: any point up till checkin all I've done is improve the script for next time, no time has been wasted.

And by de-costing[1] the whole process, we can do it again some day...



[1] ok, I'm sorry for 'de-costing'. But I struggled to come up with a better term for 'reducing the overhead to next-to-nothing' that wasn't unnecessarily verbose, or had 'demean' connotations.

Monday, August 10, 2009

To Windows 7… and back again

This weekend I uninstalled the RTM of Windows 7.

Whilst I could have probably got the glide pad and sound card working with Vista drivers (had I spent enough time on the Dell website lying about what system I had to find the current versions of drivers, rather than just the aged XP ones listed under Inspiron 9300), what killed it was my Kaiser Baas Dual DVB USB tuner wouldn’t play ball: driver appeared all ok, device recognised etc… but it just wouldn’t detect any TV channels anymore. And for a media centre PC that’s just not on. And with a limited window before it’d miss recording Play School and earn me the scorn of my sons, I pulled the plug and dug out the Vista RTM disk (yeah: do you think I could successfully download the with-SP2 version from MSDN this weekend? Of course not).

This is the only time in living memory I’ve upgraded a PC without pulling the old HD as a fall-back, and predictably it’s the one that bit me in the arse…

Mind you, a fresh install’s made it feel pretty snappy again, Vista or no.

Wednesday, August 05, 2009

Ticks

(No, not the ones you get camping. This is a developer blog).

What’s a Tick? Depends who you ask. ‘Ticks’ are just how a given component chooses to measure time. For example,

DateTime.Now.Ticks = 100 ns ticks, i.e. 10,000,000 ticks/sec
Environment.TickCount = 1 ms ticks since process system start

These are just fine for timing webservices, round-trips to the database and even measuring the overall performance of your system from a use-case perspective (UI click to results back). But for really accurate timings you need to use system ticks, i.e. ticks of the high-resolution timer: System.Diagnostics.Stopwatch.GetTimestamp() [1]. This timer’s frequency (and hence resolution) depends on your system. On mine:

System.Diagnostics.Stopwatch.Frequency = 3,325,040,000 ticks/sec

…which I make to be 0.3ns ticks. So some 300x more accurate than DateTime.Now, and accurate enough to start measuring fine-grained performance costs, like boxing/unboxing vs generic lists.

Clearly it’s overkill to use that level of precision in many cases, but if you’re logging to system performance counters you must use those ‘kind of ticks’, because that’s what counter types like PerformanceCounterType.AverageTimer and PerformanceCounterType.CounterTimer require.

[1] Thankfully since .Net 2, we don’t have to use P/Invoke to access this functionality anymore. On one occasion I mistyped the interop signature (to return void), and created a web-app that ran fine as a debug build, but failed spectacularly when built for release. Took a while to track that one down...

Tuesday, August 04, 2009

GC.Collect: Whatever doesn’t kill you makes you stronger

Just a meme that was bouncing around the office this morning, in relation to the nasty side-effect of calling GC.Collect explicitly in code: promotion of everything that doesn’t get collected.

What this means of course is if you call GC.Collect way before you get any memory pressure, objects will get promoted to L1, then to L2, and quite likely not get cleaned up for a long, long time, if ever.

LLBLGen take note

Monday, July 27, 2009

OneNote Screen Clipping Failing

Since we got these new PC’s at work, we’ve had an issue where we couldn’t take any screen clippings with OneNote (or Alt-Print Screen, or anything actually): the resulting clipping looked like this:

image

Turns out someone turned the 3GB switch on (to give Visual Studio more headroom on another project[1]). And on a dual-monitor developer PC, one of the consequences is that the amount of memory the system uses to address video memory is going to be tight.

Turned it off: right as rain[2].

Note also that – as far as I can infer – Visual Studio 2008 is not, by default, marked LARGEADDRESSAWARE so wouldn’t have got the extra address space anyway without being mod’d, which strikes me as brave given the potential consequences (read the exercise in the link).

(Update: I checked, and that flag is definitely not on the exe out-of-the-box)

[1] Side note: Good article explaining the benefits from running Visual Studio on a 64 bit OS
[2] PS: Where the heck does that expression come from?
Oh
[3] And there’s no LARGEADDRESSAWARE flag on the C# compiler either

Birthday Present

Whilst I was unwrapping my other presents, Microsoft RTM’d Windows 7 and Powershell 2. Just for me! You shouldn’t have, etc… Only, apparently I can’t unwrap it until Aug 6th.

But who to send the ‘thank you’ to?

Wednesday, July 08, 2009

What makes a VS Test Project?

Ever wondered what exactly makes a project a test project from the perspective of the VS IDE? Or – to put it another way – did you ever right click on a test and ‘Run Tests’ wasn’t there?

image

Turns out it’s just a case of this being in the primary PropertyGroup for the project:

<ProjectTypeGuids>{3AC096D0-A1C2-E12C-1390-A8335801FDAB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}</ProjectTypeGuids>

(that and the reference to Microsoft.VisualStudio.QualityTools.UnitTestFramework, but you had that already right?)

Friday, June 19, 2009

Stupid Naming Themes [WiX]

This is WiX:

candle.exe
dark.exe
heat.exe
light.exe
lit.exe
melt.exe
pyro.exe
smoke.exe
torch.exe

Jesus wept. Even the ‘fresh ground .jar of Java beans’ crowd outgrew this childishness…

Wednesday, June 17, 2009

Nasty SSIS 2008 issue with ‘Table or View’ access mode

I’m still not 100% sure I understand this, but here goes:

I swapped one of my packages to use a ‘table or view’ for a data source, rather than (as previously) a named query, and the performance dive-bombed. Think less ‘put the kettle on’, more ‘out for lunch and an afternoon down the pub’.

image

Profiler shows that when you use ‘table or view’, SSIS executes a select * from that_table_or_view on your behalf. But pasting exactly the same select * from profiler into a management studio window still ran pretty good, and completed the query in 3000-odd reads, compared to the 6 million + that SSIS had burnt before I cancelled it.

The view in question is just a straight join of two tables, fairly simple stuff. But profiler showed that SSIS was getting a very different execution plan from running the query in management studio, joining the tables in the opposite (wrong) order. This presumably explains why the number of reads went from 3000-odd to SSIS’s 6 million+. And why it was running a bit s..l..o..w.

Bizarrely even when I put the same select * back into SSIS using ‘Sql Command’ mode, it still ran quickly, provided there was a line break before the FROM. This made me think there must be a bad execution plan getting used somehow, and that extra whitespace was just enough to avoid it:

image

Looking closer at profiler, it appear that when using ‘Table or View’ SSIS first issues the same query with a SET ROWCOUNT 1 on, in order to get the metadata. This doesn’t happen when SSIS uses ‘Sql Command’ as its source: it executes sp_prepare and seems to get all its metadata from that.

So the only conclusion I can come to is that the execution plan is being poisoned by executing it with SET ROWCOUNT 1 on, picking a plan that’s more appropriate for one row than many. In ‘Sql Command’ mode this doesn’t appear to be an issue because SSIS gets the metadata a different way.

Which makes me think I will be avoiding ‘table or view’ like the plague from now on.

[Update 2009-06-19]

For the sake of completeness, here are the extra screenshots I wanted to put in originally, but my screen clipper was playing up (yet again).

Here’s the ‘right’ version you get from management studio or SSIS in ‘SQL Command’ mode (last two columns are ‘READS’ and ‘DURATION’ respectively):

image

image

And here’s what SSIS produces in ‘table or query mode’ (sorry I clipped off the bit above showing it executing the same query with SET ROWCOUNT 1 on, but it did happen):

image

image

…a very different plan.

I thought a good proof here would be to log in as a different user (plans are cached per-user), and execute the equivalent of what SSIS produced:

set rowcount 1
select * from lm.vw_RoadDatumWithRoadDetails
go
set rowcount 0
select * from lm.vw_RoadDatumWithRoadDetails

…unfortunately that didn’t seem to work (or rather fail), so there must be something else in the SSIS preamble that’s also involved in this case :-(

However Microsoft UK’s Simon Sabin has already posted a repro based on the Adventure Works 2008 database (which was pretty quick work). He says:

“Nested loop joins don’t perform if you are processing large numbers of rows, do to the lookup nature”

…which is exactly the strategy I see in the ‘bad’ plan above.

So who’s ‘fault’ is all this? You could make a good case for saying that SQL server should play safe and include the ROWCOUNT as part of its ‘does-a-query-plan-match?’ heuristic (along with a basket of other options it already includes), and that would probably be a good thing generally.

But I think the main question has got to be why does SSIS use SET ROWCOUNT at all? Any time I’ve ever wanted to get metadata I’ve always done a SELECT TOP 0 * FROM x. The ‘top’ is part of the query, so already generates a unique query plan, and it’s not like SSIS has to parse and re-write the query to insert the TOP bit : SSIS is already generating that whole query, based on the ‘table or view’ name: adding the ‘top’ clause would be trivial.

I feel a Connect suggestion coming on.

Wednesday, June 10, 2009

Preserving Output Directory Structure With Team Build 2008

The default TFS drop folder scheme is a train-wreck where all the binaries and output files from the entire solution are summarily dumped into a single folder (albeit with some special-case handling for websites). Its just gross.

What I wanted, of course, was a series of folders by project containing only the build artefacts relevant for that project: ‘xcopy ready’ as I like to say. Pretty sensible yes? Quite a popular request if one Googles around: one wonders why the train-wreck scheme was implemented at all. But I digress.

Contrary to what you (and I) may have read, you actually have to do this using AfterBuild, not AfterCompile (which fires way too early in the build process[1]). So the canned solution is:

Put this in your TFSBuild.proj:

  <PropertyGroup>

    <CustomizableOutDir>true</CustomizableOutDir>

 

    <!--  TEAM PROJECT

(this turns off the ‘train-wreck’ scheme, and goes back to each project’s output going to their bin\debug or whatever folder)

Put this in any project file that you want to capture the output from:

  <Target Name="AfterBuild" Condition=" '$(IsDesktopBuild)'!='true' ">

    <Message Text="Copying output files from $(OutDir) to $(TeamBuildOutDir)" />

    <ItemGroup>

      <FilesToCopy Include="$(OutDir)\**\*.*" />

    </ItemGroup>

    <Copy SourceFiles="@(FilesToCopy)" DestinationFiles="@(FilesToCopy ->'$(TeamBuildOutDir)\$(AssemblyName)\%(RecursiveDir)%(Filename)%(Extension)')" />

  </Target>

And hey presto:

image

image

A series of output folders, by assembly, containing only the artefacts relevant to that assembly, as opposed to the

Note however that with this scheme you get all files marked ‘Copy to Output Directory’ but not files marked as Content, which makes this currently unusable for deploying websites and means it’s not strictly speaking xcopy-ready[2]. Hopefully there is an easy fix to this, otherwise I’ll be diving back into the SNAK codebase where I’ve solved this previously.

 

[1] before serialization assemblies are generated, and before the obj folder has been copied to the bin folder. A good diagnostic is to put <Exec Command="dir" WorkingDirectory="$(OutDir)"/> into an AfterCompile target, and take a look at what you get back.
[2] I have a big thing about this, which is why I dislike the ClickOnce build output so much. Build and deploy need to be considered separately.

Visual Studio AddIns and remoted profiles

Here at work we have our profiles, including our documents folders, remoted to the network. It’s a constant pain, and this morning the fun was how to get the CopySourceAsHTML visual studio Add-in working as a result, since its ‘normal’ install location, My Documents\Visual Studio 2008\AddIn’ is within that remoted profile.

Irrespective of fiddling around with CASPOL and security zones, Visual Studio seems intransigent in not wanting to load AddIns from network locations. Fortunately it turns out there are a number of different places that you can put add-ins to get them to load, which is probably better explained in this conversation.

As a result I dumped the addin into

%ALLUSERSPROFILE%\Application Data\Microsoft\VisualStudio\9.0\AddIns

C:\Documents and Settings\All Users\Application Data\Microsoft\VisualStudio\9.0\AddIns

…and it worked just fine.

Wednesday, May 13, 2009

Tuesday, April 07, 2009

TechEd 2008 DVD Silverlight Compatibility Issue

Shows how often I actually refer back to those DVDs, but now that Silverlight 2 is out they don’t work (well the navigation doesn’t, which is nearly as bad)

There are a number of crappy workarounds, but fortunately at least one clever fix (short of re-authoring the DVDs with the updated online content). Naively I thought you could just add the application/x-silverlight-2-b2 mime-type mapping into the registry and it would work, but apparently not.

The Silverlight version detection / browser sniffing / forwards compatibility story has been monumentally shite so far, because I’ve seen a whole heap of the ‘Install Silverlight’ button in lieu of content, but immortalizing it onto countless DVDs really takes the biscuit.

Flash – if I recall correctly – got this right first time, that being over a decade ago.

Friday, March 20, 2009

Do One Thing Per Line

Yeah, so your fancy-pants Linq query, and your fluent interface all looks so neat and expressive, but after you’ve tried to play ‘pin the null reference exception on the donkey’ in production for the n’th time you might consider that all those intermediate variables were generally considered good style for a reason. And it’s not just that you can pinpoint an error given a line number:

“Putting each statement on a line of its own provides an accurate view of a program’s complexity” – Steve McConnell [1]

That and naming the variables documented the code[2], right?

Everyone’s oooohh so up in arms if a method spans multiple pages on screen, but a 10 line fluent interface chain mixed with some linq and some lambdas is somehow ok!? Be serious now[3].

Dumb it down. One day you will appreciate it.

 

[1] Code Complete, Chapter 31.5 Para ‘Using Only One Statement Per Line’. See http://www.cc2e.com/Page.aspx?hid=291
[2] …and chapter 11.1 ‘Considerations in Choosing Good Names’
[3] Not that I’m advocating avoiding either Linq or Lambdas, but moderation in everything, right? Really not sold at all on Fluent Interfaces mind. So last year.

Old School JavaScript Officially ‘Old School’

I feel I have been rumbled:

Signs That Your JavaScript Skills Need Updating

Tuesday, March 17, 2009

Easy Upserts in SSIS 2005

Merging in updated or newly created data from an external source is a pretty common task in SSIS. The canonical way[1] these upserts are handled is the SSIS Lookup Pattern:

clip_image001

Incoming rows are run through the Lookup component to identify which already exist in the destination. Those that do become UPDATEs, and those that don’t (the Lookup’s error output) become INSERTs. You can further refine this to only retrieve the inserted-or-updated rows from the datasource in the first place and save on bandwidth and so on (a future post on delta loads I think).

It’s fairly straightforward, but to my mind this has a few major problems.

Firstly, it’s a pain in the arse. For something that’s such a common requirement, this seems like a lot of shape-twiddling-clicky-click-dragging to get it up and running[1]. And I hate how brittle the OLE DB Command parameter mapping is. But that’s another story.

More concerning is that the update flow uses the OLE DB Command component, and that’s something that fires per row, unlike the OLE DB Destination that your INSERTs go to, which can batch and/or use fast load. Normally the updates are going to be in the minority compared to the inserts, but if you get a significant number of them it still costs time.

Probably the bigger problem for me with this pattern is that the performance of the lookup degrades as the data volume progressively increases. Every time you insert data there's more for the lookup to subsequently lookup against. You can (and should) limit the columns in the lookup to restrict it's memory blow out, but it's always going to just progressively get worse over time.

To me this is symptomatic that the problem is being solved at the wrong tier. Why bring large quantities of data up from the database to determine 'newness' of a row, when that's clearly something best done closer to the database itself? Especially as both sets of rows are going to the same place anyway.

So then you think about pouring data into a temporary table, then doing the upsert within Sql (based on a join), then dropping the temporary table. But you’ve got to create and drop the temporary table. What you need is a temporary table that’s transparently created for you. And what, after all, is an INSTEAD OF trigger, if not precisely that?

The Upsert View

So my solution for this is a series of 'upsert views'. An upsert view is a view with an INSTEAD OF INSERT trigger that instead of the insert performs an upsert. In this case (being Sql 2005) the upsert is an IF … INSERT … ELSE … UPDATE construct, but a 'true' Sql 2008 upsert would be a minor tweak.

An upsert view looks a bit like this:

Select all-the-updatable-columns
From sometable

Non-updatable columns include timestamp and identity columns (you'd never mark a column you brought in from a source system as an identity column, so there's no need to mess about with identity inserts).

The instead of trigger looks a bit like this:

-- Insert all the genuinely new rows
Insert into [table]
(all-the-updatable-columns)
From
Select (all-the-updatable-columns)
From inserted
Where not exists (select 1 from [table] where (keys=inserted.keys)

-- Update all the modified rows
Update [table]
Set all-the-updatable-columns=inserted.all-the-updatable-columns
From
Inserted
Inner join [table] on (keys = inserted.keys)
Where (concurrency_indicators <> inserted.concurrency_indicators)

(We’ll explain the keys and the concurrency indicators in a moment)

What this produces is a view over your data table that takes records INSERTED into the view and performs an UPSERT into the base table. As a result your SSIS can be simplified to this:


clip_image002


In reality of course you normally have more than this: probably have some other lookups and some data type conversions. However you've not had to put the Upsert functionality into the SSIS (by hand): instead the database is handling it all for you.

Provided you enable the FIRE_TRIGGERS option, this is entirely compatible with using Fast Load, and this can make a significant performance difference for the time of your package execution. You've kept all the row-matching down in Sql where it can use indexes, optimisations and the page cache, and you've reduced the memory footprint of your SSIS packages. I shame myself by not having any hard comparison data to back this up however.

We (of course) generate[2] these upsert views based on various bits of metadata we store to indicate which are the keys and the concurrency indicators. In a data warehouse it’s quite common for the primary key to be different to that defined in the source system, so the key used for row matching – the ‘business key’ – has to be indicated somewhere. Column-level extended properties seemed a fairly natural fit:

clip_image003

Similarly the columns that are used for concurrency control – rowversion / datetime updated – are marked as follows:

clip_image004

(If you don’t have one then you use the ‘fallback’ concurrency model that so many of Microsoft’s design-time tools like to kick out: you compare every column to see if it’s changed (mark all non-key columns). This is clearly a bit yuck).

And that’s pretty much it. Provided you can cope with INSERTing into a view and getting UPDATEs to your table, (which is a bit odd, but can be catered for with appropriate naming conventions) then maybe you can benefit from this approach too.

[1] Sometimes you can get the Slowly Changing Dimension wizard to kick out something similar, but that’s pretty hacky, and doesn’t like using certain types of columns as the update indicator (IRRC). It also re-creates the downstream flow every time you open it, which obviously removes any customisations you make. Avoid.
[2] The generation process was a fairly simple PowerShell script. I’ll post the important bits up one day.

Saturday, March 14, 2009

Dear HP / Dell / Microsoft

Please stop printing your product labels, serial numbers and OEM certificate of authenticity stickers in sweat soluble ink, then fixing them to the underside of our laptops.


Yours: everyone who ever actually used a laptop on their lap, or handled it using their hands.

Friday, March 13, 2009

Vista thinks I work for Borg

image

Hey did you see that thing that guy did where he brute-forced blur-redacted text back into plaintext? That was pretty cool. Damned if I can find the link again mind…

Monday, March 09, 2009

Say what you like about Vista…

…but the XP Start bar really did need a going over:

image

Avoiding Reporting Services ‘StreamNotFound’ Exception when Viewing Multiple Instances of the Same Report

I’ve been using Reporting Services 2005 to build a series of BI dashboards, initially using the SharePoint reports library, but more recently using straight ASP.Net pages. In essence we’ve been using SSRS as a headless charting engine.

On some of our dashboards we occasionally saw little red crosses rather than our charts – IE’s way of telling us the image hadn’t loaded. Didn’t happen very often, and at the time we had bigger fish to fry, but unfortunately when we moved to our big new multi-core server, it happened a lot more.

In the logs we saw a lot of this:

w3wp!library!f!01/21/2009-12:40:25:: Call to GetItemTypeAction(http://myserver/reports/BlahBlahReport.rdl).

w3wp!library!f!01/21/2009-12:40:26:: e ERROR: Throwing Microsoft.ReportingServices.Diagnostics.Utilities.StreamNotFoundException: The stream cannot be found. The stream identifier that is provided to an operation cannot be located in the report server database., ;

Hotfix 913363 looked promising, but that was included in SP1, and we were SP3, so unless it had regressed, that wasn’t going to help.

What we determined was that this only happened when the same user requested the same report multiple times (with different parameters) at more or less the same time. And that happens a fair bit for us: our dashboards (both the SharePoint versions, and the later ASP.Net ones) basically rendered down to pages of IFRAMES pointing back at SSRS, and sometimes multiple charts on one dashboard would actually just be the same report re-executed with different parameters. So my ‘race condition’ alarm was off and running.

For any report on the dashboard, IE does this:

  • Load the HTML in the IFRAME
  • Go back to the server and load all the images, javascript etc…

Any generated images in the report have to be cached momentarily on the server between when the HTML is generated, and when the browser comes and ask for them. And of course the browser has to supply some kind of reference back to the server so it gets the right images, and not the images from a different report or a what a different user is viewing. So there’s a Reporting Services Session (which you probably read about the first time you got a rsExecutionContextNotFound exception, right?).

So we speculated that SSRS was getting its sessions in a mess, and somehow a subsequent request to the same report, using different parameters, was throwing away the results of the previous execution (the chart image we are after) in the process of creating a new one. At which point Graeme discovered you can disable session cookies (in the ConfigurationInfo table) which (after an IISReset) indeed fixed the problem entirely.

But why the problem in the first place?

Looking at the HTTP traffic between the browser and the server, one sees that the scoping of the image request is indeed only scoped to an imageID and report as part of the GET url, and within a execution session as part of a cookie:

GET /ReportServer?BlahBlahReport&rs%3aFormat=HTML4.0&rs%3aImageID=d95ebcc7-deba-4734-93c9-270468bd133b HTTP/1.1

Cookie: ConsoleVisible5dd14218-294b-424e-a33a-013236be5290=false; RSExecutionSession%3ahttp%3a%2f%2fserver%2f%2fBlahBlahReport=54m0qr55hytlvn55nzkjxr55

I’ve not followed this through entirely, but basically the trouble is your browser only has one RSExecutionSession cookie, which appears to contains only one execution ID per report URI. You’d imagine then that what you’d get is just the wrong image (outstanding requests for images from the first execution are submitted using the execution ID from the second session), but you don’t, so the most likely scenario I can come up with is that starting the second execution for the report implicitly removes the artefacts (images etc…) still in cache from the first execution. If this happens before the browser’s loaded them then it’s little broken crosses time…

Once you turn cookieless sessions on, however, the GET requests look more like this:

GET /ReportServer?%2fBlahBlahReport&rs%3aSessionID=ucqtvo55wacqugqjln3l2045&rs%3aFormat=HTML4.0&rs%3aImageID=e9a2d08b-23cb-4586-8399-eb42ed5558a7 HTTP/1.1

ie they contain all three of report, execution and image. As such executing the same report multiple times in parallel doesn’t create any race conditions. This seems like such a better way of doing it that I’m surprised this isn’t the default behaviour (I guess it can’t clean up the temporary cache quite as aggressively, but other than that I can’t see the downside).

It’s worth noting that this kind of thing is an absolutely classic pitfall with any form of HTTP session state, which application frameworks like ASP and ASP.Net typically sidestep by serializing requests within a session and thus avoiding order-of-execution issues. However in this case, even if SSRS serialized access it would have to ensure it serialized it in the right sequence to really fix the problem. Tricky.

 

PS: Reporting Services 2005. Haven’t tested on 2008.

Tablet PCs: Still Too Heavy

I went out and bought a nice new HP Touchsmart TX2 the other day. And I’ve been very happy with it. For one thing I’ve hankered after a tablet for some time, but the shear ‘double the price’ barrier has always stopped me. But that’s not the case with the TX2, which goes for under $2k and is fairly comparable with it’s non-tablet brethren.

And it’s a touch-screen as well, though the multi-touch capabilities are pretty much just gimmicks supported by a handful of built-in HP apps until Windows 7 comes along. That being said, I’ve already got very used to flicking the screen for page up / page down operations (though I still think a scroll-wheel on the side would be just as easy).

But it’s just too heavy.

It’s not heavy by current standards: it’s sub 2 kilos, and even the (far more expensive) Fujitsu Lifebooks only go down to 1.6 kg (in a 12”). And it’s definitely not too heavy to carry in a bag on your shoulder. But in my totally non-scientific ‘picking things up with my hand’ testing I’ve determined that the ideal weight for a tablet PC is less than a kilo (the weight of a Dell Mini 9 in fact). Tablets are for holding, right? Well at 2kg (and without much grip on the edges) your thumb is getting a good workout, let me tell you.

So my requirements for a tablet PC that will genuinely shift usage patterns:

  • <1kg
  • 4 hours battery life minimum, ideally 10, included in the weight above.
  • Some decent ergonomics around the frame so one can actually hold onto the damn thing.
  • About $600

Various rumours regarding tablet netbooks seem to be coming to fruition, so provided the dual-core Atom has enough grunt to handle the handwriting recognition, then this will be all good. Many a time I’ve sat there with the Mini 9 on my lap thinking ‘if only the screen would just flip round…’

Coming soon: less hardware ramblings. Promise.

Friday, March 06, 2009

I <3 Dell Mini 9

I’ll ‘fess up to being totally unimpressed with progress in the PC industry in the last 10 years. I appreciate we’ve made some fairly big steps technologically speaking (multi-core is right up there), but the growth of power consumption and (as a result) heat and noise dissipation has had me absolutely appalled. Kilowatt PSUs! Heatsinks the size of a fist! 3D graphics may have come on in leaps and bounds, but does it have to sound like a 747 about to take off? It’s all just so crude. I feel like opening the drive bay and shovelling in more coal.

I constantly think back to the Acorn Risc PC I had in uni, which seemed pretty powerful at the time, but whose CPU ran cool to the touch. It’s been downhill ever since.

Similarly I like to complain about laptop battery lives. What’s the point of a mobile computer if you have to be tied to the wall socket every two hours? But then laptops aren’t actually mobile computers, they are portable computers: portable like a portable TV is portable: you can pick them up and carry them around, but realistically to use them you’ve got to plug them in. This drives me *nuts*, particularly as I can easily burn through two hours before I really get started on anything, by which time it’s time for a battery break.

Enter the netbook.

The wife’s old laptop (one of my cast offs) finally went back to the Dell recycling plant in the sky, and so we got her a netbook as a replacement. A Mini 9 to be exact.

And it’s fantastic.

I can’t really say how it compares to it’s other netbook brethren, but if this is what the rest are like then viva the revolution because for on-sofa / in-kitchen[1] / in-bed casual browsing this form factor and weight (1kg) is absolutely on the mark[2]. Not too big, not too small. The Mini is SSD so it’s totally silent (excepting occasional sound card cross-chatter). The Atom CPU runs cool, so it doesn’t make your lap all sweaty. And it’s cheep enough to buy on a whim ($550 currently)[3]. Oh and it’s XP, so it can run .Net.

I don’t totally buy into it as the nirvana of mobile computing, because 3.5 hours is still not enough, but as something you can happily take room to room in the house it’s bang on.

Here it is cosying up to a Dell 15” Latitude: it’s exactly half the size[4]

P2030003

Here it is dwarfed by my 17” Inspiron 9300, which is only it’s grandfather in Laptop-generations:

P2030005

Ok, so the 9300 still beats the pants off it in terms of performance, but the Atom team have seriously lit a fire under the rest of the industry because it’s already good enough for most people, and it’s only going to get better. You’d have to imagine when the average punter catches on that Atom-based desktops, like the the EeeBox, can happily handle their Word/Email/Web demands there’s going to be a fairly major correction in the market, whatever the critics say. Just check out the price differential. That 90% of your horsepower you donate to SETI, Folding@Home or whatever: why not just not buy it in the first place and save your kilowatt hours? And that’s before you even consider any cloud computing offload.

In years to come we will look at our current CPU technologies in the same way we view CRTs, incandescent bulbs and the internal combustion engine. Here’s hoping we are making the first footsteps now.

 

[1] For recipes. I mean, no-one really bought an internet fridge, right?
[2] Though it would be better as a tablet
[3] Frankly it’s cheep enough to use as a universal remote control, if only all your devices talked Bluetooth
[4] If it looks like its overhanging on the right, it’s just the perspective

Oracle OLE DB provider precision issues with SSIS 2005

We had this problem[3] before where SSIS would keep complaining that the Oracle client wasn’t giving it the correct precision information for numeric columns:

Error at Import Broken Stock On Hand Levels [DTS.Pipeline]: The "output column "WORKD_TONNES" (42)" has a precision that is not valid. The precision must be between 1 and 38.

We were using Oracle 9r2 client + OLE DB provider against an Oracle 8 database, and I just assumed that one of those wasn’t passing though metadata it should have, leaving SSIS to see the precision and scale as 0:

image

You can explicitly set the scale/precision in the advanced editor, but it keeps wanting to overwrite it every time you look at the query, which is a major pain in the arse. And explicitly TRUNCing or ROUNDing the column in the source query doesn’t seem to make any difference, even while you’d think that would make the precision explicit. So instead we used the Microsoft OLE DB Provider for Oracle (MSDAORA), and everything was happy.

Well nearly.

Actually it turns out that rather than MSDAORA ‘working’, all it actually does is default the precision and scale as 38,0 in scenarios where it can’t determine the precision/scale. And this happens rather a lot, since an Oracle Numeric doesn’t actually have to have a length or precision specified (though it should): it can be some kind of wierd variable precision float/numerical hybrid, which isn’t something that SSIS 2005 really caters for (nor other parts of Sql for that matter). Anyway, the point is that 38,0 isn’t the right answer either: whilst it’s fine for integer data anything after the decimal is lost, and that can take a while to pick up on.

So I’m not sure which is worse. On the one hand you can use the Oracle OLE DB driver, and all your precisions will be wrong until you fix them, but at least it’s in your face enough you know about it (though sometimes you can’t even OK the dialog, to get into the advanced editor and fix it, and/or it whinges about mis-matches with error output columns).

On the other hand the Microsoft driver makes a good stab at ‘what you probably want’, but as a result you lose precision on decimal data. And even if you set the precision up, it’ll overwrite it again (silently) when you touch the query, so it can go from working to not very inconspicously.

Probably the only safe way here is to use the Microsoft driver, and modify all your queries to cast all non-integer variable Numeric columns to strings[2]. That way the conversion into decimal data has to be explicitly defined, and can’t be lost quite as easily. It’s a bit more work of course, but at least the output is right.

Which brings me back to why I started looking at this again because there is no 64 bit version of MSDAORA [1] (I’m not even sure if MSDAORA is actively supported anymore), and we’ve had all manner of fun getting the 32 bit version running talking to either of the 32 and 64 bit versions of the Oracle client. We failed, but I may have just configured it wrong.

So instead we are in this ridiculous situation where we use MSDAORA at design time, and the Oracle OLE DB driver on the server at execution time, which creates all manner of warnings in the logs and also means some Sql (with inline comments) works on a developers machine but doesn’t work on the server.

It’s a mess.

[1] Nor Jet either, if you were crazy enough to load something from an Excel file. Who would do such a thing?
[2] Interestingly this is exactly how SSIS 2008 gets round this problem: it gives up, maps the variable-Numeric column as a string and leaves the conversion up to you
[3] Obviously the first problem we had was the whinging about the DefaultCodePage issue
[4] Managed to get all the footnotes in backwards for this post, which is pretty poor.

Sunday, March 01, 2009

Consistency in SSIS

It just occurred to me that the Derived Column Transformation editor (and other SSIS expressions) take a C (ie C#) style approach to equality:

image

…whereas SSIS 2005 can only use VB.Net for Script Tasks / Components.

Go figure.

 

[ok, yeah, fixed in SSIS 2008. I know. Lucky you]

Friday, February 27, 2009

GDR Grrr...

I've totally had it with Visual Studio Database Edition 2008 GDR.
CREATE VIEW [dbo].[DateRangeExpanded] WITH SCHEMABINDING
AS
select n.Number, n.Date
from dbo.Numbers n
where n.Date >= CONVERT(DATETIME,'2007-12-01') and
n.Date <= datediff(day, 0, DATEADD(MONTH, 6, GETDATE()))
Results in
Error TSD03127: Cannot schema bind view '[dbo].[DateRangeExpanded]'
because name 'DATETIME' is invalid for schema binding. Names must be in two-part format and an object cannot reference itself.
WFT? SQL Server is quite happy with the above, it's just GDR getting it's knickers in a twist yet again: DateTime is a type not a column. I have found no way of making it happy with this one[1], other than taking the schemabinding off, and I am pretty pissed about limiting my database design just to accommodate GDR's inadequacies.

If this project wasn't finishing up in a few weeks, GDR would be in the bin by now.


[1] CAST vs CONVERT doesn't make any difference, for one thing.

Thursday, February 26, 2009

Dang! SSAS deployment configurations are held in the .user file

I really wish Microsoft would make up their mind where deployment target locations should live (ie setting the destinations for Publish / Deploy)

For SSRS this data is stored in the project file (.rptproj) , per configuration, so anyone can get latest on a project, change their solution configuration to (say) 'Test', hit deploy and away they go, happily deploying to your Test environment. This is how it should be.

For SSAS (2005 / 2008) this data is stored in the .dwproj.user file, so for any new developer / PC / workspace that comes along all those deployment configuration settings have to be re-setup (or copied from another workspace), or all their deployments are to localhost. This is just really irritating. As I discovered today.

For an XBAP (or just a WinForms ClickOnce project) a single deployment target is stored in the project file. Other, previously-used deployment locations, are stored per-user. None are correlated with a project configuration in any way. This 'deploy to prod' design mentality drives me nuts.

For an ASP.Net website the (single) deployment location isn't even stored against in the project at all, but squirreled away in %userprofile%\Local Settings\Application Data\Microsoft\WebsiteCache\Websites.xml. Of course! Web application projects put it soley in the .user file, which is at least less obscure, if no more useful.

Now I'd be the first to say that deployment to test environments etc... is best done from a seperate scripted process, but I accept that not everyone has my affinity for long PowerShell deployment scripts. Plus sometimes I just can't be arsed. So am I completely insane to wish that the in-IDE Publish / Deploy mechanisms would work in a manner that could be actually used would be usable within a team, and not scatter the deployment target information quite so liberally?

Wednesday, February 18, 2009

Mini USB Convergence Finally Coming

In a rare victory for common sense, it seems that the world’s mobile phone makers have at last agreed to converge on using mini-USB as a universal mobile phone charging standard. Not only that, but chargers will be optional at point of sale, so you won’t end up with a house full of them.

The plan is to do this by 2012, which is a bit slow in my book, but it’s all progress, right?

Thursday, February 12, 2009

Substituting view/synonym causes naming collisions in Database Edition GDR

The other day we discovered a duplicated table in the database. We're doing some refactoring, and a certain amount of duplication during this process is inevitable, but to cut a long story short, the two instances of this table needed collapsing into one.

So I dropped one instance, and created a synonym in it's place (pointing to the other). This saves fixing all the views and external references (in the SSAS cube) to that table. Those references will go away over time.

But when I tried to sync the database using GDR it just didn't get it. It's quite happy to create the new synonym, just too dumb to drop the original table first - so there's a naming collision I had to fix by hand. It basically does this:

CREATE SYNONYM [dbo].[Table_2] FOR [dbo].[Table_1];
GO
DROP TABLE [dbo].[Table_2];

Dur! And it does the same thing if you substitute a view for a table too:



CREATE VIEW [dbo].[Table_2]
AS
-- Create view instead of original table
SELECT One FROM dbo.Table_1;

GO
-- Drop original table
DROP TABLE [dbo].[Table_2];
Anyone see the problem here? This is not an edge case guys.

I should point out that this only happens when sync'ing a database with the database model. If you sync two live databases, GDR gets the execution order right.

Not impressed.

(By way of comparison Sql Compare 7 is quite happy with both these scenarios, whether comparing against a live database or a snapshot. Drop table; create synonym - it's not that hard. And Sql Compare 7's been out a while (8 is nearly out))


[Update: 2008-02-20] And another thing: GDR's idea of dropping an object (eg: if you sync to a dev database and the object isn't there any more) is to comment out the object definition in the database project. All well and good, but it's too stupid to also comment out any extended properties the object might have (say the ones the view designer put in to store the view layout), so obviously those extended properties cause the build to crap out as the object they reference doesn't exist.

Thursday, February 05, 2009

Not sold on Database Edition 2008 GDR

Just in time for the end of 2008 Microsoft finally got Database Edition 2008 GDR out the door. This replaces the version that came with VS 2008, primarily bringing with it Sql 2008 compatibility.

So GDR is now Microsoft's 3rd bite at integrating the database development workflow into Visual Studio. Lets start with what it does right:

Obviously the main benefit of bringing the database into Visual Studio is you get source control for free. Sure - you've been able to script out your database and add it to source control before, but this way you can check in database changes and the code that depends on it as one atomic checkin, against a work item. That's actually pretty neat from a tracking / branching point of view.

And because it's a fully model-driven system, all the SQL in the project is tested for consistency and correctness at build time. You could do this before of course (script out and build a structure only database to check it builds), but this way you know any problems you find are local to your workspace, so it really was you that broke them.

The output from the build process is a model that completely describes your database schema, so this can be used to sync up test and production databases, either via the IDE, MSBuild or a redistributable command line utility that can be incorporated into your deployment / install process. Not that you couldn't have done all this with RedGate's SqlComparison SDK of course, but if you struggle to get licences other than MSDN purchased, this does make your life a bit easier, and at least it's been constructed with a build-once-deploy-many philosophy, unlike the clowns behind ClickOnce.

The model also (allegedly) provides a safer platform for database refactoring: it's supposed to 'remember' things like column renames as renames and treat them as such the next time a database is sync'd to the model. I guess only time will tell how well that works in the real world given, for example, multiple databases at different schema versions (ie Prod database 5 versions behind Test etc… ). Are the refactorings going to be journalled away and somehow replayed in the right order? If so, how does it know at which version to start? It's not a simple matter.

But Microsoft has still totally, totally missed the point, because there is still no design support. Not one. Nada. Not even a schema browser.

Hey I can see where this is going, and I can see they're laying foundations here, but in my mind the fundamentals of a database development tool are that they assist in database development. You know - the core stuff: creating objects and sprocs. Source control and a deployment workflow are secondary to that. But with GDR you'd better be hankering after the Sql 6.5 days, because you're going to be writing lots and lots of TSQL. The best I can say is it's colour coded, but didn't we only just get IntelliSense in Management Studio? It just doesn't work for me.

But the alternative - do your dev / design in Management Studio and sync back to the model - is equally flawed, if only because you've got to keep syncing back again (and the sync UI is still a bit ragged). That's a whole lot of friction for relatively nebulous benefits (especially if you already have your database schema under some level of source control), and in my mind questions the whole utility of the product. What's the point of 'integrating' into the Visual Studio IDE if you have to keep tabbing out to do anything even remotely useful (think: execution plans)?

Similarly, whilst there's a Data Compare feature, it's not actually part of your database project, just something you do on an ad-hoc basis. So there's no easy way of designating a certain subset of tables as 'reference data' that should essentially be considered part of the schema, rather than part of the contents, and sync'd up when you redeploy. I'll accept that's probably a vNext feature, but its absence calls into question the viability of using GDR as an end-to-end database deployment manager in the first place, compared to tools like SqlPackager (or scripting using SqlCompare \ SqlDataCompare APIs).

And of course the model parser isn't perfect, so occasionally a valid bit of SQL creates build errors because the model can't fathom what it's doing. In my case it can't seem to resolve the columns in a CTE that queries another (properly referenced) database, and barfs about ambiguous references where there are none. Unfortunately fidelity is always the problem with Model Driven Architectures, even ones that use a DSL as the model, because they have to - at some point - cater for a lowest common denominator, and at any rate they are in an arms race with whatever it is they're modelling. Microsoft is touting it's extensible architecture, and IBM are apparently writing a DB2 provider, but what do you think the chances of 100% functionality coverage are? And for Oracle? I'll wager pretty low. So be prepared to take an 80/20 view, and put some bits into pre/post build scripts and / or re-structure them to keep the parser happy.

Then there are the little things. It creates a schema and object hierarchy when your database is initially imported, but it doesn't enforce it, and the Add \ New (whatever) templates don't pick up on it, so all your new objects end up in the wrong schema unless you're concentrating. If you do anything with any system views you'll need to reference the master database ( %ProgramFiles%\Microsoft Visual Studio 9.0\VSTSDB\Extensions\SqlServer\2008\DBSchemas), so why is it not referenced to start with? And the doco's not been updated for RTM so contains all kinds of outdated information around the command line interfaces, and absolutely no content in the API section other than listing the namespaces.

http://social.msdn.microsoft.com/Forums/en-US/vstsdb/thread/3f0f0888-9b37-4279-81d4-b924595630a8/
http://social.msdn.microsoft.com/Forums/en-US/vstsdb/thread/a966ac3c-c039-4c3f-9b6d-2882d9874282

So this still strikes me as neither fish nor fowl. Given the designers already exist in Management Studio it's a bit pisspoor that the DataDude team haven't borrowed them yet. I imagine I'll still be doing most of my work in Management Studio, and at best attempting to keep the database model in sync. And I suspect that for me, and many others, this will be just too much friction to bother with and so the whole effort will be more or less wasted.

Which is all a bit sad, given it's 2009, I've written a decades worth of database-centric apps, and we've still not got this right. At least no-one's actually paying for this any more.

Friday, January 30, 2009

Sql 2005 Install Woes on Shiny New Big Server

I had a new one this time.

I (of course) ran into the install-hangs-on-setting-file-security issue (KB910070), but I was expecting that. What really threw me was after that then install then just kept dying, leaving this in the logs:

Faulting application sqlservr.exe, version 2005.90.1399.0, faulting module sqlservr.exe, version 2005.90.1399.0, fault address 0x0000000000b323f0.

This really threw the installer too - after an uninstall, and even after a manual cleanup, the installer still though there was an instance hanging around. Which it was. I had to manually delete the SQL Services (using SC), a bunch of instance registry settings and the instance files directory (the MSSQL.1 folder) before I could finally get it to re-install. I guess the uninstall died too.

So then I tried installing again, and again. And again.

So I started speculating. Was it the virus scanner .... No. Could it be the Sql 2005 installer didn't like .Net 3.5 sp1? Uninstall... No. Was I definately using the 64 bit version... Yes. Could I slipstream SP2 and workaround some issue I didn't understand yet... No. Was it the monster 24 cores the server had (4 x hex core)... maybe.

There is a known issue with Sql 2005 instal failing with odd number of cores (ie Phenoms). That (obviously) doesn't count: but maybe Sql can't either. So I used the instructions in KB954835 to criple my monster server down to a single CPU, and then it all installed just fine. I can now install SP2 (3 actually) which allegedly should then make it all work.

Moral:
Obviously I should have been installing Sql 2008 instead
It's clearly becoming way too easy - with multi-multi-core boxes - to drop into some massively unexplored race condition territory in something that's otherwise really quite stable and well tested.
You can have too many cores

Other thoughts:
There must be a better way to restrict an installer or app to run on only one core without farting about with BOOT.INI
Once I put SP3 on it better all work otherwise the boss is going to be really pissed ('What, those other 23 cores? They're um ... spares')

Wednesday, January 28, 2009

Forcing Web Part pages into Edit Mode

(Before I forget again :)

Even if you've hacked about with a SharePoint master page to remove all the SharePoint chrome (including the edit button and suchlike) there are URL hacks you can use to get the page back into edit mode:
pageurl?ToolPaneView=2
(There's a few more like this, see Sharepoint Tweaks... )

...as well as the hack to get a page into the Web Part Maintanance page:

pageurl?contents=1

Special cleaning instructions for Nokia 6110 Classic

Mine had a bit of a smudge on the screen so I thought I'd pop it through the wash and clean it up. The critical bit here was to make sure the phone was in the pocket of my cycling shorts, so it didn't get bashed / scratched by the washing machine drum.

After an afternoon out on the line drying (still in shorts pocket) bingo: one nice clean phone.

Nokia: tough as.

Monday, January 19, 2009

Recordset to Powershell object collection

I've found this pretty damn useful: it converts an ADO.Net IDbDataReader into a collection of powershell objects, mapping column names to property names so you can sort and filter and whatnot using standard PS syntax.


# Executes an IDbCommand and converts the resultset
# to a Powershell object collection
function ExecuteCommand(
$command = { throw 'Command must be supplied' }
){
$rows = @();
$reader = $command.executereader()
while($reader.read()){
$values = new-object object;

for($i = 0; $i -lt $reader.FieldCount; $i++){
$name = $reader.GetName($i);
$value = $reader.GetValue($i);

Add-Member -in $values noteproperty $name $value
}
$rows+= $values;
}
$reader.Close();
$rows
}


Next week: why I was doing this in the first place (much more interesting)

Popular Posts