Cup(Of T): 2011

Sunday, November 20, 2011

Gotchas with the Kinect SDK for Windows

Playing with the Kinect SDK for Windows, and having a ball, but the doco is (understandably) a bit rubbish in places, or to be more specific – lacks critical details around the form that a parameter takes, where that detail is important.

Anyway, this is my list of gotchas so far:

Depth Data Inverted when Player Index tracking enabled

Bizarrely, whether you initialize and open your depth image stream with ImageType.Depth or ImageType.DepthAndPlayerIndex makes the difference between whether what you get is ‘right way round’ or horizontally inverted.

Inverted is generally more useful, because it matches with the ‘mirror image’ video stream. So why isn’t the stream like that always? Seems like an unnecessary inconsistency to me, and one you might want to spell out in the doco.

Different Depth Data Pixel Values when Player Index Tracking Enabled

When you do turn player index tracking on, the depth stream ‘pixels’ are lshifted 3 positions, leaving the lower 3 bits for the player index. This is documented, and I understand you’ve got to put the player index somewhere, but why not make the format consistent in both cases, and just leave the lower bits zero if tracking not enabled? Better still, why not put the (optional) player index in the high bits?

This is especially irritating because...

GetColorPixelCoordinatesFromDepthPixel() Requires Bit-Shifted Input

The nuiCamera.GetColorPixelCoordinatesFromDepthPixel() mapping method expects the ‘depthValue’ parameter to be in the format it would have been if you had player tracking enabled. If you don’t, you’ll have to lshift 3 places to the left yourself, just to make it work. So depending on how you setup the runtime, the pixels from one part of the API can or can’t be passed to another part of the API. That’s poor form, if you ask me.

Not that you’ll find that in the doco of course, least of all the parameter doco.

No GetDepthPixelFromColorPixelCoordinates Method

Ok, so I understand that the depth to video coordinate space translation is a lossy one, but I still don’t see why this method doesn’t exist.

I picked up the Kinect SDK and the first thing I wanted to do was depth-clipping background removal. And the easy way to do this is to loop through the video pixels, and for each find the corresponding depth pixel and see what its depth was. And you can’t do that.

Instead you have to loop through the depth pixels and call the API method to translate to video pixels, but because there are less of them compared to the video pixels, you have to paint them out as a 2x2 block, and even then there’ll be lots of video pixels you don’t processes, so many you have to run the loop twice: once to set all the video pixels to some kind of default state, and once for those that map to depth pixels to put the depth ‘on’.

Just didn’t feel right.

Thursday, September 22, 2011

Geolocation in HTML 5

Ok, so it’s not actually part of HTML 5 (the spec), but conceptually at least it’s definitely part of HTML 5 (the brand).

So what’s actually involved. Hmm. OH MY GOD IS IT THAT EASY !?

   function showMap(position) {

// Show a map centered at (position.coords.latitude, position.coords.longitude).

}



// One-shot position request.

navigator.geolocation.getCurrentPosition(showMap);

[from the W3 geolocation spec]

So you just rock up to html5demos.com/geo and ...

Holy crap. I won’t show you the resulting map because it shows where I live. What’s really freaky about that is this netbook doesn’t have a GPS. So either Windows 7 or IE 9 has fallen back to IP-based location inference, and somehow still got me only one house out.

I’m totally freaked out.

Anyway, the point of all this is that IE 9 is the browser for Windows Phone 7.5 (Mango), which – if it actually supports this API (and Wikipedia says yes it does) - means you can write location-aware mobile apps targeting Mango without having to ‘go native’. And for the demo I want to put together, this can only be a good thing...

Thursday, September 15, 2011

Windows 8: First Impressions

Q: "Your first impression? Love or hate?"

Well that's a really good question.

I was always horribly dissapointed with my HP TX2 multitouch laptop, and whislt some of that was about the hardware (rubbish battery life, noisy fan), some of it was just how non-touch capable Windows 7 was to actually use. Windows Media Center achieves many of my 'media center kiosk' wants, but doesn't let me Skype or browse the web without dropping back to the desktop and so forth. Then you've got to go and find the mouse and all that crap. I realised that, like Media Center, a different usage type required a very different UI experience.

So in many ways what I was after was absolutely where Windows 8 is going. And (in the 30 mins I've actually played with it) I love it for that.

That being said, they're going to have to be really careful they don't throw the baby out with the bath water. Ok the desktop's still there for 'traditional' apps, but the Start bar is gone, as is apparently ALT-TAB task switching. And without a touch screen, that metro UI really sucks actually.

It'd be more than a shame if embracing a device/cloud future required ditching 10 years worth of desktop productivity, it'd be a Vista-scale corporate-desktop disaster.

Wednesday, September 14, 2011

Yes, Silverlight is Dead

Not exactly quick off the block predicting this, but I didn't want to rush to judgement. But yes, Silverlight is dead, on the desktop at least.[1]

Why? Reach. Silverlight was always going to be playing catchup to Flash, which took pretty much a decade, remember, to get ubiquity. Silverlight just didn't have time on its side. Today if you build an app in Silverlight you can target contemporary browsers on Windows/Mac. By contrast if you build it in HTML you can target Macs, Linux, iPods, iPads, Android, Windows Phone, Kindles, PS3... the list goes on.

To put Silverlight out to all those individual devices is going to take Microsoft a heap of time and effort. By contrast all of those devices have web browsers already, most of them pretty good ones, and getting better all the time.

It's a numbers game. The browsers finally won.

But can HTML realistically replace Silverlight? Absolutely. Not entirely, not today, but surprisingly close, and getting closer by the day. The foundations for mature, maintainable web-client development are finally being put down. And the tooling. Visual Studio 2010's javascript IntelliSense is pretty damn impressive, and already supports jquery for example. Add support for MVVM development (ala knockout.js) and you've got a decent development workflow to rival what you might be used to in WPF/Silverlight/Winforms land (we'll probably see more about this out of Build this week). And don't forget there's a JS version of RX.

Sure, browser-based javascript is somewhat limited compared to the Silverlight runtime. The touch support isn't quite there yet, for example. But it's more than enough to support UI interaction, and the gap's closing awfully fast.

As a developer who started in web, then moved to the desktop I'm really excited about all of this because I can see a future that finally blends the best of both worlds.

[1] I'll clarify before I get flamed: Silverlight is not dead today. I'll be starting a new project using it real soon actually. But the transition is going to be pretty abrupt. I'll be amazed if you start any new Silverlight projects next year.

Saturday, September 03, 2011

#AUTechEd 2011

The condensed version

Denali Always On for Mission Critical Systems: failing demo rather confused this one. And I don’t understand how it can be shared-nothing mirroring if it’s still based on a Windows cluster
Abundantly "Crescent": MS’s answer for BI visualisation gets even more fragmented
Practical Consolidation: when to go hard(ware), when to go virtual
HTML 5 - The useful bits: some bits you can actually use! Now!
Kinecting The Dots – Developing with the Kinect SDK: I want a Kinect
What's new for Windows Phone 7.5 Games Developers XNA and Silverlight: it’s not either/or any more
What's new in Windows Phone 7.5 for End Users and Enterprises: finally, targeted (‘private’) deployments
There is no Web: HTML5 Offline, NoSQL and MVC 3: offline support not there yet. Avoid for now. No, really.
Killer Real-World PowerPivot Examples: Grant is insane
JavaScript with a view – advancing client side code: viewmodel driven MVVM for JS is a reality. Use it
Science of Great UI: it is a science. It is important.
Data Dashboards using Microsoft BI: why use the right tool for the job when you can just keep hacking till it works?

Updated Links now point to Channel 9 site, where the videos will end up

Tuesday, August 23, 2011

IIS WebAdministration module failing with 80040154

Apart from the normal stuff about the IIS 7 powershell module failing because it's not registered, elevated or you've not allowed the execution of scripts in your powershell session, there's one more little gotcha:

Get-Website : Retrieving the COM class factory for component with CLSID {688EEEE5-6A7E-422F-B2E1-6AF00DC944A6} failed due to the following error: 80040154.

...which is that the COM objects are only registered for x64 (if on an x64 machine), so if you're running a 32 bit PowerShell prompt (for various reasons related to VS GDR) you'll get the error above :-(

See http://stackoverflow.com/questions/3501001/windows-powershell-snap-in-for-iis-fails-on-32-bit for some unsavory workarounds, or just run as x64 if you want it to actually work :-(

What proxy is 'automatically detect settings' actually giving me?

Chrome doesn't seem to like the 'automatically detect settings' (at least not where I am right now), so I have to change over to have an explicitly configured proxy.

But what to enter? PowerShell to the rescue:

[System.Net.WebRequest]::DefaultWebProxy.GetProxy("http://www.test.com")

...tells you what proxy the system will use for the uri provided, and that's pretty much the only one you ever have to worry about.

Friday, August 19, 2011

Clustered Hyper-V Live Migration for $450

Well, ok, not counting the MSDN licence I had to play with this, but the point is thanks to the iSCSI support in Windows 2008 R2 (initiator and target), you can now build test clusters without having to have a ‘real’ (as in expensive) shared disk array, so you too can amaze your friends by live-migrating a virtual machine in front of their very eyes, or dispel your own lingering doubts that this stuff is all smoke and mirrors.

I used:

3x old Dell Optiplex 745’s that we got for a song
A 100mb hub I borrowed from IT
Er… that’s it

Using the Microsoft iSCSI target for Windows Server 2008 R2, one box pretends to be a SAN. You could use Windows Storage Server, or a high-end NAS that supported iSCSI also.

The other two boxes I stuck Windows 2008 R2 with Hyper-V role. I could have used Hyper-V server. Using the out-of-the-box iSCSI initiator, I bound both of them to virtual drives I fronted up from the storage server, and after a few goes made a cluster.

I’m not going to do the blow by blow, because there’s actually a couple of really good posts on doing this:

http://www.servercare.nl/Lists/Posts/Post.aspx?ID=61 (read all three in this series)
http://blogs.technet.com/b/josebda/archive/tags/iscsi/

…though you will have to wade through them a bit, because the landscape has been changing, but before you know (well, it took maybe a few days, on and off) you have a VM flitting from box to box like a sprite[1].

Couple of things I will mention:

The doco says you can’t do this with only one NIC per box, but you can. Wouldn’t want to in production, sure, but you can
Though experience I suggest that the safest course is to only have the quorum disk target attached when creating the cluster (and add more disks later). That’ll prevent the wrong disk being used as the quorum disk, which I couldn’t work out how to prevent otherwise
If you destroy the cluster (as I did, several times, when it kept getting the disks round the wrong way) and find your machines don’t talk to each other any more, try removing them from the domain and re-adding. Worked for me
If something doesn’t work, don’t be an idiot like me and later try exactly the same thing again and waste a whole day rebuilding everything. Try it a different way :-/

[1] The actual moment of cut-over took the VM out for about 4 seconds, which isn’t terrible considering the appallingly low-spec setup I was running: disk, heartbeat and client access all hitting one NIC though 100mb hub. It was only getting about 7Mb/s on the disk too.

Tuesday, August 16, 2011

Custom Folders in SQL Server Management Studio

It’s about bloody time, but it took a 3rd party to hack it together. This really should be out-of-the-box behaviour.

I like the way he’s used extended properties as the persistence medium, rather than, say, a table with a special name. It’s an approach I’ve used myself for other extensions to the standard schema metadata, like for my Upsert View Generator (which is long overdue an updated post).

Tuesday, July 19, 2011

Partitioned, Rolling Averages on SQL Server 2008

Had a bit of fun a while back implementing a rolling average for an aggregation over a large table in a data warehouse.
Like you I’d been drilled into the ‘avoid cursors and use sets’ mentality, and when I see something like a rolling average I immediately think of it in set-based terms: come up with a way to join the table to itself using a window, and average over that window.
Something like this:
select group, avg(value)
from table t1
inner join table t2 on (t1.group = t2.group and t1.date > t2.date and t1.date < (t2.date – 15 days))
group by group
And this all works fine in dev, but once the data volumes start stacking up, WHOAH have you got a problem and ohmygod is that really the number of logical reads being done!?
This problem is generally known as a triangular join, and can be found anytime there is a non-equality join predicate between two tables, without any further conditions applied. It’s a big gotcha for the ‘sets good, cursors bad’ mantra, because in many cases a cursor based approach would be significantly better.
In this case particular case, even whilst it’s not an unlimited triangular join (each row should only be joined to the previous 15 days, not to all prior rows), you get the same effect. The optimizer figures: ‘hey – I might just do the full cross join, and then filter the rows’. And for a large data warehouse table this is very much a no-no.
So what’s the answer. Well, cursors, or even the quirky update approach if you are brave. But to provide a partitioned rolling average, both of these approaches require the data to be sorted as group,date. The cursor can apply that sort as an ‘order by’, the quirky update requires the destination table to be in that order (and some other things too).
My table, like many in a data warehouse, is sorted by date,group, because inserts are done incrementally over time, and so never cause any page splits. Sorting the data for a table this size (several TB’s) is expensive.
So I am thinking there must be better way. One that can scan the table in the order it exists, store rows in buckets indexed by the ‘group’ column, prune as necessary, and quickly provide an average for the last 15 days for any incoming row. And I thought about how I could do it in T-SQL, and I thought about how easy it would be in C#…
…and then I remembered about CLR stored procs, and about 30 mins later I had a working solution that took my process down from 5+ hours to some handful of minutes. The implementation was trivial (think Dictionary<int,List<SomeValueWithDate>> and you’re there already), and deployment was amazingly easy thanks to Visual Studio database projects (aka GDR2) which just do it all for you.
The only slight problem was that, like a normal stored proc, about all you can do with the output is INSERT – you can’t use it in a JOIN for example. I settled for capturing the output into a table variable, then doing the update as a second step back in T-SQL land.

Friday, July 01, 2011

Model-first with Entity Framework and Oracle 11g (Part 2)

So having given up on the Oracle provider for now, and because people had said good things about them, I tried devArt’s dotConnect:

-- Table "Customers"
CREATE TABLE "Customers"  ( 
    "Id" NUMBER(10) NOT NULL,
    "FirstName" VARCHAR2(50) NOT NULL,
    "AnInteger" NUMBER(10) NOT NULL,
    "ALong" NUMBER(18) NOT NULL,
    "IsSomethingTrue" NUMBER(1) NOT NULL,
    PRIMARY KEY ("Id")
)
/
CREATE SEQUENCE "Customers_SEQ"
/
CREATE OR REPLACE TRIGGER "Customers_INS_TRG"
    BEFORE INSERT ON "Customers" FOR EACH ROW
BEGIN
    SELECT "Customers_SEQ".NEXTVAL INTO :NEW."Id" FROM DUAL;
END;
/

Better. We got our sequence triggers, and some slightly saner data type choices. We've still got the casing problem (and no check constraints for our boolean flag), but we can fix that now because the devArt DDL generation template is a real T4 template, not just a stub:

You’ll find yours in C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\Extensions\Microsoft\Entity Framework Tools\DBGen. There’s not loads in there, all the real work is in the include file in .\Entity Framework Tools\Templates\Includes\Devart SSDLToOracle.ttinclude

A quick case-conversion modification to the the OracleGenerator class:


  


  	public override string GetValidIdentifier(string identifier) {

		identifier = base.GetValidIdentifier(identifier);
		return ToOracleCase(identifier);
	}

	public static string ToOracleCase(string value){
		StringBuilder output = new StringBuilder();
		bool waslower = false;
		foreach (var c in value.ToCharArray())
		{
			bool isLower = char.IsLower(c);
			if(waslower && !isLower)
				output.Append('_');
			
			output.Append(Char.ToUpper(c));
			waslower = char.IsLower(c);
		}
		return output.ToString();
	}

...and we are ready to go (ok, it's a crude implementation, but look, we are off and away):

-- Table "CUSTOMERS"
CREATE TABLE "CUSTOMERS"  ( 
    "ID" NUMBER(10) NOT NULL,
    "FIRST_NAME" VARCHAR2(50) NOT NULL,
    "AN_INTEGER" NUMBER(10) NOT NULL,
    "ALONG" NUMBER(18) NOT NULL,
    "IS_SOMETHING_TRUE" NUMBER(1) NOT NULL,
    PRIMARY KEY ("ID")
)
/
CREATE SEQUENCE "CUSTOMERS__SEQ"
/
CREATE OR REPLACE TRIGGER "CUSTOMERS__INS_TRG"
    BEFORE INSERT ON "CUSTOMERS" FOR EACH ROW
BEGIN
    SELECT "CUSTOMERS__SEQ".NEXTVAL INTO :NEW."ID" FROM DUAL;
END;
/

Much better.

Update: The next day I realised I’d made a cardinal error. All any of this does is change the generated DDL, not the store schema in the EDMX model. I had to do that separately, which rather defeated the point of what I was trying to do. In the end I gave up, went database-first and wrote an XSLT that performs a PascalCasing conversion on the conceptual model object / attribute names. And re-mapped any badly mapped types. What a PITA.

Wednesday, June 29, 2011

Model-first with Entity Framework and Oracle 11g (Part 1)

In the very latest beta release of the Oracle .Net drivers (11.2.0.2.30), Oracle now ‘supports’ Entity Framework (database-first or model-first) and Linq-to-entities. Oh, and 32bit only for now.

(If you’re tempted to try this, and get a 'ORA-00904: "SUBPROGRAM_ID": invalid identifier' error it’s probably because you – as a developer - are using 10g Express Edition, which isn’t yet supported. You can wait for the EF beta 2 (soon, allegedly), or alternatively you can download 11g Express Edition which is also in beta. And 32 bit only (on Windows, anyway). Oh, and make sure you install the ODAC after you install XE, or you’ll get the wrong Oracle home as your default on your path, and you’ll get errors in Visual Studio setting up Oracle database connections)

Prior to EF I’ve been quite happy doing database-first development in the SQL Server world, mostly because the visual tooling (database diagrams and the like) has been pretty good, and because of the type fidelity with mainstream languages (esp. .net) there’s relatively little loss of intent when you derive a model from the schema.

In the Oracle world of course this is very different. I’ve been massively underwhelmed with the quality of the tooling for schema design, and Oracle’s unconventional[1] stance on 4-byte integer and boolean data types (i.e. lack thereof) means that any model derived from the schema is going to need a fair bit of tweaking the metadata. That and fixing up all the SHOUTING_IDENTIFIERS.

So for Oracle, model-first looks like a really good idea. Unfortunately, in the current beta at least, Oracle’s support for this is pretty lame:

All integer types are written out as Number(38,0) (irritating)
Sequences are not generated for identity fields (really painful)
The PascalCasing of your entities is preserved in the database schema (totally unusable).

e.g.:

-- Creating table 'Customers'
CREATE TABLE "myschema"."Customers" (
   "Id" number(38,0) NOT NULL,
   "FirstName" nvarchar2(50) NOT NULL,
   "AnInteger" number(38,0) NOT NULL,
   "ALong" number(38,0) NOT NULL,
   "IsSomethingTrue" number(38,0) NOT NULL
);

Why unusable? Because, as any Oracle developer will tell you with a bitter look, Oracle is case sensitive. Generally when you query it it all works as if your query was upper-cased for you: select * from mytable works against MYTABLE. But if you use Mixed Case and create a MyTable you’d have to write select * from “MyTable” (note the quotes) and get the capitalization right. Yeah, you can see how much fun you could have with this, and it’s the same with columns. So stick to the SHOUTING_IDENTIFIERS convention, or suffer the horrible consequences. It’s ok, I didn’t know either.

So what any sane SSDL / DDL generation template would do would do a conversion, right? Entities named MyTable would spit out a table called MY_TABLE, and the MyTableId attribute would be a MY_TABLE_ID column. But no.

Worse than that, because Oracle’s implemented the entire generation process in a workflow, not in the T4 template (which is a stub) you can’t fix this on your own.

That’s about where I gave up on Oracle. Bastards.

Next time: 3rd party drivers to the rescue!

[1] I’m trying to be polite. Prehistoric would be more apt. DB2, MySql, Postgres (and SQL Server of course) have 4 byte integers because that’s what applications / CPUs actually use. Oracle act like it’s still 1990 and the ‘database IS the application’

Tuesday, June 07, 2011

Australian Dev User Groups

I was looking for (and failed to find) a Perth Java User Group, and came across this list in builder.com.au. If accurate it seems to depict a bleak state for Java, loosing out 4:1 to .Net in user group count terms:

Ok, massive caveats apply. Number of user groups is a terrible metric, maybe the Java ones have much higher membership / do more with less / do everything online, or maybe they’re all so good they don’t need user groups unlike the poor fumbling .Net developers. But I don’t think so…

Thursday, June 02, 2011

“PhpStorm 2.1 doesn't work too nice on my Windows + Samba anymore”

Got a really bizarre email from JetBrain’s bug tracking system the other day: a ‘won’t be fixed’ for a bug I had absolutely no interest in.

Like many users, I’d enabled notifications for ‘issues reported by me’. Turns out that someone had registered a user with the name ‘me’…

…yup…

…and so managed to spam the entire registered user base when anything happened to any of their bugs :-/ So some fairly serious design failure there: user-entered data being mistaken as a control sequence - basically SQL injection (in spirit, at least).

Good job he didn’t register as Little Bobby Tables.

PS: Of course this is all a bit passé compared to hacking the electoral role
PPS: Why isn’t data tainting a compulsory part of all mainstream languages these days?

Monday, May 30, 2011

Permissions Issues in Dynamics CRM 2011 when Users Inherit Roles from Teams

Setting up a CRM demo I noticed what appears to be a problem in the CRM 2011 security model which affects users whom only inherit a Security Role from their membership in a Team, and have no roles of their own.

My user is in a Team, and the Team has the out-of-the-box Customer Service Representative role. But my user can’t create a Service Activity like he should be able to:

I thought the team permissions were right for creating Activities, but MSDN just has a ‘coming soon page’ for both the roles and doesn’t even list the Activity entity so I wasn’t getting much help there.

Some error dialogs in CRM have a little ‘view log’ button, which helps, but this one didn’t. However the TechNet article ‘Troubleshooting Microsoft Dynamics CRM’ was quite helpful here (written for CRM 4.0 but much is still relevant), describing how you activate the detailed error dialog by modifying this setting in the app.config:

<add key="DevErrors" value="On"/>

This gave me something a bit more concrete to go on:

SecLib::AccessCheckEx failed. Returned hr = -2147187962, ObjectID: 00000000-0000-0000-0000-000000000000, OwnerId: {blah}, OwnerIdType: 8 and CallingUser: {blah}. ObjectTypeCode: 2500, objectBusinessUnitId: {blah}, AccessRights: CreateAccess

Ok, so I can’t create something. But interestingly a bit of poking about in the metadata schema in the database (Metadata.Entity) reveals ObjectTypeCode 2500 is not actually Activity, as I suspected, but UserEntityUISettings.

Hmm. The team’s got permissions for that too I thought:

... but that permission can only be applied at the user level. So I wondered if it wasn’t transitive over the Team correctly, and added the user directly to the role.

And then it all worked.

Perhaps there is an explanation somewhere, but I’m damned if I can find it. Services and Service Activities are new in 2011 and seem to be skipped entirely by the two CRM books I looked at. There is some documentation around the permissions model, but it’s incomplete, and most blog posts I’ve read seem to suggest diagnosing issues is a bit of a crapshoot, which is pretty piss poor for a business-orientated application if you ask me. And I’d hate to have had to diagnose this in the cloud, without access to the database...

Monday, May 23, 2011

More whinging about Hash Aggregate

I previously posted on why Hash Aggregate is to be avoided if possible. Unfortunately Hash Aggregate vs. Stream Aggregate appears to be an ‘all or nothing’ proposal. And that sucks.

A Stream Aggregate can be relatively cheap (in terms of memory pressure) because it can leverage the ordering (typically clustered index ordering) of the input data and aggregate on-the-fly. By contrast the Hash aggregate has to cope with completely disordered data streams, so must buffer all the output aggregates in memory until the input data has been entirely read:

"The other aggregation operator, hash aggregate, is similar to hash join. It does not require (or preserve) sort order, requires memory, and is blocking (i.e., it does not produce any results until it has consumed its entire input). Hash aggregate excels at efficiently aggregating very large data sets.”
[Craig Freedman - http://blogs.msdn.com/b/craigfr/archive/2006/09/20/hash-aggregate.aspx ]

This is a good strategy because:

It copes with totally unordered data
Provided you are aggregating to a significantly higher grain than the source data, the volume of aggregates in the hash (and hence memory consumption) can be relatively small.

In my case, of course, the latter isn’t true: I’m aggregating across a significant amount of data, but not aggregating very much, so I am experiencing vast quantities of tempdb spill.

My input data is partially aligned to the aggregation (the most significant column is sorted correctly), so you’d expect that the aggregation would be able to leverage that smartly, to output aggregates when they were clearly ‘done’. So, for example, if you had a clustered index on Month/Day/Category, and you were doing sum(something) GROUP BY Month, Category, you and I both know that after you’ve finished spooling data for any given month, you can output all those buckets, rather than hang on to them waiting for more. Unfortunately this doesn’t happen, which means given enough data, something will spill to disk:

“if we run out of memory, we must begin spilling rows to tempdb. We spill one or more buckets or partitions including any partially aggregated results along with any additional new rows that hash to the spilled buckets or partitions”
[Craig’s blog again, same link as before]

One can only hope it’s the older buckets that get spilt first, but even so, when this happens you are taking the IO hit of streaming all that aggregate data to disk and then subsequently loading it all back in again. Ouch. Or you force the stream aggregate (using a hint) and take the pain of the sort. More ouch.

For many workloads this probably isn’t much of an issue, but for my data warehouse this is starting to look like a major, major bottleneck.

Thursday, May 19, 2011

Setting up CRM 2011? Set the Calender format first

When you create a new Dynamics CRM 2011 organizational instance, the localization settings for that Organization default to US format, totally ignoring the settings of the installing user, defaults for the server etc…

You can change this in Settings \ Administration \ System Settings \ Format, but make sure you do so before you add any users to the CRM instance, because when you change it it doesn’t apply retrospectively to users already created. Hardly intuitive, I think you’ll agree, and you’d at very least expect a note (or warning) on the dialog that all you are changing (apparently) is the new user default.

eg: After changing to Australian format, new users correctly see this:

However, a user created before the change still sees this:

These users must each and every one go into their dashboard options and change their personal format settings to get what is now the system setting.

You’d notice this fairly early on I imagine, but you could still have created a bunch of content and a bunch of users, so this could be a real administrative pain in the arse.

Silverlight Spy

This looks cool – a Hawkeye for Silverlight apps (in and out of browser). Basic version is free.

Good Overview, there’s also a video on C9

Wednesday, May 18, 2011

NineMSN vs IE9

NineMSN still shows up as ‘compatibility view’ in both IE 8 and 9, despite what the banner ad below might suggest:

Note: I wrote this two months ago and forgot to publish it. But it’s still true...

Tuesday, May 17, 2011

Excel Multiselect Issues with SSAS Calculated Member

Ok, it’s an old one, but I struggled to find the right answer, so I thought it was worth a post. I knew it was an issue, and I thought I knew the answer. I was wrong.

Say you create a calculated member to return some kind of item count against a dimension, a bit like this:

CREATE MEMBER CURRENTCUBE.[Measures].[Days Count] as (    
count( descendants(     
  [Date].[Year-Month-Date].CurrentMember,     
  [Date].[Year-Month-Date].[Date]     
))

and it works just fine when you test it

SELECT {    
    [Measures].[Days Count]     
    ,[Measures].[Minutes Of Day Count]     
} ON COLUMNS     
FROM [My Cube]     
WHERE {     
    [Date].[Year-Month-Date].[Year].[2011]     
}

…and then you find it falls apart when someone queries the cube in Excel and uses multi-select in the pivot table filter. So you follow Mosha’s advice to fix it (using Existing):

CREATE MEMBER CURRENTCUBE.[Measures].[Days Count] as    
count( existing [Date].[Year-Month-Date].[Date] )

…but then someone points out that that doesn’t work either. More recent versions of Excel (2007, 2010) implement multi-select via sub-queries; with just a few days selected it generates a query that (simplified) looks a bit like this:

SELECT {    
    [Measures].[Days Count]     
    ,[Measures].[Minutes Of Day Count]     
} ON COLUMNS     
FROM (     
    SELECT (     
        {[Date].[Year-Month-Date].[Date].&[20110101]     
        ,[Date].[Year-Month-Date].[Date].&[20110102]     
        })     
    ON COLUMNS  FROM [My Cube]     
)

and executing that doesn’t give you the right answer (it just counts the number of members in the dimension):

So the old advice, still commonly quoted, is actually wrong. The solution (as of SSAS 2008 and onwards) is to use a dynamic named set:

CREATE DYNAMIC SET [Selected Days] as (    
    [Date].[Year-Month-Date].[Date]     
)

CREATE MEMBER CURRENTCUBE.[Measures].[Days Count] as (    
    count([Selected Days])     
)

I found this a bit bizarre at first, but some of the posts I read were themselves a bit confused and left in the ‘existing’. This is not required, and distracts from what’s really going on here: this works by design as this is one of the problems that dynamic sets are intended to fix. A (more recent) Mosha post spells it out:

“Dynamic sets are not calculated once. They are calculated before each query, and, very important, in the context of that's query WHERE clause and subselects”

Ah. The answer, and the explanation. Finally.

Thursday, May 12, 2011

When Tabs go Bad

Maybe it’s just me, but I don’t think a tabbed UI is cutting it any more for the New User dialog in Active Directory:

Also just me, but ‘Remote Desktop Services Profile’ strikes me as about twice as long as it should be for a tab header.

Update: Later I found the Active Directory Administrative Center, which is an apparently WPF-based replacement for all this, and looks much nicer. So I guess the dialog above is deprecated anyway.

Monday, April 11, 2011

Configuring SSAS 2008 HTTP access on IIS 7.5 using PowerShell

…was way too much like hard work / blindly poking about in the dark, and there’s still a nasty hacky bit in the script where I gave up trying to make the IIS snap-in work, and just used appcmd, but it does actually work:

<#
.Synopsis
Configures HTTP access for SSAS on the local box

.SeeAlso
http://technet.microsoft.com/en-sg/library/gg492140(en-us).aspx
http://learn.iis.net/page.aspx/436/powershell-snap-in-changing-simple-settings-in-configuration-sections/
http://www.iis.net/ConfigReference/system.webServer/security/isapiCgiRestriction
#>
$ErrorActionPreference = 'stop'
$scriptDir = split-path $myInvocation.MyCommand.Path;
$isapiPath = "$scriptDir\msmdpump.dll"

import-module WebAdministration
cd IIS:\

# Register msmdpump as a (globally) acceptable ISAPI handler
$isapiRestrictions = Get-WebConfiguration /system.webServer/security/isapiCgiRestriction
$handler = @($isapiRestrictions.Collection | ? { $_.description -eq 'OLAP' } )
if(-not $handler){
    # This way the object always appears locked?
    # but should be fine according to http://forums.iis.net/t/1158906.aspx
   $null = @"
    $restrictions = $isapiRestrictions.GetCollection();
    $handler = $restrictions.CreateElement('add');
    $handler.SetAttributeValue('path', $isapiPath);
    $handler.SetAttributeValue('allowed',$true);
    $handler.SetAttributeValue('description','OLAP');
    $restrictions.Add($handler);
"@

    # instead try
    # http://serverfault.com/questions/134361/programmatically-add-an-isapi-extension-dll-in-iis-7-using-adsi
    & "$env:windir\system32\inetsrv\appcmd.exe" set config /section:isapiCgiRestriction /+"[path='$isapiPath', description='OLAP',allowed='True']"

}

# Create the app pool
cd IIS:\AppPools
if(-not (Test-Path OLAP)){
    $olapAppPool = New-WebAppPool OLAP
}else{
    $olapAppPool = Get-Item OLAP
}
$olapAppPool.managedPipelineMode = 'classic'
$olapAppPool | Set-Item

# Create the website
cd IIS:\Sites
$defaultSite = @(dir)[0]
cd $defaultSite.Name

if(-not (Test-Path OLAP)){
    $olapApp = New-WebApplication OLAP -physicalPath:$scriptDir
}else{
    $olapApp = Get-Item OLAP
}
$olapLocation = '{0}/{1}' -f $defaultSite.Name,$olapApp.Name
cd OLAP

# Setup the web application: first associate the app pool
set-itemproperty $pwd -name applicationPool -value 'OLAP'

# ...then enable anonymous access
$basicAuth = Get-WebConfiguration -Filter /system.webServer/security/authentication/anonymousAuthentication
$basicAuth.Enabled = $true
$basicAuth | Set-WebConfiguration -Filter /system.webServer/security/authentication/anonymousAuthentication -PSPath IIS:\ -Location $olapLocation

# ...and create a mapping for that handler for this web app
$mapping = Get-WebHandler OLAP
if(-not $mapping){
    $mapping = New-WebHandler OLAP -ScriptProcessor:$isapiPath -Path:*.dll -Verb:* -Location:$olapLocation -PSPath:IIS:\ -Modules:IsapiModule
}
$mapping

Wow. IIS administration is just as bizarre and arcane as it always was. Did someone say ‘where is the setup wizard’?

Friday, April 08, 2011

Word Team not talking to Excel Team, apparently

'spreadsheet' not in Word dictionary

Wednesday, March 30, 2011

Avoid Hash Aggregation

I’d been wondering about this, but Simon Sabin’s recent post just spelt it out for me: Hash Aggregation is all too easy to do, but eats performance for breakfast. We’re all used to the problem of SARGable arguments, right? You perform any operations on the parameter not the column in order to correctly hit the indexes you’d expect to hit. So for example you wouldn’t do this:

 select blah
 from table
 where year(datecolumn) = @myyear

but instead you’d do this:

 select blah
 from table
 where datecolumn >= @myyear and datecolumn < dateadd(year, @myyear, 1)

Well exactly the same problem occurs with aggregates, but without the fix. Say datecolum was datetime2(0) – ie down to the second, and you want an annual aggregate of something. For example:

 select year(datecolumn) as Year, itemId, sum(cost)
 from table
 group by year(datecolumn), ItemId

Look what we did. We had to apply the function to the column, and hey presto, despite our underlying table having a clustered index in datecolumn, itemId order, our aggregation is a Hash Aggregation because SQL can’t infer the relationship between a Year and a DateTime. It can’t use Stream Aggregation unless the data is in the right order, which it now can’t infer, and it thinks the hash is going to be cheaper than a full sort.

But Hash Aggregation is expensive, as you only have to read the doco to realise:

For the distinct or aggregate operators, use the input to build the hash table (removing duplicates and computing any aggregate expressions). When the hash table is built, scan the table and output all entries. http://technet.microsoft.com/en-us/library/ms189582.aspx

That hash just buffered all our data in order to do the aggregation. Opps, TempDb spill here we come…

A Stream Aggregation is much more performant because it can perform the aggregation ‘on the fly’ because the data is in order, but to do this in our example we’d have to have a calculated, indexed column ‘Year’, and indexing means it’s persisted, so bang we just added 4 bytes to our fact rows and increased my table size by about 20% (my table is not very wide, but very, very long). And I still need the existing clustered index, so we’re creating a non-clustered index and the relative cost difference for scanning the table might mean we still don’t get what we want.

How much worse is the Hash? I cooked up an example with some of our data against three scenarios:

doing an annual aggregate based on munging dates into years (the group by above)

doing the aggregate based on adding the year column and putting it in the clustered index

88% : 12%. Quite a difference (though the rowcount here is fairly small so this may not be entirely representative).

Ideally Sql would ‘see through’ the Year (and other date related functions) and deduce that it can still use the date-ordered clustered index to service a stream aggregation, and this is exactly what Simon’s opened a Connect issue asking for.

That article seems to suggest that there are workarounds, like joining to the date dimension table and grouping based on columns from that, which I tried in vain. However, this may only be relevant when you’re limiting to a subset of the data. More importantly, this is only viable if the original column was of type Date: if you had a DateTime2(0) you’d need to join against a combined Date-And-Time dimension (getting fairly large). If you had DateTime2(2), you’d need to join against a date dimension that went down to the 100’th of a second. That’s a big dimension (3,153,600,000 rows / year).

Ultimately given my real world scenario is doing lots of aggregations from the sub-minute up to the minute grain, I had to bite the bullet, and change the column in the fact table to smalldatetime (and put the seconds portion in a different column). And that seems like a really poor outcome for what you would have thought would be a bread-and-butter warehousing requirement[1].

Here’s a really excellent write up of the two different operators from Craig Freedman’s blog:

Stream Aggregate

Hash Aggregate

[1] I’m only doing any of this at all because of limitations in SSAS and trying to avoid calc-and-sum performance issues, so it’s all a bit frustrating.

Tuesday, March 29, 2011

Static Lookup Data: Prefer CASE to JOIN when possible

Getting into the nitty-gritty of fine-tuning the datamart I’m working on now, so spending a lot of time looking at execution plans, and wondering ‘why?’. Today: lookup data.

Normally in a datamart you’d denormalize as much as possible, ideally avoiding JOINs altogether, but there’s a trade-off with space and in some cases the space cost just can’t be justified. Sometimes you still have to add some lookup data (cost rates etc…) in your fact view.

Given a tiny amount of lookup data (10’s of rows), I thought putting it in a little table and doing a join would be pretty performant. The lookup table will basically just sit in RAM all the time, and it’s tiny compared to the fact data so the optimizer can’t possibly it the wrong way round. And I thought searched CASE statements were expensive. Turns out they are nothing on a JOIN, even in the optimal case.

Adding columns using a CASE statement performs significantly better than the static lookup table strategy. Even when the CASE statement is rolled up into a table-valued function, the costs in my case are about 2:1 in favour (the TVF is successfully ‘optimized away’).

In both strategies SQL has to look at each-and-every row, so I’m surprised it’s that different, but I guess maybe the Hash Match is optimized for larger amounts of data (on the right side), and can’t match a CASE for trivial lookups. There’s also the cardinality to consider: a CASE devolves to a COMPUTE SCALAR, which means no change in the number of rows, but a Hash Match might output a row more than once (or drop rows) based on the number of matches on the right side of the query. I’m guessing this constraint means there’s a fundamentally cheaper way to process the data in memory.

Here’s the before and after, with a relatively small number of rows (70,000 on my dev box, but it’s near exactly the same with 100+ million in the test environment):

Now you’ll have to excuse me, because I have some views to change…

Friday, March 25, 2011

Enabling Sql Server CLR Integration if not currently enabled

I don’t know, is calling RECONFIGURE unnecessarily an issue? Anyway, if you really want to only enable the integration if it’s not already enabled (and maybe want to do something else at the same time):

DECLARE @config TABLE (
	NAME sysname,
	minimum INT,
	maximum INT,
	config_value INT,
	run_value int
)

INSERT INTO @config
EXEC sp_configure 'clr enabled'

IF EXISTS (SELECT run_value FROM @config WHERE run_value = 0)
BEGIN
	PRINT 'Enabling CLR integration'
	EXEC sp_configure 'clr enabled', 1;
	RECONFIGURE;
	EXEC sp_configure 'clr enabled'
END

Wednesday, March 16, 2011

What bits to download for SQL Cumulative Updates

If you look at the download page for Sql 2008 R2 CU 6, you’ll a choice of six different download packages all for your platform:

Whu? Actually it turns out that all but one of these are subset packages of the main hotfix, that you’d only need to apply if – say – you only wanted to patch PowerPivot or SQL Native Client, but you didn’t want to have to download the whole update*.

Quite why this isn’t deemed worth mentioning anywhere in the relevant knowledge base article or on the download page itself is beyond me, but fortunately I happened to stumble upon an article on the SQL CSS team blog that explained it all (after I’d already downloaded half of them) so now I know.

And now, so do you.

[*] Note the rules are different for Sql 2005 – you do have to download all of them. So no potential confusion there.

Wednesday, March 09, 2011

Extract Zip from PowerShell

Adapted from an old Powershell Guy post, and flipped around to unzip the files:

function unzip($zipPath, $destination){
    $shell = new-object -com shell.application;
    $zip = $shell.NameSpace($zipPath);
    foreach($item in $zip.items()){
        $shell.Namespace($destination).copyhere($item)
    }
}

Evil use of shell object (where is the managed API?), but seems to work quite nicely actually. Note carefully the brackets after $zip.Items().

Now I’ve worked it out it seems really easy to find others doing the same thing. Oh well.

http://www.powershellcommunity.org/Forums/tabid/54/aft/5239/Default.aspx

http://learningpcs.blogspot.com/2010/07/powershell-working-with-zip-files.html

http://www.vistax64.com/powershell/166287-re-copyhere-ignores-voptions-windows-powershell.html

Tuesday, March 01, 2011

Configuring HTTP Access for Analysis Services 2008 R2 64 bit

It’s basically no different from 2005, so follow these instructions, ensuring that (if needed) you download the OLE-DB provider for AS from the 2008 R2 feature pack site, and not one of the vanilla 2008 releases, and also ensuring that don’t mark your app pool as ‘enable 32 bit’.

You may find that the web role services ‘ISAPI extensions’ or ‘Windows Authentication’ isn’t even enabled on your box, and you will need those. You can get these up by right-clicking the web server node:

A good way to test this is to try and connect using Management Studio. If it fails with a ‘unsupported data format’ error, and you put the pump on a different box to the SSAS instance, it’s probably a double hop issue, and you’re going to have to send yourself insane talking to IT about configuring SSAS to use Kerberos authentication, constrained delegation, SPNs and all the normal fun stuff. I feel for you.

I bit the bullet and installed IIS directly on the SSAS box, and got a bit further. A good way to test at this point is to try and connect to SSAS locally using Management Studio, using the http://<server>/olap/msmdpump.dll path. If that works, at least you know the SSAS/IIS end is ok. But then I still couldn’t connect from Excel on my workstation, in a different domain, which was why I was doing all this in the first place.

I used the old ‘create local user with matching login/password’ trick, and ran Excel as that, but that which didn’t work. Guessing the lack of trust between the domains would be a problem I attempted to disable Kerberos and force NTLM (which seems to have changed a bit from how it used to be done on Windows 2003), but that still didn’t work either (IIS logs helped here). Eventually I just enabled anonymous access to the website, changed the app pool credentials to use Local Service (Network Service would have done), granted IUSR access to the cube (this bit I don’t quite follow, why not Local Service?) and finally it all worked. But note that now I was using anonymous access I didn’t need to install IIS at all, since I could have done this from any box. Ho hum.

Finally go and vote for this to be supported out-of-the-box. With all the work that went into separating out the HTTPSys bits (for WCF et al), this should be a total no-brainer.