Monday, March 15, 2010

Why No ‘Average’ Aggregation Type in SSAS?

Just coming back onto some cube work for the first time in 12 months or so, and managed to get myself confused by the AverageOfChildren semi-additive aggregation type, just like I did the first time round :-/

(Aside: AverageOfChildren is a semi-additive average: it sums over all dimensions except time, and averages over time. This makes sense (for example) if you want to compare average monthly sales over the year for each of your branch offices. It’s useless, however, when you want to know the average sale price per unit)

Whilst creating calculated measures of the type [Measures].[Something Sum] / [Measures].[Something Count] is all very well, I don’t understand why this isn’t just supported out of the box, as a native Average aggregation type. Sure there are always ‘flavours’ of averages that need specific MDX (weighted averages for example), but surely the simple flat average over row count is so common as to warrant ‘out of the box’ support.

You might say having two ‘average’ aggregations would confuse, but AverageOfChildren causes plenty of confusion on it’s own: having two (and documentation on which to use when) would probably make the situation considerably better.

Fool me twice: shame on me.

Update June 2011: Rather than just complain, I raised a Connect Issue: Add AverageOfRows aggregation type for simple averages. Please vote for it


David Stachon said...

thanks for saying it. absolutely true.

I have a cube I'm working on with about a hundred measures that require averaging. I'm dreading having to code that calculation so many times.

...and the values can be null so I'm going to have to have a counter for each measure as to not obscure the average. ugh.

Samantha said...

David, did you have success? My cube also has many measures and my code has grown unwieldy. Was hoping to use built-in Average over Time measures but they display nothing if a measurement is NULL for any date within the time range I select. The sum/count methodology I've found suggestions to use, is slow in MDX with our complex queries. Any updates on how you solved the problem would be most appreciated.

David Stachon said...

Hi Samantha,

I did end up going with the sum/count methodology. Every single one of my measures (~200 now) has a corresponding counter, and corresponding calculated member)

(in my case, I'm not dealing with a great deal of data so I'm not having a problem with performance...yet!)

What I did end up doing is creating a "meta data" database in Access that contained all my measures, then it would generate my view, the cube xml (i.e. the content of the tag), and calculated members. (as you've likely noticed, measures don't allow for multi-select in order to perform a mass property change).

This is a great time saver, but won't help you with your performance issue.

Good luck! unfortunate that such a basic thing has to be such a challenge.

piers7 said...

@David for that many measures I'd probably attempt to script something up. I've had some success recently adding calculations dynamically into cubes (using AMO and PowerShell), and I imagine something similar could work here. At worst, a script that spits out all the MDX for you to cut-and-paste into your script view would at least save you some time (though this wouldn't generate the backing measures for you)

One of the questions you'd have to answer would be how you 'tag up' measures in the underlying data source so that the 'average generator' will know to spit out an average calc for them.

piers7 said...

I also created a connect issue (see link above)

piers7 said...

@David: You *can* multi-select measures if you change to the grid view rather than the tree view. And re-reading your comment I see you've already started to generate the MDX script from your access db - I should have read that more carefully before I replied.

@Samantha: The sum/count method should not be slow. With the correct aggregations in place it should be the fastest possible implementation. However if the sum/count measures are themselves calculations, that's when things can go south.

Popular Posts