Getting into the nitty-gritty of fine-tuning the datamart I’m working on now, so spending a lot of time looking at execution plans, and wondering ‘why?’. Today: lookup data.
Normally in a datamart you’d denormalize as much as possible, ideally avoiding JOINs altogether, but there’s a trade-off with space and in some cases the space cost just can’t be justified. Sometimes you still have to add some lookup data (cost rates etc…) in your fact view.
Given a tiny amount of lookup data (10’s of rows), I thought putting it in a little table and doing a join would be pretty performant. The lookup table will basically just sit in RAM all the time, and it’s tiny compared to the fact data so the optimizer can’t possibly it the wrong way round. And I thought searched CASE statements were expensive. Turns out they are nothing on a JOIN, even in the optimal case.
Adding columns using a CASE statement performs significantly better than the static lookup table strategy. Even when the CASE statement is rolled up into a table-valued function, the costs in my case are about 2:1 in favour (the TVF is successfully ‘optimized away’).
In both strategies SQL has to look at each-and-every row, so I’m surprised it’s that different, but I guess maybe the Hash Match is optimized for larger amounts of data (on the right side), and can’t match a CASE for trivial lookups. There’s also the cardinality to consider: a CASE devolves to a COMPUTE SCALAR, which means no change in the number of rows, but a Hash Match might output a row more than once (or drop rows) based on the number of matches on the right side of the query. I’m guessing this constraint means there’s a fundamentally cheaper way to process the data in memory.
Here’s the before and after, with a relatively small number of rows (70,000 on my dev box, but it’s near exactly the same with 100+ million in the test environment):
Now you’ll have to excuse me, because I have some views to change…