Tuesday, April 15, 2014

It's time to bring Data Tainting to the CLR

Last week's Heartbleed bug once again exposes the shaky foundations of current software development processes. Decades-old issues such as inadvertent use of user-input values, and unintended access to memory (whether read or write) continue to be critical risks to the infrastructure on which our economies are increasingly reliant.

How do we deal with these risks? We implore developers to be more careful. This is clearly not working.

Risk-exposed industries (like resources) have mature models to categorize risk, and rate the effectiveness of mitigation strategies. Procedural fixes (telling people not to do something, putting up signs) rate at the bottom, are ranked lower than physical guards and such like. At the top are approaches that address the risk by doing something fundamentally less risky - engineering the risk away. As an industry, we could learn a thing or two here.

At times we have. The introduction of managed languages all but eliminated direct memory access bugs from those environments - no more buffer overruns, or read-after-free - but did little or nothing to address the issue of user input . And yet this is arguably the more important of the two - you still need untrustworthy data to turn direct memory access into a security hole. We just moved the problem elsewhere, and had a decade of SQLi and XSS attacks instead.

I think it's time to fix this. I think it's time we made the trustworthiness of data a first-class citizen in our modern languages. I think it's time to bring Data Tainting to the CLR.

Data Tainting is a language feature where all objects derived[1] from user input are flagged as 'tainted' unless explicitly cleared, a bit like that infamous 'downloaded from the internet' flag that block you running your downloads. Combined with other code paths asserting that their arguments are untainted, this can largely eliminate the problem of unsanitized user input being inadvertently trusted.

Taint Checking is not a new idea, it's just not very common outside of academia[2] (Perl and Ruby are the only extant languages I know of that support it, having strangely failed to take hold in JavaScript after Netscape introduced it last century). But it's exactly what we need if we are to stop making the same security mistakes, over and over again.

It has bugged me for over a decade that this never took off, but the last straw for me was questions like this on security overflow: Would Heartbleed have been prevented if OpenSSL was written in (blah) ... Why? Because the depressing answer is no. Whilst a managed implementation of TLS could not have a direct memory scraping vulnerability, the real bug here - that the output buffer sizing was based on what the client wrote in the input header - is not prevented. So the flaw could still be misused, perhaps allowing to DOS the SSL endpoint somehow.

Raw memory vulnerabilities are actually quite hard to exploit: you need to know a bit about the target memory model, and have a bit of luck too. Unsanitized input vulnerabilities, once you know about the flaw, are like shooting fish in a barrel: this input string here is directly passed to your database / shown to other users / used as your balance / written to the filesystem etc... The myriad ways we can find to exploit these holes should not be a testament not to our ingenuity, but highlight an elephant in the room: it's still too hard to write secure code. Can we do more to fall into the pit of success? I think so.

Implementing taint checking in the CLR will not be a trivial task by any means, so Microsoft are going to take a fair bit of persuading that matters enough to commit to. And that's where I need your help:
  • If any of this resonates with you, please vote for my user voice suggestion: bring Data Tainting to the CLR
  • If you think it sucks, comment on it (or this post) and tell me why
Imagine a world where Heartbleed could not happen. How do we get there?

Next time: what taint checking might look like on the CLR.

[1] Tainting spreads through operators, so combining tainted data with other data results in data with the taint bit set.
[2] Microsoft Research did a good review of the landscape here, if you can wade through the overly-theoretical bits: http://research.microsoft.com/pubs/176596/tr.pdf

Thursday, April 10, 2014

Could not load file or assembly ApprovalTests

If you get the following error when running ApprovalTests...
Could not load file or assembly 'ApprovalTests, Version=3.0.0.0, Culture=neutral, PublicKeyToken=11bd7d124fc62e0f' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
...make sure the assembly that your tests are in is not itself called ApprovalTests. As perhaps it might be if you were just demoing it to someone quickly. Doh!

Saturday, March 15, 2014

Another PowerShell Gotcha - Order of precidence and type inference

There was a fun moment at last week's Perth Dot Net User Group when the presenter (demonstrating Octopus Deploy's new scripting console), put in the following PowerShell:

> Write-Host [Environment]::MachineName

And got back (literally)

[Environment]::MachineName

...which wasn't what he expected at all. It's worth pausing to consider why this happens.

Remember of course that when passing strings as arguments in PowerShell, the quotes are optional - it would make a fairly poor shell replacement otherwise. So for example the following is completely valid:

> Write-Host Hello!

This is the reason[1] why variables in PowerShell have the dollar prefix - it makes their use in binding fairly unambiguous (just as the @ symbol in Razor achieves the same thing).

> $a = 'Moo'
> Write-Host $a

Moo

If you really wanted to write '$a' you'd have to enclose it in single quotes (as I just did) or escape the dollar symbol.

Anyway back to the original problem, you can see that PowerShell has two possible ways of interpreting

> Write-Host [Environment]::MachineName

...and since it doesn't start with a $, you get the 'bind to as an object' behavior, which - in this case - gives you a string (since it's clearly not a number).

What you really wanted was one of the following:

> Write-Host ([Environment]::MachineName)
> Write-Host $([Environment]::MachineName)

SOMECOMPUTERNAME

They both give the intended result, by forcing the expression within the brackets to be evaluated first (which on its own is unambiguous to the parser), and then passing the result of that as an argument to the bind for Write-Host.

This is really important trick to know, because will otherwise bite you again and again when you try and call a .Net method, and attempt to supply a parameter via an expression, for example:

$someClass.SomeMethod($a.length -1)
...when what you need to say is

$someClass.SomeMethod(($a.length -1))
Key take-home: When in doubt, add more brackets[2]

[1] presumably
[2] parenthesis

Saturday, May 25, 2013

Sql trivia: BINARY_CHECKSUM

You know how in SQL Books online the doco tells you to be wary of using CHECKSUM and BINARY_CHECKSUM functions because they will miss some updates. Here's a trivial example:

select 
 BINARY_CHECKSUM(A),
 BiggestInt,
 BINARY_CHECKSUM(BiggestInt)
from (
 select cast(null as int) as A,
 Power(cast(2 as bigint),31) -1 as BiggestInt
) x

Which returns 2147483647 in all 3 cases. So essentially BINARY_CHECKSUM is blind to the difference between a null and an int(max value). Which is fair enough... but does illustrate the point that even if you are only doing a checksum on a single nullable 4 byte field, you can't stuff it into 32 bits without getting at least one collision.

Thursday, March 14, 2013

Google Reader In Memoriam

So, Farewell Then Google Reader
Your feed
Has expired
Like Bloglines,
Which you killed
Off
I guess
Now
We'll just have to use
Something else
Instead
Google Reader was 7 1/2
(With apologies to E.J.Thribb)

Installing the Analysis Services Deployment Utility

The Analysis Services Deployment Utility is a utility that can be used to deploy Analysis Services build outputs (.asdatabase files) to a server, or to generate the XMLA for offline deployment.

I've often used this as part of an automated installation process to push releases out into an integration environment, but on this project I wanted to perform this installation as part of a nightly build. It failed - because the utility (and it's dependencies) weren't installed on the build server.

I wasn't entirely sure what I needed to get it installed (and I was attempting to install the minimum amount of stuff). It turns out this tool is distributed as part of Sql Management Studio, not the SSDT/BIDS projects (as I'd previously assumed). Not sure if the Basic or Complete option is required, because I picked 'complete' and that fixed it.

Also, for 2012 the path has changed, and the utility is now at:

C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\Microsoft.AnalysisServices.Deployment.exe

Thursday, March 07, 2013

Installing BIDS project templates for SQL 2012

You can absolutely be forgiven for being confused about how to install the BIDS project templates for SQL 2012. They've moved to the new Sql Server Data Tools (SSDT), but Microsoft are themselves inconsistent about what SSDT is. To make matters worse, you've got Visual Studio 2010 and 2012 versions of each package, depending on what IDE you want to work in.

I'll attempt to clarify:

  • SSDT is only the updated SQL database project template, that replaces the Database Edition GDR / datadude stuff (and subsumes the SQL-CLR project type)
  • SSDT-BI is the other, ex-BIDS projects: SSIS, SSRS, SSAS
Unfortunately the SQL 2012 install media uses the term 'Sql Server Data Tools' to refer to both at the same time, and up-until-last-week the SSDT-BI project didn't exist outside of the SQL install media. Much confusion and delay. Hopefully the following guidance clears it up a bit:

If you only care about the SQL Server Database project


(eg: you are a C# developer, and your SQL database schema as a project in your solution)

Install the appropriate version of SSDT that matches the version of Visual Studio (2010 or 2012) you're using right now (or both if necessary):
Note these do not include the BIDS project templates (SSRS, SSIS, SSAS); they only include the new SQL Server Database project template.

If you are a BI developer, and want the lot

...in a Visual Studio 2010 Shell

Install the 'Sql Server Data Tools' component from the SQL 2012 install media. This gets you everything you need.
Optionally, also install the updated version of the SQL project template (only) by installing SSDT for Visual Studio 2010

...in a Visual Studio 2012 Shell

Install the standalone version of SSDT for Visual Studio 2012 (for the database project) and SSDT-BI for Visual Studio 2012 (for the SSRS, SSIS, SSAS templates)

...but don't know which shell to use

If you plan to create a single Visual Studio Solution (.sln) combining BIDS artifacts database projects and other project types (e.g. C# or VB projects), then that will determine your choice here. It's certainly easier working in just one IDE than having to have two open.

Otherwise just pick one. You might be swayed by some of the new VS 2012 features, then again you get the 2010 version already on the install media, so that option is less downloading. Given they shipped against one version, but now support the next as far as I can see they'll have to support both versions going forwards, at least for a couple of years.


Editors Note: This rewrite replaces the sarcastic rant I had here previously, which was quite cathartic, but not desperately helpful in navigating the landscape here.

Thursday, February 21, 2013

INotifyPropertyChanged: Worst. Interface. Ever

As any .Net UI developer will tell you, INotifyPropertyChanged is a fundamental part of 'binding' an object to a UI control. Without it binding is essentially one-way: changes in the control change the object, but if this has a ripple effect on other properties, or properties are changed by other 'below the UI' processes, the UI can't know to repaint. This is essentially an implementation of the Observer pattern[1].

Unfortunately it's not for free - you have to implement it yourself - and that's where the problems start. So much has been written on the pain of implementing INotifyPropertyChanged (INPC for short) that I need not repeat it all here. It's generated so many questions on StackOverflow you'd think it's due its own StackExchange site by now.

The principal complaints are around all the boilerplate code and magic strings required to implement, so for the sake of completeness I'll summarize some of the solutions available:

Pick one of these, stick with it and you're done. Ok, there's still a bit of griping about separation of concerns, and whether this is an anti-pattern, but you're done right?

No.

It's so much worse than that.

Do you remember the first time you implemented GetHashCode(), and later when you realized you'd done it wrong? And later when you really realized you'd done it wrong, that there was no good way you could override Equals() for a mutable object, and the sneaking realization that this whole problem existed only for the benefit of Hashtables? It's a bit like that.

What we have with INotifyPropertyChanged is an implicit contract, that is to say that a large part of the contract can't be formally defined in code. Which means you have to validate your implementation manually. In this case the implicit bit is about threading. INotifyPropertyChanged exists to support UI frameworks and (bizarrely, in this day-and-age) they are still single threaded, and can only execute on the thread that constructed them – including event handlers. Think about this a bit, and you will eventually conclude:
An object that implements INotifyPropertyChanged must raise the PropertyChanged event only on the thread that was originally used to construct any registered subscribers for that event

Now there's a problem[2].

Clearly this is something that's just not possible to check for at runtime, so your design has to cater for this. Passing objects that might have been bound to business logic that might mutate them? UI thread please. Adding an item into a collection that might be ObservableCollection? UI thread please. Doing some calculations in the background to pass back to an object that may have been bound? Marshal via UI thread please. And so on. And don't even get me started on what you do if you have two (or more) 'UI' threads[3].

This is a horrible, horrible creeping plague of uncertainty that spreads through your UI, where the validity of an operation can't be determined at the callsite, but must also take into account the underlying type of an object (violating polymorphism), where that object came from (violating encapsulation), and what thread is being used to process the call (violating all that is sacred). These are aspects that we just can't model or visualize well with current tooling, at least not at design time, and none of the solutions above will save you here.

So there you go. INotifyPropertyChanged: far, far worse than you imagined.


[1] ok, any use of .net events could be argued is Observer, but the intent here is the relevant bit: the object is explicitly signalling that it's changed state.
[2] Actually I've over-simplified, because you can have whole chains of objects listening to each other, and if any one of them is listened to by an object with some type of thread-affinity, that's the constraint you have to consider.
[3] Don’t try this at home. There are any number of lessons you’ll learn the hard way.

Popular Posts