Cup(Of T): It's time to bring Data Tainting to the CLR

Last week's Heartbleed bug once again exposes the shaky foundations of current software development processes. Decades-old issues such as inadvertent use of user-input values, and unintended access to memory (whether read or write) continue to be critical risks to the infrastructure on which our economies are increasingly reliant.

How do we deal with these risks? We implore developers to be more careful. This is clearly not working.

Risk-exposed industries (like resources) have mature models to categorize risk, and rate the effectiveness of mitigation strategies. Procedural fixes (telling people not to do something, putting up signs) rate at the bottom, are ranked lower than physical guards and such like. At the top are approaches that address the risk by doing something fundamentally less risky - engineering the risk away. As an industry, we could learn a thing or two here.

At times we have. The introduction of managed languages all but eliminated direct memory access bugs from those environments - no more buffer overruns, or read-after-free - but did little or nothing to address the issue of user input . And yet this is arguably the more important of the two - you still need untrustworthy data to turn direct memory access into a security hole. We just moved the problem elsewhere, and had a decade of SQLi and XSS attacks instead.

I think it's time to fix this. I think it's time we made the trustworthiness of data a first-class citizen in our modern languages. I think it's time to bring Data Tainting to the CLR.

Data Tainting is a language feature where all objects derived[1] from user input are flagged as 'tainted' unless explicitly cleared, a bit like that infamous 'downloaded from the internet' flag that block you running your downloads. Combined with other code paths asserting that their arguments are untainted, this can largely eliminate the problem of unsanitized user input being inadvertently trusted.

Taint Checking is not a new idea, it's just not very common outside of academia[2] (Perl and Ruby are the only extant languages I know of that support it, having strangely failed to take hold in JavaScript after Netscape introduced it last century). But it's exactly what we need if we are to stop making the same security mistakes, over and over again.

It has bugged me for over a decade that this never took off, but the last straw for me was questions like this on security overflow: Would Heartbleed have been prevented if OpenSSL was written in (blah) ... Why? Because the depressing answer is no. Whilst a managed implementation of TLS could not have a direct memory scraping vulnerability, the real bug here - that the output buffer sizing was based on what the client wrote in the input header - is not prevented. So the flaw could still be misused, perhaps allowing to DOS the SSL endpoint somehow.

Raw memory vulnerabilities are actually quite hard to exploit: you need to know a bit about the target memory model, and have a bit of luck too. Unsanitized input vulnerabilities, once you know about the flaw, are like shooting fish in a barrel: this input string here is directly passed to your database / shown to other users / used as your balance / written to the filesystem etc... The myriad ways we can find to exploit these holes should not be a testament not to our ingenuity, but highlight an elephant in the room: it's still too hard to write secure code. Can we do more to fall into the pit of success? I think so.

Implementing taint checking in the CLR will not be a trivial task by any means, so Microsoft are going to take a fair bit of persuading that matters enough to commit to. And that's where I need your help:

If any of this resonates with you, please vote for my user voice suggestion: bring Data Tainting to the CLR
If you think it sucks, comment on it (or this post) and tell me why

Imagine a world where Heartbleed could not happen. How do we get there?

Next time: what taint checking might look like on the CLR.

[1] Tainting spreads through operators, so combining tainted data with other data results in data with the taint bit set.
[2] Microsoft Research did a good review of the landscape here, if you can wade through the overly-theoretical bits: http://research.microsoft.com/pubs/176596/tr.pdf

Cup(Of T)

Tuesday, April 15, 2014

It's time to bring Data Tainting to the CLR

No comments:

Popular Posts

Search This Blog

Previous Posts

Tags

Approved Stuff:

About Me

ORG