Wednesday, May 23, 2018

FP = Functional PowerShell

I have been writing a lot of F# the last few years, and this has made me look critically at some of my PowerShell scripts. While we may use pipelines all the time in PowerShell, often we still treat it as yet-another-curly-bracket-language when it comes to flow control.

But this is our preconceptions letting is down, because PowerShell is already much more expression orientated than that. So, for example, 'if' and 'switch' in PowerShell are (since v2 or something) expressions, and not just pure imperative branching.

This means that - rather than write this:

# imperative version
if ($inputValue % 2 -eq 0){
    $result_a = 'even'
}else{
    $result_a = 'odd'
}
Write-Host "Result_A is now $result_A"
... you could instead write the far superior:

# expression-orientated version
$result_b = `
    if ($inputValue % 2 -eq 0){
        'even'
    }else{
        'odd'
    }

Write-Host "Result_B is now $result_b"
This works with 'switch' also:

# expression-orientated switch version
$c = `
    switch($inputValue){
        1 { 'One' }
        2 { 'Two' }
        default { 'Other' } 
    }

Write-Host "C is now $c"
Sure, PowerShell isn't going to give you any nice warnings about not providing output values on all branches and so forth, but the latter version is just plain *easier to follow*.

Wednesday, March 09, 2016

Still no migration path from PowerPivot + PowerQuery to SSAS Tabular in SQL 2016 (RC0)

PowerPivot is a great product, and PowerQuery can be fairly awesome as well, but when you run out of steam running your spreadsheet in Excel/SharePoint you are a bit screwed, because if you convert a PowerPivot model to SSAS Tabular (using the SSDT ‘Import from PowerPivot’ project) then all your PowerQuery tables are converted into pasted data because SSAS can’t talk PowerQuery.

So in order to ‘upsize’ between PowerPivot and SSAS Tabular you have to:

  • Re-author all your PowerQueries as traditional SSIS ETLs (or similar)
  • Land the data in a relational database
  • Change your PowerPivot model to source from those tables, rather than from PowerQuery.
  • This actually involves recreating them all, since for tables sourced from PowerQuery, the connection type can’t be changed.
  • And the calculated columns and measures
  • And the formatting and sort orders
  • etc…

That looks a lot like ‘rewrite from scratch’ to me, which is a pretty poor option (and a major gotcha with the PowerPivot/PowerQuery approach). So I was pleased to read (somewhere I can’t find now) that this will be addressed in the SQL 2016 timeframe, with PowerQuery supported as a data source for SSAS[1], SSRS[2] and SSIS[3].

Only… seems like it’s actually not.

I’ve been doing a trial of SQL 2016 using CTP3.3 and RC0, to determine if this fixes an issue we had with PowerPivot KPIs, and it seems like it does. However, if SSIS or SSAS can source from PowerQuery I’m blowed if I can see where that functionality is, and the release notes have been very quiet on this front.

The only concrete thing I’ve found is this tantilizing (and presumably unintentional) bit in the SQL 2016 Preview site’s Deeper Insights Across Data white paper:

image

I stress that’s not my highlighting :-(

 

[1] Can’t find where I originally got this idea from. May have just got the wrong end of the wrong stick
[2] See
https://gqbi.wordpress.com/2015/05/07/bi-nsight-sql-server-2016-power-bi-updates-microsoft-azure-stack/. That being said, SSRS can use SSIS as a source, so if that was to be implemented…
[3] See
https://social.msdn.microsoft.com/Forums/en-US/68a06982-4166-4ac9-93c0-3c247a9c64a7/powerquery-within-ssis-in-sql-2016?forum=sqlintegrationservices and http://sqlmag.com/blog/what-coming-sql-server-2016-business-intelligence or just vote for https://connect.microsoft.com/SQLServer/Feedback/Details/1046883

Tuesday, May 19, 2015

Driving RGB LED strips from the Raspberry Pi - A Brief Overview

In 2013 I put my first set of programmable Christmas lights together, a very simple project which involved hanging a 5m addressable RGB LED strip (based on LDP8806) outside my house, driven by a Raspberry Pi and my first ever experiments in Python. It was awesome, and because the RPi was on WiFi, I could SSH from my bed (where I had clear view of the lights outside the house) and incrementally tweak the light sequence, or add extra patterns I'd dreamt up during the day.

The LDP8806 strips were a simple way to get started, because the protocol they speak is basically SPI. Since the RPi has hardware SPI support, this means talking to them is pretty much as simple as writing bytes to /dev/spidev0.0. You have to write the right bytes of course, but even that's pretty simple - write as many G,R,B bytes as you have pixels, then a trailing zero. And again, Adafruit have great tutorials with ready-to-roll sample code (including a gamma correction).

I was hooked. But what became immediately apparent was that 5m wasn't going to cut it. Wrapped round the 'Christmas tree' in a spiral it suddenly looked a lot smaller. I needed moar pixels.

So last year (2014) I started playing with WS2812-based lights (aka Neopixels). These have the advantage over the LDP8806's of being significantly cheaper per LED, however because of the tight timing requirements of their data signal, they are traditionally driven using a microprocessor like an Arduino (which isn't going to get preempted or multitask).


(The WS2812 LED, with integrated controller chip visible)

So the question arises, how to control the WS2812's from the Raspberry Pi. This will be the topic of the next post.

(Of course just when you thought you were on top of it all everything all changes again. Adafruit is now selling a range of LED strips under the brand 'DotStar' which claim to be a fair bit easier to talk to than WS2812, and as best I can tell from the RPi Python demo code they talk SPI).

Thursday, August 14, 2014

SMO issues with SQL 2012 SP2

After rolling out SQL 2012 SP2 (CU1) across all our environments, we noticed all our TeamCity builds were going a lot slower, or never completing. I didn't make the connection at first, but I eventually tracked it down to a SMO call in our install script that was taking a long time to return (actually I think it either deadlocks or never returns on our busy build server).

Firing up profiler I was able to confirm that the SMO call in question (querying a database's DatabaseSnapshotBaseName property) was resulting in multiple SQL queries hitting the server, one of which in particular was fairly massive. We were doing this in an inner loop, so the performance was fairly rubbish - we had a classic ORM-lazy-load issue.

Clearly we should have looked at SetDefaultInitFields before, but the performance was always good enough we didn't have to bother. Which makes me suspect strongly that either the defaults have changed with SP2 (CU1), or the 'get all properties for the database' query has changed to include additional data (probably something cloud related), which reduces the performance substantially in the 'get everything' case.

What's a bit nasty here is that you also have this query executed if you reference the smo JobServer object, since (through ILSpy) one of the first things that class's constructor does is access the StringComparer property for the MSDB database object:
this.m_comparer = parentsrv.Databases["msdb"].StringComparer;
This object depends on a couple of server properties, but I think 'Collation' is the key one here - if that's not loaded then the the newer 'all properties' query kicks in, which is again much slower than it was before.

Moral: be sure to add 'Collation' to your SetDefaultInitFields() list for Databases.

Tuesday, July 22, 2014

'Specified wildcard pattern is invalid' with Octopus Deploy and PSake

So we upgraded to Octopus Deploy v2, and all our deployments are now failing with this error:

22/07/2014 11:51:08 AM: An Error Occurred:
Info 11:51:08
Test-Path : Cannot retrieve the dynamic parameters for the cmdlet. The
Info 11:51:08
specified wildcard pattern is not valid:
Info 11:51:08
Octopus.Environment.MachinesInRole[myproject-sql-node]
At D:\Octopus\Applications\Test\ORDW.Staging.TPPS\7.0.310-trunk_1\Tools\Install
Info 11:51:08
\psake.psm1:357 char:17
Info 11:51:08
+ if (test-path "variable:\$key") {

Our install process is quite complex, so we use Psake to wrangle it. Integration between the two is relatively straightforward (in essence we just bind $octopusParameters straight onto psake's -properties), and I could see from the stack trace that the failure was actually happening within the PSake module itself. And given the error spat out the variable that caused the issue, I figured it was to do with the variable name.

Most of the variable names are the same as per Octopus Deploy v1, but we do now get some extra ones, in particular the 'Octopus.Environment.MachinesInRole[role]' one. But that's not so different from the type of variables we've always got from Octopus, eg: 'Octopus.Step[0].Name', so what's different?

Where psake is failing is where it pushes properties into variable scope for each of the tasks it executes as part of the 'build', and apparently it's choking because test-path doesn't like it. So I put together some tests to exercise test-path with different variable names, and find out when it squealed. This all works against the same code as runs in psake, ie:

test-path "variable:\$key"

$keyResult
aOk
a.bOk
a-bOk
b.aOk
b-aOk
a.b.cOk
a-b-cOk
c.b.aOk
c-b-aOk
a[a]Ok
a[a.b]Ok
a[a-b]Ok
a[b.a]Ok
a[b-a]Cannot retrieve the dynamic parameters for the cmdlet. 
The specified wildcard pattern is not valid: a[b-a]
a[a.b.c]Ok
a[a-b-c]Ok
a[c.b.a]Ok
a[c-b-b]Cannot retrieve the dynamic parameters for the cmdlet. 
The specified wildcard pattern is not valid: a[c-b-b]
a[b-b-a]Ok

I've highlighted the failure cases, but what's just as interesting is which cases pass. This gives a clue as to the underlying implementation, and why the failure happens.

To cut a long story short, it appears that any time you use square brackets in your variable name, PowerShell uses wildcard matching to parse the content within the brackets. If that content contains a hypen, then the letters before and after the first hyphen are used as a range for matching, and the range end is prior to the range start (ie: alphabetically earlier), you get an error.

Nasty.

It's hard to know who to blame here. Octopus makes those variables based on what roles you define in your environment, though I'd argue the square brackets is potentially a bad choice. PSake is probably partly culpable, though you'd be forgiven for thinking that what they were doing was just fine, and there's no obvious way of supressing the wildcard detection. Ultimately I think this is probably a PowerShell bug. Whichever way you look at it, the chances of me getting it fixed soon are fairly slight.

In this case I can just change all my Octopus role names to use dots not hypens, and I think I'll be out of the woods, but this could be a right royal mess to fix otherwise. I'd probably have to forcefully remove variables from scope just to keep PSake happy, which would be ugly.

Interestingly the documentation for Test-Path is a bit confused as to whether wildcard matching is or isn't allowed here - the description says they are, but Accept wildcard characters' claims otherwise:

PS C:\> help Test-Path -parameter:Path

-Path 
    Specifies a path to be tested. Wildcards are permitted. If the path includes spaces, enclose it in quotation marks. The parameter 
    name ("Path") is optional.
    
    Required?                    true
    Position?                    1
    Default value                
    Accept pipeline input?       True (ByValue, ByPropertyName)
    Accept wildcard characters?  false

Also interesting is that Set-Variable suffers from the same issue, for exactly the same cases (and wildcarding definitely doesn't make any sense there). Which means you can do this:
${a[b-a]} = 1
but not this
Set-Variable 'a[b-a]' 1
Go figure.


Update 23/7:
  • You can work around this by escaping the square brackets with backticks, eg Set-Variable 'a`[b-a`]'
  • I raised a Connect issue for this, because I think this is a bug
  • I've raised an issue with Psake on this, because I think they should go the escaping route to work around it.

Tuesday, April 15, 2014

It's time to bring Data Tainting to the CLR

Last week's Heartbleed bug once again exposes the shaky foundations of current software development processes. Decades-old issues such as inadvertent use of user-input values, and unintended access to memory (whether read or write) continue to be critical risks to the infrastructure on which our economies are increasingly reliant.

How do we deal with these risks? We implore developers to be more careful. This is clearly not working.

Risk-exposed industries (like resources) have mature models to categorize risk, and rate the effectiveness of mitigation strategies. Procedural fixes (telling people not to do something, putting up signs) rate at the bottom, are ranked lower than physical guards and such like. At the top are approaches that address the risk by doing something fundamentally less risky - engineering the risk away. As an industry, we could learn a thing or two here.

At times we have. The introduction of managed languages all but eliminated direct memory access bugs from those environments - no more buffer overruns, or read-after-free - but did little or nothing to address the issue of user input . And yet this is arguably the more important of the two - you still need untrustworthy data to turn direct memory access into a security hole. We just moved the problem elsewhere, and had a decade of SQLi and XSS attacks instead.

I think it's time to fix this. I think it's time we made the trustworthiness of data a first-class citizen in our modern languages. I think it's time to bring Data Tainting to the CLR.

Data Tainting is a language feature where all objects derived[1] from user input are flagged as 'tainted' unless explicitly cleared, a bit like that infamous 'downloaded from the internet' flag that block you running your downloads. Combined with other code paths asserting that their arguments are untainted, this can largely eliminate the problem of unsanitized user input being inadvertently trusted.

Taint Checking is not a new idea, it's just not very common outside of academia[2] (Perl and Ruby are the only extant languages I know of that support it, having strangely failed to take hold in JavaScript after Netscape introduced it last century). But it's exactly what we need if we are to stop making the same security mistakes, over and over again.

It has bugged me for over a decade that this never took off, but the last straw for me was questions like this on security overflow: Would Heartbleed have been prevented if OpenSSL was written in (blah) ... Why? Because the depressing answer is no. Whilst a managed implementation of TLS could not have a direct memory scraping vulnerability, the real bug here - that the output buffer sizing was based on what the client wrote in the input header - is not prevented. So the flaw could still be misused, perhaps allowing to DOS the SSL endpoint somehow.

Raw memory vulnerabilities are actually quite hard to exploit: you need to know a bit about the target memory model, and have a bit of luck too. Unsanitized input vulnerabilities, once you know about the flaw, are like shooting fish in a barrel: this input string here is directly passed to your database / shown to other users / used as your balance / written to the filesystem etc... The myriad ways we can find to exploit these holes should not be a testament not to our ingenuity, but highlight an elephant in the room: it's still too hard to write secure code. Can we do more to fall into the pit of success? I think so.

Implementing taint checking in the CLR will not be a trivial task by any means, so Microsoft are going to take a fair bit of persuading that matters enough to commit to. And that's where I need your help:
  • If any of this resonates with you, please vote for my user voice suggestion: bring Data Tainting to the CLR
  • If you think it sucks, comment on it (or this post) and tell me why
Imagine a world where Heartbleed could not happen. How do we get there?

Next time: what taint checking might look like on the CLR.

[1] Tainting spreads through operators, so combining tainted data with other data results in data with the taint bit set.
[2] Microsoft Research did a good review of the landscape here, if you can wade through the overly-theoretical bits: http://research.microsoft.com/pubs/176596/tr.pdf

Thursday, April 10, 2014

Could not load file or assembly ApprovalTests

If you get the following error when running ApprovalTests...
Could not load file or assembly 'ApprovalTests, Version=3.0.0.0, Culture=neutral, PublicKeyToken=11bd7d124fc62e0f' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
...make sure the assembly that your tests are in is not itself called ApprovalTests. As perhaps it might be if you were just demoing it to someone quickly. Doh!

Popular Posts