This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Wednesday, 27 August 2014

Why LINQ requires you to Func?

When LINQ appeared on our screens it brought along a requirement under the guise of 'Func' whenever you wanted to do anything substantial, such as supply the contents of a where clause:

What exactly is Func? What are we actually being asked for here?  This is a journey that begins with delegates.

First let's consider how we create an object:

As you can see this takes three steps:

  • We define the a class (Car)
  • We create a variable of that type
  • Then we create an instance of the class and set the variable to a reference of it

A bit simple? Stick with me...

Now lets see how we do this with delegates.  I invite you to spot the difference:

In both instances Car acted as a pointer to some functionality, the first was a reference to an instance of a class whereas the second was reference to a method.  You've spotted the difference.

Class type = Reference to a object
Delegate type = Reference to a method

Directly from the C# Spec:
1.11 Delegates

A delegate type represents references to methods with a particular parameter list and return type. Delegates make it possible to treat methods as entities that can be assigned to variables and passed as parameters.

That last sentence is extremely important.  Delegates allow us to pass references to methods in the same way we can pass references to objects.

This is the purpose of Func.  Instead of being forced to define our own delegate (what we did in the second example, named Car) we're provided with a definition of a delegate of which we have to provide a suitable reference.

Let me hammer this home:
Usually we call methods which a reference (or value of) an object, instead the Where method is requesting a reference to a method which is the same type of Func.
Lets see a concrete example:

Scenario: Select all names beginning with M

To do this I need to create a method that tests a given item, and return true if it begins with M:

However because of these features introduced in (introduced in C# 2*):

- Removing the awkward delegate syntax
- Anonymous methods, allowing you to define a delegate instance's action in-line

we can supply it in a way I suspect you're familiar with:

* Surprising isn't it, this has been available since 2005!

Hopefully you can see that the only difference is that we've compressed the method definition, moving towards a more fluid, expressive form of programming.  Which, in conjunction with anonymous methods is the only purpose of Func.

There are 17 varieties of Func, each one specifying a different amount of parameters to enable you to pick the right one for your task.  There is also a sibling Action which has the same purpose expect for not returning a value.

So...why not have a glance over your codebase? How many times have you used Func without giving it a second thought?  Even better, how could you produce your own methods that accept functionality as a parameter in the form of Func?

Thursday, 17 July 2014

Unassuming Unicode, the secret to characters on the web

Recently I got an e-mail with an interesting title:

How did they do that?

Just how did KLM insert an airplane into the subject of an e-mail? Unicode!

I needn't put a full description here, but unicode is the system that provides a unique identifier for every single character your computer is capable of displaying.  Yes Chinese, Yiddish, Maldivian, Airplane symbols, the lot!

So what does this look like under the hood?

To find out I copied the character into Notepad and saved it, ensuring I selected 'Unicode' as the encoding at the bottom of the 'Save As' dialog.

Then I viewed the raw binary of the file in a hex editor (I just happened to pick this online one).  The results were simply:

FF FE 08 27

What we're seeing here is the hexadecimal representation of the binary in the file.  You can confirm this using windows calculator in programming mode but for simplicity this is:

FF     11111111
FE     11111110
08     00001000
27     00100111

The first two bytes are telling us that is little-endian UTF-16, these are the byte order mark (BOM).  Endian (or endianness) simply tells us from which end we read the data first, which in this case means we read from right to left.

So doing this we now have (omitting the byte order marks):

27 08
Which just so happens to the unique identifier for the airplane symbol:

But why do you care about this?  You could've just copied and pasted the original symbol, right?

Well it just so happens that HTML encoding closely follows these unicode code points.  So if I wanted to use this character myself I'd want to be absolutely certain it'll render correctly.

To do this I'd first make sure my page is described as being encoded in unicode using the correct meta tag:
<meta charset="utf-8">
Then I can create the character using &#xnnnn where nnnnn is the unicode code point.  Therefore &#x2708 creates our airplane:

That's just one.  There are 109, 383 other characters out there, go and use 'em.

Saturday, 7 June 2014

Keeping your source, safe

Too many times now have I seen a fear of committing code, with many developers waiting until they are absolutely certain their code is damn near perfect before hitting commit.  I blame the terminology, commit sounds so final and carrying reputation consequences.  That's why I prefer to call them checkpoints:

A checkpoint is a point in time that you can return to - no matter what happens:

- Your hard drive fails
- You find yourself needing to backtrack
- You take a holiday
- You lose a 'life'

The more checkpoints you have, the more choice you're giving yourself in the future to return to.

That's why I advocate of checking your code in early and often. Don't worry if it's a work in progress, there are missing tests, it's not perfect. Check it in!

Of course I'm not advocating checking in crap, meaning there has to be rules:

- It should compile
- All tests pass
- You keep it on your own branch
- You include any new code since the last commit
- A commit message is nice (although not mandatory for every commit)

These are just common courtesy to your follow developers meaning they'll be able to pick up from where you left off for whatever reason.

Using source control like this keeps your code safe, provides an audit trail, and allows others to see your work.

Therefore I urge you commit often after all it's your branch.

Thursday, 27 March 2014

Reliance on implementation details

Recently I stumbled across an issue in a legacy app which didn't appear to make any sense.  The issue involved determining the precision of a Decimal which was giving different results for exactly the same value.

First of all I wrote a quick test to attempt to replicate the problem, which appeared to happen for 0.01:

This passed, then I'd noticed in a particular method call the signature was expecting a Decimal, but was instead being supplied a Float (yes option strict was off [1]), meaning the Float was being implicitly converted. Quickly writing a test incorporating the conversion:

Causes the issue:

It seems to think 0.01 is to 3 decimal places!

So what's going on here? How can a conversion affect the result of Precison()? Looking at the implementation I could see it was relying on the individual bits the Decimal is made up from, using Decimal.GetBits() to access them:

The result of Decimal.GetBits() is a 4 element array, of which the first 3 elements represent the bits that go to make up the value of Decimal.  However this method relies only on the fourth set of bits - which represents the exponent. In the first test the decimal value was 1 with exponent 131072, the failed test had 10 and 196608.

When converting to binary we see the difference more clearly, I've named them bitsSingle for the failed test and bitsDecimal for the passing test:

As you can see the exponent for bitsSingle is 3 (00000011) whereas the exponent for bitsDecimal is 2 (00000010), which represent negative powers of 10.

Looking back at the original numbers we can see how these both accurately represent 0.01:

bitsSingle has a value of 10, with an exponent of -3 = 10 -3
bitsDecimal has a value of 1, with an exponent of -2 = 10 -2

As you can see Decimal can represent the same value even though the underlying data differs. Precision() is only relying on the exponent and ignoring the value, meaning it's not taking into account the full picture.

But why is the conversion storing this number differently than when instantiated directly?  It just so happens that creating a new Decimal (which uses the Decimal constructor) uses a slightly different logic than that of the cast. So even though the number is correct, the underlying data is slightly different.

This brings us to the point of the article.  The big picture here is to remember that you should never rely on implementation details, rather only what can be accessed through defined interfaces.  Whether that be a webservice, reflection on a class, or peeking into the individual bits of a datatype.  Implementation details can not only change, but in the world of software - are expected to.

If you want to play around with the examples above I've uploaded them to GitHub.

[1]I know it's not okay and there isn't a single reason for this, however as usual with a legacy app we simply don't have the time / money to explicitly convert every single type in a 20,000 + loc project.

Wednesday, 18 December 2013

Highlights of the year (literally)

As the end of the year approaches, I thought it'd be prudent to make a list of all nuggets of advice and insight I've read this year:

Effective Programming: More Than Writing Code (Jeff Atwood)

It’s amazing how much you find you don’t know when you try to explain something in detail to someone else. It can start a whole new process of discovery.
There's no question that, for whatever time budget you have, you will end up with better software by releasing as early as practically possible, and then spending the rest of your time iterating rapidly based on real-world feedback. So trust me on this one: even if version 1 sucks, ship it anyway. 

Lehman's laws of software evolution
As an evolving program is continually changed, its complexity, reflecting deteriorating structure, increases unless work is done to maintain or reduce it.

Scrum: A breathtakingly Brief and Agile Introduction (Chris Sims, Hillary Louise Johnson)
The daily scrum should always be held to no more than 15 minutes. (Matt Asay)
Oracle has never been particularly community-friendly. Even the users that feed it billions in sales every quarter don't particularly love it.

The Art of Unit Testing: with Examples in .NET (Roy Osherove)
Finally, as a friend once said, a good bottle of vodka never hurts when dealing with legacy code.

Thursday, 17 October 2013

Recently I had the need to decode a Base64 string and make a PDF of it.  Usually I would've written a small utility app, but this time I rolled with powershell:

I'm impressed with how quickly I can knock out a script like this (yes they are .NET assemblies) without having to load a new VS solution. Of course a lot more could be done to this (file format via an argument for example) but I thought I'd share it raw as I know I'll need to use it again one day.

Monday, 26 August 2013

The myth of software development

When you're developing software, have you ever thought "once this feature is complete I'll be done"? I'm the first to admit that there is always an end point in sight, believing once I've reached it I'll be able to say I'm finished.

Well guess what... software can never be considered finished, don't believe me?  Then why is Windows XP still being updated almost 12 years after its initial release?

Psychologically a lot of people compare a software project with more traditional types projects such as construction, however they are completely incomparable:

  • Software only ever reaches a state of acceptable functionality
  • Software is infinitely malleable meaning it can never reach a state of 'done'

Both of these reasons, in the same way as proving they aren't comparable to construction, show that starting a software project again is very rarely the right choice - instead adapt the software into the new state of acceptable functionality.

This is because software is the cumulative sum of all previous work, even reasonably small products will be the culmination of many man years.  In addition users understand how it works and all of the quirks of the features, including how to use them to the organisations advantage.

Therefore no matter how much spaghetti, ill named and awkward that legacy project is, it is almost never the right decision to start again from scratch.

Which is exactly why code needs to be maintainable, because you'll almost certainly won't be the only person who has to look after it.  Using tools such as resharper can help with this, and great to transform a spaghetti-ridden legacy project (and you may even manage to get some unit test coverage!) into something you can work with.

Therefore next time you want to start again from scratch think very carefully, as it's almost never the right choice.