This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Thursday, 31 May 2012

Important things to remember about the humble string

Strings, as known in C#, are a strange bunch of objects that don't quite seem to fit in with all the normal rules. We all use them, and they seem simple enough. However there are a bunch of rules that are really important to keep in mind.

Difference between string and String

Both string and String can be used when needed, and indeed neither trumps the other.  string is merely an alias for System.String which means that it's purely a shortcut and is only there for convenience. Both types compile to identical IL (ldstr) meaning they are totally identical during execution. In fact String isn't alone, other types have these too:

int, Int32
long, Int64
bool, System.Boolean
float, System.Single

the full list is here

They are a reference type

When you create a string, only the reference to that string (not the string itself) is stored in the variable.  I always found this confusing as you don't specifically create a new object in order to get a reference to one.  In fact when you first declare a string they are interned, which means they are only created once.  You can even access the reference to an interned string with String.Intern.

However equality operators compare the value

The default behaviour of the equality operators (==) is to compare the value that the variable holds so in terms of a reference type will only compare the reference not the value of the object.  This is not true however for strings as the equality operators have been overloaded to compare the value instead.  An example:
object a = new object();
object b = new object();

bool isObjectRefenceSame = a == b; //This is comparing the reference because they are reference    types

string a = "a";
string b = "b";

bool isStringSame = a == b; //This is comparing the value (even though string is a reference type) because the equality operators have been overloaded

They're also immutable

Because all strings are interned (see above) they also, by definition, have to be immutable (see the wikipedia article!) which means that once created a string cannot be modified in any way (ie can't be mutated).

Which is the reason, the whole reason, and nothing but the reason to why we have StringBuilder

Because strings are immutable when you do string concatination (+=) a new string needs to be created each time to store the result, which of course incurs a penalty.  StringBuilder side-steps this overhead by moving individual characters around prior to creating an instance of a string.

As you can see strings are not all that is apparent and have a few behaviours that really should be understood.