Rely on your experience and knowledge over some tools’ recommendations

Whenever .NET developers tell me they know C# really well, I can’t help myself but to wonder whether they mean 1) the programming language itself (its syntax, its grammar and its constructs), 2) the compiler or 3) both. The compiler is the heart of any programming language and if you respect it enough when you code, you can create some very optimal and productive things with it. On the other hand, if you only focus on the programming language and disregard the inner workings of the compiler, chances are that you risk affecting the health of your software…negatively.

For example, I tend to see a severe chronic problems amongst C# developers towards operations related to boxing and unboxing. Reading some newsgroups posts, blog posts’ comments and working with many .NET developers that have never heard these terms before made me aware that this chronic problem was pretty much widespread. This phenomena is even more real for developers that are too keen on using coding/refactoring tools, like ReSharper, because they tend to rely more on the suggestions of those tools instead of relying on deeper knowledge to decide whether or not a change in the code is due. Before building on this idea, I think it’ll be a good idea to refresh our concepts of boxing and unboxing in .NET. For this, I’ll share some notes from the excellent book on .NET programming, CLR via C# (second edition), by Jeffrey Richter.

Let us consider what happens when boxing occurs:

It’s possible to convert a value type to a reference type by using a mechanism called boxing. Internally, here’s what happens when an instance of a value type is boxed:

  1. Memory is allocated from the managed heap. The amount of memory allocated is the size required by the value type’s fields plus the two additional overhead members (the type object pointer and the sync block index) required by all objects on the managed heap.
  2. The value type’s fields are copied to the newly allocated heap memory.
  3. The address of the object is returned. This address is now a reference to an object; the value type is now a reference type.

The C# compiler automatically produces the IL code necessary to box a value type instance, but you still need to understand what’s going on internally so that you’re aware of code size and performance issues.

You see that I have put the last sentence in bold. What Jeffrey is saying there is extremely important, because some tools aren’t smart enough to handle every single mechanisms of the compiler, therefore “you still need to understand what’s going on internally so that you’re aware of code size and performance issues”. Let us move on.

Let us now consider what happens during unboxing:

Unboxing is not the exact opposite of boxing. The unboxing operation is much less costly than boxing. Unboxing is really just the operation of obtaining a pointer to the raw value type (data fields) contained within an object. In effect, the pointer refers to the unboxed portion in the boxed instance. So, unlike boxing, unboxing doesn’t involve the copying of any bytes in memory. Having made this important clarification, it is important to note than an unboxing operation is typically followed by copying the fields. Frequently, however, an unboxing operation is followed immediately by copying its fields. Internally, here’s exactly what happens when a boxed value type instance in unboxed:

  1. If the variable containing the reference to the boxed value type is null, a NullReferenceException is thrown.
  2. If the reference doesn’t refer to an object that is a boxed instance of the desired value type, an InvalidCastException is thrown.

Now that we have a general idea of what boxing and unboxing are and how they work, we should consider the following as an important principle:

Obviously, boxing and unboxing/copy operations hurt your application’s performance in terms of both speed and memory, so you should be aware of when the compiler generates code to perform these operations automatically and try to write code that minimizes this code generation.

In a nutshell you get boxing whenever a conversion from a value type to a reference type occurs. Conceptually, unboxing is the reverse operation where a conversion from a reference type (such as an instance of Object) to a value type occurs. In .NET, value types are any object that derives from System.ValueType, such as your familiar primitive types (Int32, Boolean, Char, Decimal, etc.), structures and enumerations. Reference types are classes, interfaces, events, delegates, exceptions, etc. (basically anything that doesn’t derive from System.ValueType). The key element to understand is not necessary the mechanisms involved by the compiler, but rather to be conscious that there is a performance hit whenever boxing and unboxing occurs. Since the introduction of generics with .NET 2.0, we don’t have to deal as much with such concepts because generics provide type safety and better performance by avoiding boxing/unboxing operations. But not everything in .NET is generic and it’s extremely important to discover those potential “danger zones” where boxing/unboxing operations can occur within the framework, especially if you’re using development tools like ReSharper.

I’m a big fan of JetBrains and all their products that help me be more effective as a developer. Their continuous integration tool, TeamCity, is in my opinion the most innovative and complete software to set up a CI environment. Their dotTrace profiling application is also one of those tools that is installed whenever I set up a fresh development environment. But the one tool that I cannot code without is none other than ReSharper. I believe so much in that tool for making me more productive and effective when I code that I even wrote a post about reasons why the enterprise should seriously consider ReSharper for its developers. Unfortunately, ReSharper isn’t perfect. Excellent yes, but not perfect. Which means that developers should still use their knowledge about the programming language and the compiler to really take advantage of the .NET platform. This statement is even more pronounced when potential boxing/unboxing situations can occur. For example, consider the following code:

private static void Main()
{
  string name = "Brian";
  int age = 27;
  System.Console.WriteLine("Hello, my name is " + name + " and I'm " + age + " years old.");
}

At first sight, it seems that the above code is right.  I would agree with you at some extend.  I agree that it compiles just fine and that it outputs the correct message on the console.  If you had ReSharper installed, it will also say that the code is well formatted and that you’re good to check it in (the feedback for this is a little green icon on the top right side of the code window in Visual Studio).  The underlying problem with this code is that your application gets a performance hit because a boxing operation has occurred.  In fact, if you view the IL emitted by the compiler you will see a box instruction which forces the CLR to perform boxing for the value type.  The reason for this is that implicitly, this overloaded version of Console.WriteLine() calls one of the overloaded versions String.Concat() which accepts Object parameters (a tool like Reflector or ILDASM can help you to see this for yourself).

This is the point where knowledge and experience kicks in.  The proper way to write the above code is as following:

private static void Main()
{
  string name = "Brian";
  int age = 27;
  System.Console.WriteLine("Hello, my name is " + name + " and I'm " + age.ToString() + " years old.");
}

The reason the code above is superior is that you get all the benefits of the first version without the performance hit because no boxing occurs (again you can check this for yourself by viewing the IL emitted by the compiler).  As a matter of fact, calling a ToString() method on a value type will tell the CLR to invoke that method instead on that particular instance instead of boxing the value type as a reference type and then calling Object’s ToString() on it.  If you call Console.WriteLine( age ) instead, you might be surprised to know that no boxing occurs because one of the overloaded versions of that method actually accepts an Int32 as a parameter. 

This is why I said that knowledge and experience counts because unless you know these nifty facts you could blindly follow your tools’ recommendations which might propose some false positives.  An example of a false positive can be ReSharper graying out the ToString() method call on the age variable and recommending you to remove it.  Most developers using ReSharper will blindly follow such recommendation and actually remove the invocation of ToString() on the value type by either hitting ALT+ENTER on the line and let ReSharper remove the invocation or by reformatting the whole code file by hitting CTRL+ALT+SHIFT+F.  Though the above example seems pretty negligent, you can imagine this kind of operation in a loop or anywhere else a boxing/unboxing operation might occur in a library or within the framework itself.  A simple profiling on the application should detect such a performance bottleneck.  In other words, don’t checkmate yourself by blindly following your tools recommendations!

Even though I only touched on the subject of boxing and unboxing, it’s important to understand that there are many other areas that this post’s ideas apply.  For example, knowing when to explicitly implement an interface, deciding when to favor composition over inheritance, justifying the encapsulation level of a class and its members, etc.  These are strategies that some tools can’t provide a hint for, therefore you should rely on your knowledge and experience on the platform and the programming paradigms (object-oriented, functional, etc.) currently used.  If you have the chance of working in a larger team, I highly recommend your team to conduct code review sessions so that each team member can leverage each other’s knowledge and experience on the platform.  On the other hand, if you don’t have the privilege of participating in code reviews, or even worse if your organization doesn’t encourage such a practice, then you’re pretty much on your own, but you’re not alone!  In that case, I recommend you to read the following books which will no doubt make you a better C#/.NET programmer:

This post has been viewed: 1189 times. kick it on DotNetKicks.com

 

Similar posts you might be interested in reading:

4 Comments

  1. Jon Skeet:

    I’d say that in this case the call to ToString() clutters up the code and make *almost no* difference to performance. The readability cost of avoiding boxing is worse than the performance hit of boxing.

    What do you think is really going to be the significant cost here – a boxing operation, or writing to the console? In my experience, the cost of boxing is significantly overstated. It’s there, and like any other cost it should be avoided where you can write non-boxing code which is just as readable, but it shouldn’t be given as much prominence as it tends to be. I analysed an example of this from “CLR via C#” in this blog post:

    http://msmvps.com/blogs/jon_skeet/archive/2008/10/08/why-boxing-doesn-t-keep-me-awake-at-nights.aspx

    Having said that, thanks for plugging my book :) (And I can thoroughly recommend “CLR via C#” in general – I have yet to read “Effective C#”.)

  2. Brian Di Croce:

    @Jon. You’re right on the readability of the code and as always there’s a tradeoff when factors such as performance, security, readability, maintanability, performance, etc., come into account. My intention wasn’t to primarily deal with boxing/unboxing with that little example of writing to the console, but rather to know that a boxing operation will occur even though a tool (like ReSharper in this case) will recommend a “false positive” action, such as not calling ToString() on the value type. I think that if all the members that participate in the code review sees no problem with this, then it’s ok. I just don’t want too many developers follow the recommendations of some tool just because they might be ignorant on the fact of what’s going on the background, i.e., boxing/unboxing in this case.

    Thanks for your comment and also for bringing into account other aspects to the code such as readability.

  3. Jon Skeet:

    Yes, it’s definitely worth being aware of it to start with. Mind you, there are some odd cases where boxing might happen unexpectedly – or might *not* happen unexpectedly.

    In the tech review of C# in Depth, Eric Lippert commented: “Though, confusingly enough, calling an interface method on a struct does NOT box it. Casting a struct to an interface type DOES box it. That is, ((IFoo)s).Foo() boxes but s.Foo does not. Lots of people, including myself, get confused by this point.”

    If something confuses Eric, us mortals have little chance of getting it right all the time :) It gets even weirder with generics and constraints, IIRC. But yes, the “simple” cases should be well understood – and a developer should (IMO) have a solid grasp of the language they’re using. It doesn’t have to be perfect, but solid enough.

    My pet peeve at the moment (well one of them) is people using query expressions even when they’re only doing a “select”, or only a “where” followed by a no-op select. Using dot notation is almost always more readable there – but because many developers don’t understand the pure syntactic sugar of query expressions, they’re afraid to move away from it. The good news is that I think the online community is gradually picking up on it. Of course, there’s the main body of the developer iceberg under the surface – those who never read blogs, ask questions on newsgroups etc :(

  4. haile:

    more reference:
    Troelsen explains the baggage of performance issues (in
    both speed of execution and code size) and a lack of type safety in his book “Pro C# 2008 and the NET 3.5 Platform 4th Edition: page 319.”

Leave a comment

Powered by WP Hashcash