Posts tagged ‘C#’

String vs StringBuilder for the .NET Concatenation Performance Championship

Does your code compiles successfully? Good. How about your application’s performance? Could it use some improvement? Did you take some time in answering these questions before releasing it to the Quality Control team or to a potential customer?

You must’ve heard and read about it many times over since the .NET framework was shipped in early 2002. Unfortunately, I still see some .NET developers using the String class as a concatenation harlot in some contexts where using a StringBuilder would be a better match instead.

This post is about knowing when and why you should use the StringBuilder’s appending function in some cases and the String’s concatenation function in others. I’ll be using Red Gate’s ANTS Profiler to profile a simple C# console application to see exactly how much execution time was spent in each string concatenation method for both the String class and the StringBuilder class.

The StringBuilder Class

Before going further into the discussion, I think it is worthwhile to define the purposes of each class. Let us start with the StringBuilder class.

According to the .NET Framework API MSDN documentation, the StringBuilder class represents a mutable string of characters. In order not to reinvent the wheel, let us see what the remarks are about this class [1]:

This class represents a string-like object whose value is a mutable sequence of characters. The value is said to be mutable because it can be modified once it has been created by appending, removing, replacing, or inserting characters.

Most of the methods that modify an instance of this class return a reference to that same instance. Since a reference to the instance is returned, you can call a method or property on the reference. This can be convenient if you want to write a single statement that chains successive operations one after another.

A StringBuilder object maintains a buffer to accommodate the concatenation of new data. New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer.

The String Class

As for the String class, the MSDN documentation has these remarks to say about it [2]:

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System..::.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable.

A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.

Performance Analysis

As stated in the beginning of this post, I am using a very simple application to note the performance differences between the StringBuilder class and the String class when performing multiple concatenations. The application exercises both these classes in two ways.

Concerning the String class, I am performing the following code with a different number of concatenations defined by XX:

    private static void ConcatenateString_XX_Times()
    {
      string s = string.Empty;

      for (int i = 0; i < XX; i++)
      {
        s += "test";
      }
    }

Concerning the StringBuilder class, I am performing the following code with a different number of concatenations defined by XX:

    private static void ConcatenateStringBuilder_XX_Times()
    {
      StringBuilder sb = new StringBuilder();

      for (int i = 0; i < XX; i++)
      {
        sb.Append("test");
      }
    }

NOTE: In both cases, XX defines a number with the following values: 1 to 20 inclusively, 100, 1000, 10 000 and 100 000.

The following figure shows a summary generated by ANTS Profiler which represents the execution time (in seconds) each method took to perform on a Dell Inspiron 9300 (1.86 GHz Pentium M CPU with 2GB ram).


Figure 1. A summary of the execution time it took to run each method
Results generated by ANTS Profiler 3.1.0

Here's the same representation shown in Table 1 (for the String class) and Table 2 (for the StringBuilder class). I have omitted showing the results for the first 20 concatenations because both the String and StringBuilder seem to have taken approximately the same time to perform their corresponding methods (compare lines 11-30 for the String class and lines 35-54 for the StringBuilder class).

Table 1. Summary of the time taken to run each concatenation method for the String class for 100, 1000, 10 000 and 100 000 string concatenations

# of concatenations 100 1000 10 000 100 000
Time (sec.) 0.0004 0.0145 0.785 189

Table 2. Summary of the time taken to run each concatenation method for the StringBuilder class for 100, 1000, 10 000 and 100 000 string concatenations

# of concatenations 100 1000 10 000 100 000
Time (sec.) 0.0003 0.0029 0.0295 0.297

Table 3. Proportional differences between the String and StringBuilder classes for 100, 1000, 10 000 and 100 000 concatenations

# of concatenations Observations
100 We can say that the StringBuilder is taking approximately the same time than the String class
1000 The StringBuilder is 5 times faster than the String
10 000 The StringBuilder is 27 times (approx.) faster than the String
100 000 The StringBuilder is 636 times faster than the String

We can interpret the above table as follows:

  • In the first round, the StringBuilder and the String classes are pretty well the same when concatenating 100 strings in a for loop.
  • In the second round, we are multiplying the previous number of concatenations by 10 (100 x 10 = 1000). As we can see, the StringBuilder is taking approximately 10 (0.0029/0.0003 = 9.66, which we'll round to 10 for simplifying our reasoning) times as much time than the first round to concatenate 1000 strings (following a direct proportional line right? In fact, 10 times more strings to concatenate => approximately 10 times more time to complete, therefore a 1:1 ratio) In contrast, the String class is taking 36.25 (0.0145/0.004 = 36.25) times as much time than the first round to concatenate 1000 strings. Therefore, we can say that for 1000 string concatenations, the StringBuilder is 362.5% faster than the String class! Let us continue...
  • In the third round, we are again multiplying the previous number of concatenations by 10 (1000 x 10 = 10 000). As we can see, the StringBuilder is taking (again) approximately 10 (0.0295/0.0029 = 10.1724, which we'll round to 10 again for simplifying our reasoning) times as much time than the previous round to concatenate 10 000 strings (the ratio 1:1 still applies!) In contrast, the String class is taking 54.14 (0.784/0.0145 = 54.14 approximately) times as much time than the previous round to concatenate 10 000 strings. Therefore, we can say that for 10 000 string concatenations, the StringBuilder is 541.4% faster than the String class! Let us continue...
  • In our last round, we are yet again multiplying the previous number of concatenations by 10 (10 000 x 10 = 100 000). As we can see, the StringBuilder is taking (AGAIN!) approximately 10 (0.297/0.0295 = 10.0678) times as much time than the previous round to concatenate 100 000 strings (the ratio 1:1 still applies!) In contrast, the String class is taking 240.76 (189/0.785 = 240.76 approximately) times as much time than the previous round to concatenate 100 000 strings. Therefore, we can say that for 100 000 string concatenations, the StringBuilder is 2407.6% faster than the String class! Ok, that's enough for now.

In computer science, we use the Big-O notation to evaluate an algorithm's performance. For instance, we'd say that up to 100 000 string concatenations (because that's how far we've gone with our tests), the StringBuilder's Append method gives us O(n) or linear performance. On the other hand, due to the immutable nature of the String class, everytime we concatenate a string to it in a loop, we end up with roughly O(n2) or quadratic performance, which in itself is a performance disaster compared to O(n). This observation is only valid for our context. We should try the same test with more than a million string concatenations to see if our observation still fits.

In order to respect the DRY principle, I'm not going to show the IL code generated when using a StringBuilder's Append method or a String's Concat method because you can find such information on several posts (see the Reference section at the end of this post) alread, but I do recommend to take some time to view the internals when invoking both of these methods using a tool like ILDASM or Reflector.

Discussion and conclusion

Some people that wrote a post on the same subject recommend to use the StringBuilder class over the String class if you are concatenating a predetermined number of strings. For instance, Mahesh Chand recommends to use the StringBuilder if you have to concatenate a string more than 10 times and he supports his recommendation with a pretty simple and realistic demo. In his excellent article on the same subject, Jouni Heikniemi concludes that there's a "magic number" that helps deciding when to use the StringBuilder's concatenation over the String's, and that number is between four and eight concatenations. Finally, in his thorough and very well detailed article on the same subject, David Cumps concludes with the following:

  • If you can avoid concatenating, do it!This is a no brainer, if you don't have to concatenate but want your source code to look nice, use the first method. It will get optimized as if it was a single string.
  • Don't use += concatenating ever.Too much changes are taking place behind the scene, which aren't obvious from my code in the first place. I advise to rather use String.Concat() explicitly with any overload (2 strings, 3 strings, string array). This will clearly show what your code does without any surprises, while allowing yourself to keep a check on the efficiency.
  • Try to estimate the target size of a StringBuilder.The more accurate you can estimate the needed size, the less temporary strings the StringBuilder will have to create to increase its internal buffer.
  • Do not use any Format() methods when performance is an issue.Too much overhead is involved in parsing the format, when you could construct an array out of pieces when all you are using are {x} replaces. Format() is good for readability, but one of the things to go when you are squeezing all possible performance out of your application.

Though I firmly agree with the previous recommendations, I think that we should also consider what the MSDN documentation has to say about the performance and memory allocation of both classes:

The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs. A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated.

In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input. [1]

If it is necessary to modify the actual contents of a string-like object, use the System.Text..::.StringBuilder class. [2]

My conclusion is that you shouldn't use the String class to concatenate multiple strings just to properly align and format your code (the visual aspect of the code). Even if you're concatenating 8, 9 or 10 strings inside a method, prefer the StringBuilder's Append method, especially if that method will be invoked repetitively.

I also believe that using tools like Red Gate ANTS Profiler, JetBrains dotTrace Profiler and the Microsoft .NET CLR Profiler to analyze your application's performance, and memory allocation will help you further in improving your design and implementation code, as well as pinpointing those bottlenecks in your application.

Sometimes a successful compilation just isn't enough.

References

  1. "StringBuilder Class", MSDN Documentation, Microsoft Corp.
  2. "String Class", MSDN Documentation, Microsoft Corp.
  3. "StringBuilder and String Concatenation" by Mahesh Chand (2002)
  4. ".NET String vs. StringBuilder - Concatenation Performance" by Jouni Heikniemi (2004)
  5. "String Concatenation vs Memory Allocation" by David Cumps (2007)
  6. "Immutable types: understand their benefits and use them" by Patrick Smacchia (2008)
  7. "Everything Is Fast For Small n" by Jeff Atwood (2007)
  8. "Big-O Notation", Wikipedia

Explicit Interface Members Implementation in C#

A popular and accepted definition of craftsmanship is

Aptitude, skill, or manual dexterity in the use of tools or material. Taking time to make sure a project is done well.

The tools or materials involved in software development are whatever you need and use to build your software product well. These might include various things such as IDEs, compilers, CASE tools, frameworks, APIs, external tools, and much more. These kinds of tools and materials are things that you can buy or try, and then use them as a black box, without much knowledge on their intrinsic mechanics. You know what they’re supposed to do, so you use them in a specific context.

The aptitude, skill, or manual dexterity part of the definition is more subtle, as it can mean a lot of things to a lot of people, depending on the experience and knowledge of each individual. I believe that your attitude towards your work is also a key factor that can make a difference between delivering a good software and a better software. That attitude is what separates artists and craftsmen from the rest of the crowd. I also believe that anyone that has a will to improve their aptitude, skill, or manual dexterity to develop and deliver better software can make that leap.

This post has to do with going the extra mile (or the extra bit I should probably say…) when designing your software. I’ll take a few minutes to write about explicit interface members implementation, a useful pattern when you need for your design to respect the framework’s rules (as well as the compiler’s), without shifting your design to fully submit to those rules, which might force you to please the frameworks and tools instead of pleasing the actual user of your class in terms of code readability and testability. As you probably know, even in software development, some rules can be broken and some others can be bent. In this case, we’re going to bend them…else good luck arguing with the compiler!

According to Brad Abrams’s book “Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries“,

Explicit member implementation allows an interface member to be implemented such that it is only available when cast to the interface type.

In other words, there are two ways to access an interface member that is being explicitly implemented

  • Access it by explicitly casting the object into the appropriate type holding that member.
  • Access it from inside the type declaring that member.

It is a very useful concept to know and to use because it helps to design according to the specification of the type and not according to the specification of a framework or an API.

For example, suppose we have a generic collection of books, such as described in Listing 1. This class offers the possibility to add a bunch of books in a book collection, but for some reason we don’t want the user of this class to just add a single book or “multiple single books”. If a single book needs to be added, we just create a list of one book and pass that list the Add method of the class.  Hey, it’s a specification!  What can you do?

Listing 1.

using System.Collections.Generic;
using System.Collections;

public class BookCollection : ICollection
{
public void Add ( Book[] books )
{
// This is our own Add method.
// We want the user of this class to add a list of books to the book collection.
}

public void Add( Book item )
{
// This is the Add method from the ICollection interface that this class needs to implement.
// We're respecting the rule of the framework...and satisfying Resharper
}

// The rest of this class members will be ignored for now...
}

Now, suppose we want to use this class under Visual Studio 2005, this is what Intellisense will offer us to choose when using the Add method from our BookCollection class in Listing 2.

Listing 2.

“So which one should we use?” Actually that’s a wrong question to ask. The real question is “Which one do we need to use?” The answer is in our specification. The specification for this class states that we want to add a list a books. Therefore, we don’t want Intellisense to show, nor do we want to implicitly use, the Add method required by the ICollection interface implementation. Therefore, we should hide it. As a matter of fact, since we can’t remove it (because doing so will break the contract defined by the interface), we have no other choice but to hide it or make it private. But just because we’ll hide it, it doesn’t mean that the user can’t explicitly invoke it. Remember that rules can be broken and bent. I’ll show later how to use the hidden implementation of the method (in the name of human curiosity…)

In order to hide this member, we’ll have to explicitly implement it in our class. The mechanism provided by C# to explicitly implement a member of an interface is by declaring the method’s signature which is in the following form:

Return-value InterfaceName<GenericType>.MethodName(Parameters with their types) { // body… }

In our case, we’ll explicitly implement the Add (Book item) method in the BookCollection class, as shown in Listing 3.

Listing 3.

void ICollectionAdd( Book item )
{
  throw new NotImplementedException();
}

In this case, I decided to throw a NotImplementedException() to force the user to use the other Add method if he decides to explicitly use this method. But you can do other things here, such as call another method, or write a log entry every time this method is being explicitly accessed in order to know if the design should be refactored or changed in future releases. Who knows, maybe our users do want the possibility to add just a single book. The current version of the specification doesn’t say so, but it doesn’t mean it will always be that way!

After explicitly implementing this member in our BookCollection class, Intellisense will show the following possible signature when using the Add method from our BookCollection class, as shown in Listing 4.

Listing 4.

As we can see, Intellisense now shows us the only Add method signature that we can use, therefore respecting the specification for this class.  Congratulations.

If you take a look under the hood of that method with Lutz Roeder’s .NET Reflector and view the code with its IL, you’ll see that the compiler has marked the scope of the method to be private instead of public as it is shown in Listing 5.

Listing 5.

explicitimplementationil.PNG

At this point, you might ask yourself “Why is the class implementing the generic ICollection interface anyways, if you want to provide your own Add method?” The answer is that I also want this class to have other collection-oriented methods, such as a Contains(), Clear() and Remove() method, and a Count property. Furthermore, I want this class to be used like a generic collection to other classes or components of my system. Therefore, implementing the provided generic ICollection interface is a good way to satisfy this requirement. Plus, I want to be able to enumerate on my elements.

There is also another method that I might not want this class to implicitly expose, and that is the IsReadOnly property. According to the MSDN documentation of the .NET Framework, A collection that is read-only does not allow the addition, removal, or modification or elements after the collection is created. Therefore, I can decide to explicitly implement this interface member in the BookCollection class as well. Going forward with that idea for the other members implemented by the BookCollection class, the class will expose only the methods, the properties, the events and the delegates with a more accurate match to my domain model, and thus publicly hiding the other “necessities” or “domain noise” required by some framework or API.

But (ah yes, that little word again…) you should be very careful when opting for this strategy. For instance, if some component uses an object of type ICollection<Book> (via dependency injection for example), and uses the IsReadOnly property before doing some other operation on the collection, it’ll call the implementation of the IsReadOnly property in the BookCollection. Therefore, if you are throwing an exception because you don’t want the user of the class to use this method implicitly, for example, it might backfire when some other component in the system will need to invoke it. You can limit the use of your class to some human users, but it’s much harder to do so when that user is a component already in operation.

Nevertheless, the user can still invoke the explicitly implemented member of an interface by casting the object to the interface holding that member, as shown in Listing 6.

Listing 6.

BookCollection bookCollection = new BookCollection();
((ICollection) bookCollection).Add( new Book() ); // Using the ICollection's Add method explicitly...

For instance, if the Add (Book item) method implements the following body,

void ICollection.Add( Book item )
{
  // This is the Add method from the ICollection interface that this class needs to implement.
  // We're respecting the rule of the framework...and satisfying Resharper

System.Console.WriteLine( "Using the interface member explicitly...");
}

…then we’d see the “Using the interface member explicitly…” in the console window once the method is invoked. 

So when should you explicitly implement interface members?

  • When you want your class to implement an interface, but prohibit the use of one service or substitute it for another service to the user.
  • When a class implements two (or more) interfaces with a common set of services, such as the ones provided by the IEnumerable and IEnumerable<T> interfaces. In fact, if you implement the generic IEnumerable<T> interface, you’ll have no choice to implement the IEnumerable interface as well. You’ll have to decide which members to explicitly implement.
  • When you want to hide the implementation of an interface member in your class.

That concludes this post. The main idea was to show that you should shift your design to reflect and respect the domain and its specification rather than pleasing the framework or the API. Sometimes, rules in software development can be broken, and most of the time, they can be bent, especially if it can help you to design a more consistent and concise system design. A design should speak in terms of the domain and not (or at least avoid) in the terms of the technology, framework or tool. When possible, try to abstract whatever is needed by the framework or API, but NOT required by the user of the system; the user being a human being or another system.

Related links