String vs StringBuilder for the .NET Concatenation Performance Championship

Does your code compiles successfully? Good. How about your application’s performance? Could it use some improvement? Did you take some time in answering these questions before releasing it to the Quality Control team or to a potential customer?

You must’ve heard and read about it many times over since the .NET framework was shipped in early 2002. Unfortunately, I still see some .NET developers using the String class as a concatenation harlot in some contexts where using a StringBuilder would be a better match instead.

This post is about knowing when and why you should use the StringBuilder’s appending function in some cases and the String’s concatenation function in others. I’ll be using Red Gate’s ANTS Profiler to profile a simple C# console application to see exactly how much execution time was spent in each string concatenation method for both the String class and the StringBuilder class.

The StringBuilder Class

Before going further into the discussion, I think it is worthwhile to define the purposes of each class. Let us start with the StringBuilder class.

According to the .NET Framework API MSDN documentation, the StringBuilder class represents a mutable string of characters. In order not to reinvent the wheel, let us see what the remarks are about this class [1]:

This class represents a string-like object whose value is a mutable sequence of characters. The value is said to be mutable because it can be modified once it has been created by appending, removing, replacing, or inserting characters.

Most of the methods that modify an instance of this class return a reference to that same instance. Since a reference to the instance is returned, you can call a method or property on the reference. This can be convenient if you want to write a single statement that chains successive operations one after another.

A StringBuilder object maintains a buffer to accommodate the concatenation of new data. New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer.

The String Class

As for the String class, the MSDN documentation has these remarks to say about it [2]:

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System..::.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable.

A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.

Performance Analysis

As stated in the beginning of this post, I am using a very simple application to note the performance differences between the StringBuilder class and the String class when performing multiple concatenations. The application exercises both these classes in two ways.

Concerning the String class, I am performing the following code with a different number of concatenations defined by XX:

    private static void ConcatenateString_XX_Times()
    {
      string s = string.Empty;

      for (int i = 0; i < XX; i++)
      {
        s += "test";
      }
    }

Concerning the StringBuilder class, I am performing the following code with a different number of concatenations defined by XX:

    private static void ConcatenateStringBuilder_XX_Times()
    {
      StringBuilder sb = new StringBuilder();

      for (int i = 0; i < XX; i++)
      {
        sb.Append("test");
      }
    }

NOTE: In both cases, XX defines a number with the following values: 1 to 20 inclusively, 100, 1000, 10 000 and 100 000.

The following figure shows a summary generated by ANTS Profiler which represents the execution time (in seconds) each method took to perform on a Dell Inspiron 9300 (1.86 GHz Pentium M CPU with 2GB ram).


Figure 1. A summary of the execution time it took to run each method
Results generated by ANTS Profiler 3.1.0

Here's the same representation shown in Table 1 (for the String class) and Table 2 (for the StringBuilder class). I have omitted showing the results for the first 20 concatenations because both the String and StringBuilder seem to have taken approximately the same time to perform their corresponding methods (compare lines 11-30 for the String class and lines 35-54 for the StringBuilder class).

Table 1. Summary of the time taken to run each concatenation method for the String class for 100, 1000, 10 000 and 100 000 string concatenations

# of concatenations 100 1000 10 000 100 000
Time (sec.) 0.0004 0.0145 0.785 189

Table 2. Summary of the time taken to run each concatenation method for the StringBuilder class for 100, 1000, 10 000 and 100 000 string concatenations

# of concatenations 100 1000 10 000 100 000
Time (sec.) 0.0003 0.0029 0.0295 0.297

Table 3. Proportional differences between the String and StringBuilder classes for 100, 1000, 10 000 and 100 000 concatenations

# of concatenations Observations
100 We can say that the StringBuilder is taking approximately the same time than the String class
1000 The StringBuilder is 5 times faster than the String
10 000 The StringBuilder is 27 times (approx.) faster than the String
100 000 The StringBuilder is 636 times faster than the String

We can interpret the above table as follows:

  • In the first round, the StringBuilder and the String classes are pretty well the same when concatenating 100 strings in a for loop.
  • In the second round, we are multiplying the previous number of concatenations by 10 (100 x 10 = 1000). As we can see, the StringBuilder is taking approximately 10 (0.0029/0.0003 = 9.66, which we'll round to 10 for simplifying our reasoning) times as much time than the first round to concatenate 1000 strings (following a direct proportional line right? In fact, 10 times more strings to concatenate => approximately 10 times more time to complete, therefore a 1:1 ratio) In contrast, the String class is taking 36.25 (0.0145/0.004 = 36.25) times as much time than the first round to concatenate 1000 strings. Therefore, we can say that for 1000 string concatenations, the StringBuilder is 362.5% faster than the String class! Let us continue...
  • In the third round, we are again multiplying the previous number of concatenations by 10 (1000 x 10 = 10 000). As we can see, the StringBuilder is taking (again) approximately 10 (0.0295/0.0029 = 10.1724, which we'll round to 10 again for simplifying our reasoning) times as much time than the previous round to concatenate 10 000 strings (the ratio 1:1 still applies!) In contrast, the String class is taking 54.14 (0.784/0.0145 = 54.14 approximately) times as much time than the previous round to concatenate 10 000 strings. Therefore, we can say that for 10 000 string concatenations, the StringBuilder is 541.4% faster than the String class! Let us continue...
  • In our last round, we are yet again multiplying the previous number of concatenations by 10 (10 000 x 10 = 100 000). As we can see, the StringBuilder is taking (AGAIN!) approximately 10 (0.297/0.0295 = 10.0678) times as much time than the previous round to concatenate 100 000 strings (the ratio 1:1 still applies!) In contrast, the String class is taking 240.76 (189/0.785 = 240.76 approximately) times as much time than the previous round to concatenate 100 000 strings. Therefore, we can say that for 100 000 string concatenations, the StringBuilder is 2407.6% faster than the String class! Ok, that's enough for now.

In computer science, we use the Big-O notation to evaluate an algorithm's performance. For instance, we'd say that up to 100 000 string concatenations (because that's how far we've gone with our tests), the StringBuilder's Append method gives us O(n) or linear performance. On the other hand, due to the immutable nature of the String class, everytime we concatenate a string to it in a loop, we end up with roughly O(n2) or quadratic performance, which in itself is a performance disaster compared to O(n). This observation is only valid for our context. We should try the same test with more than a million string concatenations to see if our observation still fits.

In order to respect the DRY principle, I'm not going to show the IL code generated when using a StringBuilder's Append method or a String's Concat method because you can find such information on several posts (see the Reference section at the end of this post) alread, but I do recommend to take some time to view the internals when invoking both of these methods using a tool like ILDASM or Reflector.

Discussion and conclusion

Some people that wrote a post on the same subject recommend to use the StringBuilder class over the String class if you are concatenating a predetermined number of strings. For instance, Mahesh Chand recommends to use the StringBuilder if you have to concatenate a string more than 10 times and he supports his recommendation with a pretty simple and realistic demo. In his excellent article on the same subject, Jouni Heikniemi concludes that there's a "magic number" that helps deciding when to use the StringBuilder's concatenation over the String's, and that number is between four and eight concatenations. Finally, in his thorough and very well detailed article on the same subject, David Cumps concludes with the following:

  • If you can avoid concatenating, do it!This is a no brainer, if you don't have to concatenate but want your source code to look nice, use the first method. It will get optimized as if it was a single string.
  • Don't use += concatenating ever.Too much changes are taking place behind the scene, which aren't obvious from my code in the first place. I advise to rather use String.Concat() explicitly with any overload (2 strings, 3 strings, string array). This will clearly show what your code does without any surprises, while allowing yourself to keep a check on the efficiency.
  • Try to estimate the target size of a StringBuilder.The more accurate you can estimate the needed size, the less temporary strings the StringBuilder will have to create to increase its internal buffer.
  • Do not use any Format() methods when performance is an issue.Too much overhead is involved in parsing the format, when you could construct an array out of pieces when all you are using are {x} replaces. Format() is good for readability, but one of the things to go when you are squeezing all possible performance out of your application.

Though I firmly agree with the previous recommendations, I think that we should also consider what the MSDN documentation has to say about the performance and memory allocation of both classes:

The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs. A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated.

In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input. [1]

If it is necessary to modify the actual contents of a string-like object, use the System.Text..::.StringBuilder class. [2]

My conclusion is that you shouldn't use the String class to concatenate multiple strings just to properly align and format your code (the visual aspect of the code). Even if you're concatenating 8, 9 or 10 strings inside a method, prefer the StringBuilder's Append method, especially if that method will be invoked repetitively.

I also believe that using tools like Red Gate ANTS Profiler, JetBrains dotTrace Profiler and the Microsoft .NET CLR Profiler to analyze your application's performance, and memory allocation will help you further in improving your design and implementation code, as well as pinpointing those bottlenecks in your application.

Sometimes a successful compilation just isn't enough.

References

  1. "StringBuilder Class", MSDN Documentation, Microsoft Corp.
  2. "String Class", MSDN Documentation, Microsoft Corp.
  3. "StringBuilder and String Concatenation" by Mahesh Chand (2002)
  4. ".NET String vs. StringBuilder - Concatenation Performance" by Jouni Heikniemi (2004)
  5. "String Concatenation vs Memory Allocation" by David Cumps (2007)
  6. "Immutable types: understand their benefits and use them" by Patrick Smacchia (2008)
  7. "Everything Is Fast For Small n" by Jeff Atwood (2007)
  8. "Big-O Notation", Wikipedia
This post has been viewed: 8584 times. kick it on DotNetKicks.com

 

Similar posts you might be interested in reading:

5 Comments

  1. String vs StringBuilder - [assembly: AssemblyTitle("U Can C Sharp 2.0")]:

    [...] végre egy "független" tesztet, ahol pontosan ezt a kérdést boncolgatták - itt olvashatjátok. Published 2008. február 6. 22:17 by Fülöp Dávid Filed under: [...]

  2. Mike Griffin:

    First, you really should correct the mistake in the opening line of your post. Second, you are wrong.

    do this:

    s += “foo” + “foo1″ + “foo2″ + “foo3″ + “foo4″;

    And see what IL code is executed. We’ve done the timings too and never use StringBuilder for anything what so ever ….

  3. Robert Nilsson:

    Mike; using your above code will produce the following crap for the garbage collector to handle:

    “foofoofoo1foofoo1foo2foofoo1foo2foo3foofoo1foo2foo3foo4″

    This is bcause when concatinating strings every instance of the previous (concatenated) string is removed by the garbage collector, producing alot of crap for the framework to handle. Many websites that produce dynamicaly created html with string concatenation (i.e a menu-trees etc) and has a visitor load of a couple of thousands every day may cause the webserver to freeze entirely for minutes before the garbage collector manages to clean out all old string objects. These hickups can be avoided by using a stringbuilder object instead (which in size would be 1/4 of your string example when disposed).
    This is performance at another level; not just “runtime” performance…

  4. Shyam:

    Thats a good one….

  5. Siddharth:

    Thanks a ton for this. I am making an app and needed to preprocess a 6GB text log. It had to be output back to a file about 250 MB in size. Using Stringbuilder instead of Strings cut down the time about 10 times.

Leave a comment

Powered by WP Hashcash