Psychedelic Panorama of Foo

Á¦ À̸§ÀÌ Inigo Montoya ÀÔ´Ï´Ù. ´ç½ÅÀÌ ³» ¾Æ¹öÁö¸¦ Á׿´¾î. Á×À» ÁغñÇØ.

±Ý¿äÀÏ, 5¿ù 02, 2008

 

String Concatenation in Java

¾Æ³çÇϼ¼¿ä. For awhile now, there has been a lot of controversy over whether to use '+' or StringBuilder. Usually, when one wants to compare the two, a strong inductive proof is technique is used. We assume that

if p(i)p(i+1)p(k) is true, then p(k+1) is true.

How does this do anything for us? We use it when we concatenate suppose we set k = 10,000. Suppose we do an iteration of k concatenations and appends. We assume that if p(i)p(i+1)p(k) is true, then p(k+1) is true.That makes this kind of test a very sensible proof, but not very practical. Why isn't it practical?

I want to first go over the tests that I've seen people have created for showing how much faster StringBuilder is than '+'. Here is a test, for example, considers what I illustrated above. Let's review it.

String Pool

According to the intern method all String literals are added to the String Pool. There is a lot of speculation around how the String Pool is managed, but it's important enough to just know that it exists.

for(int i = 0; i < iterations; i++) { foo += "foo"; }

With the String Pool "foo" gets added to the String Pool; furthermore, with each iteration foo grows by "foo". This is impractical because with k = 10000, who in the world is ever going to append the same literal from the String Pool 10,000 times? It'll never happen. Even though it does cover several cases, it doesn't cover ALL cases. More importantly, it doesn't cover any of the cases that programmers are likely to encounter. In the case where append is used, the literal string "foo" gets appended 10,000 times

Further, '+=' is considered less efficient than just '+' because '+' can be optimized to call toString() less, but '+=' is forced to call toString() each time. See Optimization of String Concatenation.

Compiler Optimizations

public static void main(String[] args) { StringTest test = new StringTest(100); for(int i = 0; i < 10; i++) { test.concat(); test.append(); } }

100 is hardcoded as the iteration count in the test. The compiler will optimize this and replace each variable instance of iterations with 100 wherever it sees it. Basically this ...

for(int i = 0; i < iterations; i++) {

... becomes ...

for(int i = 0; i < 100; i++) {

This is important because the capacity of a StringBuilder normally is initialized with a default fixed capacity as a buffer. If it starts to exceed this capacity, it has to grow that capacity. What if it knew at compile-time exactly how long it was required to be? It could optimize that default capacity to be large enough that it wouldn't need to grow.

String concat took 1812 ms String append took 0 ms String concat took 1766 ms String append took 0 ms String concat took 1766 ms String append took 0 ms String concat took 1812 ms String append took 0 ms String concat took 1750 ms String append took 0 ms String concat took 1828 ms String append took 0 ms String concat took 1750 ms String append took 0 ms String concat took 1750 ms String append took 0 ms String concat took 1782 ms String append took 0 ms String concat took 1750 ms String append took 0 ms

As for normal string concatenation, it doesn't look like it benefits from this optimization at all. This is probably because of '+=' that forces a new String to be created and thrown out with each iteration.

Resource Management and the Garbage Collector

Here is the iteration loop for the tests. It iterates each test 10 times. This will produce 10 strings that have "foo" concatenated 10,000 times.

for(int i = 0; i < 10; i++) { test.concat(); test.append(); }

Notice that concat() runs before append(), and also notice that they are running together in the same loop. This implies they are running in the same VM. concat() produces the same string that append() does. This is important because a majority of all the setup and overhead is handled by the concat() method.

Another thing to consider is heap size, stack size, and the garbage collector. For tests like these, we not only want to increase the heap size, but the stack size as well. The reason for it is we don't want the garbage collector or virtual memory to interfere with our test. It would be a shame if append() suddenly had to operate with the garbage collector cleaning up after the concat() run.

To avoid both of these scenarios, it is best to adjust stack and heap size with -XmssNm -XmxNm, and run each test in a separate VM.

Leo's Version

I decided to create my own set of tests that ran out of ant. I concatenated the same string 10000 times for 10 strings. It yielded similar results.

[java] INFO - Starting test concat [java] INFO - concat finished. Created 10 strings in 10314 milliseconds [java] INFO - Starting test append [java] INFO - append finished. Created 10 strings in 45 milliseconds

You'll notice that my 10 strings took several thousand milliseconds longer than Paul Barry's. 2 things changed are the jvmargs and fork="true"

  • jvmargs. An overview of JVM arguments shows that
    • -Xint will disable HotSpot and thus the native code. Everything runs interpreted.
    • -Xincgc will use an incremental garbage collector that is more predictable. The incremental garbage collector uses small increments of garbage collection to minimize impact by spreading out the garbage collection. IMHO, that's better than having all the garbage collection happen at once and skew test results. There's no guarantee that it will ever run.
  • fork="true" means that the particular instance will be forked from ant and run in its own JVM.

·¹À̺í: , , ,





This page is powered by Blogger. Isn't yours?

¿¡ °¡ÀÔ °Ô½Ã¹° [Atom]