![]() Psychedelic Panorama of FooÁ¦ À̸§ÀÌ Inigo Montoya ÀÔ´Ï´Ù. ´ç½ÅÀÌ ³» ¾Æ¹öÁö¸¦ Á׿´¾î. Á×À» ÁغñÇØ.Åä¿äÀÏ, 5¿ù 03, 2008String Concatenation in Java: A Case Study of Logging¾Æ³çÇϼ¼¿ä. In my entry on String Concatenation in Java, I went over a test that compares appending and concatenating a string literal "foo" to itself 10,000 times. The main points for its impracticality were:
I decided to write my own tests specifically for logging in java. Here are my requirements:
To summarize, the goal is more strings, but fewer concatenations per string. How does StringBuilder hold up now? leo@sark~/test
(18:03:51) [28] ant
Buildfile: build.xml
test-compile:
test:
[java] INFO - Starting test usingStringConcat
[java] INFO - usingStringConcat finished. Created 10000000 strings in 154691 milliseconds
[java] INFO - Starting test usingPlusConcat
[java] INFO - usingPlusConcat finished. Created 10000000 strings in 114850 milliseconds
[java] INFO - Starting test usingStringBuilder
[java] INFO - usingStringBuilder finished. Created 10000000 strings in 122337 milliseconds
[java] INFO - Starting test usingStringBuilderNoToString
[java] INFO - usingStringBuilderNoToString finished. Created 10000000 strings in 108929 milliseconds
[java] INFO - Starting test concat
[java] INFO - concat finished. Created 10 strings in 10006 milliseconds
[java] INFO - Starting test append
[java] INFO - append finished. Created 10 strings in 97 milliseconds
First, I should explain the difference between the tests.
You can see now that the margin between StringBuilder and '+' concatenation has been reduced. It has been reduced so much that now '+' concatenation is actually faster. The margin is approx. 8000 milliseconds. Statistically speaking, I should run this several times and calculate the standard deviation, but I'm not willing to do that much work. I've noticed that '+' is consistently faster though. Why!? I don't know, but let's review the code, and I will fire off some speculations. import java.io.PrintWriter;
import java.io.StringWriter;
import static org.kuali.kra.logging.BufferedLogger.*;
/**
* Simple unit test class to compare various {@link String} concatenation methodologies in Java.
*
*/
public class StringConcatenationTest extends ConcatenationTest {
public StringConcatenationTest(String name) {
super(name);
}
/**
* Tests concatenating {@link String} literal instances. Handles conversion of integers to {@link String}
* instances as well. The main part to realize is that this is using the '+' operator for this test.
*
*/
public void usingPlusConcat() {
new StackSimulation(20).execute(new DispatchableStackElement() {
public void dispatch() {
for(int i = 0; i < iterations; i++) {
String foo = "foo" + i + "foofoo" + i + "foofoofoo" + i;
}
}
});
}
/**
* Tests concatenating {@link String} literal instances. Handles conversion of integers to {@link String}
* instances as well. The main part to realize is that this is using the {@link String#concat(String)} method
* for this test.
*
*/
public void usingStringConcat() {
new StackSimulation(20).execute(new DispatchableStackElement() {
public void dispatch() {
for(int i = 0; i < iterations; i++) {
String foo = "foo".concat(String.valueOf(i))
.concat("foofoo").concat(String.valueOf(i))
.concat("foofoofoo").concat(String.valueOf(i));
}
}
});
}
/**
* Tests appending {@link String} literal instances. Handles conversion of integers to {@link String}
* instances as well. The main part to realize is that {@link StringBuilder} is used for this
* exercise.
*
*/
public void usingStringBuilder() {
new StackSimulation(20).execute(new DispatchableStackElement() {
public void dispatch() {
for(int i = 0; i < iterations; i++) {
String foo = new StringBuilder()
.append("foo").append(i)
.append("foofoo").append(i)
.append("foofoofoo").append(i).toString();
}
}
});
}
/**
* Tests appending {@link String} literal instances. Handles conversion of integers to {@link String}
* instances as well. The main part to realize is that {@link StringBuilder} is used for this
* exercise.
*
*/
public void usingStringBuilderNoToString() {
new StackSimulation(20).execute(new DispatchableStackElement() {
public void dispatch() {
for(int i = 0; i < iterations; i++) {
StringBuilder foo = new StringBuilder()
.append("foo").append(i)
.append("foofoo").append(i)
.append("foofoofoo").append(i);
}
}
});
}
/**
* C-Style formats are available in Java. This is just testing the impact of such a thing.
* Runs in a loop and executes <code>iterations</code>. It will construct a completely separate
* {@link String} instance each time.
*
*/
public void usingCStyle() {
for(int i = 0; i < iterations; i++) {
StringWriter writer = new StringWriter();
new PrintWriter(writer).printf("%s%s%s%d", "foo", "foofoo", "foofoofoo", i);
String foo = writer.getBuffer().toString();
}
}
/**
* Entry point for test
*/
public static void main(String args[]) {
if (args.length < 1) {
error("Come on! Give me a test to run!");
System.exit(1);
}
new StringConcatenationTest(args[0]).runTest();
}
}
Why the Integer Conversion of i? Why Not Just Append String Literals?Let's look at usingPlusConcat() for(int i = 0; i < iterations; i++) {
String foo = "foo" + i + "foofoo" + i + "foofoofoo" + i;
}
At first, I used "foo" + "foofoo" + "foofoofoo". What's wrong with it? Well, it's exactly the same as "foofoofoo" + "foofoofoo". According to the Java Language Specification on the concatenation operator, this is basically a compile-time constant. That means, it gets created at compile-time. This won't yield a realistic test for us because log messages are not compile-time constants. To mix things up a little bit, I make it handle a conversion between int and String, then concatenate it non-literally. The String Pool doesn't get used as much, and naturally, the test ran much slower. I replicated the same effect with the usingStringBuilder test. for(int i = 0; i < iterations; i++) {
String foo = new StringBuilder()
.append("foo").append(i)
.append("foofoo").append(i)
.append("foofoofoo").append(i).toString();
}
StringBuilder Looks Fast Without toString()You may have also noticed a test, usingStringBuilderNoToString. This is exactly the same test, except it doesn't call toString() at the end of the appends. Clearly, this has a large impact on the results. Without which, StringBuilder is much faster than concatenating with '+'. If you look at the toString() method of StringBuilder, you'll see: public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
So why is it so heavyweight? I'll cut to it. The String constructor does this: this.value = Arrays.copyOfRange(value, offset, offset+count);
It literally copies the buffer byte-by-byte. Ouch! Thanks to the immutable nature of String and StringBuilder, just about everything you do to a String means copying it. Even when you append a String to a StringBuilder, you are copying the char[] out of the String and into the buffer of the StringBuilder. It All Comes Down to OptimizationAfter J2SE 1.4, the Sun Java compiler is ALWAYS optimizing. There's just no way to turn it off. The Java Language Specification says about Optimizing String Concatenation that a compiler can optimize it using StringBuilder. I can't really say how the optimization is happening, but it's pretty obvious. My best guess is that the object->string conversion is optimized by the compiler with '+' concatenation better than with StringBuilder. That's my guess. I know, that sounds kind of lame, but there's just no way to know for sure. ConclusionI'm going to summarize the facts I discovered in the testing. += for String Concatenation Does not Optimize WellIf you look at Paul Barry's Test, you'll see += is used. This is actually a really inefficient way to concatenate strings. It's also a very unlikely way. You just don't see this used very often for string concatenation. primitive/Object->String Conversion is InefficientIf you can avoid it, then avoid it! The Integer.getChars() method has huge amounts of overhead. If you look at the java source code for Integer.toString(int,int), you'll find: if (i == Integer.MIN_VALUE)
return "-2147483648";
The reason it is hard-coded as a string literal is because
toString() has Some OverheadYou really can't avoid using it completely, but you can minimize the number of times you call it. Leaning on optimization to keep from throwing away Strings is a good idea. For example, '+=' is considered less efficient than just '+' because '+' can be optimized to call toString() less, but '+=' is forced to call toString() each time. With Optimization StringBuilder is NOT Faster than '+'If you consider the compiler optimizing to keep from throwing away strings, then there is no advantage to using one over the other. It is possible that '+' optimizes object -> string conversion better than StringBuilder. That's just speculation though. Source CodeThe source code is a self-extracting shell script. That means it doesn't work in windows. Really. This should work fine with OS X, BSD, Linux, Solaris, etc... Just not windows. ű×: java, programming, software |
