Brian Goetz:
Pop quiz: Which language boasts faster raw allocation performance, the Java language, or C/C++? The answer may surprise you -- allocation in modern JVMs is far faster than the best performing malloc implementations. The common code path for new Object() in HotSpot 1.4.2 and later is approximately 10 machine instructions ..., whereas the best performing malloc implementations in C require on average between 60 and 100 instructions per call ... And allocation performance is not a trivial component of overall performance -- benchmarks show that many real-world C and C++ programs, such as Perl and Ghostscript, spend 20 to 30 percent of their total execution time in malloc and free -- far more than the allocation and garbage collection overhead of a healthy Java application.
If this is your area, read the rest. His former article is also worth a visit if you're a Java professional. (And I know some of you are.)
For the rest of you: Look! Pretty sun-storm movies! (Thanks, Mike!)
Okay, I have to say, this is very much a straw man argument. malloc in C is wasteful, because it's a general allocator. You can call malloc(2 * 1024 * 1024 * 1024) or malloc(1). new Object() can allocate objects off of a pool. I have recently written allocators in C that can get down to, and below, 10 machine instructions, but they're for very specific things. Most people don't want to go through that trouble. However, I want to go through that trouble because I need as many IOPS and GB/s as possible. Therefore C is what I use to attain the highest performance, and the closest control of the hardware. (And have been known to drop to assembly quite often. ;-) Gee, and even the Verilog now. This is a scary trend...)
The memory interaction they refer to in terms of the cache is also something that you have *much* tighter control over in C++. However, in general C++ is inferior at it, because new simply calls malloc(). There are image processing algorithms I've written, that simply because of their cache properties/performance, I could not write them in Java and have even decent performance.
What is also interesting is that they don't mention the cache-busting problems of a copying garbage collector. It's very fast in terms of instantaneous allocation, but adds large latency when running, and totally trashes the cache, thereby completly invalidating their point about cache locality anyway. The memory footprint is also typically increased quite considerably, putting pressure on multipurposed machines.
Optimizing C++ compilers have also done the "defensive copy avoidance" stuff for a while. (That and C++ has const references anyway, so that alleviates 98% of the problem. Too bad very few people use const correctly) It's a global optimization covered in every modern compiler textbook worth its salt. That and fast stack allocation of things like point is, well, fast.
The biggest difference is that Java is good for high level, very complex code that has a lot of simple, object-based dynamic allocation requirements. I hope to use it for some of our maintenance software so I, at least, don't have to completely re-write it three times for three different platforms. :-)
The whole programming language debate should focus on *what* a particular tool is good for, in constrast to being "superior" in a general sense. Java can never replace C++ for all tasks, nor C++ Java. They are taylored to different crowds.
Then again, I suppose this article is written for technical managers, when they have heard "Java is slow" arguments from lazy programmers who don't want to learn a new language.
Except for LISP -- it's a real man's programming language. ;-)
(setq bad-joke true)
** footnote -- I also noticed that his references for C/C++ are hopelessly outdated. '93 & '94 What's really funny is that I was writing a virtual machine at the time they were written. Not much longer after that I was optimizing various algorithms for the x86. Down to memset, even. (Once again, written for a general audience) It's amazing the bandwidth you can get from a system when you try. :-)
Heya!
You can call malloc(2 * 1024 * 1024 * 1024) or malloc(1). new Object() can allocate objects off of a pool...
If you knew you were only going to do "new Object()", and ONLY "new Object()", that's true. (Though somewhat irrelevant, since that's not at all what's being debated there, except as a side note about pooling now being inefficient.)
Java's "new ...()" constructor is also general in the sense that it constructs objects which require many different sizes. A Java programmer can also do "new byte[0]" or "new byte[2 * 1024 * 1024 * 1024]", so I'm not sure what you're meaning to say here.
I have recently written allocators in C that can get down to, and below, 10 machine instructions...
Question: Does that include cleanup and 'free' in the overall picture? Or are you just talking about the cost of the allocation in specific?
(And yes, even back in the 1990s, at a large CAD/CAM company I worked for, we used to use memory pools -- not because "malloc" was slow, but because we were morons at managing it correctly. And I'm sure "free" was very cheap under those circumstances.)
But in either case, most people probably aren't using your allocators. (And I'd also point out that Java's allocators are written in C, meaning that he's not saying it's absolutely impossible to get such performance in C.) As you say:
... but they're for very specific things.
Right. So you're talking past each other. The guy is clearly talking about general users of both languages -- people who just use "malloc()" versus typical Java use. The author says this at the outset.
And indeed, even mentions that C/C++ can be improved by the use of specialized malloc/free -- so it's hardly like he's trying to tar the entire realm of C/C++ through some deception:
One study ... also measured the effects of replacing malloc with the conservative Boehm-Demers-Weiser (BDW) garbage collector for a number of common C++ applications, and the result was that many of these programs exhibited speedups when running with a garbage collector instead of a traditional allocator.
And indeed, back when Java was generally slower, it would have been fair to say it was slow, even though there were specific tricks (pooling), which could have made specific tasks faster.
So I beg to differ with your characterization of "straw man" -- he clearly stated, right up front, he was talking about "malloc()", not specialized algorithms useful only in certain circumstances.
I agree with your point about the cache and C.
What is also interesting is that they don't mention the cache-busting problems of a copying garbage collector. It's very fast in terms of instantaneous allocation, but adds large latency when running...
Huh?
The malloc/free approach deals with blocks of memory one at a time, whereas the garbage collection approach tends to deal with memory management in large batches, yielding more opportunities for optimization (at the cost of some loss in predictability).
It's true there no explicit, large discussion of the unpredictability of garbage collection, but please remember that this is written towards Java programmers, and we're generally aware of that drawback.
On the other hand, I'd point out that an optimizing compiler might be able to make reasonble choices when to do such things. Perhaps not as a a human (or, heh, perhaps even better, given what we've seen in other optimization problems), but surely at leason reasonable.
Lastly, it might be a moot point anyway: Yes, the cache get screwed up. But so what? You're doing garbage collection at that moment, and the latency associated with that process far overshadows anything concerning the cost of loading the cache. That's very poor at a microscopic level (looking only at that segment of time), but, if the numbers are to be believed, pays off overall, under typical circumstances:
As he said:
... benchmarks show that many real-world C and C++ programs, such as Perl and Ghostscript, spend 20 to 30 percent of their total execution time in malloc and free -- far more than the allocation and garbage collection overhead of a healthy Java application...
The memory footprint is also typically increased quite considerably, putting pressure on multipurposed machines.
This is indeed true in the typical case. But Java allows you to choose among memory management algorithms, based on the needed tradeoffs.
The biggest difference is that Java is good for high level, very complex code that has a lot of simple, object-based dynamic allocation requirements
I'm not trying to be annoying, but I think that was his thesis.
The whole programming language debate should focus on *what* a particular tool is good for, in constrast to being "superior" in a general sense. Java can never replace C++ for all tasks, nor C++ Java. They are taylored to different crowds.
I don't think he was advocating writing kernels in pure Java, if that's your drift. :-)
But my read is that his target is Java programmers who think they're going to be gaining speed by doing various "tricks", such as object pooling or misusing the "final" keyword -- at the cost of good code design. That old tendency: "I'm so brilliant, I'm going to optimize up front!"
Most of us aren't as smart as we think we are.
And I'd be more than glad to do an image-processing comparison (or some close equivalent) sometime (in our copious spare time!) in the future. When did you try this? Things have changed considerably in the past several years.
I'm not saying you're wrong, but it would be interesting to try.
I also noticed that his references for C/C++ are hopelessly outdated.
Any newer references you'd like to cite?
It's amazing the bandwidth you can get from a system when you try.
The beauty of optimizing compilers (in whatever langauge) is that it gives the average programmer much of the benefits a top person in the field could achieve with explicit control. Or, in some cases, even better.
But you're right, in that this is all moot, as we should be doing everything in Lisp. (In Emacs, mind you.) ;-)
Okay, I have to say, this is very much a straw man argument. malloc in C is wasteful, because it's a general allocator. You can call malloc(2 * 1024 * 1024 * 1024) or malloc(1). new Object() can allocate objects off of a pool. I have recently written allocators in C that can get down to, and below, 10 machine instructions, but they're for very specific things. Most people don't want to go through that trouble. However, I want to go through that trouble because I need as many IOPS and GB/s as possible. Therefore C is what I use to attain the highest performance, and the closest control of the hardware. (And have been known to drop to assembly quite often. ;-) Gee, and even the Verilog now. This is a scary trend...)
The memory interaction they refer to in terms of the cache is also something that you have *much* tighter control over in C++. However, in general C++ is inferior at it, because new simply calls malloc(). There are image processing algorithms I've written, that simply because of their cache properties/performance, I could not write them in Java and have even decent performance.
What is also interesting is that they don't mention the cache-busting problems of a copying garbage collector. It's very fast in terms of instantaneous allocation, but adds large latency when running, and totally trashes the cache, thereby completly invalidating their point about cache locality anyway. The memory footprint is also typically increased quite considerably, putting pressure on multipurposed machines.
Optimizing C++ compilers have also done the "defensive copy avoidance" stuff for a while. (That and C++ has const references anyway, so that alleviates 98% of the problem. Too bad very few people use const correctly) It's a global optimization covered in every modern compiler textbook worth its salt. That and fast stack allocation of things like point is, well, fast.
The biggest difference is that Java is good for high level, very complex code that has a lot of simple, object-based dynamic allocation requirements. I hope to use it for some of our maintenance software so I, at least, don't have to completely re-write it three times for three different platforms. :-)
The whole programming language debate should focus on *what* a particular tool is good for, in constrast to being "superior" in a general sense. Java can never replace C++ for all tasks, nor C++ Java. They are taylored to different crowds.
Then again, I suppose this article is written for technical managers, when they have heard "Java is slow" arguments from lazy programmers who don't want to learn a new language.
Except for LISP -- it's a real man's programming language. ;-)
(setq bad-joke true)
** footnote -- I also noticed that his references for C/C++ are hopelessly outdated. '93 & '94 What's really funny is that I was writing a virtual machine at the time they were written. Not much longer after that I was optimizing various algorithms for the x86. Down to memset, even. (Once again, written for a general audience) It's amazing the bandwidth you can get from a system when you try. :-)
Posted by: Michael Zappe on April 3, 2007 10:06 PM