Sunday, June 2, 2013

"Why do you prefer Java to C++?"


This is a question I have been asked during an interview a long ago. Whilst it is a nice question it is not an easy one, not only because your interviewer could be a C++ evangelist and you'll never know (shit happens!), but also because there might be tons of reasons hard to explain in such a context where you have time to just name a few. So let's go through the long answer here.
Apart from the usual and quite widespread considerations about the major shortcomings of the C++ language, such as:

  • The syntax. It is really annoying and hard to read. It's definitely the worst I have ever seen (yes, I know, Perl is also bad but I'm not a Perl developer). Stroustrup once said that C++ was designed to be an elegant language, but other languages, such as Python, make this statement quite embarrassing. Regarding Java, it is somewhat too verbose and bureaucratic, requiring a lot of boilerplate code at times, but to some extent I agree with those who think that redundancy contributed to make it an easy to read and a robust language, well suited for large development teams. By the way, take a look at this slide by Josh Bloch about this topic.

  • No real platform, just an overly wide and complicated language (with the addition of a scanty yet intricate STL), which also implies that you have to be constantly trained. I really can't stand the C++ policy that whenever you need something you are “free” (well, rather forced) to do it your own way, even for very basic needs. More than often you waste a lot of time searching for libraries or writing things from scratch that prove to be not as good as if they were supported by the language/platform itself. In general, frequent and basic tasks are almost always more complicated with respect to any other OO language, which is clearly not a pro. Moreover, my own experience tells me that is easier to change part of the design inside a Java application than in C++ one, simpler languages tend to be more “agile”.

  • Low productivity. There is still a fair number of developers that either neglect this or simply justify it by asserting that, on the other hand, you have a dramatic increase in performance. It could actually be true sometimes, but most of the time you might not care, it's not a compelling reason. I tend to see myself as a software architect (or at least this is what I wish to become, although right now I'm nothing in particular), hence I'm rather “obsessed” with finding a good, future proof and/or scalable design; usually performance comes later (remember, premature optimization is an anti pattern).

  • Old designed and old fashioned. Think about how modern Python was when it came out in 1991 and how old C++ stood when it has been standardized in 1998. For instance, consider std::function or std::tuple. I mean, 1991 vs. 2011. Ok, they are very different languages and the comparison is a bit far fetched, but the point is that it is not reasonable to wait until 2010/2011 for stuff like smart pointers, portable multi-threading, regex, typesafe null and enums, hashmaps and hashset and so on. We are talking about pretty basic stuff, software building blocks. Does it look like a modern language? Really? Even in the early days it looked exactly what it indeed was, that is an extension to the C language rather than a new design with a C-like syntax. And today, being C compatible is a cons rather than a pro. (if you are interested in this topic have a look at this message from Bruce Eckel)

  • Many more such as the lack of memory safety, bad unit testing support, the madness of the operator overloading coupled with RAII, friend classes/methods, private subclassing and so on.
 
Apart from the above, here are some other deficiencies I happened to see during my working life using C++:
 
Memory management
While smart pointers are a fundamental and useful tool, they don't come without a price. First off, no covariant return. Also, they require you to pay attention to circular references. Nonetheless they are so beneficial that many experienced programmers say that smart pointers should be used everywhere (more or less); and I do agree with them, indeed. But... hey, wait a minute! If I have to use std::shared_ptr<> (and sibillings) all over the place cluttering my code, why should I use C++ in the first place? Is a timely resources release a must have? 
Well, usually it isn't. Aside the fact that a new proxy object must be allocated every time you enter a new scope, the real point is that memory management is something that steals time and attention to the developer, and to some extent it keeps being the same even when using smart pointers. Although I'm familiar with them, I'm also aware that they will never make reasoning about pointers/references as easy and natural as it is in Java.
But I'm just scratching the surface: complicated concurrent programs are not easy to achieve without a Garbage Collector. This is the reason why in this talk you can see Rob Pike from Google stating that the adoption of a GC has been a day one decision when designing the Go language (alternatively take a look at http://talks.golang.org/2012/splash.article#TOC_14.). The topic would require an entire post alone but I'd like to mention a few concrete examples related to non-locking synchronization. Usually non-blocking data structures take advantage of the Compare And Swap (CAS) instruction (or emulate it, e.g. on architectures that have LL/SC), which unfortunately suffers from the ABA problem. This is even more problematic where you have no GC [see Java Concurrency in Practice, 15.4.4, or What every programmer should know about memory, 8.1]. In other cases a reference counting system is used to avoid memory leaks, increasing complexity (think of Read Copy Update (RCU)). Instead, using CopyOnWriteArrayList and CopyOnWriteArraySet is very easy and natural.
Back to C++, RAII is usually nice (well, not always, that's why the C++ committee introduced the move constructor), but heap memory management can be really a huge source of troubles, thus a GC is almost necessary today.

Concurrency
Wow, as of 2011 we have std::thread. But what to do when you need to stop a std::thread? Just design your “cooperative interruption” mechanism (you may use apoison pill[see Java Concurrency in Practice, 7.2.3]). So, what about interrupt in std::thread? Can you Java developers out there think of getting rid of the interrupt signaling mechanism provided by the language? And what about the concept of executors? It is good to see std::atomic<T>, but it would have been nice to see a new Java-like volatile keyword too.
To get to the point, C++11 introduced some other stuff like async and promise (well, async is not really async...), but let's be honest, all the “new” features are way too few and poor for modern concurrency scenarios. Solid threading support is a must have for a modern programming language, and C++ just isn't. Someone might point out that even Java cannot compete with different a more recent concurrency models like Scala's agents or goroutines, and it's true. But still, every time I use C++ I miss so much the concurrency package and all that amazing code written by Doug Lea, Josh Bloch and many more.
By the way, here some interesting resources about std::thread, showing that the multi-thread support is not mature yet (will it ever be?):

Lack of reflection (and annotations)
Whenever you need to perform double dispatching in a language that do not support it natively, the usual approach adopted to avoid the bad looking if-else RTTI chain in either Java or C++ is the visitor pattern. A smarter solution that avoids the other part to accept the visitor, is to use a dispatching table that retrieves a specific handler given a parameter that acts as a key. This is the solution my colleague adopted in our C++ DROP middleware when necessary, even though it required a considerable amount of partially obscure code. Now let's talk about another solution that may be adopted in Java: a few lines of the reflection API. No doubt it would be better to avoid reflection at all and that a fast performing method is preferable to a slow one (quite often good design and performance are at odds with each other), but in C++ you are left with no option. 
More in general, reflection is a powerful tool that C++ lacks. Granted that, as we all know, it should be used with care, the reflection API is a milestone in many frameworks, is at foundation of almost every serialization framework (e.g. standard Java serialization, Google's Protocol Buffers, etc.), code inspection tool, plugin-in based functionalities, and, along with annotations, makes Java flexible. By the way, annotations really deserve mentioning too, just think of how handy they are when you use JAXB, JPA, JUnit or any other kind of source code analysis. When I switch from Java to C++ it takes some time to accept that both reflection and annotations are not there.

Static init fiasco
Consider a messaging/event system, where you have a hierarchy of events/messages and some factories or dispatchers which perform specific actions depending on the class type. I'm referring to what Thinking in Java 4th ed. calls Registered factories. The approach is to use some static construct to let the factories know the whole event/message hierarchy and, if you already dealt with this problem in Java at least once, you surely know that it not possible to place this self registration logic inside the static block of the derived classes. This is due to the presence of a class loader and its class loading on first use. Conversely in C++ such technique works because static code is usually executed before the main (entry point) function gets called. However you should really take care about the ordering between the events and the factory, otherwise you'll end up with the so called “static init fiasco”. Fortunately you can avoid it by using the construct-on-first-use-idiom.
In my opinion the problem here is that the solution doesn't really fit with the language design and looks like a dirty hack: having an object whose destructor get never executed is a bad language corner case, and it's not the only one you can came across in C++. In this particular example Java isn't really shining either, nonetheless if you can live with this and few other small downsides class loaders will compensate you with other benefits. Think of their use in Java EE application containers for instance.

Header and namespace hell
The C/C++ include system is a crap and the namespace mechanism is just a lousy technique to prevent name clashes. I won't spend time here elaborating neither the first sentence nor the second as many people before me already did it thoroughly (one such example is again this talk about Go and its design). I'm simply going to report that I experienced or heard of any possible problem, from include ordering issue and double declarations to weird clashes, especially during integration. I don't think that packages are the best solution ever, but they are a clean design decision, conversely namespaces are just a limited solution.


So, thanks to the the lack of many standard frameworks and the complexity of the language, I spend a considerable amount of time reasoning on how to translate my ideas into C++ rather than the ideas themselves, while when using Java writing feels way more natural, even after some time away from it. In other words, with the latter I concentrate more on where I want to go (the design), rather than how to get there (the mere code and syntax). Albeit not perfect, Java is able to provide a decent set of tools (language features and toolkits/frameworks) to produce useful work in a reasonable amount of time and effort, but I can't tell exactly the same for C++.

Fortunately, since C++11 came out, you are no longer forced to use the boost libraries even for basic functionalities, but it keeps going down this road I really don't like. The language is now absurdly huge, way too complex, yet not really effective. To be clear, the latest revision is definitely a good improvement, but I see it as a late (if desperate) and only partially succeeded attempt to port some of well established features since long present in other languages (mainly Java), before the language considerably looses developers for not being competitive enough compared to those; those that instead provide a whole platform (e.g. Java & C#) or great strength in some specific domains (e.g. Ruby on Rails, JavaScript, Dart, etc.). Lately I started being the more and more intrigued by Scala and Go by their use in Twitter and Google respectively, especially for their modern approach to concurrency. Here the gap with C++ is enormous.

There is one more aspect that I consider relevant: the JVM. The one from Sun/Oracle is an excellent piece of software fitted with many valuable technologies and ideas, and I see many positive aspects in using a managed language with an “insulation” layer, at the same strength of an hypervisor for servers, even when portability is not a requirement. Moreover it is reasonable to say that many dynamic optimizations now yield to very good performance, not to mention a bunch of other crucial and well known advantages. Even if you are not aiming at coding in Java, a good knowledge and experience with the JVM and a GC is a good starting point for Scala, Closure, Groovy, but also JavaScript or Dart.

Overlooking the technical aspects, I do believe C++ is the past, I had to move from C++ to C++11 while working on the DROP project and I admit that to some extent it's been helpful. But at the same time I'm convinced more than ever that I don't want C++ to be my everyday coding tool. I have no problems with it, simply it's not much fun, there are much better and newer languages I enjoy much more, and Java is one of them. As soon as my TODO list shrinks a bit I will stick with a few books about Scala and Go... and maybe JavaScript (likely abandoning my will to improve my Python skills, although I like it, and learning Ruby).

In the end, it's just a matter of fun, that's why “I prefer Java to C++”.

No comments:

Post a Comment