Java vs. C++ for Server Development - Still Not Good Enough!

May 9, 2015 | 11:21 pm PDT

172 views

0 comments

Perspective: Fox

Synopsis

Over the past couple of months, I've been working on developing a functionally equivalent implementation of a simple, high performance, scalable, server system, in both C++ and in Java, in order to determine the current viability of Java as a server language/platform.

The motivation of that project is twofold:

For my own professional knowledge, when considering how to architect and implement future projects; and
If it turns out my suspicions are correct, then to have concrete proof to show the hordes of Java proponents who keep trying to convince me that Java is just as performant and scalable as C++.

At this point, both implementations of the server are, in fact, functionally equivalent. They both contain the same abstractions and perform the same (minimal) functionality. Currently, the servers listen for incoming socket connections, then hand each connection off to a dedicated thread to perform the handling of that connection. At this time, the handling only consists of reading the available data from the socket then echoing it back to the remote process. Very simple, and very minimal.

As I have time, I will be adding more "typical" server side processing to the connection handlers. For example, certain text parsing for HTTP messages, creating instances of "typical" server side objects (e.g. a "User" object to represent the user associated with the connection). But, for now, let's consider the simple case of the echo server.

I have, very deliberately, chosen to use a thread-per-connection model. If you are one of the many that believe an event driven, SEDA, or asynchronous messaging model is more scalable then I'm sorry, but you are amongst the masses of misinformed (we'll deal with that in a later post). For the Java version, I've had to create a thread pool and pre-start all of the threads, because I found Java's thread creation performance to be abysmal. The C++ version just calls pthread_create() each time a connection is received. Otherwise, the two versions are pretty much identical.

Build and Test Environment

All development and testing is being done on OpenSUSE 13.2, running Linux kernel 3.16.7.

The hardware configuration is as follows:

CPU: Intel Core i3-3220, 3.30GHz, 2 cores with 2 hyperthreads each
RAM: 16GB (2x 8GB), DDR3-1600
Network: Realtek RTL8169 Gigabit Ethernet controller
Hard Drive: Seagate Barracuda ST1500DM003, SATA drive, 6Gb/s

None of the other hardware in the test machine should have any relevance. For that matter, the network interface and the hard drive are really not relevant either since the system client application is running on the same host, and the system is not swapping.

The C++ version has been compiled with GNU g++ 4.8.3. The exact command lines used to compile and link are shown in the makefile. Generally, though, I used optimization level 2 ("-O2"), and link time optimization ("-flto").

The Java version was compiled (into bytecode) using Oracle's Linux Java compiler, version 1.8.0 update 40. The same version of the JVM is used at runtime.

The client takes, as a command line argument, the maximum number of connections to maintain (--maxconn). While it is running, it pseudo randomly establishes between "maxconn" and 75% of "maxconn" number of connections, and pseudo randomly terminates between "maxconn" and 75% of "maxconn" connections. This simulates an environment where users are regularly connecting and eventually disconnecting to/from the server. While a given connection is active, it pseudo randomly sends/receives 512 byte "messages" which the server accepts then echoes back.

I ran the tests with 4 client instances, each with a "maxconn" of 7,500 so that the server will always have between 22,500 to 30,000 active connections and there will always be between 0 to 7,500 new connections being established and/or terminated. I believe this, reasonably, simulates a real world environment.

After starting the server and the 4 client instances, I wait for the clients to complete their first large block of connections (6,883 connections), then I start running the Linux 'top' command on the server's PID. I run 'top' with a delay interval of 5 seconds, for 60 iterations. This provides a 5 minute sampling.

Results

In short, what we find is the following:

CPU Usage Graph
The C++ version shows a median CPU utilization of 11.38%, while the Java version shows a median CPU utilization of 52.69% Output of 'top' for the C++ version, and for the Java version.
Memory Usage Graph
The C++ version shows a median memory usage (resident set size) of 304MB, while the Java version shows a median memory usage of 3,527MB Output of 'top' for the C++ version, and for the Java version.

In other words, the Java version uses 4.6 times (463%) the amount of CPU, and 11.6 times (1,160%) the amount of memory, to perform essentially the same amount of work as the C++ version.

Two other very significant points, which must be mentioned:

The JVM never releases memory back to the OS. So, let's say the server load is typically around 20,000 active connections but for whatever reason, at one point it spikes to 40,000 for a short period, then goes back down to 20,000, the additional memory required to support that 40,000 spike will not be released until you restart the JVM - not a very acceptable solution for a server system!
If you look at the accompanying CPU utilization chart, you see that the Java version is plagued by frequent extremes. It is constantly fluctuating between about 30% and 90%. I can't say that I'd be very comfortable with that - especially when compared to the C++ version which remains fairly consistent between 10% and 13%. What would worry me about the spikes in the Java version is, if there is other processing occurring on the host, at the same time, then you're going to have sporadic contention.

Conclusions

It's important to keep in mind, that all the server is doing is accepting connections, receiving messages, then sending those messages back to the client. In a real server we'd actually be doing something with, or in response to, the received messages (such pseudo logic will be added as I have time).

A lot of Java proponents have the belief that you can address performance and scalability issues by simply throwing more cheap, commodity, servers at the system to spread the load. I've never considered that approach to be particularly feasible once you factor in the total cost of ownership/operation. When you take into account the additional space requirements of 5 to 10 times the number of servers; the additional electricity consumption, particularly for cooling; the additional manpower to maintain those additional servers; the additional complications of having a higher number of hardware failures due to using lower quality components combined with a higher quantity of such components; the numbers fail to be so attractive over a duration of time.

The other issue I have with the idea of "horizontal scalability" (as opposed to vertical scalability) is that you now have a significantly larger number of processes having to communicate and synchronize with each other over a network - which results is dramatically more complicated code and systems, overall. On this I speak from long and painful experience. A single process, running on a single machine requires a fraction of the development maintenance than does a system composed of n instances of the process running on n distributed servers.

In closing, I would have difficulty imaging a scenario where I could justify implementing a server system in Java, as opposed to C++. Sure, you might save some time on initial development, but if the system has any reasonable performance and scalability requirements I am confident what was saved in initial development effort will be lost (and then some) on trying to meet such requirements. It's been my experience that those performance and scalability requirements end up not being met, but by that point so much time and resources have been put into the Java implementation that the project management is not willing to admit a mistake, and upper management is not willing to start over with C++.

Next Post:
Java Buffered Socket IO, Memory Usage, and Performance in Scalable Server Systems »

Comments

No Responses to: Java vs. C++ for Server Development – Still Not Good Enough!