Which Java sort is faster – MergeSort, QuickSort or Arrays.sort?

I had a discussion the other day about writing your own sorting code versus using the built in Java sort.  So tonight I decided to do a little geek research for the fun of it.  Now, I will say, I have rarely ever rolled my own sorting algorithm in Java (if ever).  In C, yes, C++, maybe – until STL came around.  So I decided to do a test.  I scoured the internet (ok, a couple of Google searches) for a good sample of a Quick Sort(link here) and a Merge Sort(from here) and then I compared them to the built in Java sort (Arrays.sort).  The two code samples came from Java Tips, great site, check it out.

I did see some interesting numbers when I changed the number of elements or the kind of elements I sorted.  I ended up using a string concatenated with the current time in milliseconds to fill the array.  I ended up running a series of tests where the sample ranged from 10,000 entries to 200,000 entries.  Is this likely in the real world? Probably not, but it makes for some good geek discussion!

Code to fill the array:

private static void fillArray(List<String> array) {
     Random generator2 = new Random( System.currentTimeMillis() );

     for (int x = 0; x<50000; x++){
         array.add("Value = " + generator2.nextInt());
     }
}

The following is a series of results  after running the code 10 times.  The results are the time it took to make each sort call in milliseconds.

I think the key takeaway from this table is you can consistently see the Java (Arrays.sort) consistently faster across the different element counts.

I do not know this for sure, I could look into the Java source to verify but I think the Arrays.sort must be using some kind of Introsort algorithm where it recognizes the recursion depth and dynamically switches to a heap sort when the recursion depth gets too high.  One issue that is not shown here, is the Quick Sort and Merge Sort actually run out of memory on sets that are over a million (in a basic configuration).  So this recursion optimization is actually a really big deal with very large sets and in short the Arrays.sort should probably be used in most cases.

The reality is, most data has some kind of sort out of the box and is rarely this random.  The key however, is to really understand the various things going on within your sort code and running a few tests like this may in the end save you a lot of programming time.

13 thoughts on “Which Java sort is faster – MergeSort, QuickSort or Arrays.sort?

  1. Quicksort’s performance is largely based on how you choose the pivot. For example, if do simple partitioning on input that is already sorted Quicksort will die. The implementation you reference uses median-of-three partitioning, which is good but there are better methods.

    I’m not sure why this Quicksort would run out of memory. It appears to be doing the sort in place but I didn’t look too closely. It’s obvious why Merge sort blows up.

    Like

  2. Actually, it was only the mergesort that dumped. When the set was several million the program never even got to the QuickSort. I should move the quicksort up first and see what it does with extremely large sets and see how it performs.

    Like

  3. Having had a quick look at Arrays.sort, it would also appear to use a Merge Sort, but with a couple of optimisations. The two I immediately spotted (thanks to helpful comments) were checking to see if the data was already sorted (not helpful in your case), and doing an insertion sort if there were fewer than 7 elements in one of the sub arrays.

    A quick test on my laptop indicates that it will happily work with 10,000,000 million random strings in an array. 20 Million caused it to run out of memory, even with –Xmx2048m.

    Like

  4. When doing benchmarks, please take into account the granularity of the timer you’re using. The standard windows millisecond timer has a granularity of about 15 or 16ms and is largely dependent on operating system context switches. In your first two examples, you only get 0-4 clicks which is much too low to get any numbers that actually mean something.

    There’s System.nanoTime() which does a much better job and has a much finer granularity (although not nanosecond granularity, it does take some time between “clicks”). I’ve also tried using JNI to hook into windows’ high performance counters to get more accurate times for benchmakrs, but found that the overhead simply isn’t worth it.

    Another suggestion would be to repeat the test a large amount of times so it runs long enough for each iteration (if you want to sort small arrays, repeat the sort until it runs at least several seconds).

    Like

    • Thanks Kim, that is why I did it multiple times but I understand what you are saying. In general over time and multiple runs you should be able to get some kind of average. Whether its 100% accurate may not be that important but if I had to do this for my actual job I would have used something similar to what you mention and have run for probably 1000 times each – programmatically of course. 🙂

      Like

  5. Pingback: Which Java sort is faster? Part 2 » Balfes.net

  6. I Haven’t really written a sort since the early 1970s, but sorting was already extensively studied way back then (by Knuth and others). Sorting via magnetic tape or punched cards was fun, often with frequent manual intervention to mount new tapes or feed partially sorted card decks back in for the subsequent sort phase.

    There’s a ton (or tonne, if you prefer — of Java sorting algorithm examples out there, including some effective animations. A few examples are:
    http://www.cs.cf.ac.uk/Dave/C/ANIM/anim.html
    http://www.cs.ubc.ca/~harrison/Java/sorting-demo.html
    http://maven.smith.edu/~thiebaut/java/sort/demo.html
    http://www.ansatt.hig.no/frodeh/algmet/animate.html

    And many more! … The key thing is to remember that a sort algorithm that works well in one environment (hardware platform, OS, available memory, nature of the problem, etc) mightn’t do so well when applied to another problem, so you’ve gotta be careful which one you choose each and every time.

    Like

  7. Pingback: Sorting a HashMap in Java: SortedMap to the rescue! | Bob's Blog

  8. I have a text that describes a Parallel Sorting by Random Sampling algorithm that purports linear performance improvement up to high processor counts.

    You can take a look at my Java implementation of PSRS sort on Github; https://github.com/broadbear/sort.

    I’ve encapsulated some parallel search algorithms in a Collections type API:

    List sorted = PCollections.sort(List unsorted);

    Like

    • Mike,

      I tried your code, and run into a number of issues. The immediate one was the instantiation of aFinal array in PSRSSort class (line 60). Array of Integer type will cause the ArrayStoreException exception in PSRSSort, line 231.

      PCollections.sort() was passed a List of Comparable classes other than Integer. However, after changing Integer to Object in line 60, I still had an ArrayIndexOutOfBounds exception in line 160.

      I do not yet have a test case to demonstrate this to you. If you cannot reproduce the problem at your end, let me know.

      Thanks,
      Alex

      Like

  9. Pingback: Thank you for a great 2015 and Happy New Year! | Bob's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s