Sorting in Rexx

Introduction

After a thread about sorting appeared on comp.lang.rexx recently, I implemented a compendium of sorting algorithms in Rexx and ran some time trials.

These are the nicknames of the algorithms I implemented. For further details, please see the comments in the code.

bubble Classic bubble sort (left-to-right pairwise exchanging of values)
bubble2 Bi-directional bubble sort (left-to-right then right-to-left passes)
selection Selection sort - find the minimum value then place it on the left
shaker Shaker sort - find the minimum and maximum values then place them on the left and right
insertion Insertion sort - move each value leftwards into its place in the sorted data
sshell Simplified Shell sort - a development of bubble sort which compares each element i with an element i+d; when no more exchanges are possible, divide d by two
shell Shell sort - similar to the simplified version but using insertion instead of bubble sort
comb A variation on the simplified Shell sort which doesn't repeat any passes but only divides d by 1.3
tournament  Play a knock-out tournament to find the smallest datum (the algorithm which started the thread which prompted me to carry out this investigation!)
merge Merge sort - divide the array into two halves using a temporary area, sort each, then merge the halves back into the original array
lmerge Merge sort based on a linked-list (an array of subscripts)
smerge Merge sort based on strings of subscripts
heap Classic heap sort
quick Hoare's quicksort - divide the list into two partitions, then sort each partition
radix Left-to-right Radix sort - place items in buckets according to their first character, then sort each bucket
iradix In-place version of the radix sort

The code and the tests

All the above algorithms were placed in a single file of Rexx code called sorts.rexx together with a prologue which takes care of calling the selected algorithm and timing it. The file is just over 1,000 lines long and measures a bit over 33K.

The program can either generate its own test data, in the form of random numbers (in the format 12345.67, complete with leading spaces if necessary so that all numbers are the same width), or can input data from a file (treating each file line as one item). For testing, the "-v" flag will generate a (possibly abbreviated) display of the items before and after sorting, and in any case the program will verify that the items are in order after they are sorted. This doesn't guarantee that the output contains all the input items, once each, but the sorted items can be written to a file for further analysis and so I am fairly confident that all the supplied sorting procedures work as advertised.

The algorithms were tested both on numbers (generated randomly for each individual test) and on text. The text was taken from archives of the comp.lang.rexx newsgroup; this means the data mostly consists of lines of 0 to 80 alphanumeric and symbol characters, and that the same data was used for each test. Up to 15% of the data strings were empty (blank lines).

The theory

The algorithms fall into several groups according to how efficient they are in theory.

In addition, some of the algorithms require extra working space (particularly the tournament sort, which needs space to store the results of all the matches, and merge sort, which needs a work-space to store one of the array halves while they are being merged together).

A summary of the average time and space complexity of each algorithm can be found in the following table. Note that the worst-case time complexity of quicksort is N2, but this is very unlikely to happen in the given implementation.

TimeSpace
bubble(2)N21
selectionN21
shakerN21
insertionN21
(s)shellN1.5(see above)1
combsee above1
tournamentN log(N)N
(l,s)mergeN log(N)N
heapN log(N)1
quickN log(N)log(N)
radixNb (see above)N
iradixNb (see above)log(N)

The complexity of an algorithm (as shown in the above table) is not the whole story, however, because it does not take into account the constant of proportionality. For instance, the bubble sort is almost always much slower than any of the other N2 algorithms because it carries out a lot of redundant comparisons on each pass through the data. With this in mind, the list in which the algorithms were introduced was given in approximate order according to how they are expected to perform (slowest first).

Of the five N2 algorithms, as mentioned above, the bubble sort is generally expected to be the slowest. The bi-directional bubble sort is usually faster because it often requires fewer passes to get the elements in place: a small element near the end of the array only moves back one place for each left-to-right pass, but will traverse the whole array on a right-to-left pass. Shaker should be faster than selection, firstly because it reduces loop overhead by finding two elements on each pass, and secondly because an element is guaranteed not to be the maximum if it is already the minimum and so a comparison is saved in that case. Insertion is often the fastest of the N2 algorithms because on average each element only has to be shifted half way up the array before it is in place. However, it performs more assignments than shaker, so the latter may sometimes win.

The quicksort algorithm is generally considered to be the fastest of the N log(N) methods, although heapsort is more predictable and doesn't have the N2 worst-case that quicksort has. Merge sort can be an efficient algorithm, but is generally not preferred because of the extra space that it requires (which can also slow the algorithm down).

Radix sort is often claimed to be the fastest of all methods for sorting strings, but care must be taken when it is used for numbers. In compiled languages it can be used for sorting numbers stored in big-endian binary format or in IEEE floating-point format (provided the sign is handled properly), but in Rexx it cannot be used on unformatted numbers. Rexx also lets the algorithm down by not having an efficient way to fetch "the ith character of the nth string" - something that can be accessed directly in C, for example. The in-place radix sort is bound to be faster than the plain version in Rexx because there is less copying of the data involved.

The practice

Tests were carried out on two different interpreters, namely REXX/imc and IBM Object Rexx, as the different implementations may well favour the sorting methods in different ways. The actual timing data is shown below.

It was unsurprising that bubble, bubble2 and selection were the three worst methods in all cases, followed by shaker and insertion (in either order). More of a surprise in the Object Rexx tests was that it often took longer to sort short numbers than long text strings and that heapsort with 100 numbers took consistently longer than with 200 numbers! Also, some of Object Rexx's numeric tests seemed unstable (heapsort with 5000 numbers took either about 27 seconds or about 66 seconds with very few results in between; shellsort with 20000 numbers varied between 137 and 250 seconds but with no results between 170 and 220).

In general, heapsort was outperformed by quicksort on Object Rexx; however, contrary to expectations, the reverse was true on REXX/imc. Radix sorting methods were faster than both in most cases, despite the fact that they are not particularly suited to Rexx (for example, the operation `substr(str,n,1)' in REXX/imc causes the whole string `str' to be copied to a workspace before the nth character is selected).

Shell and comb sort were often surprisingly good, despite the fact that they are not true N log(N) algorithms. The tournament sorting method performed unexpectedly poorly in all cases, given that it ought to be very similar to the heapsort aside from the extra bookkeeping (it employs a similar tree structure). Possibly the extra space and/or the pattern of access used by the bookkeeping structure was a factor in dragging the performance down. It may also be the case that Rexx's DROP instruction (which is used in the tournament sorting routine to eliminate each winner) is relatively expensive to execute.

The plain merge sort performed better than its two variants in three of the four cases, possibly because each step involves straightforward moving of each array element into its final position, while the other two methods require the whole array to be reordered once the subscript pointers have been sorted (and they also require a greater number of temporary variables). However, it performed significantly worse on REXX/imc when sorting strings - probably because an assignment on REXX/imc causes the whole string to be copied, and this is faster with short subscripts than long text strings. The string merge sort is an attractive proposition because it is well suited to argument passing and Rexx's parse instruction, and it was declared the winner for sorting strings on REXX/imc. However, it performed rather worse on Object Rexx which perhaps means that this interpreter isn't so good at building strings and splitting them apart, relative to other operations. Sorting many elements using this method does require very long strings to be built and returned from functions, and this might be an inefficient use of memory.

Almost all the tests got significantly slower as the number of elements approached 50,000. The amount of memory consumed by the program was probably not enough to cause swapping, so it probably just means that both interpreters are relatively inefficient when dealing with large arrays of data.

A least-squares fit was attempted in order to express the time taken by each method in the form m·Ne for some multiplier m and exponent e. An expression of the form m·Nelog(N) was also attempted, but there was no case in which that provided a better fit. Probably the log(N) term was swamped by other factors such as the efficiency of each interpreter in dealing with arrays. The multipliers and exponents thus obtained are displayed in the table below. In most cases these values do not fit very well the inflated timings obtained with 20,000 or 50,000 elements. The multipliers are in units of microseconds.

Estimated multipliers and exponents to fit the timing data for various sorting methods
REXX/imcObject REXX
NumbersText NumbersText
Method multexpmultexp multexpmultexp
bubble 18.82.12 13.92.17 14.12.19 13.22.12
bubble2 14.62.13 10.12.20 13.92.16 11.62.12
selection 12.02.12 9.492.14 11.32.15 7.702.13
shaker 6.562.18 5.142.20 7.112.14 4.592.14
insertion 6.342.15 5.022.19 4.282.23 7.132.12
sshell 1111.53 61.91.62 1471.54 90.21.53
tournament  1591.41 1201.47 1331.73 1431.71
shell 82.71.41 28.91.62 37.51.51 71.61.39
comb 1011.41 40.61.58 1371.40 1301.38
smerge 1221.32 69.71.42 5.441.95 6.531.88
lmerge 1681.30 1081.39 27.41.62 23.81.63
merge 1911.26 45.91.54 55.91.51 75.71.43
quick 1081.31 27.91.58 1231.27 53.91.41
radix 1061.31 16.91.64 1531.29 75.71.37
heap 2551.22 53.71.51 95.31.49 1631.32
iradix 1331.23 31.31.53 1831.20 95.61.30

The least exponent achieved was 1.20 (for `iradix' on Object Rexx) which is quite far from N log(N). The explanation for this is undoubtedly due to overheads within the Rexx interpreter. For example, we use a stem variable rather than a true array, and accessing a stem variable has a certain time penalty associated with it (this should be O(log(N)) in REXX/imc, but this interpreter also has other time penalties associated with modifying the value of a variable - particularly when it is one of several large arrays in memory).

It may (or may not) be relevant that while heap sort beat quick sort on REXX/imc it has a much larger multiplier. It was quicker for larger values of N because it has a lower exponent (although this isn't really explained by the theory).

A slightly surprising result is the exponent of 1.41 for the Shell sort. This is the only case in which the exponent came out less than that predicted by the theory.

The net result of this is that the attempted fit of the timing data to a simple formula in N is probably not very accurate or useful. In any case, the theory often doesn't predict a simple formula: a method described as O(N log(N)) might actually take time A·N+B·N·log(N) for some constants A and B. The N log(N) component dominates for larger values of N but the other term could influence the result significantly for the smaller values used in the tests.

Conclusions

Quicksort and heapsort are good general sorting algorithms and should be the natural choice for more than a few hundred data items. Radix sorting methods are fairly complex and do not work in all cases, but they are also fast and could be the best choice in some specialised applications. Merge sort can work well if the items are in a linked list or provided as a string of numbers, but for most applications the extra space required discounts it from being an efficient solution. For small numbers of items, combsort is the simplest and one of the fastest methods available. Insertion sort also has some merits for small datasets. Bubble sort is probably only ever useful as a programming exercise!

Numerical data

The average times (in seconds) for the program to run the various sorting algorithms are given below. Many of the tests which took more than about fifteen minutes were only run once (and the slower sorting algorithms were not tested at all on the larger data sets).

REXX/imc 1.76 was tested on a Sun Ultra-5 SparcStation running Solaris 9 on which it executes about 91,000 clauses per second.

Times taken by REXX/imc to sort random numbers by various methods
100 200 500 1000 2000 5000 10000 20000 50000
bubble    0.331    1.40     9.57    41.9    190     - - - -
bubble2    0.279    1.15     8.36    35.7    166     - - - -
selection    0.213    0.863    5.92    27.3    118     - - - -
shaker    0.160    0.659    4.55    20.9     92.4    822     - - -
insertion    0.135    0.595    4.04    16.9     76.1    664     - - -
sshell    0.138    0.393    1.35     3.79    10.9     52.9    160      544     -
tournament    0.120    0.298    0.928    2.39     6.38    27.4     82.5    294     1625    
comb    0.066    0.196    0.583    1.59     4.26    16.2     46.1    121      713    
shell    0.060    0.150    0.490    1.23     3.42    14.0     39.5    121      651    
lmerge    0.072    0.175    0.529    1.25     3.07    11.4     29.9     84.2    390    
smerge    0.060    0.143    0.403    0.950    2.36     9.40    27.6     99.0    572    
merge    0.065    0.157    0.457    1.08     2.55     8.84    21.8     57.3    243    
quick    0.045    0.115    0.349    0.844    2.10     7.33    19.1     52.6    253    
heap    0.068    0.166    0.490    1.11     2.60     8.09    18.8     46.4    162    
radix    0.040    0.108    0.515    0.654    1.85     9.71    15.7     47.8    325    
iradix    0.039    0.100    0.268    0.595    1.64     4.66    12.0     38.1    130    
 
Times taken by REXX/imc to sort lines of text by various methods
100 200 500 1000 2000 5000 10000 20000 50000
bubble    0.302    1.37     9.32    42.6    205     - - - -
bubble2    0.252    1.24     8.37    40.0    192     - - - -
selection    0.184    0.792    5.33    23.5    114     - - - -
shaker    0.140    0.564    4.09    18.2     87.3    760     - - -
insertion    0.129    0.568    3.68    17.6     85.4    689     - - -
sshell    0.116    0.409    1.32     3.77    13.3     65.4    226      953     -
shell    0.058    0.164    0.578    1.63     5.35    28.5    101      476     3416    
tournament    0.125    0.302    1.01     2.60     7.32    34.4    111      440     2885    
comb    0.069    0.177    0.628    1.92     5.69    27.8     97.6    396     2871    
merge    0.071    0.167    0.568    1.55     4.74    23.8     87.5    386     2885    
radix    0.050    0.089    0.367    1.08     3.49    21.6     81.9    360     2678    
quick    0.048    0.127    0.432    1.21     5.06    18.9     68.5    334     2527    
heap    0.071    0.162    0.565    1.56     4.48    21.5     74.6    302     2266    
iradix    0.051    0.093    0.376    0.936    2.97    15.6     53.0    224     1613    
lmerge    0.076    0.173    0.556    1.36     3.59    15.3     46.1    168     1036    
smerge    0.060    0.137    0.423    1.06     2.88    13.1     43.8    168     1222    

IBM Object Rexx 2.2.0.0 was tested on a Pentium 233MMX machine running Red Hat Linux 7.3 on which it executes about 96,000 clauses per second. (By comparison, REXX/imc executes about 57,000 c.p.s. on this machine.)

Times taken by Object Rexx to sort random numbers by various methods
100 200 500 1000 2000 5000 10000 20000 50000
bubble    0.336    1.64     9.56    58.0    233     - - - -
bubble2    0.299    1.42     7.85    45.6    199     - - - -
selection    0.233    1.09     6.02    33.6    147     1052     - - -
insertion    0.142    0.621    3.60    19.4    106      872     - - -
shaker    0.144    0.633    3.77    19.9     87.2    620     - - -
tournament    0.509    1.30     4.95    16.2     58.3    333     1452     - -
smerge    0.077    0.165    0.623    2.57    10.1     90.3    566     2622     -
sshell    0.187    0.584    1.70     5.86    17.2     71.4    235      918     -
heap    0.215    0.184    0.673    2.04     4.48    55.6    116      900     3778    
lmerge    0.076    0.152    0.392    1.54     5.09    32.2    109      427     2695    
merge    0.085    0.160    0.534    1.33     4.39    19.7     88.9    342     2597    
comb    0.083    0.255    0.685    2.35     5.75    17.9     57.8    204      909    
shell    0.050    0.111    0.350    1.07     3.45    13.6     51.0    178      710    
radix    0.063    0.137    0.593    0.748    2.40    12.4     20.9     74.6    503    
quick    0.054    0.104    0.292    0.659    1.90     6.11    18.4     56.5    236    
iradix    0.050    0.119    0.262    0.641    1.77     4.49    13.9     47.6    219    

(The appearance of the numbers 20008 and 50008 in the table below is due to Object Rexx's different interpretation of some "^Z" characters in the test data.)

Times taken by Object Rexx to sort lines of text by various methods
100 200 500 1000 2000 5000 10000 20008 50008
bubble    0.252    1.01     6.54    29.0    153     - - - -
bubble2    0.218    0.896    5.29    23.6    136     - - - -
selection    0.154    0.583    3.80    15.6     92.0    580     - - -
insertion    0.131    0.545    3.26    14.0     84.6    477     - - -
shaker    0.098    0.376    2.43    10.4     58.3    397     - - -
tournament    0.501    1.20     4.63    14.8     51.3    291     1275     - -
smerge    0.066    0.152    0.497    1.67     6.86    59.8    386     1498     -
sshell    0.118    0.361    1.04     2.78    10.9     38.9    155      711     -
lmerge    0.061    0.159    0.369    1.28     4.16    29.8    103      386     2454    
comb    0.086    0.193    0.588    1.59     4.58    14.6     50.3    176     1101    
merge    0.070    0.157    0.453    1.05     3.31    14.9     51.1    157      944    
heap    0.076    0.164    0.546    2.13     3.07    10.5     37.0    119      659    
shell    0.049    0.124    0.371    0.871    2.91     9.35    32.5    154      498    
quick    0.043    0.099    0.278    0.808    3.11     6.94    30.6     85.9    811    
radix    0.056    0.095    0.401    0.736    2.46     9.56    28.3    102      488    
iradix    0.052    0.082    0.288    0.567    1.61     6.01    19.2     64.9    336