Shellsort Algorithm and Gap Sequences

Flavjo Xhelollari
3 min readMar 16, 2021

In this article we will discuss the implementation of Shellsort algorithm and its various gap sequences and their impact on the performance of the algorithm. What is Shellsort? Some argue that shellsort is mainly a variation of Insertion Sort algorithm since we are moving elements only one position ahead. So, it can be considered as generalization of insertion sort, enabling distant entries to exchange with each-other. We can summarize the idea of shellsort into the following steps:

  1. Pick a value for the gap.
  2. Divide the array into equal subarrays of size gap.
  3. Sort the subsequences using Insertion Sort.
  4. Decrease the value of the gap, and repeat the previous steps until sorted.

For the implementation of this algorithm, we are going to see different cases. To begin with, there are 2 modes of running the algorithm. Via the first one, we pass a large vector of numbers generated randomly, to the shellsort function, and also we pass the type of the shellsort gap sequence as an argument. This is how we generate the random numbers, and pass them to the shellsort method. The shellsort algorithm, is defined as follows:

Github gist containing part of the code for the execution of the program.

As mentioned above, in the first mode, we specify the size of the array in the command line as follows:

for n in $( seq 10 10 100000 )
do./program $n 1 500000 2 > /dev/null
./program $n 1 500000 2 > /dev/null
done 2> results_hibbart.dat

So, the command line input works like this : first number represent the mode (1 or 2), the second number represents the size of the array to be passed to shellsort, and the third number represents the type of gap sequence. Back to the modes, apart from the first one, the second mode works differently since the second argument in the command line is the name of a ”.dat” file which contains numbers that we shall pass to shellsort to order. The most important feature of the algorithm’s implementation is the gap sequence. This is displayed in the three different sequence gaps mentioned in the algorithm. Firstly, Donald Shell, when designing the Shellsort algorithm, proposed that the general term for the sequence is n/2^k, where n is the size of the unsorted array. Following this rule, concrete gaps would be n/2,n/4 . . . 1.This would lead us to a worst case complexity of Θ(n²). Apparently, this is not an improvement from the typical Θ(n²) algorithms. Actually this can be seen in my implementation, if we run the algorithm, and pass1as a second argument when calling shellsort function. Another possible case is when we have a different type of gap. The second one is Hibbard’s sequence. The general term for Hibbard’s sequence is 2^(k−1). This means that we have as concrete gaps, all the numbers equal to the powers of two, minus one. Concretely, we have 1, 3, 7, 15 . . . . For this specific gap, the algorithm ∈ Θ(n^(3/2)). Obviously, compared to Donald Shell’s gap implementation, this algorithm performs better. When I researched more about the topic, I came across Sedgewick’s sequence, which is accepted to be one of the most best-performing strategies on Shellsort algorithm. The general term for Sedgewick’s sequence is 4k+ 3·2k^2+ 1. The first elements of this sequence are 1, 8, 23 . . . . As we can see, the numbers in this sequence are coprime, and for different numbers in it, say h, g, a g-sorted array after h-sorted remains g-sorted and vice versa. For the Sedgewick sequence, we can say that it ∈ O(n^(4/3)). Thus, from all the strategies of sequences mentioned above, the last one is the best option when it comes to performance. When running the algorithm with large inputs, we see that for the first run (mode 1, Hibbard’s Sequence) is slower than the second run (mode 2, Sedgewick’s Sequence). To plot the graphs, we use gnuplot as following:

gnuplotset key top leftset xlabel "Input Size"set ylabel "Basic Operations"plot "hibbard.dat" using 1:2 pointtype 7 title "Hibbard", \2*x**2 linewidth 2 linecolor "red" title "2n^2"

For the second run, we use the same commands, except that :

...
plot "sedgewick.dat" using 1:2 pointtype 7 title "Sedgewick", \plot "hibbard.dat" using 1:2 pointtype 7 title "Hibbard", \2*x**2 linewidth 2 linecolor "red" title "2n^2"

Thus, there is a visible difference, and we can see that Sedgewick’s Sequence is faster than Hib-bard’s.

Sources:

  1. Shellsort. (2021, February 11). Retrieved March 01, 2021, from https://en.wikipedia.org/wiki/Shellsort
  2. Sedgewick, R. (1996). Analysis of Shellsort and related algorithms. In Algorithms — ESA ’96(pp. 1–11). Springer Berlin Heidelberg. https://doi.org/10.1007/3-540-61680-242

--

--

Flavjo Xhelollari
0 Followers

Computer Science and Mathematics recent graduate from @American University in Bulgaria, interested in Data Science and AI.