Count Sort

A sorting algorithm with O(n) time complexity with the following key points

  • Sorts the array in 2 passes
  • Does not make use of comparisons
  • Keeps a Bookkeeping array for counting number of occurrences per array item.

The algorithm's limitation is that in its worst case it could consume 2 ^ bits of memory especially when

  • 32 & 64 bit types are used,
  • with arrays where the distance between the array's MIN and MAX values is significantly large

Challenges working with integer arrays

The algorithm requires the ability to translate

  • Array values (positive or negative) to BookKeeping indexes (positive)
  • BookKeeping Index positions (positive) to negative or positive values
+---+---+---+---+---+---+---+---+     Type: Integer or Unsigned
| 2 | 2 | 5 | 5 | 8 | 1 | 5 | 3 |     (Min, Max) = (1,8)
+---+---+---+---+---+---+---+---+     (distance) = Min - Max + 1 = 8
  |   |   |   |    \______|__
   \__|    \___\_________/   \
      |           |           |
+---+---+---+---+---+---+---+---+    Type: Unsigned
| 1 | 2 | 1 | 0 | 3 | 0 | 0 | 1 |    BookKeeping (BK) capacity(8)
+---+---+---+---+---+---+---+---+    holding counts from BK[min..max]   
min(1)  BK['5'] = ^         max(8)       

Distance calculation

Therefore, knowing the distance between min & max values is fundamental to the algorithm's logic.

However, integer values can easily cause an overflow when the distance between min and max exceeds the [-127..0] or [0..127] ranges

-127            0            +127
  |-------------|-------------|        Both Min and Max are negative
   <-- dist --->                        Safe: Dist = (max - min)

-127            0            +127
  |-------------|-------------|        Min is negative, Max is positive
                                       Unsafe: (max - min) overflows
    <-------- dist --------->                

-127            0            +127
  |-------------|-------------|        Both Min & Max are positive
                  <-- dist -->         Safe: Dist = (max - min)

Therefore, when min and max have opposite signs we have to convert both to usize before we calculate the distance. Therefore the below implementation will calculate the distance correctly for both signed and unsigned types.

fn dist_from(&self, min: i8) -> usize {
    if *self > min {
        (*self as usize).wrapping_sub(min as usize)
    } else {
        (min as usize).wrapping_sub(*self as usize)
    }
}

let len: usize = max.dist_from(min);

Now that we know how to calculate the distance we can proceed with value-to-index and index-to-value translations.

Value-to-index translation

We know that

  • Min maps to Bookkeeping[0] position and
  • Max maps to BookKeeping[distance] position
  • where Min <= array[..] <= Max

Therefore, the index is found as index = value - Min which more or less is the distance from min, which we already know how to calculate. As a result the translation is given by the following implementation ...

let idx = value.dist_from(min);     // Map array value -> BK index 
BookKeeping[idx] += 1;              // increment count by 1

Index-to-value translation

This is the reverse effect, where we need to translate the index from the BookKeeping onto the corresponding array value, since we know that BookKeeping position [0] is the min value wihtin the input array.

For example, if min == 6 then the array's value at position index will be given as

  • for index = 0, array[0] = MIN + 0
  • for index = 1, array[1] = MIN + 1
  • for index = 2, array[2] = MIN + 2
  • etc

Recall that the max(index) == distance and distance

  • always fits the unsigned numbers value range
  • overflows the signed numbers value range as shown below
-127            0            +127
  |-------------|-------------|        (Min,Max) = (-123,122)
    -123 <----- dist -----> 122        distance = 245
     min                    max        value = (Min: -123 + index: 245)
                                       ^^^^^ ** OVERFLOW **

For example, i8 has i8::MIN value of -128, and by adding index with value 245 will cause an overflow of -11; this is equivalent to 245 % i8::MIN. However, the trick here is that by adding -11 to min and wrapping around, will yield the desired value.

Therefore, the steps to translate index/unsigned to value/signed are

  1. Convert index to i8 given by index % i8:MIN
  2. and do a modular add with min
Value = (Min  + (   index as i8  )) % 128                
=====   =====   ===================   ===
82    = (-123 + (-51 = 205 % -128)) % 128
113   = (-123 + (-20 = 236 % -128)) % 128
122   = (-123 + (-11 = 245 % -128)) % 128

Rust performs then above operation with the following statement and implemented asDistance::add_index()

array[index] = min.wrapping_add( index as i8 );

Final implementation

Hence, by putting all the above together, we have the following implementation for the CountSort trait

/// Sorts a given array using the Count Sort algorithm.
/// Input array NuType shouldn't exceed u16 to avoid memory issues
/// ```
/// use csx3::sort::count::CountSort;
///
/// let v = &mut [3i8,5,8,1,2,4,6,0];
///
/// v.count_sort();
/// assert_eq!(v, &[0,1,2,3,4,5,6,8]);
/// ```
pub trait CountSort {

    fn count_sort(&mut self);
}

// CountSort macro implementation for singed and unsigned types
impl<T> CountSort for [T]
    where T: Distance<T> + Copy + Ord + Debug{

    fn count_sort(&mut self) {

        if self.len() < 2 {
            return;
        }
        // find min and max elements
        // so we can construct the boundaries of the counting array
        // i.e. if (min,max) = (13232, 13233) then we need only an array with capacity(2)
        let (min, max) = min_max(self);

        // construct a counting array with length = Max - Min + 1
        let len: usize = max.dist_from(min);
        // initialise it with zero counts
        let mut count = vec![0usize; len + 1];
        // and finally measure counts per item
        self.iter()
            .for_each(|x| {
                // construct index offset based on Min value, such as, Min is at [0] position
                let idx: usize = x.dist_from(min);
                count[idx] += 1;
            });

        // play back onto the input slice the counts collected with Sum of all counts == slice.len()
        let iter = &mut self.iter_mut();
        count.into_iter()
            .enumerate()
            .filter(|(_, x)| *x > 0)
            .for_each(|(i, x)| {
                // place value at `x` positions
                iter.take(x)
                    // translate index -> value
                    // given value = Min + index
                    .for_each(|n| { *n = min.add_index(i ) });
            });
    }
}
/// Distance calculation between two types that are either both signed or unsigned
/// Returns the distance as unsigned type
pub trait Distance<T> {
    fn dist_from(&self, min: T) -> usize;
    fn add_index(&self, idx: usize) -> T;
}

/// Macro implementation of Distance trait for all signed types
macro_rules! impl_dist_signed {
    ( $($x:ty),*) => {
        $( impl Distance<$x> for $x {
            #[inline]
            fn dist_from(&self, min: $x) -> usize {
                if *self > min {
                    (*self as usize).wrapping_sub(min as usize)
                } else {
                    (min as usize).wrapping_sub(*self as usize)
                }
            }
            #[inline]
            fn add_index(&self, idx: usize) -> $x { self.wrapping_add(idx as $x) }
        } )*
    }
}
impl_dist_signed!(i8,i16,i32,isize,u8,u16,u32,usize);