Count Sort
A sorting algorithm with O(n) time complexity with the following key points
- Sorts the array in 2 passes
- Does not make use of comparisons
- Keeps a Bookkeeping array for counting number of occurrences per array item.
The algorithm's limitation is that in its worst case it could consume 2 ^ bits of memory especially when
- 32 & 64 bit types are used,
- with arrays where the distance between the array's MIN and MAX values is significantly large
Challenges working with integer arrays
The algorithm requires the ability to translate
- Array values (positive or negative) to BookKeeping indexes (positive)
- BookKeeping Index positions (positive) to negative or positive values
+---+---+---+---+---+---+---+---+ Type: Integer or Unsigned
| 2 | 2 | 5 | 5 | 8 | 1 | 5 | 3 | (Min, Max) = (1,8)
+---+---+---+---+---+---+---+---+ (distance) = Min - Max + 1 = 8
| | | | \______|__
\__| \___\_________/ \
| | |
+---+---+---+---+---+---+---+---+ Type: Unsigned
| 1 | 2 | 1 | 0 | 3 | 0 | 0 | 1 | BookKeeping (BK) capacity(8)
+---+---+---+---+---+---+---+---+ holding counts from BK[min..max]
min(1) BK['5'] = ^ max(8)
Distance calculation
Therefore, knowing the distance between min & max values is fundamental to the algorithm's logic.
However, integer values can easily cause an overflow when the distance between min and max exceeds the [-127..0] or [0..127] ranges
-127 0 +127
|-------------|-------------| Both Min and Max are negative
<-- dist ---> Safe: Dist = (max - min)
-127 0 +127
|-------------|-------------| Min is negative, Max is positive
Unsafe: (max - min) overflows
<-------- dist --------->
-127 0 +127
|-------------|-------------| Both Min & Max are positive
<-- dist --> Safe: Dist = (max - min)
Therefore, when min and max have opposite signs we have to convert both to usize before we calculate the distance. Therefore the below implementation will calculate the distance correctly for both signed and unsigned types.
fn dist_from(&self, min: i8) -> usize {
if *self > min {
(*self as usize).wrapping_sub(min as usize)
} else {
(min as usize).wrapping_sub(*self as usize)
}
}
let len: usize = max.dist_from(min);
Now that we know how to calculate the distance we can proceed with value-to-index and index-to-value translations.
Value-to-index translation
We know that
Minmaps toBookkeeping[0]position andMaxmaps toBookKeeping[distance]position- where
Min <= array[..] <= Max
Therefore, the index is found as index = value - Min which more or less is the distance from min, which we already know how to calculate.
As a result the translation is given by the following implementation ...
let idx = value.dist_from(min); // Map array value -> BK index
BookKeeping[idx] += 1; // increment count by 1
Index-to-value translation
This is the reverse effect, where we need to translate the index from the BookKeeping onto the corresponding array value, since we know that BookKeeping position [0] is the min value wihtin the input array.
For example, if min == 6 then the array's value at position index will be given as
- for
index = 0,array[0] = MIN + 0 - for
index = 1,array[1] = MIN + 1 - for
index = 2,array[2] = MIN + 2 - etc
Recall that the max(index) == distance and distance
- always fits the unsigned numbers value range
- overflows the signed numbers value range as shown below
-127 0 +127
|-------------|-------------| (Min,Max) = (-123,122)
-123 <----- dist -----> 122 distance = 245
min max value = (Min: -123 + index: 245)
^^^^^ ** OVERFLOW **
For example, i8 has i8::MIN value of -128, and by adding index with value 245 will cause an overflow of -11; this is equivalent to 245 % i8::MIN.
However, the trick here is that by adding -11 to min and wrapping around, will yield the desired value.
Therefore, the steps to translate index/unsigned to value/signed are
- Convert
indextoi8given byindex % i8:MIN - and do a modular add with
min
Value = (Min + ( index as i8 )) % 128
===== ===== =================== ===
82 = (-123 + (-51 = 205 % -128)) % 128
113 = (-123 + (-20 = 236 % -128)) % 128
122 = (-123 + (-11 = 245 % -128)) % 128
Rust performs then above operation with the following statement and implemented asDistance::add_index()
array[index] = min.wrapping_add( index as i8 );
Final implementation
Hence, by putting all the above together, we have the following implementation for the CountSort trait
/// Sorts a given array using the Count Sort algorithm.
/// Input array NuType shouldn't exceed u16 to avoid memory issues
/// ```
/// use csx3::sort::count::CountSort;
///
/// let v = &mut [3i8,5,8,1,2,4,6,0];
///
/// v.count_sort();
/// assert_eq!(v, &[0,1,2,3,4,5,6,8]);
/// ```
pub trait CountSort {
fn count_sort(&mut self);
}
// CountSort macro implementation for singed and unsigned types
impl<T> CountSort for [T]
where T: Distance<T> + Copy + Ord + Debug{
fn count_sort(&mut self) {
if self.len() < 2 {
return;
}
// find min and max elements
// so we can construct the boundaries of the counting array
// i.e. if (min,max) = (13232, 13233) then we need only an array with capacity(2)
let (min, max) = min_max(self);
// construct a counting array with length = Max - Min + 1
let len: usize = max.dist_from(min);
// initialise it with zero counts
let mut count = vec![0usize; len + 1];
// and finally measure counts per item
self.iter()
.for_each(|x| {
// construct index offset based on Min value, such as, Min is at [0] position
let idx: usize = x.dist_from(min);
count[idx] += 1;
});
// play back onto the input slice the counts collected with Sum of all counts == slice.len()
let iter = &mut self.iter_mut();
count.into_iter()
.enumerate()
.filter(|(_, x)| *x > 0)
.for_each(|(i, x)| {
// place value at `x` positions
iter.take(x)
// translate index -> value
// given value = Min + index
.for_each(|n| { *n = min.add_index(i ) });
});
}
}
/// Distance calculation between two types that are either both signed or unsigned
/// Returns the distance as unsigned type
pub trait Distance<T> {
fn dist_from(&self, min: T) -> usize;
fn add_index(&self, idx: usize) -> T;
}
/// Macro implementation of Distance trait for all signed types
macro_rules! impl_dist_signed {
( $($x:ty),*) => {
$( impl Distance<$x> for $x {
#[inline]
fn dist_from(&self, min: $x) -> usize {
if *self > min {
(*self as usize).wrapping_sub(min as usize)
} else {
(min as usize).wrapping_sub(*self as usize)
}
}
#[inline]
fn add_index(&self, idx: usize) -> $x { self.wrapping_add(idx as $x) }
} )*
}
}
impl_dist_signed!(i8,i16,i32,isize,u8,u16,u32,usize);