Count Sort
A sorting algorithm with O(n) time complexity with the following key points
- Sorts the array in 2 passes
- Does not make use of comparisons
- Keeps a Bookkeeping array for counting number of occurrences per array item.
The algorithm's limitation is that in its worst case it could consume 2 ^ bits
of memory especially when
- 32 & 64 bit types are used,
- with arrays where the distance between the array's MIN and MAX values is significantly large
Challenges working with integer arrays
The algorithm requires the ability to translate
- Array values (positive or negative) to BookKeeping indexes (positive)
- BookKeeping Index positions (positive) to negative or positive values
+---+---+---+---+---+---+---+---+ Type: Integer or Unsigned
| 2 | 2 | 5 | 5 | 8 | 1 | 5 | 3 | (Min, Max) = (1,8)
+---+---+---+---+---+---+---+---+ (distance) = Min - Max + 1 = 8
| | | | \______|__
\__| \___\_________/ \
| | |
+---+---+---+---+---+---+---+---+ Type: Unsigned
| 1 | 2 | 1 | 0 | 3 | 0 | 0 | 1 | BookKeeping (BK) capacity(8)
+---+---+---+---+---+---+---+---+ holding counts from BK[min..max]
min(1) BK['5'] = ^ max(8)
Distance calculation
Therefore, knowing the distance
between min
& max
values is fundamental to the algorithm's logic.
However, integer values can easily cause an overflow when the distance
between min
and max
exceeds the [-127..0]
or [0..127]
ranges
-127 0 +127
|-------------|-------------| Both Min and Max are negative
<-- dist ---> Safe: Dist = (max - min)
-127 0 +127
|-------------|-------------| Min is negative, Max is positive
Unsafe: (max - min) overflows
<-------- dist --------->
-127 0 +127
|-------------|-------------| Both Min & Max are positive
<-- dist --> Safe: Dist = (max - min)
Therefore, when min
and max
have opposite signs we have to convert both to usize
before we calculate the distance
. Therefore the below implementation will calculate the distance correctly for both signed
and unsigned
types.
fn dist_from(&self, min: i8) -> usize {
if *self > min {
(*self as usize).wrapping_sub(min as usize)
} else {
(min as usize).wrapping_sub(*self as usize)
}
}
let len: usize = max.dist_from(min);
Now that we know how to calculate the distance
we can proceed with value-to-index and index-to-value translations.
Value-to-index translation
We know that
Min
maps toBookkeeping[0]
position andMax
maps toBookKeeping[distance]
position- where
Min <= array[..] <= Max
Therefore, the index is found as index = value - Min
which more or less is the distance
from min
, which we already know how to calculate.
As a result the translation is given by the following implementation ...
let idx = value.dist_from(min); // Map array value -> BK index
BookKeeping[idx] += 1; // increment count by 1
Index-to-value translation
This is the reverse effect, where we need to translate the index
from the BookKeeping onto the corresponding array value
, since we know that BookKeeping position [0]
is the min
value wihtin the input array.
For example, if min == 6
then the array's value
at position index
will be given as
- for
index = 0
,array[0] = MIN + 0
- for
index = 1
,array[1] = MIN + 1
- for
index = 2
,array[2] = MIN + 2
- etc
Recall that the max(index) == distance
and distance
- always fits the unsigned numbers value range
- overflows the signed numbers value range as shown below
-127 0 +127
|-------------|-------------| (Min,Max) = (-123,122)
-123 <----- dist -----> 122 distance = 245
min max value = (Min: -123 + index: 245)
^^^^^ ** OVERFLOW **
For example, i8
has i8::MIN
value of -128
, and by adding index
with value 245
will cause an overflow of -11
; this is equivalent to 245 % i8::MIN
.
However, the trick here is that by adding -11
to min
and wrapping around, will yield the desired value
.
Therefore, the steps to translate index/unsigned
to value/signed
are
- Convert
index
toi8
given byindex % i8:MIN
- and do a modular add with
min
Value = (Min + ( index as i8 )) % 128
===== ===== =================== ===
82 = (-123 + (-51 = 205 % -128)) % 128
113 = (-123 + (-20 = 236 % -128)) % 128
122 = (-123 + (-11 = 245 % -128)) % 128
Rust performs then above operation with the following statement and implemented asDistance::add_index()
array[index] = min.wrapping_add( index as i8 );
Final implementation
Hence, by putting all the above together, we have the following implementation for the CountSort
trait
/// Sorts a given array using the Count Sort algorithm.
/// Input array NuType shouldn't exceed u16 to avoid memory issues
/// ```
/// use csx3::sort::count::CountSort;
///
/// let v = &mut [3i8,5,8,1,2,4,6,0];
///
/// v.count_sort();
/// assert_eq!(v, &[0,1,2,3,4,5,6,8]);
/// ```
pub trait CountSort {
fn count_sort(&mut self);
}
// CountSort macro implementation for singed and unsigned types
impl<T> CountSort for [T]
where T: Distance<T> + Copy + Ord + Debug{
fn count_sort(&mut self) {
if self.len() < 2 {
return;
}
// find min and max elements
// so we can construct the boundaries of the counting array
// i.e. if (min,max) = (13232, 13233) then we need only an array with capacity(2)
let (min, max) = min_max(self);
// construct a counting array with length = Max - Min + 1
let len: usize = max.dist_from(min);
// initialise it with zero counts
let mut count = vec![0usize; len + 1];
// and finally measure counts per item
self.iter()
.for_each(|x| {
// construct index offset based on Min value, such as, Min is at [0] position
let idx: usize = x.dist_from(min);
count[idx] += 1;
});
// play back onto the input slice the counts collected with Sum of all counts == slice.len()
let iter = &mut self.iter_mut();
count.into_iter()
.enumerate()
.filter(|(_, x)| *x > 0)
.for_each(|(i, x)| {
// place value at `x` positions
iter.take(x)
// translate index -> value
// given value = Min + index
.for_each(|n| { *n = min.add_index(i ) });
});
}
}
/// Distance calculation between two types that are either both signed or unsigned
/// Returns the distance as unsigned type
pub trait Distance<T> {
fn dist_from(&self, min: T) -> usize;
fn add_index(&self, idx: usize) -> T;
}
/// Macro implementation of Distance trait for all signed types
macro_rules! impl_dist_signed {
( $($x:ty),*) => {
$( impl Distance<$x> for $x {
#[inline]
fn dist_from(&self, min: $x) -> usize {
if *self > min {
(*self as usize).wrapping_sub(min as usize)
} else {
(min as usize).wrapping_sub(*self as usize)
}
}
#[inline]
fn add_index(&self, idx: usize) -> $x { self.wrapping_add(idx as $x) }
} )*
}
}
impl_dist_signed!(i8,i16,i32,isize,u8,u16,u32,usize);