Selection Algorithm
if you had an array entry of 4 elements, containing the numbers 10, 8, 2 and 4, and you were looking for the 3rd statistic that would be 8.
The first order statistic is just the minimum element of the array. That's easier to find with a linear scan. The nth order statistic is just the maximum, again easier, easy to find with a linear scan. The middle element is the median.
For all other cases, the selection algorithm returns the answers in O(n) time
Implementation flavours
Randomised approach
Where pivot selected is chosen randomly based on the 75/25 rule
pub trait Select<T> {
fn r_selection(&mut self, nth_min: usize) -> &T;
}
impl<T> Select<T> for [T]
where T: Copy + Ord {
/// Find the nth order statistic within an unordered set with O(n) performance
/// using nth_min as 1 will return the smallest item; 2 the second smallest, etc
/// When function returns, the input array has been rearranged so that ```item == array[ nth order ]```
/// ```
/// use csx3::select::Select;
///
/// let (arr, nth_order) = (&mut [23,43,8,22,15,11], 1usize);
///
/// let ret_val = arr.r_selection(nth_order);
/// assert_eq!(ret_val, &8);
/// assert_eq!(&arr[nth_order-1], &8);
/// ```
fn r_selection(&mut self, nth_min: usize) -> &T
{
// println!("Input: {:?}::{}th", v, order_nth);
if self.len() == 1 {
return &self[0];
}
// pick an index at random based on a uniform distribution
let idx = rand::thread_rng().gen_range(0..(self.len() - 1));
// find out the nth order of this sample
let (left_partition, nth, right_partition) = self.partition_at_idx(idx);
let order = left_partition.len() + 1;
// println!("\tAsked:{}ord Picked:{}th, {:?} {:?}ord {:?}", nth_min, idx, left_partition, order, right_partition);
// is nth order sampled over, equal or above the desired nth_min ?
match nth_min.cmp(&order) {
// we've found the item in nth_min order
Ordering::Equal => nth,
// the nth_min is below the nth found so recurse on the left partition
Ordering::Less =>
left_partition.r_selection(nth_min),
// the nth_min is above the nth found so recurse on the right partition with adjusted order
Ordering::Greater =>
right_partition.r_selection(nth_min - order),
}
}
}
Deterministic approach
Where the pivot selected is always the median of the recursive set provided by the below implementation
The idea is to
- break the array into chunks of 5 elements
- sort them and pick
chunk[3]
as the median - Collect all medians into a new
array
- recurse until you converge to the median
/// Returns a vector of N/5 medians where N = input array length
/// It breaks array into N/5 sub-arrays of length 5 for cheap sorting and picking the median value
///
pub fn medians_of_medians<T>(v:&mut [T]) -> Vec<T>
where T : Copy + Ord + Debug {
// extract median of medians array
// split input slice into n/5 groups of 5
v.chunks_mut(5)
.map(|chunk| {
// sort each group
chunk.mergesort_mut(Merge::merge_mut_adjacent);
// pull the median out
chunk[ chunk.len() >> 1]
})
// return as vector
.collect()
}
/// Finds the median value within an array of N elements
pub fn find_median<T>(v:&mut [T]) -> T
where T : Copy + Ord + Debug {
if v.len() == 1 {
return v[0]
}
let mut medians: Vec<T> = medians_of_medians(v);
find_median(&mut medians)
}