Selection Algorithm

if you had an array entry of 4 elements, containing the numbers 10, 8, 2 and 4, and you were looking for the 3rd statistic that would be 8.

The first order statistic is just the minimum element of the array. That's easier to find with a linear scan. The nth order statistic is just the maximum, again easier, easy to find with a linear scan. The middle element is the median.

For all other cases, the selection algorithm returns the answers in O(n) time

Implementation flavours

Randomised approach

Where pivot selected is chosen randomly based on the 75/25 rule

pub trait Select<T> {
    fn r_selection(&mut self, nth_min: usize) -> &T;
}

impl<T> Select<T> for [T]
    where T: Copy + Ord {
    /// Find the nth order statistic within an unordered set with O(n) performance
    /// using nth_min as 1 will return the smallest item; 2 the second smallest, etc
    /// When function returns, the input array has been rearranged so that ```item == array[ nth order ]```
    /// ```
    /// use csx3::select::Select;
    ///
    /// let (arr, nth_order) = (&mut [23,43,8,22,15,11], 1usize);
    ///
    /// let ret_val = arr.r_selection(nth_order);
    /// assert_eq!(ret_val, &8);
    /// assert_eq!(&arr[nth_order-1], &8);
    /// ```
    fn r_selection(&mut self, nth_min: usize) -> &T
    {

        // println!("Input: {:?}::{}th", v, order_nth);
        if self.len() == 1 {
            return &self[0];
        }

        // pick an index at random based on a uniform distribution
        let idx = rand::thread_rng().gen_range(0..(self.len() - 1));
        // find out the nth order of this sample
        let (left_partition, nth, right_partition) = self.partition_at_idx(idx);

        let order = left_partition.len() + 1;
        // println!("\tAsked:{}ord Picked:{}th, {:?} {:?}ord {:?}", nth_min, idx, left_partition, order, right_partition);

        // is nth order sampled over, equal or above the desired nth_min ?
        match nth_min.cmp(&order) {
            // we've found the item in nth_min order
            Ordering::Equal => nth,
            // the nth_min is below the nth found so recurse on the left partition
            Ordering::Less =>
                left_partition.r_selection(nth_min),
            // the nth_min is above the nth found so recurse on the right partition with adjusted order
            Ordering::Greater =>
                right_partition.r_selection(nth_min - order),
        }
    }
}

Deterministic approach

Where the pivot selected is always the median of the recursive set provided by the below implementation

The idea is to

  • break the array into chunks of 5 elements
  • sort them and pick chunk[3] as the median
  • Collect all medians into a new array
  • recurse until you converge to the median
/// Returns a vector of N/5 medians where N = input array length
/// It breaks array into N/5 sub-arrays of length 5 for cheap sorting and picking the median value
///
pub fn medians_of_medians<T>(v:&mut [T]) -> Vec<T>
    where T : Copy + Ord + Debug {

    // extract median of medians array
    // split input slice into n/5 groups of 5
    v.chunks_mut(5)
        .map(|chunk| {
            // sort each group
            chunk.mergesort_mut(Merge::merge_mut_adjacent);
            // pull the median out
            chunk[ chunk.len() >> 1]
        })
        // return as vector
        .collect()
}

/// Finds the median value within an array of N elements
pub fn find_median<T>(v:&mut [T]) -> T
    where T : Copy + Ord + Debug {

    if v.len() == 1 {
        return v[0]
    }
    let mut medians: Vec<T> = medians_of_medians(v);
    find_median(&mut medians)
}