Merge Algorithms

Merging two or more ordered sets

The basic use case for these algorithms is merging two ordered arrays into a single and ordered array. Although simple, it becomes far more complicated when you consider

  • Very large datasets spanning many processing nodes (segmentation, map/reduce, etc)
  • Memory and cpu constraints on embedded systems (in-place, out-of-place)

Benchmarks

The following benchmarks provide an indicative performance comparison between the different merge implementations. The input size used is 5,000 (2 x 2,500) elements.

Out of place merge function
===========================
test bench_merge_iterator          ... bench:      61,250 ns/iter (+/- 5,708)

In place merge functions
========================
test bench_merge_lazy              ... bench:      80,606 ns/iter (+/- 2,367)
test bench_merge_mut               ... bench:      68,282 ns/iter (+/- 8,597)
test bench_merge_mut_adjacent      ... bench:      43,533 ns/iter (+/- 655)

Implementation

The chapters that follow, provide a detailed explanation on how the below implementation works

/// Applies memory efficient in-place merging when two slices are adjacent to each other.
/// ```
/// use csx3::merge::merge_mut_fast;
///
/// let mut input = vec![1, 3, 5, 7, 9, 2, 4, 6, 8, 10];
/// let (s1,s2) = input.split_at_mut(5);
///
/// merge_mut_fast(s1,s2);
/// assert_eq!(input, vec![1,2,3,4,5,6,7,8,9,10]);
/// ```
/// Panics in case the two slices are found not to be adjacent. For safety, always use *ONLY* against slices that have been mutable split from an existing slice
pub fn merge_mut_fast<T>(s1: &mut [T], s2: &mut [T]) -> usize where T: Ord + Debug {
    let ws: &mut [T];

    unsafe {
        ws = from_raw_parts_mut(s1.as_mut_ptr(), s1.len() + s2.len());
        assert!(s2[0] == ws[s1.len()]);
    }

    let (mut p, mut c, mut j, llen, mut inversion ) = (0usize, 0usize, s1.len(), s1.len(), 0usize);
    let mut idx_rfl: Vec<usize> = Vec::from_iter(0..ws.len());
    let len = idx_rfl.len();

    //println!("{ws:?}::{idx_rfl:?}, ({i},{c},{j})");

    unsafe {
        let idxp = idx_rfl.as_mut_ptr();
        let wsp = ws.as_mut_ptr();

        let mut cc; // c' definition
        let mut pp; // p' definition

        loop {
            cc = *idxp.add(c);
            pp = *idxp.add(p);
            match (j < len && j != p, p < len-1 && c < llen -1) {
                (true, _) if (*wsp.add(cc)).cmp(&(*wsp.add(j))) == Ordering::Greater => {
                    inversion += j - p;
                    wsp.add(p).swap( wsp.add(j));
                    //idx_rfl.swap(ii, j);
                    idxp.add(pp).write(j);
                    idxp.add(j).write(pp);
                    j += 1;
                },
                (_, true) => {
                    wsp.add(p).swap(wsp.add(cc));
                    //idx_rfl.swap(pp, c);
                    idxp.add(cc).write(pp);
                    idxp.add(pp).write(cc);
                    c += 1;
                },
                (_,_) => break,
            };
            p += 1;
            //println!("{ws:?}::{idx_rfl:?}, ({i},{c},{j})");
        }
    }
    //println!("Merge Done!");
    inversion
}