Merge Algorithms
Merging two or more ordered sets
The basic use case for these algorithms is merging two ordered arrays into a single and ordered array. Although simple, it becomes far more complicated when you consider
- Very large datasets spanning many processing nodes (segmentation, map/reduce, etc)
- Memory and cpu constraints on embedded systems (in-place, out-of-place)
Benchmarks
The following benchmarks provide an indicative performance comparison between the different merge implementations. The input size used is 5,000 (2 x 2,500) elements.
Out of place merge function
===========================
test bench_merge_iterator ... bench: 61,250 ns/iter (+/- 5,708)
In place merge functions
========================
test bench_merge_lazy ... bench: 80,606 ns/iter (+/- 2,367)
test bench_merge_mut ... bench: 68,282 ns/iter (+/- 8,597)
test bench_merge_mut_adjacent ... bench: 43,533 ns/iter (+/- 655)
Implementation
The chapters that follow, provide a detailed explanation on how the below implementation works
/// Applies memory efficient in-place merging when two slices are adjacent to each other.
/// ```
/// use csx3::merge::merge_mut_fast;
///
/// let mut input = vec![1, 3, 5, 7, 9, 2, 4, 6, 8, 10];
/// let (s1,s2) = input.split_at_mut(5);
///
/// merge_mut_fast(s1,s2);
/// assert_eq!(input, vec![1,2,3,4,5,6,7,8,9,10]);
/// ```
/// Panics in case the two slices are found not to be adjacent. For safety, always use *ONLY* against slices that have been mutable split from an existing slice
pub fn merge_mut_fast<T>(s1: &mut [T], s2: &mut [T]) -> usize where T: Ord + Debug {
let ws: &mut [T];
unsafe {
ws = from_raw_parts_mut(s1.as_mut_ptr(), s1.len() + s2.len());
assert!(s2[0] == ws[s1.len()]);
}
let (mut p, mut c, mut j, llen, mut inversion ) = (0usize, 0usize, s1.len(), s1.len(), 0usize);
let mut idx_rfl: Vec<usize> = Vec::from_iter(0..ws.len());
let len = idx_rfl.len();
//println!("{ws:?}::{idx_rfl:?}, ({i},{c},{j})");
unsafe {
let idxp = idx_rfl.as_mut_ptr();
let wsp = ws.as_mut_ptr();
let mut cc; // c' definition
let mut pp; // p' definition
loop {
cc = *idxp.add(c);
pp = *idxp.add(p);
match (j < len && j != p, p < len-1 && c < llen -1) {
(true, _) if (*wsp.add(cc)).cmp(&(*wsp.add(j))) == Ordering::Greater => {
inversion += j - p;
wsp.add(p).swap( wsp.add(j));
//idx_rfl.swap(ii, j);
idxp.add(pp).write(j);
idxp.add(j).write(pp);
j += 1;
},
(_, true) => {
wsp.add(p).swap(wsp.add(cc));
//idx_rfl.swap(pp, c);
idxp.add(cc).write(pp);
idxp.add(pp).write(cc);
c += 1;
},
(_,_) => break,
};
p += 1;
//println!("{ws:?}::{idx_rfl:?}, ({i},{c},{j})");
}
}
//println!("Merge Done!");
inversion
}