Day 6: Solution Explanation
Approach
Day 6's problem involves finding the first occurrence of a sequence of unique characters in a datastream. The approach is to:
- Process the input datastream as a sequence of bytes
- Examine consecutive windows of characters (of length 4 for part 1, 14 for part 2)
- Check each window for duplicate characters
- Find the position of the first window that contains no duplicates
The solution uses Rust's trait system to create reusable functionality for checking duplicates and finding marker positions.
Implementation Details
Detecting Duplicates
The first key component is a trait for checking whether a slice contains any duplicate elements:
#![allow(unused)] fn main() { trait Duplicate { fn has_duplicates(&self) -> bool; } impl<T> Duplicate for [T] where T: Debug + Copy + PartialEq + Ord { fn has_duplicates(&self) -> bool { let mut tmp = self.to_vec(); tmp.sort(); tmp.windows(2).any(|a| a[0]==a[1]) } } }
This implementation:
- Creates a copy of the slice
- Sorts the copy (bringing identical elements next to each other)
- Checks adjacent pairs for equality using
windows(2)
The trait is implemented generically for any slice type [T]
where T
supports debugging, copying, equality comparison, and ordering.
Finding Marker Positions
The second key component is a trait for finding the position of a marker in a datastream:
#![allow(unused)] fn main() { trait Signaling { fn marker_position(&self, len:usize) -> usize; } impl<T> Signaling for [T] where T : Debug + Copy + PartialEq + Ord { fn marker_position(&self, len: usize) -> usize { self.windows(len) .enumerate() .skip_while(|&(_,stm)| stm.has_duplicates() ) .next() .map(|(i,_)| i + len) .unwrap_or_else(|| panic!("marker_position(): Ops!")) } } }
This implementation:
- Creates sliding windows of the specified length using
windows(len)
- Pairs each window with its index using
enumerate()
- Skips windows that contain duplicates using
skip_while
- Takes the first window that has no duplicates
- Returns the position after this window (index + window length)
Main Solution
With these traits defined, the main solution becomes remarkably simple:
fn main() { let data = std::fs::read_to_string("src/bin/day6_input.txt").expect(""); let out = data.bytes().collect::<Vec<_>>(); println!("Marker Length @4 = {}", out.marker_position(4)); println!("Marker Length @14 = {}", out.marker_position(14)); }
The solution reads the input file, converts it to a vector of bytes, and then calls marker_position
with the appropriate lengths for part 1 (4) and part 2 (14).
Algorithm Analysis
Time Complexity
The time complexity of this solution depends on the length of the input (n
) and the marker length (m
):
- Checking for duplicates in a window takes O(m log m) time due to the sorting operation
- In the worst case, we check every window in the input, giving us O(n) windows
- Overall time complexity: O(n * m log m)
For this problem, m
is small (4 or 14), so the logarithmic factor isn't significant, making the effective complexity close to O(n).
Space Complexity
The space complexity is O(n) to store the input as a vector of bytes, plus O(m) temporary storage for each duplicate check.
Alternative Approaches
Using a HashSet for Duplicate Detection
A common alternative approach would be to use a HashSet
to check for duplicates:
#![allow(unused)] fn main() { fn has_unique_chars(window: &[u8]) -> bool { let mut set = HashSet::new(); window.iter().all(|&c| set.insert(c)) } }
This would have O(m) time complexity for checking duplicates instead of O(m log m), but at the cost of using HashSet
which has more overhead than simple sorting for small datasets.
Using Frequency Counting
Another approach would be to count the frequency of each character:
#![allow(unused)] fn main() { fn has_unique_chars(window: &[u8]) -> bool { let mut counts = [0; 256]; // For ASCII for &c in window { counts[c as usize] += 1; if counts[c as usize] > 1 { return false; } } true } }
This has O(m) time complexity and uses a fixed amount of space, but is limited to ASCII or other bounded character sets.
Using a Bit Set
For even more efficiency, a bit set could be used for the specific case of lowercase ASCII characters:
#![allow(unused)] fn main() { fn has_unique_chars(window: &[u8]) -> bool { let mut bits = 0u32; for &c in window { let mask = 1 << (c - b'a'); if (bits & mask) != 0 { return false; } bits |= mask; } true } }
This has O(m) time complexity and uses only a single integer for storage, but is limited to a single case of character set.
Conclusion
The solution demonstrates the power of Rust's traits for creating reusable, generic functionality. By separating the concerns of duplicate detection and marker finding into traits, the code becomes more modular and expressive. The generic implementation allows the solution to work with any type of element, not just characters, making it more versatile than specialized approaches.