- How does hash-based duplicate detection work?
- The tool computes a cryptographic hash (SHA256) of each file's content. Files with identical hashes are duplicates — the hash is deterministic and collision probability is astronomically low. This is faster than byte-by-byte comparison and works across any file type.
- Can two different files have the same hash?
- In theory, yes (hash collision). For SHA256, the probability is approximately 1 in 10^77 for any two specific files — effectively impossible in practice. MD5 collisions are theoretically possible but would require deliberate crafting, not accidental duplication.
- Does it find duplicates with different filenames?
- Yes — hashing compares content, not filename. Two files named photo.jpg and IMG_0001.jpg with identical content are flagged as duplicates. Filename similarity is irrelevant to content-based deduplication.