Collapse duplicate lines, show occurrence counts. Part of the DevTools Surf developer suite. Browse more tools in the Text / String collection.
Use Cases
Data engineers removing duplicate entries from ETL pipeline output
DevOps teams deduplicating repeated log messages before analysis
Developers cleaning up duplicated imports or dependency lists in config files
QA testers comparing test output files for unexpected duplicate results
Tips
Paste text with repeated lines to see unique entries and occurrence counts
Use it to clean up duplicated entries in CSV or config files
Check the count column to identify the most frequently repeated lines
Fun Facts
The Unix 'uniq' command, which removes duplicate adjacent lines, was part of the original Unix Version 3 (1973) — line deduplication is a 50-year-old problem.
The 'sort | uniq' pipeline is one of the most common Unix command combinations. In 2019, GNU coreutils added 'sort -u' as a combined shortcut.
Hash-based deduplication (using a Set or HashMap) runs in O(n) time, while sort-based dedup runs in O(n log n) — but the hash approach uses more memory.
FAQ
How is this different from Line Deduper?
Line Dedupe shows occurrence counts per duplicate line — useful when you're triaging which duplicates matter. Line Deduper just removes duplicates without counts.
Can I filter by count?
Yes — show only lines that appear more than N times. Useful for finding the most-duplicated entries in a log.
Are counts case-sensitive?
Same as Line Deduper — toggle 'case-insensitive' to group variations together. Counts reflect the grouping.
Can it work as a frequency histogram?
Yes — sort the output by count descending and you have a frequency distribution. Paired with the Word Frequency tool, this covers both whole-line and word-level analysis.