Duplicate Line Remover — Clean Text Lists Instantly
Paste any list of lines and this tool removes every duplicate, keeping only unique entries. Options let you control case sensitivity, trim leading/trailing whitespace before comparing, and choose whether to keep the first or last occurrence of each duplicate group.
Paste text above to remove duplicate lines.
How it works
When duplicate lines actually matter
Duplicate lines are innocuous in poetry, but they create real problems in structured data. In a CSV file, a duplicated row silently inflates record counts, skews aggregates, and can cause primary-key violations when the file is imported into a database. In server log files, repeated identical log entries from a crashing process can fill a disk within minutes. In configuration files, duplicate keys are silently overwritten — or worse, cause parse errors — depending on the parser.
Code repositories also suffer from accidental line duplication: copy-paste errors in dependency lists (requirements.txt, package.json), repeated import statements, or duplicate entries in .gitignore. Automated CI checks often catch these, but a quick paste-and-deduplicate before committing is faster than debugging a pipeline failure.
Blank lines deserve special attention. Two consecutive blank lines are technically duplicates, but many text formats (Markdown, Python source, email bodies) use double blank lines as intentional separators. This tool's 'trim whitespace' option normalizes lines that appear blank due to stray spaces, without destroying intentional empty lines — the result depends on your specific content, so review the output.
Case sensitivity: when it matters and when it doesn't
By default this tool compares lines case-insensitively, so 'Apple', 'apple', and 'APPLE' all count as the same line and only the first is kept. This is the right setting for human-readable lists like tag clouds, keyword lists, city names, and email addresses where you want to collapse variants of the same thing.
Case-sensitive mode is essential for code. Python imports 'os' and 'OS' are technically different (though 'OS' would fail at runtime on most systems), and variable names like 'Result' and 'result' are distinct identifiers. SQL keywords are case-insensitive by convention but table names are case-sensitive on Linux file systems. When deduplicating code snippets, configuration keys, or paths, always enable case-sensitive mode.
A practical middle ground is to run the tool twice: once case-insensitively to spot near-duplicates for review, and once case-sensitively for the authoritative deduplication. This two-pass approach catches typos and inconsistent capitalisation that automated dedup alone would miss.
How blank-line and whitespace handling works
Trailing spaces are invisible in most text editors and copy-paste operations, yet they make lines compare as unequal. 'apple ' (two trailing spaces) and 'apple' are different strings, so without trimming enabled you would keep both — leaving a subtle duplicate in the output. The 'Trim whitespace' option strips leading and trailing spaces from each line before comparing, without modifying the content of the output lines (the trimmed text is written to the result).
Leading whitespace in indented code or outline lists is meaningful: trimming would destroy the indentation hierarchy. For code or structured outlines, disable trimming and use case-sensitive mode to preserve exact line content. For flat text lists such as keywords, city names, or URL paths, trimming is almost always the right choice.
The 'keep first vs keep last' toggle matters most when your list has metadata attached to positions. For example, a log of events where later entries represent more recent states — in that scenario, keeping the last occurrence (most recent record for each key) is semantically correct, while keeping the first would preserve stale data.
Frequently asked questions
›Does the tool preserve line order?
Yes. Lines are not sorted — only duplicates are removed. The relative order of surviving lines is identical to the input order.
›What counts as a duplicate?
Two lines are duplicates if they are identical after applying your chosen options (case folding and/or whitespace trimming). Only the text content is compared; line numbers and positions are not considered.
›Will blank lines be removed?
Blank lines are only removed if they are duplicates of each other. If your text has three consecutive blank lines and you have 'Keep first' enabled, only the first blank line survives. If you want all blank lines removed, use a text sorter with the 'remove empty lines' option instead.
›What does 'Keep last occurrence' do?
When 'Keep first' is unchecked, the tool keeps the last occurrence of each duplicate group instead of the first. The result still appears in the original document order — only the surviving instance changes.
›Can I use this to deduplicate email lists?
Yes. Paste one email address per line. Use case-insensitive mode (the default) since email addresses are case-insensitive by specification. Trim whitespace is also recommended for copy-pasted email data.
›Is there a line count limit?
There is no enforced limit. The tool processes everything in your browser's memory. Practically, modern browsers handle hundreds of thousands of lines without noticeable lag.
›Does my data leave my browser?
No. All processing happens entirely in JavaScript on your device. Nothing is uploaded to any server.
›How do I deduplicate a CSV by a specific column?
This tool works on whole lines. To deduplicate by a single column, first sort or extract that column into a single-column list, deduplicate, then rejoin. For large datasets, a dedicated tool like Python's pandas or a spreadsheet UNIQUE function is more precise.
Related tools
Last updated: