Remove Duplicate Lines
Paste text with one item per line; duplicate lines are removed (first occurrence kept).
Back to all tools on ToolForge
Input
Result
About Remove Duplicate Lines
This tool removes duplicate lines from text using a Set-based deduplication algorithm. It tracks each unique line encountered and keeps only the first occurrence, removing subsequent duplicates. The algorithm operates in O(n) time complexity with O(1) lookup per line using JavaScript's Set data structure.
It is useful for cleaning up email lists, removing duplicate entries from exported data, deduplicating keyword lists for SEO, cleaning log files with repeated entries, preparing unique value lists for programming, and consolidating search results.
Deduplication Algorithm
The tool uses a Set to track seen lines:
Algorithm Steps:
1. Split input text into lines by newline character
2. Create empty Set to track seen lines
3. Create empty array for output
4. For each line:
a. Check if line exists in Set (O(1) operation)
b. If not in Set: add to output and add to Set
c. If in Set: skip (duplicate detected)
5. Join output array with newlines
JavaScript Implementation:
function dedupe(text) {
const lines = text.split(/\r?\n/);
const seen = new Set();
const output = [];
for (const line of lines) {
if (!seen.has(line)) {
seen.add(line);
output.push(line);
}
}
return output.join('\n');
}
Time Complexity: O(n) where n = number of lines
Space Complexity: O(n) for Set storage
Lookup Time: O(1) average case per line
Deduplication Methods Comparison
| Method | Order Preserved | Time Complexity | Use Case |
|---|---|---|---|
| Set-based (this tool) | Yes | O(n) | General purpose, order matters |
| Sort then uniq | No | O(n log n) | When sorted output needed |
| Nested loop comparison | Yes | O(n²) | Small lists, no Set support |
| Hash map counting | Yes | O(n) | When count statistics needed |
Common Use Cases
| Use Case | Description | Example |
|---|---|---|
| Email List Cleanup | Remove duplicate email addresses | [email protected] x3 → x1 |
| Keyword Deduplication | Clean SEO keyword lists | "buy shoes" x5 → x1 |
| Log File Cleaning | Remove repeated log entries | Error messages x100 → x1 |
| Export Data Cleaning | Deduplicate database exports | Customer IDs x multiple → x1 |
| Search Results | Consolidate duplicate search hits | URLs appearing multiple times |
| Programming Lists | Create unique value arrays | Unique tags, categories, IDs |
| Contact List Merge | Merge contact lists without dupes | Phone numbers, names |
Code Examples by Language
JavaScript:
// Using Set (ES6)
const unique = [...new Set(lines)];
// Manual implementation
function dedupe(lines) {
const seen = new Set();
return lines.filter(l => {
if (seen.has(l)) return false;
seen.add(l);
return true;
});
}
Python:
# Using set (order not preserved)
unique = list(set(lines))
# Order-preserving (Python 3.7+)
unique = list(dict.fromkeys(lines))
# Manual implementation
def dedupe(lines):
seen = set()
result = []
for line in lines:
if line not in seen:
seen.add(line)
result.append(line)
return result
Bash/sort:
# Sort and remove duplicates
sort -u file.txt
# Remove duplicates, preserve order
awk '!seen[$0]++' file.txt
# Using sort with uniq
sort file.txt | uniq
PowerShell:
# Get unique lines (order preserved in PS 7+)
Get-Content file.txt | Select-Object -Unique
# Manual deduplication
$seen = @{}
Get-Content file.txt | Where-Object {
if ($seen[$_.ToString()]) { $false }
else { $seen[$_.ToString()] = $true; $true }
}
PHP:
// Using array_unique (preserves keys)
$unique = array_unique($lines);
// Using array_flip (fast, reindexes)
$unique = array_keys(array_flip($lines));
Java:
// Using LinkedHashSet (order-preserving)
Set unique = new LinkedHashSet<>(lines);
// Using Stream API (Java 8+)
List unique = lines.stream()
.distinct()
.collect(Collectors.toList());
Case Sensitivity and Comparison
Comparison Behavior:
Default (Case-Sensitive):
Input:
"Apple
apple
APPLE
Apple"
Output:
"Apple
apple
APPLE"
(All three kept, last "Apple" removed)
Case-Insensitive Deduplication:
Input:
"Apple
apple
APPLE
Apple"
Output (lowercase):
"Apple"
Or output (first occurrence):
"Apple"
Implementation:
// Case-insensitive in JavaScript
const seen = new Set();
lines.filter(l => {
const lower = l.toLowerCase();
if (seen.has(lower)) return false;
seen.add(lower);
return true;
});
Trimming Before Comparison:
Input:
"apple
apple
apple"
With trim:
"apple"
Without trim:
"apple
apple
apple"
(All kept due to whitespace differences)
Deduplication Examples
Example 1: Simple Duplicates
Input:
"apple
banana
apple
cherry
banana"
Output:
"apple
banana
cherry"
Example 2: Consecutive Duplicates
Input:
"line1
line1
line1
line2
line2
line3"
Output:
"line1
line2
line3"
Example 3: Email List
Input:
"[email protected]
[email protected]
[email protected]
[email protected]
[email protected]"
Output:
"[email protected]
[email protected]
[email protected]"
Example 4: Mixed Case
Input:
"Hello
hello
HELLO
World"
Output (case-sensitive):
"Hello
hello
HELLO
World"
Example 5: With Empty Lines
Input:
"apple
banana
apple
cherry"
Output:
"apple
banana
cherry"
(Empty line treated as unique value)
Example 6: Numeric IDs
Input:
"1001
1002
1001
1003
1002
1001"
Output:
"1001
1002
1003"
Example 7: URLs
Input:
"https://example.com/page1
https://example.com/page2
https://example.com/page1
https://other.com/"
Output:
"https://example.com/page1
https://example.com/page2
https://other.com/"
Example 8: Log Entries
Input:
"ERROR: Connection failed
INFO: Retrying
ERROR: Connection failed
INFO: Success
ERROR: Connection failed"
Output:
"ERROR: Connection failed
INFO: Retrying
INFO: Success"
Advanced Deduplication Options
Option 1: Keep Last Occurrence
Instead of first, keep last duplicate:
function dedupeLast(lines) {
const seen = new Set();
const result = [];
// Iterate backwards
for (let i = lines.length - 1; i >= 0; i--) {
if (!seen.has(lines[i])) {
seen.add(lines[i]);
result.unshift(lines[i]);
}
}
return result;
}
Option 2: Count Occurrences
Track how many times each line appears:
function countOccurrences(lines) {
const counts = new Map();
for (const line of lines) {
counts.set(line, (counts.get(line) || 0) + 1);
}
return counts;
}
Option 3: Remove Lines Appearing N+ Times
Keep only lines that appear exactly once:
function keepUniqueOnly(lines) {
const counts = new Map();
for (const line of lines) {
counts.set(line, (counts.get(line) || 0) + 1);
}
return lines.filter(l => counts.get(l) === 1);
}
Option 4: Fuzzy Matching
Remove near-duplicates using similarity:
(Requires external library like string-similarity)
Best Practices
- Backup original data: Keep a copy of input before deduplication; process is not reversible.
- Consider case sensitivity: Decide if "Hello" and "hello" should be treated as same.
- Check for whitespace: Leading/trailing spaces can prevent duplicate detection; trim if needed.
- Review results: Verify deduplicated output makes sense for your use case.
- Combine with sorting: For organized output, sort lines after deduplication.
- Handle empty lines: Decide if empty lines should be kept or removed separately.
Limitations
- Exact match only: Does not detect near-duplicates or fuzzy matches.
- Case-sensitive: "Hello" and "hello" are treated as different lines.
- Whitespace-sensitive: Lines with different spacing are not detected as duplicates.
- Memory limits: Very large files may cause browser performance issues.
- No context awareness: Cannot determine if duplicates are intentional or errors.
- Line-by-line only: Does not detect duplicates spanning multiple lines.
Frequently Asked Questions
- How does duplicate line removal work?
- The tool uses a Set data structure to track seen lines. It iterates through each line, checking if it exists in the Set. If not seen, the line is kept and added to the Set. If already in the Set, it's skipped. This ensures O(1) lookup time and preserves the first occurrence of each line.
- Does this tool preserve the original order of lines?
- Yes, the tool preserves the original order of lines. It keeps the first occurrence of each unique line and removes subsequent duplicates. For example, if 'apple' appears at line 1 and line 5, only the line 1 occurrence is kept.
- Is the comparison case-sensitive?
- Yes, the comparison is case-sensitive by default. 'Hello' and 'hello' are treated as different lines. For case-insensitive deduplication, convert text to lowercase first using a text transformation tool or text editor function.
- What are common use cases for removing duplicate lines?
- Common uses include: cleaning up email lists, removing duplicate entries from exported data, deduplicating keyword lists for SEO, cleaning log files with repeated entries, preparing unique value lists for programming, and consolidating search results.
- How does this compare to sorting and removing duplicates?
- Sorting first groups duplicates together but changes original order. This tool preserves order but doesn't group duplicates visually. For unique sorted output, use sort-lines tool first, then remove duplicates. Order preservation is key difference.
- Can this tool handle large lists?
- The tool uses JavaScript Set which has O(1) lookup time, making it efficient for large lists. However, browser memory limits apply. For very large files (100,000+ lines), consider using command-line tools like sort -u or dedicated deduplication software.