Email Extractor

Scan text and extract all email addresses using regex pattern matching with automatic deduplication.

Back to all tools on ToolForge

More in Text Tools

Input Text



Extracted Emails

About Email Extractor

This email extractor uses regular expressions to scan text and identify email addresses matching standard format patterns. Results are automatically deduplicated, producing a clean list of unique email addresses for analysis, migration, or compliance workflows.

Email Regex Pattern Explained

The extraction pattern balances RFC compliance with practical coverage:

Pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/gi

Component breakdown:
┌─────────────────┬──────────────────────────────────────┐
│ [a-zA-Z0-9._%+-]+ │ Local part (before @)              │
│                 │ - Letters (a-z, A-Z)                 │
│                 │ - Digits (0-9)                       │
│                 │ - Dot, underscore, percent, plus, hyphen │
│ @               │ At symbol (required separator)       │
│ [a-zA-Z0-9.-]+  │ Domain name                          │
│                 │ - Letters, digits, dots, hyphens     │
│ \.              │ Literal dot before TLD               │
│ [a-zA-Z]{2,}    │ Top-level domain (minimum 2 letters) │
└─────────────────┴──────────────────────────────────────┘

Flags:
  g = global (find all matches, not just first)
  i = case insensitive (USER@ = user@)

Email Format Compliance Matrix

Format Type Example RFC 5322 This Tool
Simple ASCII [email protected] ✓ Valid ✓ Matched
With dots [email protected] ✓ Valid ✓ Matched
Plus tagging [email protected] ✓ Valid ✓ Matched
Hyphenated [email protected] ✓ Valid ✓ Matched
Subdomain [email protected] ✓ Valid ✓ Matched
Country code TLD [email protected] ✓ Valid ✓ Matched
Quoted string "john.doe"@example.com ✓ Valid ✗ Not matched
International (UTF-8) 用户@例子。广告 ✓ Valid (RFC 6531) ✗ Not matched
Obfuscated user [at] domain.com ✗ Invalid ✗ Not matched

Extraction Algorithm

JavaScript Email Extraction Implementation:

function extractEmails(text) {
  // Step 1: Define regex pattern
  const pattern = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/gi;

  // Step 2: Find all matches (returns array or null)
  const matches = text.match(pattern) || [];

  // Step 3: Deduplicate using Set
  const unique = Array.from(new Set(matches));

  // Step 4: Return results
  return unique;
}

// Usage example:
const text = "Contact [email protected] or [email protected]";
const emails = extractEmails(text);
console.log(emails); // ["[email protected]", "[email protected]"]

Common Use Cases

Email Extraction Example

Input Text:
"Contact our team at [email protected] for assistance.
 You can also reach [email protected] or [email protected].
 For press inquiries, email [email protected] or [email protected] again.
 Alternative: admin subdomain at [email protected]"

Extraction Process:
  Match 1: [email protected]
  Match 2: [email protected]
  Match 3: [email protected]
  Match 4: [email protected]
  Match 5: [email protected] (duplicate)
  Match 6: [email protected]

Output (deduplicated, one per line):
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Total: 5 unique emails found

Extraction Limitations

Email Validation vs Extraction

Validation Level What It Checks This Tool
Syntax validation Matches email format pattern ✓ Yes
DNS validation Checks domain has DNS records ✗ No
MX record check Confirms domain accepts email ✗ No
SMTP verification Connects to mail server, checks mailbox ✗ No
Mailbox verification Sends confirmation email ✗ No

Deduplication with Set

JavaScript's Set data structure provides efficient O(n) deduplication:

// Set automatically removes duplicates
const emails = [
  "[email protected]",
  "[email protected]",
  "[email protected]",  // duplicate
  "[email protected]"
];

const uniqueSet = new Set(emails);
// Set(3) { "[email protected]", "[email protected]", "[email protected]" }

const uniqueArray = Array.from(uniqueSet);
// ["[email protected]", "[email protected]", "[email protected]"]

// Note: Set uses SameValueZero comparison
// "[email protected]" and "[email protected]" are different values
// For case-insensitive dedup, convert to lowercase first:
const lowercaseUnique = Array.from(
  new Set(emails.map(e => e.toLowerCase()))
);

How to Extract Emails from Text

  1. Paste text: Enter or paste the text containing email addresses.
  2. Click Extract: The tool scans for all email patterns using regex.
  3. Review results: Unique emails appear in the output box, one per line.
  4. Copy output: Click "Copy Result" to use the email list elsewhere.

Tips

Frequently Asked Questions

How does regex-based email extraction work?
Email extraction uses regular expressions to match the standard email format defined in RFC 5322: local-part@domain. The regex pattern scans text for sequences matching allowed characters before @, a domain name with valid characters, and a TLD of 2+ letters. This tool finds all matches and removes duplicates using Set data structure.
What is the RFC 5322 email standard?
RFC 5322 defines the Internet Message Format, including email address syntax. The local part (before @) can contain alphanumeric characters and special characters like ._%+- without spaces. The domain part (after @) must be a valid domain name with at least one dot. Full RFC compliance allows quoted strings and escaped characters, but most tools use a simplified pattern for practical extraction.
Why doesn't this extract all valid email formats?
Full RFC 5322 regex is extremely complex (hundreds of characters) because it allows edge cases like quoted strings ("john.doe"@example.com), comments, and escaped characters. This tool uses a practical pattern that matches 99% of real-world emails while remaining readable and fast. Exotic formats are rare outside of test suites.
Are extracted emails validated for existence?
No. This tool only validates format, not deliverability. It does not check DNS MX records, send verification emails, or confirm mailboxes exist. Format validation ensures the email looks correct; deliverability requires separate verification services that perform SMTP handshakes or send confirmation messages.
How does deduplication work?
The tool uses JavaScript's Set data structure to automatically remove duplicates. Set stores only unique values—adding an existing value has no effect. This is O(n) time complexity and handles case-sensitive comparison ([email protected] and [email protected] are treated as different emails).
What about internationalized email addresses?
This tool uses ASCII-only regex patterns. Internationalized Email Addresses (RFC 6531) allow UTF-8 characters in local parts and domains (用户@例子。广告). These require Unicode-aware regex and are not yet widely supported. For most use cases, ASCII emails cover the vast majority of addresses in circulation.