Punycode Converter
Encode and decode Punycode for IDN (Internationalized Domain Names). Convert Unicode domains to xn-- format and back.
Back to all tools on ToolForge
Unicode (IDN)
Punycode (xn--)
About Punycode Converter
This Punycode converter helps you encode and decode Internationalized Domain Names (IDN) according to RFC 3492. Unicode domain names like münchen.de are converted to ASCII-compatible Punycode (xn--mnchen-3ya.de) for DNS compatibility. The tool uses the standard punycode.js library for accurate conversion.
It is useful when working with non-ASCII domain names, debugging IDN issues, preparing domains for registration and DNS configuration, and verifying Punycode encoding for international websites.
Punycode Encoding Algorithm
Punycode converts Unicode to ASCII using a specialized encoding scheme:
Punycode Encoding Process: 1. Separate basic ASCII from non-ASCII characters - ASCII chars: kept as-is - Non-ASCII: encoded separately 2. Encode non-ASCII characters - Uses base-36 encoding (a-z, 0-9) - Variable-length integer representation - Adapts to character frequency 3. Add 'xn--' prefix - Marks label as Punycode-encoded - Required for DNS compatibility Example: "münchen.de" Step 1: Basic = "mnchen.de", Non-ASCII = "ü" Step 2: Encode "ü" → "mnchen-3ya" Step 3: Add prefix → "xn--mnchen-3ya.de" Technical Details: - Base: 36 (26 letters + 10 digits) - Initial bias: 72 - Damp factor: 700 - Threshold pattern: adapts based on position RFC 3492 Specification: https://tools.ietf.org/html/rfc3492
IDN Encoding Examples
| Unicode Domain | Punycode | Language |
|---|---|---|
| münchen.de | xn--mnchen-3ya.de | German |
| 例え.jp | xn--zckzah.jp | Japanese |
| 한국어.kr | xn--3e0b707e.kr | Korean |
| пример.ru | xn--e1afmkfd.ru | Russian |
| مثال.com | xn--mgbh0fb.com | Arabic |
| 例子。中国 | xn--fsqu00a.xn--fiqs8s | Chinese |
| café.fr | xn--caf-dma.fr | French |
| naïve.com | xn--nave-zoa.com | French |
Punycode Structure
Punycode Label Format: xn--[encoded-string]-[checksum] Components: xn-- : ACE prefix (ASCII-Compatible Encoding) encoded : Base-36 encoded Unicode characters checksum : Optional delimiter + bias adjustment Encoding Pattern: [basic-ASCII][hyphen][encoded-non-ASCII] Examples Breakdown: "münchen" → "mnchen-3ya" - "mnchen" : Basic ASCII characters (ü removed) - "-3ya" : Encoded position and value of "ü" "例え" → "zckzah" - No ASCII chars, so all encoded - "zckzah" represents both Japanese characters "café" → "caf-dma" - "caf" : ASCII portion - "dma" : Encoded "é" Decoding Process: 1. Find last hyphen 2. Part before = ASCII chars (insert non-ASCII positions) 3. Part after = encoded non-ASCII values 4. Decode base-36 integers to Unicode code points
IDN Implementation Standards
| Standard | Description | Status |
|---|---|---|
| RFC 3492 | Punycode encoding algorithm | Current Standard |
| RFC 5890 | IDNA definitions and framework | Current Standard |
| RFC 5891 | IDNA protocol (IDNA2008) | Current Standard |
| RFC 5892 | Unicode tables for IDNA | Current Standard |
| RFC 5893 | Right-to-left handling | Current Standard |
| IDNA2003 | Original IDNA implementation | Obsolete |
Code Examples by Language
JavaScript (Browser):
// Using punycode.js library
// https://github.com/bestiejs/punycode.js
// Encode Unicode to Punycode
punycode.toASCII('münchen.de');
// → "xn--mnchen-3ya.de"
// Decode Punycode to Unicode
punycode.toUnicode('xn--mnchen-3ya.de');
// → "münchen.de"
// Encode single label
punycode.encode('münchen');
// → "mnchen-3ya"
// Decode single label
punycode.decode('mnchen-3ya');
// → "münchen"
Node.js (Built-in):
// Built-in punycode module (deprecated but available)
const punycode = require('punycode');
punycode.toASCII('例え.jp'); // → "xn--zckzah.jp"
punycode.toUnicode('xn--zckzah.jp'); // → "例え.jp"
Python:
# Built-in support via idna encoding
domain = "münchen.de"
# Encode to Punycode
punycode = domain.encode('idna').decode('ascii')
# → "xn--mnchen-3ya.de"
# Decode from Punycode
unicode = punycode.encode('ascii').decode('idna')
# → "münchen.de"
# Using idna library (recommended)
# pip install idna
import idna
idna.encode('münchen.de') # → b"xn--mnchen-3ya.de"
idna.decode('xn--mnchen-3ya.de') # → "münchen.de"
PHP:
// Using intl extension
idn_to_ascii('münchen.de');
// → "xn--mnchen-3ya.de"
idn_to_utf8('xn--mnchen-3ya.de');
// → "münchen.de"
// With options (IDNA2008)
idn_to_ascii('münchen.de', 0, INTL_IDNA_VARIANT_2008);
Java:
// Using java.net.IDN
String ascii = IDN.toASCII("münchen.de");
// → "xn--mnchen-3ya.de"
String unicode = IDN.toUnicode("xn--mnchen-3ya.de");
// → "münchen.de"
C#:
// Using IdnMapping class
var idn = new IdnMapping();
string ascii = idn.GetAscii("münchen.de");
// → "xn--mnchen-3ya.de"
string unicode = idn.GetUnicode("xn--mnchen-3ya.de");
// → "münchen.de"
Browser Behavior with IDN
Display Behavior: Modern browsers display IDNs differently based on security: 1. Safe IDNs (single script) - Displayed as Unicode - Example: "münchen.de" shows as "münchen.de" 2. Suspicious IDNs (mixed scripts) - Displayed as Punycode - Example: "раураl.com" (Cyrillic 'а' in "paypal") - Shows as "xn--80ak6aa92e.com" 3. TLD restrictions - Some TLDs force Punycode display - Based on registry policies Security Measures: Homograph Attack Prevention: - Browsers detect visually similar characters - Mixed-script domains show as Punycode - Exception lists for valid mixed domains Example Attack: Legitimate: "paypal.com" Fake: "раураl.com" (Cyrillic а, у, р, а) Browser displays fake as: "xn--80ak6aa92e.com" User Settings: - Chrome: Settings → Privacy → Use secure DNS - Firefox: network.IDN_show_punycode preference - Safari: Automatic security handling
Common Use Cases
| Use Case | Description | Example |
|---|---|---|
| Domain Registration | Register IDN domains with registrar | Verify Punycode before registration |
| DNS Configuration | Configure DNS records for IDN | Use Punycode in zone files |
| SSL Certificate | Request certificates for IDN domains | CSR uses Punycode format |
| Email Configuration | Set up email for IDN domains | MX records with Punycode |
| Security Audit | Check for homograph attacks | Verify suspicious domains |
| International SEO | Optimize IDN sites for search | Hreflang with Unicode URLs |
Best Practices
- Verify encoding: Always verify Punycode output matches expected Unicode input before using in production.
- Test in browsers: Check how target browsers display your IDN to ensure proper user experience.
- Consider homograph attacks: Be aware that similar-looking characters from different scripts can confuse users.
- Use IDNA2008: Prefer IDNA2008 (RFC 589x) over IDNA2003 for newer implementations.
- Handle right-to-left: Arabic and Hebrew domains require additional bidi handling per RFC 5893.
- Document both forms: Keep records of both Unicode and Punycode versions for domain management.
Limitations
- External library: Requires punycode.js library loaded from CDN for accurate conversion.
- IDNA version: Library may use IDNA2003 instead of IDNA2008, causing slight differences.
- No validation: Does not validate if domain is registrable or follows TLD rules.
- Single conversion: Processes one domain at a time; no batch conversion support.
- No TLD lookup: Does not check if specific TLD supports the requested character set.
- Browser differences: Actual browser display may differ from tool output due to security policies.
Frequently Asked Questions
- What is Punycode and why is it used?
- Punycode is an encoding algorithm that converts Unicode strings to ASCII-compatible format. It's used for Internationalized Domain Names (IDN) to represent non-ASCII characters (like ü, 中文,日本語) in the DNS system, which only supports ASCII. Punycode domains start with 'xn--' prefix.
- How does Punycode encoding work?
- Punycode separates basic ASCII characters from non-ASCII characters. ASCII chars are kept as-is, while Unicode characters are encoded using a variable-length integer system based on 36 alphanumeric characters (a-z, 0-9). The 'xn--' prefix indicates Punycode encoding to DNS systems.
- What is the xn-- prefix?
- The 'xn--' prefix is a label separator that identifies Punycode-encoded domain labels. It tells DNS systems and browsers that the following characters are encoded Unicode. For example, 'münchen.de' becomes 'xn--mnchen-3ya.de' where 'xn--' marks the encoded label.
- What are Internationalized Domain Names (IDN)?
- IDNs are domain names containing non-ASCII characters, allowing users to register domains in their native scripts (Chinese, Arabic, Cyrillic, etc.). IDNs are converted to Punycode for DNS resolution while displaying in Unicode form in browsers.
- Are there security concerns with Punycode domains?
- Yes, homograph attacks use visually similar characters from different scripts to create deceptive domains (e.g., using Cyrillic 'а' instead of Latin 'a'). Modern browsers show Punycode instead of Unicode for domains with mixed scripts or suspicious patterns to prevent phishing.
- Which top-level domains support IDN?
- Most major TLDs support IDN including .com, .net, .org, .de, .jp, .cn, .kr, .ru, and many country-code TLDs. However, support varies by registry. Some TLDs have restrictions on which character sets are allowed based on language requirements.