Punycode Converter

Encode and decode Punycode for IDN (Internationalized Domain Names). Convert Unicode domains to xn-- format and back.

Back to all tools on ToolForge

More in Encoding & Decoding

Unicode (IDN)



Punycode (xn--)

About Punycode Converter

This Punycode converter helps you encode and decode Internationalized Domain Names (IDN) according to RFC 3492. Unicode domain names like münchen.de are converted to ASCII-compatible Punycode (xn--mnchen-3ya.de) for DNS compatibility. The tool uses the standard punycode.js library for accurate conversion.

It is useful when working with non-ASCII domain names, debugging IDN issues, preparing domains for registration and DNS configuration, and verifying Punycode encoding for international websites.

Punycode Encoding Algorithm

Punycode converts Unicode to ASCII using a specialized encoding scheme:

Punycode Encoding Process:

1. Separate basic ASCII from non-ASCII characters
   - ASCII chars: kept as-is
   - Non-ASCII: encoded separately

2. Encode non-ASCII characters
   - Uses base-36 encoding (a-z, 0-9)
   - Variable-length integer representation
   - Adapts to character frequency

3. Add 'xn--' prefix
   - Marks label as Punycode-encoded
   - Required for DNS compatibility

Example: "münchen.de"
  Step 1: Basic = "mnchen.de", Non-ASCII = "ü"
  Step 2: Encode "ü" → "mnchen-3ya"
  Step 3: Add prefix → "xn--mnchen-3ya.de"

Technical Details:
  - Base: 36 (26 letters + 10 digits)
  - Initial bias: 72
  - Damp factor: 700
  - Threshold pattern: adapts based on position

RFC 3492 Specification:
  https://tools.ietf.org/html/rfc3492

IDN Encoding Examples

Unicode Domain Punycode Language
münchen.de xn--mnchen-3ya.de German
例え.jp xn--zckzah.jp Japanese
한국어.kr xn--3e0b707e.kr Korean
пример.ru xn--e1afmkfd.ru Russian
مثال.com xn--mgbh0fb.com Arabic
例子。中国 xn--fsqu00a.xn--fiqs8s Chinese
café.fr xn--caf-dma.fr French
naïve.com xn--nave-zoa.com French

Punycode Structure

Punycode Label Format:

  xn--[encoded-string]-[checksum]

Components:
  xn--       : ACE prefix (ASCII-Compatible Encoding)
  encoded    : Base-36 encoded Unicode characters
  checksum   : Optional delimiter + bias adjustment

Encoding Pattern:
  [basic-ASCII][hyphen][encoded-non-ASCII]

Examples Breakdown:

  "münchen" → "mnchen-3ya"
  - "mnchen" : Basic ASCII characters (ü removed)
  - "-3ya"   : Encoded position and value of "ü"

  "例え" → "zckzah"
  - No ASCII chars, so all encoded
  - "zckzah" represents both Japanese characters

  "café" → "caf-dma"
  - "caf" : ASCII portion
  - "dma" : Encoded "é"

Decoding Process:
  1. Find last hyphen
  2. Part before = ASCII chars (insert non-ASCII positions)
  3. Part after = encoded non-ASCII values
  4. Decode base-36 integers to Unicode code points

IDN Implementation Standards

Standard Description Status
RFC 3492 Punycode encoding algorithm Current Standard
RFC 5890 IDNA definitions and framework Current Standard
RFC 5891 IDNA protocol (IDNA2008) Current Standard
RFC 5892 Unicode tables for IDNA Current Standard
RFC 5893 Right-to-left handling Current Standard
IDNA2003 Original IDNA implementation Obsolete

Code Examples by Language

JavaScript (Browser):
  // Using punycode.js library
  // https://github.com/bestiejs/punycode.js

  // Encode Unicode to Punycode
  punycode.toASCII('münchen.de');
  // → "xn--mnchen-3ya.de"

  // Decode Punycode to Unicode
  punycode.toUnicode('xn--mnchen-3ya.de');
  // → "münchen.de"

  // Encode single label
  punycode.encode('münchen');
  // → "mnchen-3ya"

  // Decode single label
  punycode.decode('mnchen-3ya');
  // → "münchen"

Node.js (Built-in):
  // Built-in punycode module (deprecated but available)
  const punycode = require('punycode');

  punycode.toASCII('例え.jp');  // → "xn--zckzah.jp"
  punycode.toUnicode('xn--zckzah.jp');  // → "例え.jp"

Python:
  # Built-in support via idna encoding
  domain = "münchen.de"

  # Encode to Punycode
  punycode = domain.encode('idna').decode('ascii')
  # → "xn--mnchen-3ya.de"

  # Decode from Punycode
  unicode = punycode.encode('ascii').decode('idna')
  # → "münchen.de"

  # Using idna library (recommended)
  # pip install idna
  import idna
  idna.encode('münchen.de')  # → b"xn--mnchen-3ya.de"
  idna.decode('xn--mnchen-3ya.de')  # → "münchen.de"

PHP:
  // Using intl extension
  idn_to_ascii('münchen.de');
  // → "xn--mnchen-3ya.de"

  idn_to_utf8('xn--mnchen-3ya.de');
  // → "münchen.de"

  // With options (IDNA2008)
  idn_to_ascii('münchen.de', 0, INTL_IDNA_VARIANT_2008);

Java:
  // Using java.net.IDN
  String ascii = IDN.toASCII("münchen.de");
  // → "xn--mnchen-3ya.de"

  String unicode = IDN.toUnicode("xn--mnchen-3ya.de");
  // → "münchen.de"

C#:
  // Using IdnMapping class
  var idn = new IdnMapping();
  string ascii = idn.GetAscii("münchen.de");
  // → "xn--mnchen-3ya.de"

  string unicode = idn.GetUnicode("xn--mnchen-3ya.de");
  // → "münchen.de"

Browser Behavior with IDN

Display Behavior:

Modern browsers display IDNs differently based on security:

1. Safe IDNs (single script)
   - Displayed as Unicode
   - Example: "münchen.de" shows as "münchen.de"

2. Suspicious IDNs (mixed scripts)
   - Displayed as Punycode
   - Example: "раураl.com" (Cyrillic 'а' in "paypal")
   - Shows as "xn--80ak6aa92e.com"

3. TLD restrictions
   - Some TLDs force Punycode display
   - Based on registry policies

Security Measures:

Homograph Attack Prevention:
  - Browsers detect visually similar characters
  - Mixed-script domains show as Punycode
  - Exception lists for valid mixed domains

Example Attack:
  Legitimate: "paypal.com"
  Fake:       "раураl.com" (Cyrillic а, у, р, а)

  Browser displays fake as: "xn--80ak6aa92e.com"

User Settings:
  - Chrome: Settings → Privacy → Use secure DNS
  - Firefox: network.IDN_show_punycode preference
  - Safari: Automatic security handling

Common Use Cases

Use Case Description Example
Domain Registration Register IDN domains with registrar Verify Punycode before registration
DNS Configuration Configure DNS records for IDN Use Punycode in zone files
SSL Certificate Request certificates for IDN domains CSR uses Punycode format
Email Configuration Set up email for IDN domains MX records with Punycode
Security Audit Check for homograph attacks Verify suspicious domains
International SEO Optimize IDN sites for search Hreflang with Unicode URLs

Best Practices

Limitations

Frequently Asked Questions

What is Punycode and why is it used?
Punycode is an encoding algorithm that converts Unicode strings to ASCII-compatible format. It's used for Internationalized Domain Names (IDN) to represent non-ASCII characters (like ü, 中文,日本語) in the DNS system, which only supports ASCII. Punycode domains start with 'xn--' prefix.
How does Punycode encoding work?
Punycode separates basic ASCII characters from non-ASCII characters. ASCII chars are kept as-is, while Unicode characters are encoded using a variable-length integer system based on 36 alphanumeric characters (a-z, 0-9). The 'xn--' prefix indicates Punycode encoding to DNS systems.
What is the xn-- prefix?
The 'xn--' prefix is a label separator that identifies Punycode-encoded domain labels. It tells DNS systems and browsers that the following characters are encoded Unicode. For example, 'münchen.de' becomes 'xn--mnchen-3ya.de' where 'xn--' marks the encoded label.
What are Internationalized Domain Names (IDN)?
IDNs are domain names containing non-ASCII characters, allowing users to register domains in their native scripts (Chinese, Arabic, Cyrillic, etc.). IDNs are converted to Punycode for DNS resolution while displaying in Unicode form in browsers.
Are there security concerns with Punycode domains?
Yes, homograph attacks use visually similar characters from different scripts to create deceptive domains (e.g., using Cyrillic 'а' instead of Latin 'a'). Modern browsers show Punycode instead of Unicode for domains with mixed scripts or suspicious patterns to prevent phishing.
Which top-level domains support IDN?
Most major TLDs support IDN including .com, .net, .org, .de, .jp, .cn, .kr, .ru, and many country-code TLDs. However, support varies by registry. Some TLDs have restrictions on which character sets are allowed based on language requirements.