Question 1

What is a Unicode code point?

Accepted Answer

A code point is a numeric value assigned to each character. Written as U+XXXX where X is hexadecimal. For example, U+0041 is Latin capital letter A.

Question 2

What is the difference between UTF-8 and UTF-16?

Accepted Answer

UTF-8 uses 1-4 bytes per code point and is ASCII-compatible. UTF-16 uses 2-4 bytes and is common in Windows and Java. Both encode the same Unicode characters.

Question 3

What are surrogate pairs?

Accepted Answer

Surrogate pairs combine two 16-bit values to represent code points above U+FFFF. Used in UTF-16 for characters like emoji (U+1F600).

Question 4

Why do some Unicode characters display as boxes?

Accepted Answer

Missing font glyphs cause box display. The character exists in Unicode but your font or system lacks the visual representation.

Question 5

What is Unicode normalization?

Accepted Answer

Normalization ensures equivalent sequences have the same binary form. NFC composes characters, NFD decomposes them. Important for string comparison.

Question 6

What is the valid Unicode range?

Accepted Answer

Valid code points range from U+0000 to U+10FFFF. Some ranges are reserved, private use, or non-characters.

Block	Range	Description
Basic Latin	U+0000-007F	ASCII characters, control codes
Latin-1 Supplement	U+0080-00FF	Western European letters
General Punctuation	U+2000-206F	Spaces, dashes, quotes
Arrows	U+2190-21FF	Directional arrows
Mathematical Operators	U+2200-22FF	Math symbols
Box Drawing	U+2500-257F	Line drawing characters
Emoji	U+1F600-1F64F	Emoticons

Unicode Reference

Characters

About Unicode Reference

Unicode Block Reference

Frequently Asked Questions