AZ Tools

Unicode Normalizer (NFC, NFD, NFKC, NFKD)

Text

The same visible text can be stored as different code-point sequences — é can be one precomposed character (U+00E9) or 'e' plus a combining accent (U+0065 U+0301). Unicode normalization rewrites text into a canonical form so that equal-looking strings compare equal, sort predictably, and round-trip through filesystems and databases. NFC composes to the shortest precomposed form (best default for storage and the web); NFD fully decomposes (common in macOS filenames); NFKC and NFKD additionally apply compatibility mappings, folding ligatures (fi → fi), full-width characters (2024 → 2024), and forms like Roman numerals (Ⅻ → XII). Optionally strip combining marks to remove accents entirely. The comparison table shows the code-point and UTF-8 byte length of each form and flags which one your input already matches — handy for spotting NFD data where you expected NFC. Everything runs locally; your text never leaves the browser.

Normalized output · 17 chars · 32 bytes

Café file 2024 Ⅻ ①

Input is already in NFC.

All forms compared

Formcharsbytes= input?
NFC1732yes
NFD1833no
NFKC2021no
NFKD2122no

NFC is the safest default for storage and the web. NFKC/NFKD and diacritic-stripping are lossy — don't use them on text you must keep exact.

How to use

  1. Paste or type text into the input box.
  2. Pick a target form (NFC, NFD, NFKC, NFKD) and copy the normalized output.
  3. Toggle 'Strip diacritics' to also remove accents, and read the table to see which form your input already is.

Frequently asked questions

Which form should I use?
NFC is the safest default for storage, transport, and the web — it's the shortest canonical form and what most systems expect. Use NFD when a system requires decomposed text (e.g. some macOS contexts). Use NFKC/NFKD only when you deliberately want compatibility folding (ligatures, full-width, super/subscripts collapsed), since those are lossy transformations.
What does 'strip diacritics' do?
It decomposes the text (NFD), removes all combining marks, then re-normalizes to your chosen form — so 'café' becomes 'cafe' and 'Crème Brûlée' becomes 'Creme Brulee'. This is handy for building ASCII slugs or accent-insensitive search keys, but it changes meaning in many languages, so don't use it on text you need to keep correct.
Why do the byte counts differ between forms?
Decomposed forms (NFD/NFKD) often use more code points — a precomposed 'é' is one 2-byte character in UTF-8, while 'e' + combining acute is two characters totaling 3 bytes. Compatibility forms can go either way. The table lets you compare exact code-point and byte lengths.
Is normalization reversible?
NFC ↔ NFD is information-preserving and reversible for canonical equivalence. NFKC/NFKD are not reversible — once a ligature or full-width digit is folded, the original distinction is lost. Stripping diacritics is also one-way.

Related tools