Unicode Normalizer (NFC, NFD, NFKC, NFKD)
Text
The same visible text can be stored as different code-point sequences — é can be one precomposed character (U+00E9) or 'e' plus a combining accent (U+0065 U+0301). Unicode normalization rewrites text into a canonical form so that equal-looking strings compare equal, sort predictably, and round-trip through filesystems and databases. NFC composes to the shortest precomposed form (best default for storage and the web); NFD fully decomposes (common in macOS filenames); NFKC and NFKD additionally apply compatibility mappings, folding ligatures (fi → fi), full-width characters (2024 → 2024), and forms like Roman numerals (Ⅻ → XII). Optionally strip combining marks to remove accents entirely. The comparison table shows the code-point and UTF-8 byte length of each form and flags which one your input already matches — handy for spotting NFD data where you expected NFC. Everything runs locally; your text never leaves the browser.
Normalized output · 17 chars · 32 bytes
Café file 2024 Ⅻ ①
Input is already in NFC.
All forms compared
| Form | chars | bytes | = input? |
|---|---|---|---|
| NFC | 17 | 32 | yes |
| NFD | 18 | 33 | no |
| NFKC | 20 | 21 | no |
| NFKD | 21 | 22 | no |
NFC is the safest default for storage and the web. NFKC/NFKD and diacritic-stripping are lossy — don't use them on text you must keep exact.
How to use
- Paste or type text into the input box.
- Pick a target form (NFC, NFD, NFKC, NFKD) and copy the normalized output.
- Toggle 'Strip diacritics' to also remove accents, and read the table to see which form your input already is.
Frequently asked questions
- Which form should I use?
- NFC is the safest default for storage, transport, and the web — it's the shortest canonical form and what most systems expect. Use NFD when a system requires decomposed text (e.g. some macOS contexts). Use NFKC/NFKD only when you deliberately want compatibility folding (ligatures, full-width, super/subscripts collapsed), since those are lossy transformations.
- What does 'strip diacritics' do?
- It decomposes the text (NFD), removes all combining marks, then re-normalizes to your chosen form — so 'café' becomes 'cafe' and 'Crème Brûlée' becomes 'Creme Brulee'. This is handy for building ASCII slugs or accent-insensitive search keys, but it changes meaning in many languages, so don't use it on text you need to keep correct.
- Why do the byte counts differ between forms?
- Decomposed forms (NFD/NFKD) often use more code points — a precomposed 'é' is one 2-byte character in UTF-8, while 'e' + combining acute is two characters totaling 3 bytes. Compatibility forms can go either way. The table lets you compare exact code-point and byte lengths.
- Is normalization reversible?
- NFC ↔ NFD is information-preserving and reversible for canonical equivalence. NFKC/NFKD are not reversible — once a ligature or full-width digit is folded, the original distinction is lost. Stripping diacritics is also one-way.
Related tools
Markdown Table to CSV Converter
Convert a GitHub-flavored Markdown table into CSV, TSV or semicolon-separated rows, in your browser.
Markdown Table Generator
Paste CSV, TSV, or pipe-delimited data and get a properly aligned GitHub-flavored Markdown table.
Text Diff Viewer
Compare two pieces of text and see line-by-line or word-by-word additions and removals.
Lorem Ipsum Generator
Generate placeholder text by paragraphs, sentences, or words.
Case Converter
Convert text between UPPER, lower, Title, camelCase, snake_case and more.
Character & Word Counter
Count characters, words, sentences, lines, and bytes in real time.