Unicode Character Inspector
Text
Every character in this tool gets a row showing: the character itself, the codepoint in `U+HHHH` hex, the UTF-8 byte sequence, an HTML decimal entity (`&#NNNN;`), a CSS escape (`\HHHH`), and the Unicode block it belongs to. Useful for debugging mojibake, finding the exact codepoint of a confusing character (is that a hyphen-minus or an em dash?), or seeing how many bytes your string takes in UTF-8 storage. Handles surrogate pairs correctly using Array.from for proper codepoint iteration.
string.length
13
Codepoints
12
UTF-8 bytes
19
| Char | Codepoint | UTF-8 bytes | HTML entity | CSS escape | Block |
|---|---|---|---|---|---|
| H | U+0048 | 48 | H | \0048 | Basic Latin (ASCII) |
| e | U+0065 | 65 | e | \0065 | Basic Latin (ASCII) |
| l | U+006C | 6C | l | \006C | Basic Latin (ASCII) |
| l | U+006C | 6C | l | \006C | Basic Latin (ASCII) |
| o | U+006F | 6F | o | \006F | Basic Latin (ASCII) |
| , | U+002C | 2C | , | \002C | Basic Latin (ASCII) |
| β | U+0020 | 20 |   | \0020 | Basic Latin (ASCII) |
| δΈ | U+4E16 | E4 B8 96 | 世 | \4E16 | CJK Unified Ideographs |
| η | U+754C | E7 95 8C | 界 | \754C | CJK Unified Ideographs |
| ! | U+0021 | 21 | ! | \0021 | Basic Latin (ASCII) |
| β | U+0020 | 20 |   | \0020 | Basic Latin (ASCII) |
| π | U+1F30F | F0 9F 8C 8F | 🌏 | \1F30F | Miscellaneous Symbols & Pictographs |
Codepoints iterated with Array.from (surrogate-pair safe). Block names cover the most common Unicode ranges β niche blocks may show 'β'.
How to use
- Paste or type text in the input box.
- Read each character's metadata in the table.
- Copy the parsed table as TSV with the copy button.
Frequently asked questions
- Why is π one row but len = 2?
- Emoji and other supplementary plane codepoints (>U+FFFF) take 2 UTF-16 code units in JavaScript strings, but they're one user-perceived character. The tool counts codepoints (Array.from) for the row count, but reports `string.length` separately so you can see the discrepancy.
- Total bytes vs UTF-8 column β same?
- Yes. Total bytes = sum of each row's UTF-8 byte count, computed via TextEncoder for accuracy on edge cases. Useful for sizing storage or wire format.
- What's mojibake?
- Garbled text from interpreting bytes in the wrong encoding. Classic: UTF-8 'Γ©' (C3 A9) read as Latin-1 becomes 'ΓΒ©'. This tool can help diagnose it β paste the garbled string and see if the codepoints match what 'wrong-decoded UTF-8' would produce.
- What about combining characters / grapheme clusters?
- We show codepoints, not graphemes. 'Γ©' can be one codepoint (U+00E9) or two (e + combining acute, U+0065 + U+0301). The visual character is the same; the byte representation isn't. For proper grapheme counting, you'd need Intl.Segmenter β beyond this tool's scope.
Related tools
URL Slug Generator
Turn any text into a clean URL slug β strip accents, choose a separator, set a max length.
Markdown Table Generator
Paste CSV, TSV, or pipe-delimited data and get a properly aligned GitHub-flavored Markdown table.
Text Diff Viewer
Compare two pieces of text and see line-by-line or word-by-word additions and removals.
Lorem Ipsum Generator
Generate placeholder text by paragraphs, sentences, or words.
Case Converter
Convert text between UPPER, lower, Title, camelCase, snake_case and more.
Character & Word Counter
Count characters, words, sentences, lines, and bytes in real time.