AZ Tools

File Encoding Detector

File

When a file shows up with mojibake (괆쒋쀎), the first step is figuring out what encoding it actually is. This detector reads the file's bytes in your browser (never uploaded) and runs the standard heuristics: BOM byte sequences first (FF FE for UTF-16 LE, EF BB BF for UTF-8 BOM, etc.), then null-byte density (UTF-16 has nulls in every other byte for ASCII content), then UTF-8 validity checking (most multi-byte sequences are invalid Latin-1, so valid UTF-8 = very probably UTF-8). Returns the detected encoding, confidence percentage, BOM bytes if present, and a side-by-side hex/text preview so you can sanity-check the decode visually.

How to use

  1. Drop or pick any text file. Detection runs in your browser — the file never leaves the page.
  2. Check the BOM panel first: a BOM means the encoding is essentially certain. No BOM means heuristic detection.
  3. Compare the hex view to the decoded text preview. If non-ASCII characters look right, the detection is correct.

Frequently asked questions

Why isn't there a 'detect' library like chardet?
Browsers don't ship chardet, and importing a large encoding-detection library (the JS port of ICU's `CharsetDetector` is ~200KB) for a few common cases is overkill. This tool covers the 95% case: BOMs, ASCII-only, valid UTF-8, and UTF-16 by null-byte pattern. For exotic Asian encodings (Shift_JIS, GB2312, EUC-KR) without a BOM, you'll need chardet — but this tool will tell you 'not UTF-8' so you know to look elsewhere.
What's the deal with BOMs?
Byte Order Marks are 2-4 byte prefixes that explicitly mark the encoding. UTF-8 BOM is `EF BB BF` (technically unnecessary, controversial — Microsoft adds them, Unix tools usually strip them). UTF-16/32 BOMs (`FF FE` etc.) are useful because they also signal endianness. If a file has a BOM, trust it absolutely.

Related tools