File Encoding Detector
File
When a file shows up with mojibake (괆쒋쀎), the first step is figuring out what encoding it actually is. This detector reads the file's bytes in your browser (never uploaded) and runs the standard heuristics: BOM byte sequences first (FF FE for UTF-16 LE, EF BB BF for UTF-8 BOM, etc.), then null-byte density (UTF-16 has nulls in every other byte for ASCII content), then UTF-8 validity checking (most multi-byte sequences are invalid Latin-1, so valid UTF-8 = very probably UTF-8). Returns the detected encoding, confidence percentage, BOM bytes if present, and a side-by-side hex/text preview so you can sanity-check the decode visually.
How to use
- Drop or pick any text file. Detection runs in your browser — the file never leaves the page.
- Check the BOM panel first: a BOM means the encoding is essentially certain. No BOM means heuristic detection.
- Compare the hex view to the decoded text preview. If non-ASCII characters look right, the detection is correct.
Frequently asked questions
- Why isn't there a 'detect' library like chardet?
- Browsers don't ship chardet, and importing a large encoding-detection library (the JS port of ICU's `CharsetDetector` is ~200KB) for a few common cases is overkill. This tool covers the 95% case: BOMs, ASCII-only, valid UTF-8, and UTF-16 by null-byte pattern. For exotic Asian encodings (Shift_JIS, GB2312, EUC-KR) without a BOM, you'll need chardet — but this tool will tell you 'not UTF-8' so you know to look elsewhere.
- What's the deal with BOMs?
- Byte Order Marks are 2-4 byte prefixes that explicitly mark the encoding. UTF-8 BOM is `EF BB BF` (technically unnecessary, controversial — Microsoft adds them, Unix tools usually strip them). UTF-16/32 BOMs (`FF FE` etc.) are useful because they also signal endianness. If a file has a BOM, trust it absolutely.
Related tools
Image → PDF Converter
Combine multiple JPG / PNG images into a single PDF with adjustable page size, orientation, and fit.
ZIP Inspector
Drop a ZIP and see every file inside — sizes, contents, and per-file download — without unpacking it locally.
File Splitter
Split a large file into smaller chunks to bypass attachment or upload size limits.
Text Encoding Converter
Open text files in legacy encodings (EUC-KR, Shift_JIS, Windows-1252…) as readable UTF-8.
File Hash (Checksum)
Compute SHA-1, SHA-256, and SHA-512 checksums of any file.
CSV ↔ JSON Converter
Convert CSV to JSON and JSON back to CSV, with a delimiter option.