AZ Tools

Mojibake Fixer (Repair Garbled UTF-8)

Text

Mojibake is the garbled text you get when UTF-8 bytes are mistakenly read as a single-byte encoding — almost always Windows-1252, the legacy Windows default. An é becomes é, a curly apostrophe becomes ’, a non-breaking space becomes  , and an emoji turns into a run of four odd characters like 😀. This tool reverses that: it re-encodes each character back to the Windows-1252 byte it came from and decodes the resulting bytes as UTF-8, recovering the original text. It applies the fix repeatedly for doubly-mangled text, and it's safe by design — correctly-encoded text (including non-Latin scripts) doesn't form valid UTF-8 when reversed, so it's left untouched rather than damaged. Paste the broken text and copy the repaired version. Everything runs locally; nothing is uploaded.

Repaired · repaired in 1 pass(es)

Reverses UTF-8 mis-decoded as Windows-1252. Correct text (any script) is detected as valid and left unchanged.

How to use

  1. Paste the garbled text.
  2. Read the repaired output — the tool shows how many passes it took, or that nothing needed fixing.
  3. Copy the corrected text.

Frequently asked questions

What causes mojibake?
It happens when text saved as UTF-8 is later read using a different, single-byte encoding — most often Windows-1252 or ISO-8859-1. Each non-ASCII character was stored as two or more UTF-8 bytes, and reading those bytes one at a time produces the wrong characters: é (two bytes) shows up as the two characters é. CSV imports, database migrations, and copy-paste between mismatched systems are common culprits.
Will it damage text that's already correct?
No. The repair only succeeds when the reversed bytes form valid UTF-8, which genuine mojibake does but correctly-encoded text does not. So 'café', 'Köln', '한국어', or '日本語' that are already right are detected as valid and left exactly as they are — the tool reports that no fix was needed.
Why does it sometimes apply more than one pass?
If text was mis-decoded twice — for example UTF-8 read as Windows-1252, saved, then read as Windows-1252 again — the garbling is layered. The tool repeats the repair until the text stops changing or no longer reverses to valid UTF-8, and tells you how many passes it used.
It didn't fix my text — why?
Either the text is already correct, or the corruption isn't the common UTF-8-as-Windows-1252 kind (for example it was mis-decoded as Shift_JIS or EUC-KR, or bytes were actually lost). This tool targets the most frequent case; for opening a file in a specific legacy encoding, use a text-encoding converter instead.

Related tools