AZ Tools
← Guides

Number Bases and Text Encodings: A Developer's Cheat Sheet

Numbers and text both end up as the same thing inside a computer: bits. But developers constantly switch between human-friendly and machine-friendly representations — hex colour codes, Base64 tokens, URL-escaped query strings. This guide is a practical map of the bases and encodings you meet every day, and what each one is actually for.

Number bases: the same value, different clothes

A 'base' is simply how many distinct digits a positional number system uses before it rolls over to the next place. Decimal (base 10) uses 0–9; binary (base 2) uses 0 and 1; octal (base 8) uses 0–7; hexadecimal (base 16) uses 0–9 then A–F. The number itself doesn't change between bases — only the way it's written. 11111111 in binary, 377 in octal, 255 in decimal and FF in hex are all the very same value.

Hex is popular with programmers because one hex digit maps exactly to four bits (a nibble), so two hex digits are exactly one byte. That makes hex a compact, lossless way to read raw binary — which is why colours (#FF8800), memory addresses and byte dumps are written in it.

Converting between bases

These conversions are purely mechanical — exactly the sort of thing a tool should do for you — but knowing the shape of the operation helps you sanity-check a result and understand what the 0x, 0o and 0b prefixes mean in code.

  • Binary ↔ hex: group the bits into fours from the right; each group of four is one hex digit (1010 1100 = AC).
  • Any base → decimal: multiply each digit by its place value (base raised to the position) and add the results.
  • Decimal → any base: repeatedly divide by the base; the remainders, read from last to first, are the digits.

From characters to bytes: UTF-8

Text isn't bytes until you encode it. A character set assigns each character a number (a 'code point') — Unicode is the universal one — and an encoding decides how those numbers become bytes. UTF-8 is the dominant encoding on the web: ASCII characters take one byte, and other characters take two to four, which keeps English text compact while still representing every script and emoji.

This is why a string's length in characters and its length in bytes can differ: 'café' is four characters but five bytes in UTF-8, because é needs two. Hashing, size limits and binary formats all care about bytes, so the distinction matters in practice.

Base64: bytes as safe text

Base64 does the opposite job: it turns arbitrary bytes into a string of 64 safe characters (A–Z, a–z, 0–9, plus + and /). It exists because many systems — email, JSON, URLs, HTML data: URIs — were built for text and can mangle raw binary. Base64 lets you carry an image or a key through a text-only channel intact.

The cost is size: Base64 encodes every 3 bytes as 4 characters, so the output is about 33% larger than the input. It is encoding, not encryption — anyone can decode it — so it protects against corruption in transit, never against prying eyes.

URL and percent encoding

URLs have their own rules. Characters with special meaning in a URL (spaces, &, ?, #, /) must be 'percent-encoded' as a % followed by their byte value in hex — a space becomes %20, an ampersand becomes %26. This is what lets a query string carry arbitrary text without breaking the address.

Percent encoding operates on the UTF-8 bytes of a character, which ties all of this together: text becomes code points, code points become UTF-8 bytes, and bytes become %XX pairs. Most of the encoding bugs developers hit come from confusing one of these layers for another.

Related tools