Why does my text have different byte counts in different encodings?

UTF-8 uses 1–4 bytes per character depending on the codepoint. UTF-16 uses 2 or 4 bytes. ASCII only represents 128 characters in 1 byte each — anything else fails. Emoji and CJK characters differ the most.

What's the byte count of an emoji?

Most emoji are 4 bytes in UTF-8. Compound emoji (flags, family emoji) are 8, 12, or more bytes because they're multiple codepoints joined with zero-width joiners.

Does it count the BOM?

Optional — toggle 'include BOM'. UTF-8 BOM is 3 bytes, UTF-16 BOM is 2 bytes. Most systems strip it; include in the count only when you care about file size on disk.

How is byte count different from character count?

Byte count depends on encoding; character count depends on how you define 'character' (code unit, codepoint, or grapheme cluster). For user-visible length use grapheme count; for storage use bytes.

Byte Counter | DevTools Surf

DevTools Surf

About Byte Counter

Byte Counter preview - Text / String tool

Show byte size of text in UTF-8, UTF-16, and ASCII. Part of the DevTools Surf developer suite. Browse more tools in the Text / String collection.

Use Cases

Checking SMS message byte length for multi-part thresholds
Verifying database VARCHAR column capacity for Unicode text
Measuring HTTP header sizes against server limits
Estimating bandwidth for real-time text streaming applications

Tips

Compare UTF-8 and UTF-16 sizes for the same string
Check byte size before storing text in fixed-length database columns
Verify payload size stays within API request limits

Fun Facts

UTF-8 was designed by Ken Thompson and Rob Pike in September 1992 on a placemat at a New Jersey diner.
A single emoji can take up to 28 bytes in UTF-8 when combining base characters, skin tone modifiers, and zero-width joiners.
ASCII uses exactly 7 bits per character, but UTF-8 is backward-compatible, encoding the same 128 characters in one byte each.

FAQ

Why does my text have different byte counts in different encodings?: UTF-8 uses 1–4 bytes per character depending on the codepoint. UTF-16 uses 2 or 4 bytes. ASCII only represents 128 characters in 1 byte each — anything else fails. Emoji and CJK characters differ the most.
What's the byte count of an emoji?: Most emoji are 4 bytes in UTF-8. Compound emoji (flags, family emoji) are 8, 12, or more bytes because they're multiple codepoints joined with zero-width joiners.
Does it count the BOM?: Optional — toggle 'include BOM'. UTF-8 BOM is 3 bytes, UTF-16 BOM is 2 bytes. Most systems strip it; include in the count only when you care about file size on disk.
How is byte count different from character count?: Byte count depends on encoding; character count depends on how you define 'character' (code unit, codepoint, or grapheme cluster). For user-visible length use grapheme count; for storage use bytes.

Related Text / String Tools

URL Encoder URL Decoder HTML Entity Encode HTML Entity Decode JWT Decoder Lorem Ipsum Generator Word Counter Text Diff