- Why does my text have different byte counts in different encodings?
- UTF-8 uses 1–4 bytes per character depending on the codepoint. UTF-16 uses 2 or 4 bytes. ASCII only represents 128 characters in 1 byte each — anything else fails. Emoji and CJK characters differ the most.
- What's the byte count of an emoji?
- Most emoji are 4 bytes in UTF-8. Compound emoji (flags, family emoji) are 8, 12, or more bytes because they're multiple codepoints joined with zero-width joiners.
- Does it count the BOM?
- Optional — toggle 'include BOM'. UTF-8 BOM is 3 bytes, UTF-16 BOM is 2 bytes. Most systems strip it; include in the count only when you care about file size on disk.
- How is byte count different from character count?
- Byte count depends on encoding; character count depends on how you define 'character' (code unit, codepoint, or grapheme cluster). For user-visible length use grapheme count; for storage use bytes.