- Which model tokenizers?
- OpenAI (tiktoken for GPT-3.5, GPT-4, GPT-4o), Anthropic (Claude), and a generic BPE for approximation. Pick the target model.
- Is the count exact?
- For OpenAI yes — uses the actual tiktoken library in WASM. For Anthropic, close approximation — Anthropic doesn't publish the exact tokenizer; the estimate is typically within 5%.
- Why does token count matter?
- Cost and context limits. GPT-4o: $2.50/M input tokens. Claude Sonnet 4.5: $3/M. Knowing exact counts before sending matters for high-volume apps.
- What about 1 token = ~4 chars?
- Rule of thumb for English. Other languages (Japanese, Chinese) often use ~1 token per character. Code uses even more tokens per char due to syntax.