Tokens: The Currency of LLMs
Learn what tokens are and why they matter for cost, context, and real application design
Learning goals
- •Understand what tokens are and how text is tokenized
- •Learn why token count matters for cost and context limits
- •Recognize how different content types tokenize differently
What Are Tokens?
A token is a chunk of text that the model processes as a single unit. Tokens can be:
- Whole words: "hello" → 1 token
- Word pieces: "unhappiness" → ["un", "happiness"] → 2 tokens
- Punctuation: "!" → 1 token
- Numbers: "2024" might be 1-2 tokens depending on the tokenizer
As a rough rule: 1 token ≈ 4 characters or 100 tokens ≈ 75 words in English.
Why Tokens Matter
Cost
API pricing is typically per-token. Both input (prompt) and output (completion) tokens are counted. A verbose prompt costs more than a concise one.
Context Window
Each model has a maximum context length (e.g., 8K, 32K, 128K tokens). This limit includes both your input AND the model's output. A 32K context model can process about 24,000 words total.
Performance
Longer contexts can affect response quality. Information at the beginning and end of long prompts tends to be weighted more heavily than information in the middle.
Tokenization Differences
Different models use different tokenizers:
- GPT-4 uses the cl100k_base tokenizer
- Claude uses its own tokenizer
- Open-source models often use SentencePiece or custom tokenizers
The same text may have different token counts across models. The word "indescribable" is split into ["ind", "esc", "rib", "able"] = 4 tokens in GPT tokenizers. Code often has higher token density: "function(){}" might be 5+ tokens due to special characters.