Foundations
12 minLesson 2 of 14
Tokens: The Currency of LLMs
Learn what tokens are and why they matter for cost, context, and real application design
Learning goals
- •Understand what tokens are and how text is tokenized
- •Learn why token count matters for cost and context limits
- •Recognize how different content types tokenize differently
What Are Tokens?
A token is a chunk of text that the model processes as a single unit. Tokens can be:
- Whole words: "hello" → 1 token
- Word pieces: "unhappiness" → ["un", "happiness"] → 2 tokens
- Punctuation: "!" → 1 token
- Numbers: "2024" might be 1-2 tokens depending on the tokenizer
As a rough rule: 1 token ≈ 4 characters or 100 tokens ≈ 75 words in English.
Why Tokens Matter
Cost API pricing is typically per-token. Both input (prompt) and output (completion) tokens are counted. A verbose prompt costs more than a concise one.
Context Window Each model has a maximum context length (e.g., 8K, 32K, 128K tokens). This limit includes both your input AND the model's output. A 32K context model can process about 24,000 words total.
Performance Longer contexts can affect response quality. Information at the beginning and end of long prompts tends to be weighted more heavily than information in the middle.
Tokenization Differences
Different models use different tokenizers:
- GPT-4 uses the cl100k_base tokenizer
- Claude uses its own tokenizer
- Open-source models often use SentencePiece or custom tokenizers
The same text may have different token counts across models. The word "indescribable" is split into ["ind", "esc", "rib", "able"] = 4 tokens in GPT tokenizers. Code often has higher token density: "function(){}" might be 5+ tokens due to special characters.
Common mistakes
×Ignoring token costs in production—verbose prompts at scale become expensive quickly
×Not accounting for output tokens—a request for 'a detailed explanation' generates many output tokens
×Assuming word count equals token count—special characters and code tokenize differently
×Filling context windows completely—this can degrade response quality
Key takeaways
+Tokens are the unit of measurement for LLM input and output, approximately 4 characters each
+Both input and output tokens count toward cost and context limits
+Different models tokenize the same text differently
+Efficient prompting means getting good results with fewer tokens