Tokens: The Currency of LLMs

Learn what tokens are and why they matter for cost, context, and real application design

Learning goals

•Understand what tokens are and how text is tokenized
•Learn why token count matters for cost and context limits
•Recognize how different content types tokenize differently

What Are Tokens?

A token is a chunk of text that the model processes as a single unit. Tokens can be:

Whole words: "hello" → 1 token
Word pieces: "unhappiness" → ["un", "happiness"] → 2 tokens
Punctuation: "!" → 1 token
Numbers: "2024" might be 1-2 tokens depending on the tokenizer

As a rough rule: 1 token ≈ 4 characters or 100 tokens ≈ 75 words in English.

Why Tokens Matter

Cost

API pricing is typically per-token. Both input (prompt) and output (completion) tokens are counted. A verbose prompt costs more than a concise one.

Context Window

Each model has a maximum context length (e.g., 8K, 32K, 128K tokens). This limit includes both your input AND the model's output. A 32K context model can process about 24,000 words total.

Performance

Longer contexts can affect response quality. Information at the beginning and end of long prompts tends to be weighted more heavily than information in the middle.

Tokenization Differences

Different models use different tokenizers:

GPT-4 uses the cl100k_base tokenizer
Claude uses its own tokenizer
Open-source models often use SentencePiece or custom tokenizers

The same text may have different token counts across models. The word "indescribable" is split into ["ind", "esc", "rib", "able"] = 4 tokens in GPT tokenizers. Code often has higher token density: "function(){}" might be 5+ tokens due to special characters.

Common mistakes

×Ignoring token costs in production—verbose prompts at scale become expensive quickly

×Not accounting for output tokens—a request for 'a detailed explanation' generates many output tokens

×Assuming word count equals token count—special characters and code tokenize differently

×Filling context windows completely—this can degrade response quality

Key takeaways

+Tokens are the unit of measurement for LLM input and output, approximately 4 characters each

+Both input and output tokens count toward cost and context limits

+Different models tokenize the same text differently

+Efficient prompting means getting good results with fewer tokens

What Are Tokens?

A token is a chunk of text that the model processes as a single unit. Tokens can be:

Whole words: "hello" → 1 token
Word pieces: "unhappiness" → ["un", "happiness"] → 2 tokens
Punctuation: "!" → 1 token
Numbers: "2024" might be 1-2 tokens depending on the tokenizer

As a rough rule: 1 token ≈ 4 characters or 100 tokens ≈ 75 words in English.

Why Tokens Matter

Cost

API pricing is typically per-token. Both input (prompt) and output (completion) tokens are counted. A verbose prompt costs more than a concise one.

Context Window

Each model has a maximum context length (e.g., 8K, 32K, 128K tokens). This limit includes both your input AND the model's output. A 32K context model can process about 24,000 words total.

Performance

Longer contexts can affect response quality. Information at the beginning and end of long prompts tends to be weighted more heavily than information in the middle.

Tokenization Differences

Different models use different tokenizers:

GPT-4 uses the cl100k_base tokenizer
Claude uses its own tokenizer
Open-source models often use SentencePiece or custom tokenizers