Frodex

Frodex

Beta
EnglishPortuguês (BR)
Foundations
1Introduction2Tokens3Controlling the Model
Communicating with LLMs
4Anatomy of a Good Prompt5System Prompts and Personas6Few-Shot Learning
Structured Outputs
7JSON Mode and Structured Output8Function Calling
Advanced Techniques
9Chain of Thought Reasoning10Managing the Context Window11Embeddings and Semantic Search
Production Systems
12Retrieval-Augmented Generation (RAG)13Streaming Responses14Evaluation and Cost Optimization
Frodex

Frodex

Beta
EnglishPortuguês (BR)
Foundations
1Introduction2Tokens3Controlling the Model
Communicating with LLMs
4Anatomy of a Good Prompt5System Prompts and Personas6Few-Shot Learning
Structured Outputs
7JSON Mode and Structured Output8Function Calling
Advanced Techniques
9Chain of Thought Reasoning10Managing the Context Window11Embeddings and Semantic Search
Production Systems
12Retrieval-Augmented Generation (RAG)13Streaming Responses14Evaluation and Cost Optimization
Foundations
12 minLesson 2 of 14

Tokens: The Currency of LLMs

Learn what tokens are and why they matter for cost, context, and real application design

Learning goals

  • •Understand what tokens are and how text is tokenized
  • •Learn why token count matters for cost and context limits
  • •Recognize how different content types tokenize differently

What Are Tokens?

A token is a chunk of text that the model processes as a single unit. Tokens can be:

  • Whole words: "hello" → 1 token
  • Word pieces: "unhappiness" → ["un", "happiness"] → 2 tokens
  • Punctuation: "!" → 1 token
  • Numbers: "2024" might be 1-2 tokens depending on the tokenizer

As a rough rule: 1 token ≈ 4 characters or 100 tokens ≈ 75 words in English.

Why Tokens Matter

Cost API pricing is typically per-token. Both input (prompt) and output (completion) tokens are counted. A verbose prompt costs more than a concise one.

Context Window Each model has a maximum context length (e.g., 8K, 32K, 128K tokens). This limit includes both your input AND the model's output. A 32K context model can process about 24,000 words total.

Performance Longer contexts can affect response quality. Information at the beginning and end of long prompts tends to be weighted more heavily than information in the middle.

Tokenization Differences

Different models use different tokenizers:

  • GPT-4 uses the cl100k_base tokenizer
  • Claude uses its own tokenizer
  • Open-source models often use SentencePiece or custom tokenizers

The same text may have different token counts across models. The word "indescribable" is split into ["ind", "esc", "rib", "able"] = 4 tokens in GPT tokenizers. Code often has higher token density: "function(){}" might be 5+ tokens due to special characters.

Common mistakes

×Ignoring token costs in production—verbose prompts at scale become expensive quickly
×Not accounting for output tokens—a request for 'a detailed explanation' generates many output tokens
×Assuming word count equals token count—special characters and code tokenize differently
×Filling context windows completely—this can degrade response quality

Key takeaways

+Tokens are the unit of measurement for LLM input and output, approximately 4 characters each
+Both input and output tokens count toward cost and context limits
+Different models tokenize the same text differently
+Efficient prompting means getting good results with fewer tokens

Playground

Try These Experiments

Prompt

Why This Experiment?

Compare short vs wordy prompts and see how they change token usage and output.

Response
No response yet
Choose an experiment above or type your own prompt, then click Run to see the model's response here.

A compact instruction that uses few input tokens and encourages a tiny, cheap output.