Foundations
15 minLesson 3 of 14
Controlling the Model: Generation Parameters
Master the parameters that control how LLMs generate text in real applications
Learning goals
- •Understand temperature and its effect on output randomness
- •Learn about top-p, max_tokens, and other generation parameters
- •Know when to use which parameters for different tasks
Temperature (0.0 - 2.0)
Temperature controls the randomness of token selection:
- 0.0: Deterministic—always picks the highest probability token. Best for factual tasks.
- 0.7: Balanced—good default for most tasks
- 1.0+: Creative—more random, unexpected outputs. Good for brainstorming.
Think of temperature as the "creativity dial."
Top-P (Nucleus Sampling)
Top-P (0.0 - 1.0) limits token selection to the smallest set whose cumulative probability exceeds P:
- 0.1: Very focused—considers only the most likely tokens
- 0.9: Broad—considers more diverse options
- 1.0: Considers all tokens
Use either temperature OR top-p for control, not both simultaneously.
Max Tokens
Sets the maximum length of the generated response. Important for:
- Cost control: Limits output tokens billed
- Response format: Ensures concise answers
- Context management: Leaves room for follow-up exchanges
Other Parameters
Stop Sequences Strings that terminate generation when encountered. Useful for structured outputs.
Frequency Penalty (0.0 - 2.0) Reduces repetition by penalizing tokens that have already appeared. Higher values = less repetition.
Presence Penalty (0.0 - 2.0) Encourages the model to introduce new topics. Higher values = more diverse content.
Common mistakes
×Using high temperature for tasks requiring accuracy—leads to hallucinations
×Setting max_tokens too low—responses get cut off mid-sentence
×Using both temperature and top_p—they compete; use one or the other
×Not testing parameters—optimal values vary by use case
Key takeaways
+Temperature controls randomness: low for accuracy, high for creativity
+Top-P is an alternative to temperature for controlling diversity
+Max tokens limits output length and controls costs
+Always test different parameter combinations for your specific use case