Frodex

Frodex

Beta
EnglishPortuguês (BR)
Foundations
1Introduction2Tokens3Controlling the Model
Communicating with LLMs
4Anatomy of a Good Prompt5System Prompts and Personas6Few-Shot Learning
Structured Outputs
7JSON Mode and Structured Output8Function Calling
Advanced Techniques
9Chain of Thought Reasoning10Managing the Context Window11Embeddings and Semantic Search
Production Systems
12Retrieval-Augmented Generation (RAG)13Streaming Responses14Evaluation and Cost Optimization
Frodex

Frodex

Beta
EnglishPortuguês (BR)
Foundations
1Introduction2Tokens3Controlling the Model
Communicating with LLMs
4Anatomy of a Good Prompt5System Prompts and Personas6Few-Shot Learning
Structured Outputs
7JSON Mode and Structured Output8Function Calling
Advanced Techniques
9Chain of Thought Reasoning10Managing the Context Window11Embeddings and Semantic Search
Production Systems
12Retrieval-Augmented Generation (RAG)13Streaming Responses14Evaluation and Cost Optimization
Foundations
15 minLesson 3 of 14

Controlling the Model: Generation Parameters

Master the parameters that control how LLMs generate text in real applications

Learning goals

  • •Understand temperature and its effect on output randomness
  • •Learn about top-p, max_tokens, and other generation parameters
  • •Know when to use which parameters for different tasks

Temperature (0.0 - 2.0)

Temperature controls the randomness of token selection:

  • 0.0: Deterministic—always picks the highest probability token. Best for factual tasks.
  • 0.7: Balanced—good default for most tasks
  • 1.0+: Creative—more random, unexpected outputs. Good for brainstorming.

Think of temperature as the "creativity dial."

Top-P (Nucleus Sampling)

Top-P (0.0 - 1.0) limits token selection to the smallest set whose cumulative probability exceeds P:

  • 0.1: Very focused—considers only the most likely tokens
  • 0.9: Broad—considers more diverse options
  • 1.0: Considers all tokens

Use either temperature OR top-p for control, not both simultaneously.

Max Tokens

Sets the maximum length of the generated response. Important for:

  • Cost control: Limits output tokens billed
  • Response format: Ensures concise answers
  • Context management: Leaves room for follow-up exchanges

Other Parameters

Stop Sequences Strings that terminate generation when encountered. Useful for structured outputs.

Frequency Penalty (0.0 - 2.0) Reduces repetition by penalizing tokens that have already appeared. Higher values = less repetition.

Presence Penalty (0.0 - 2.0) Encourages the model to introduce new topics. Higher values = more diverse content.

Common mistakes

×Using high temperature for tasks requiring accuracy—leads to hallucinations
×Setting max_tokens too low—responses get cut off mid-sentence
×Using both temperature and top_p—they compete; use one or the other
×Not testing parameters—optimal values vary by use case

Key takeaways

+Temperature controls randomness: low for accuracy, high for creativity
+Top-P is an alternative to temperature for controlling diversity
+Max tokens limits output length and controls costs
+Always test different parameter combinations for your specific use case

Playground

Try These Experiments

Prompt

Why This Experiment?

Experiment with different parameter settings on realistic tasks to see how they affect determinism, creativity, and length.

Response
No response yet
Choose an experiment above or type your own prompt, then click Run to see the model's response here.

At low temperature, the model should consistently return `200`. This mirrors production use cases where you want stable, deterministic answers for factual questions.