Frodex

Frodex

Beta
EnglishPortuguês (BR)
Foundations
1Introduction2Tokens3Controlling the Model
Communicating with LLMs
4Anatomy of a Good Prompt5System Prompts and Personas6Few-Shot Learning
Structured Outputs
7JSON Mode and Structured Output8Function Calling
Advanced Techniques
9Chain of Thought Reasoning10Managing the Context Window11Embeddings and Semantic Search
Production Systems
12Retrieval-Augmented Generation (RAG)13Streaming Responses14Evaluation and Cost Optimization
Frodex

Frodex

Beta
EnglishPortuguês (BR)
Foundations
1Introduction2Tokens3Controlling the Model
Communicating with LLMs
4Anatomy of a Good Prompt5System Prompts and Personas6Few-Shot Learning
Structured Outputs
7JSON Mode and Structured Output8Function Calling
Advanced Techniques
9Chain of Thought Reasoning10Managing the Context Window11Embeddings and Semantic Search
Production Systems
12Retrieval-Augmented Generation (RAG)13Streaming Responses14Evaluation and Cost Optimization
Production Systems
12 minLesson 13 of 14

Streaming Responses

Deliver faster perceived performance with streaming in production LLM interfaces

Learning goals

  • •Understand why streaming improves UX
  • •Learn to implement streaming in different frameworks
  • •Handle streaming edge cases

Why Stream?

  • User waits for entire response
  • Long responses = long wait
  • No feedback during generation
  • First tokens appear in ~200ms
  • Response builds in real-time
  • Better perceived performance

For a 500-token response, streaming can make the experience feel 10x faster.

Implementing Streaming

// Using AI SDK
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = streamText({
  model: openai('gpt-4'),
  prompt: 'Write a short story',
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Server-Sent Events (SSE) or WebSockets deliver chunks to the browser.

Streaming Considerations

Error Handling

  • Connection drops
  • Rate limits
  • Model errors

Parsing Structured Output

If streaming JSON, wait for complete object before parsing.

UI/UX

  • Show typing indicator during generation
  • Handle rapid content updates efficiently
  • Consider smooth scroll behavior

Common mistakes

×Not handling connection errors—streams can fail mid-response
×Parsing partial JSON—wait for complete structures
×Ignoring rate limits—streaming doesn't prevent rate limiting
×No loading states—users need feedback while waiting for first token

Key takeaways

+Streaming dramatically improves perceived performance
+First tokens appear in ~200ms regardless of total response length
+Handle errors gracefully—streams can fail mid-response
+Buffer structured output until it's complete

Playground

Try These Experiments

Prompt

Why This Experiment?

Understand streaming patterns and trade-offs.

Response
No response yet
Choose an experiment above or type your own prompt, then click Run to see the model's response here.

Without streaming, the user waits for the full response before seeing anything—perceived latency is the full generation time.