Production Systems
12 minLesson 13 of 14
Streaming Responses
Deliver faster perceived performance with streaming in production LLM interfaces
Learning goals
- •Understand why streaming improves UX
- •Learn to implement streaming in different frameworks
- •Handle streaming edge cases
Why Stream?
- User waits for entire response
- Long responses = long wait
- No feedback during generation
- First tokens appear in ~200ms
- Response builds in real-time
- Better perceived performance
For a 500-token response, streaming can make the experience feel 10x faster.
Implementing Streaming
// Using AI SDK
import { streamText } from 'ai';const result = streamText({ model: openai('gpt-4'), prompt: 'Write a short story', });
for await (const chunk of result.textStream) { process.stdout.write(chunk); } ```
Server-Sent Events (SSE) or WebSockets deliver chunks to the browser.
Streaming Considerations
Error Handling Errors can occur mid-stream. Always handle: - Connection drops - Rate limits - Model errors
Parsing Structured Output If streaming JSON, wait for complete object before parsing.
UI/UX - Show typing indicator during generation - Handle rapid content updates efficiently - Consider smooth scroll behavior
Common mistakes
×Not handling connection errors—streams can fail mid-response
×Parsing partial JSON—wait for complete structures
×Ignoring rate limits—streaming doesn't prevent rate limiting
×No loading states—users need feedback while waiting for first token
Key takeaways
+Streaming dramatically improves perceived performance
+First tokens appear in ~200ms regardless of total response length
+Handle errors gracefully—streams can fail mid-response
+Buffer structured output until it's complete