Advanced Techniques
18 minLesson 11 of 14
Embeddings and Semantic Search
Convert text to vectors for similarity search in real applications and systems
Learning goals
- •Understand what embeddings are and how they work
- •Learn to implement semantic search
- •Know when to use embeddings vs other approaches
What Are Embeddings?
Embeddings convert text into numerical vectors that capture semantic meaning:
- Similar meanings → Similar vectors
- Different meanings → Different vectors
A sentence like "I love pizza" becomes a vector like [0.23, -0.45, 0.87, ...] (typically 1536+ dimensions).
Key property: You can measure similarity between embeddings using cosine similarity or dot product.
Generating Embeddings
const response = await openai.embeddings.create({
model: "text-embedding-ada-002",
input: "Your text here"const vector = response.data[0].embedding; // Returns a 1536-dimensional vector ```
Store these vectors in a vector database for efficient similarity search.
Semantic Search
Traditional search: exact keyword matching Semantic search: meaning-based matching
- "automobile" finds documents about "cars"
- "happy" finds documents about "joyful" or "pleased"
- "ML" finds documents about "machine learning"
The process: 1. Embed your query 2. Find similar vectors in your database 3. Return the corresponding documents
Common mistakes
×Embedding very long texts—chunk into smaller pieces for better retrieval
×Using wrong embedding model—match the model to your use case and language
×Ignoring embedding costs—embeddings are cheaper than completions but add up at scale
×Not normalizing vectors—some similarity measures require normalized vectors
Key takeaways
+Embeddings convert text to vectors that capture semantic meaning
+Similar meanings produce similar vectors, enabling semantic search
+Chunk long documents for better retrieval precision
+Use embeddings for search, clustering, and recommendation systems