Vector Databases
Vector databases are specialized databases designed to store and search high-dimensional vectors (embeddings). They're essential for RAG, semantic search, and recommendation systems.
What are Embeddings?
Embeddings are numerical representations of text, images, or other data. Similar items have similar vectors, enabling semantic search.
"cat" → [0.2, 0.8, 0.1, ...] (768 dimensions)
"dog" → [0.3, 0.7, 0.2, ...] (similar to cat!)
"car" → [0.9, 0.1, 0.8, ...] (different)Popular Vector Databases
Pinecone
Fully managed, serverless. Easy to use, scales automatically.
Best for: Production apps, no ops
Weaviate
Open-source, GraphQL API. Built-in vectorization.
Best for: Flexibility, self-hosting
Chroma
Lightweight, Python-first. Great for prototyping.
Best for: Local dev, quick start
Qdrant
Rust-based, high performance. Advanced filtering.
Best for: Speed, complex queries
Vector Search Example
Simplified example of similarity search using cosine similarity.
python
Output:
Click "Run Code" to see output
Key Concepts
- ANN (Approximate Nearest Neighbors): Fast search algorithm (HNSW, IVF)
- Indexing: Organize vectors for efficient retrieval
- Metadata Filtering: Combine vector search with traditional filters
- Hybrid Search: Combine semantic (vector) + keyword (BM25) search