Word Embeddings
Word embeddings represent words as dense vectors in continuous space, where similar words are close together. They capture semantic meaning and relationships between words.
Key Insight: "You shall know a word by the company it keeps" — Words appearing in similar contexts have similar meanings.
Why Embeddings?
One-Hot Encoding ✗
Traditional approach: sparse, high-dimensional.
dog: [0, 1, 0, 0, ..., 0]
Vocabulary size: 50,000+
✗ No semantic similarity, huge dimensions
Word Embeddings ✓
Dense vectors in low-dimensional space.
dog: [0.3, -0.4, 0.7, ...]
Dimensions: 100-300
✓ Captures meaning, similar words are close
Popular Embedding Methods
Word2Vec (2013)
Learn embeddings by predicting context words.
Skip-gram: Predict context from word
Trained on large text corpus
Fast, captures semantic relationships (king - man + woman ≈ queen)
GloVe (2014)
Global Vectors for Word Representation.
Combines global matrix factorization + local context
Pre-trained on Wikipedia, Common Crawl
Good for capturing word analogies and relationships
FastText (2016)
Extension of Word2Vec with subword information.
Example: "where" → ["wh", "whe", "her", "ere", "re"]
Handles out-of-vocabulary words
✓ Works with rare words, morphologically rich languages
Contextual Embeddings (BERT, GPT)
Different embeddings for same word in different contexts.
Generated by transformer models
State-of-the-art for most NLP tasks
✓ Captures context, polysemy | ✗ Computationally expensive
Embedding Properties
Semantic Similarity
Similar words have similar vectors.
cosine_similarity(cat, car) ≈ 0.2
Analogies (Vector Arithmetic)
Relationships encoded in vector space.
Paris - France + Italy ≈ Rome
walking - walk + swim ≈ swimming
Clustering
Related words cluster together.
Colors: red, blue, green, yellow
Countries: USA, France, Japan, Brazil
Using Pre-trained Embeddings
Popular Pre-trained Models
Applications
Choosing Embeddings
Key Takeaway: Word embeddings transform words into meaningful vectors. Word2Vec and GloVe are classics, but contextual embeddings (BERT) are now state-of-the-art for most tasks.