Loss Functions & Optimizers
Loss functions measure how wrong your model is. Optimizers adjust weights to minimize loss. Together, they drive the learning process in neural networks.
Training Loop: Forward pass → Calculate loss → Backward pass (gradients) → Update weights with optimizer → Repeat
Common Loss Functions
Mean Squared Error (MSE)
For regression tasks. Penalizes large errors heavily.
Use for: Predicting continuous values (prices, temperatures)
Binary Cross-Entropy
For binary classification (2 classes).
Use for: Spam/not spam, fraud detection, yes/no predictions
Categorical Cross-Entropy
For multi-class classification (3+ classes).
Use for: Image classification, text categorization
Huber Loss
Combines MSE and MAE. Robust to outliers.
Smoother than MAE
Use for: Regression with noisy data
Optimization Algorithms
Stochastic Gradient Descent (SGD)
Update weights using gradient of loss.
α = learning rate
✓ Simple, works well | ✗ Can be slow, sensitive to learning rate
SGD with Momentum
Accelerate in consistent directions, dampen oscillations.
θ = θ - α·v
β = momentum (typically 0.9)
✓ Faster convergence, escapes local minima better
RMSprop
Adaptive learning rate per parameter.
Good for non-stationary objectives
✓ Works well for RNNs
Adam (Adaptive Moment Estimation)
Combines momentum and RMSprop. Most popular optimizer.
Adaptive learning rates + momentum
✓ Works well out-of-the-box, default choice for most tasks
AdamW
Adam with decoupled weight decay (better regularization).
Better generalization
✓ State-of-the-art for transformers and large models
Learning Rate
The most important hyperparameter. Controls step size during optimization.
Too High
Overshoots minimum, loss diverges.
Loss explodes
Just Right
Converges smoothly to minimum.
Good convergence
Too Low
Slow progress, may get stuck.
May not converge
Learning Rate Schedules
Choosing Loss & Optimizer
Key Takeaway: Loss functions define what to optimize. Optimizers determine how to optimize. Adam is a safe default, but experiment with learning rates and schedules for best results.