Model Evaluation
How do you know if your model is good? Model evaluation measures performance, detects overfitting, and helps you choose the best model. Never deploy a model without proper evaluation!
Golden Rule: Always evaluate on data the model hasn't seen during training (test set).
Train/Test Split
Classification Metrics
Accuracy
Percentage of correct predictions.
⚠️ Misleading with imbalanced classes
Precision
Of predicted positives, how many are actually positive?
Use when: False positives are costly (spam filter)
Recall (Sensitivity)
Of actual positives, how many did we find?
Use when: False negatives are costly (disease detection)
F1-Score
Harmonic mean of precision and recall.
Balances precision and recall
Confusion Matrix
Regression Metrics
MAE (Mean Absolute Error)
Average absolute difference.
Easy to interpret, same units as target
MSE (Mean Squared Error)
Average squared difference.
Penalizes large errors more
RMSE (Root MSE)
Square root of MSE.
Same units as target, interpretable
R² (R-squared)
Proportion of variance explained.
0 to 1, higher is better
Cross-Validation
More reliable than single train/test split. Use all data for both training and testing.
K-Fold Cross-Validation
Typical: K=5 or K=10
Overfitting vs Underfitting
Underfitting
Model too simple.
• High test error
• Model hasn't learned
Fix: More complex model, more features
Good Fit
Just right!
• Low test error
• Generalizes well
Goal: Achieve this balance
Overfitting
Model too complex.
• High test error
• Memorized training data
Fix: Regularization, more data, simpler model
Best Practices
Key Takeaway: Proper evaluation is critical. Use appropriate metrics, cross-validation, and always test on unseen data to ensure your model generalizes well.