Feature Engineering
Feature engineering is the art of creating better input features for machine learning models. Good features can make the difference between a mediocre model and a great one.
Key Insight: Better features beat better algorithms. A simple model with great features outperforms a complex model with poor features.
Feature Encoding
Convert categorical variables into numerical format that models can understand.
Label Encoding
Assign each category a unique integer.
⚠️ Use for ordinal data only (implies order)
One-Hot Encoding
Create binary column for each category.
'blue' → [0, 1, 0]
'green' → [0, 0, 1]
✓ No ordinal assumption, works for nominal data
Target Encoding
Replace category with mean of target variable.
'LA' → average salary in LA
✓ Captures relationship with target, ⚠️ risk of overfitting
Feature Scaling
Normalize features to similar ranges so no single feature dominates.
Standardization (Z-score)
Mean = 0, Std = 1
Use for: Most ML algorithms, when data is normally distributed
Normalization (Min-Max)
Scale to [0, 1] range
Use for: Neural networks, when you need bounded values
Feature Creation
Create new features from existing ones to capture relationships.
Feature Selection
Choose the most important features to reduce dimensionality and improve performance.
Filter Methods
Select features based on statistical tests (correlation, chi-square, mutual information)
Wrapper Methods
Use model performance to select features (forward selection, backward elimination, RFE)
Embedded Methods
Feature selection during model training (Lasso, Ridge, tree-based feature importance)
Best Practices
Key Takeaway: Feature engineering is both art and science. Combine domain expertise with experimentation to create features that help your model learn better.