Calculus for AI

Calculus is the mathematics of change. In AI, we use calculus to optimize models—finding the best parameters that minimize error. Understanding derivatives and gradients is essential for training neural networks.

Why it matters: Every time a neural network learns, it uses calculus to adjust weights. Backpropagation is just the chain rule applied repeatedly.

Core Concepts

Derivative

Rate of change of a function. How much does output change when input changes slightly?

f(x) = x²
f'(x) = 2x

Used for: Finding slopes, optimization

Partial Derivative

Derivative with respect to one variable, holding others constant.

f(x,y) = x² + y²
∂f/∂x = 2x

Used for: Multi-variable optimization

Gradient

Vector of all partial derivatives. Points in direction of steepest increase.

∇f = [∂f/∂x₁, ∂f/∂x₂, ...]

Used for: Gradient descent, backpropagation

Chain Rule

Derivative of composite functions. Essential for backpropagation.

d/dx[f(g(x))] = f'(g(x))·g'(x)

Used for: Computing gradients through layers

Gradient Descent

The fundamental optimization algorithm in machine learning. Follow the negative gradient to find the minimum.

1
Compute Gradient
Calculate ∇L (gradient of loss function)
2
Update Parameters
θ = θ - α·∇L (move opposite to gradient)
3
Repeat
Until convergence (gradient ≈ 0)
python
Output:
Click "Run Code" to see output

Advanced Concepts

Jacobian Matrix

Matrix of all first-order partial derivatives. Maps how a vector-valued function changes.

J = [∂f₁/∂x₁ ∂f₁/∂x₂]
    [∂f₂/∂x₁ ∂f₂/∂x₂]

Used in: Backpropagation, transformations

Hessian Matrix

Matrix of second-order partial derivatives. Describes curvature of the loss surface.

H = [∂²f/∂x₁² ∂²f/∂x₁∂x₂]
    [∂²f/∂x₂∂x₁ ∂²f/∂x₂²]

Used in: Second-order optimization (Newton's method)

AI Applications

Backpropagation
Chain rule to compute gradients through network layers
Gradient Descent
Optimize model parameters to minimize loss
Learning Rate Scheduling
Adjust step size based on gradient behavior
Momentum
Use gradient history to accelerate convergence
Adam Optimizer
Adaptive learning rates using gradient moments
Loss Functions
Derivatives tell us how to reduce error

Key Takeaway: Calculus enables learning. Every weight update in a neural network is guided by derivatives computed through backpropagation.