Calculus for AI

Calculus is the mathematics of change. In AI, we use calculus to optimize models—finding the best parameters that minimize error. Understanding derivatives and gradients is essential for training neural networks.

Why it matters: Every time a neural network learns, it uses calculus to adjust weights. Backpropagation is just the chain rule applied repeatedly.

Core Concepts

Derivative

Rate of change of a function. How much does output change when input changes slightly?

f(x) = x²
f'(x) = 2x

Used for: Finding slopes, optimization

Partial Derivative

Derivative with respect to one variable, holding others constant.

f(x,y) = x² + y²
∂f/∂x = 2x

Used for: Multi-variable optimization

Gradient

Vector of all partial derivatives. Points in direction of steepest increase.

∇f = [∂f/∂x₁, ∂f/∂x₂, ...]

Used for: Gradient descent, backpropagation

Chain Rule

Derivative of composite functions. Essential for backpropagation.

d/dx[f(g(x))] = f'(g(x))·g'(x)

Used for: Computing gradients through layers

Gradient Descent

The fundamental optimization algorithm in machine learning. Follow the negative gradient to find the minimum.

Compute Gradient

Calculate ∇L (gradient of loss function)

Update Parameters

θ = θ - α·∇L (move opposite to gradient)

Repeat

Until convergence (gradient ≈ 0)

python

Output:

Click "Run Code" to see output

Advanced Concepts

Jacobian Matrix

Matrix of all first-order partial derivatives. Maps how a vector-valued function changes.

J = [∂f₁/∂x₁ ∂f₁/∂x₂]
[∂f₂/∂x₁ ∂f₂/∂x₂]

Used in: Backpropagation, transformations

Hessian Matrix

Matrix of second-order partial derivatives. Describes curvature of the loss surface.

H = [∂²f/∂x₁² ∂²f/∂x₁∂x₂]
[∂²f/∂x₂∂x₁ ∂²f/∂x₂²]

Used in: Second-order optimization (Newton's method)

AI Applications

Backpropagation

Chain rule to compute gradients through network layers

Gradient Descent

Optimize model parameters to minimize loss

Learning Rate Scheduling

Adjust step size based on gradient behavior

Momentum

Use gradient history to accelerate convergence

Adam Optimizer

Adaptive learning rates using gradient moments

Loss Functions

Derivatives tell us how to reduce error

Key Takeaway: Calculus enables learning. Every weight update in a neural network is guided by derivatives computed through backpropagation.