Calculus for Machine Learning

The Mathematics of Change and Optimization

Why Calculus?

Calculus is the mathematical foundation of machine learning. It enables us to:

Without calculus, we couldn't train models to learn from data.

Derivatives: Measuring Change

A derivative tells us how a function changes at a specific point. It's the slope of the function at that point.

Formal Definition

The derivative of function $f$ at point $x$ is:

$$f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

This represents the instantaneous rate of change at point $x$.

Simple Example: Quadratic Function

Consider $f(x) = x^2$

The derivative is:

$$f'(x) = 2x$$

At specific points:

  • At $x = 0$: $f'(0) = 0$ (flat point, minimum)
  • At $x = 1$: $f'(1) = 2$ (increasing with slope 2)
  • At $x = 2$: $f'(2) = 4$ (steeper increase)
  • At $x = -1$: $f'(-1) = -2$ (decreasing)

Partial Derivatives: Multiple Variables

In machine learning, we work with multivariate functions - functions with many input variables. A partial derivative shows how the function changes with respect to one variable, keeping others constant.

Notation

For function $f(x, y, z)$, the partial derivatives are:

$$\frac{\partial f}{\partial x}, \quad \frac{\partial f}{\partial y}, \quad \frac{\partial f}{\partial z}$$

Each one shows how $f$ changes along one dimension.

Example: Neural Network Cost Function

A cost function with two weights:

$$J(w_1, w_2) = (w_1 - 3)^2 + (w_2 + 2)^2$$

Partial derivatives:

These tell us: decrease $w_1$ if it's > 3, decrease $w_2$ if it's > -2.

The Gradient: Direction of Steepest Increase

The gradient is a vector containing all partial derivatives. It points in the direction of steepest increase of a function.

Gradient Vector

For function $f(x_1, x_2, ..., x_n)$:

$$\nabla f = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix}$$

This is the gradient vector in n-dimensional space.

Key Properties

The Chain Rule: Computing Nested Derivatives

Neural networks are compositions of functions. The chain rule tells us how to compute derivatives through these nested functions.

Chain Rule Formula

If $y = f(g(x))$, then:

$$\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$$

where $u = g(x)$

Neural Network Example

Consider a 2-layer network:

To update $w^{(1)}$, we need: $\frac{\partial L}{\partial w^{(1)}}$

Using chain rule:

$$\frac{\partial L}{\partial w^{(1)}} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial a^{(1)}} \cdot \frac{\partial a^{(1)}}{\partial z^{(1)}} \cdot \frac{\partial z^{(1)}}{\partial w^{(1)}}$$

This is the essence of backpropagation!

Common Derivatives in ML

These are functions whose derivatives you'll encounter repeatedly:

Function $f(x)$ Derivative $f'(x)$ Used In
$x^n$ $nx^{n-1}$ Power rule, polynomials
$e^x$ $e^x$ Softmax, cross-entropy
$\ln(x)$ $\frac{1}{x}$ Log-likelihood
$\sin(x)$ $\cos(x)$ Positional encoders
$\sigma(x) = \frac{1}{1+e^{-x}}$ $\sigma(x)(1-\sigma(x))$ Sigmoid activation
$\text{ReLU}(x) = \max(0,x)$ $\begin{cases} 0 & x < 0 \\ 1 & x > 0 \end{cases}$ ReLU activation

Optimization: Using Calculus to Train

Machine learning is optimization. We use calculus to find parameters that minimize loss:

Gradient Descent Update Rule

$$w := w - \alpha \nabla J(w)$$

Move parameters opposite to gradient direction with learning rate $\alpha$

Why This Works

Real Example: In neural networks, we compute loss over many samples, take its gradient with respect to all weights, and update all weights simultaneously. This is exactly what happens during backpropagation!

Second Derivatives: Curvature Matters

The second derivative shows how the first derivative changes - it measures curvature.

Second Derivative

$$f''(x) = \frac{d}{dx}\left(\frac{df}{dx}\right)$$

Curvature Interpretation

Advanced Optimization

Advanced optimizers use second derivatives:

Integration (Brief Overview)

While derivatives are central to ML, integration appears in:

Continue Learning

Ready to see these concepts in action? Explore related topics:

Key Takeaways