Perceptron | Math for ML

The Perceptron: The Beginning

The perceptron (1958) was the first artificial neuron. Though simple, it's the foundation of modern neural networks and demonstrates core ML principles.

Historical Significance

Frank Rosenblatt's perceptron proved that a machine could "learn" to classify patterns. This sparked decades of neural network research.

The Perceptron Model

A perceptron takes $n$ inputs, applies weights, sums them, and passes through an activation function:

Linear combination:

$$z = w_1 x_1 + w_2 x_2 + ... + w_n x_n + b$$

Activation (step function):

$$\hat{y} = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{if } z < 0 \end{cases}$$

The Perceptron Learning Rule

The perceptron learns by adjusting weights when it makes mistakes:

If prediction is wrong, update:

$$w_i := w_i + \alpha (y - \hat{y}) x_i$$

$$b := b + \alpha (y - \hat{y})$$

Where $\alpha$ is the learning rate, $y$ is the true label, $\hat{y}$ is the prediction

Key insight: Only update when wrong. The error signal $(y - \hat{y})$ drives learning.

Convergence: The Perceptron Theorem

Perceptron Convergence Theorem: If data is linearly separable, the perceptron will find a separating hyperplane in finite steps.

✅ Works: When classes are separable by a line (2D) or hyperplane (nD)
❌ Fails: When classes overlap or aren't linearly separable

Limitations

Binary classification only in basic form
Linear decision boundary - can't learn XOR
Step function - non-differentiable, hard to optimize
No probability - just 0 or 1

From Perceptron to Modern Networks

Modern neural networks extend the perceptron:

💡 Smooth activation: Use sigmoid/ReLU instead of step function → differentiable
🏗️ Multiple layers: Stack perceptrons to learn non-linear patterns
📊 Probabilistic: Output probabilities instead of hard decisions
🔄 Better learning: Gradient descent instead of simple error correction

Learn More

→ Neural Network Architecture • Activation Functions • Backpropagation