The Perceptron
The First Artificial Neuron
The Perceptron: The Beginning
The perceptron (1958) was the first artificial neuron. Though simple, it's the foundation of modern neural networks and demonstrates core ML principles.
Historical Significance
Frank Rosenblatt's perceptron proved that a machine could "learn" to classify patterns. This sparked decades of neural network research.
The Perceptron Model
A perceptron takes $n$ inputs, applies weights, sums them, and passes through an activation function:
Linear combination:
$$z = w_1 x_1 + w_2 x_2 + ... + w_n x_n + b$$
Activation (step function):
$$\hat{y} = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{if } z < 0 \end{cases}$$
The Perceptron Learning Rule
The perceptron learns by adjusting weights when it makes mistakes:
If prediction is wrong, update:
$$w_i := w_i + \alpha (y - \hat{y}) x_i$$
$$b := b + \alpha (y - \hat{y})$$
Where $\alpha$ is the learning rate, $y$ is the true label, $\hat{y}$ is the prediction
Key insight: Only update when wrong. The error signal $(y - \hat{y})$ drives learning.
Convergence: The Perceptron Theorem
Perceptron Convergence Theorem: If data is linearly separable, the perceptron will find a separating hyperplane in finite steps.
- ✅ Works: When classes are separable by a line (2D) or hyperplane (nD)
- ❌ Fails: When classes overlap or aren't linearly separable
Limitations
- Binary classification only in basic form
- Linear decision boundary - can't learn XOR
- Step function - non-differentiable, hard to optimize
- No probability - just 0 or 1
From Perceptron to Modern Networks
Modern neural networks extend the perceptron:
- 💡 Smooth activation: Use sigmoid/ReLU instead of step function → differentiable
- 🏗️ Multiple layers: Stack perceptrons to learn non-linear patterns
- 📊 Probabilistic: Output probabilities instead of hard decisions
- 🔄 Better learning: Gradient descent instead of simple error correction
Learn More
→ Neural Network Architecture • Activation Functions • Backpropagation