Neural Network Architecture
Building Blocks of Deep Learning
1. The Artificial Neuron
A single neuron computes a weighted sum of inputs, adds a bias, and applies an activation function.
- \( x_i \): Input features
- \( w_i \): Weights (learned parameters)
- \( b \): Bias (learned parameter)
- \( f \): Activation function (e.g., ReLU, sigmoid)
2. Layer Types
Input Layer
Receives raw data. Size = number of features. No computation, just passes data forward.
Hidden Layers
Extract features and learn representations. More layers = deeper network = more complex patterns.
Output Layer
Produces final prediction. Size and activation depend on task (regression vs classification).
3. Universal Approximation Theorem
However, deeper networks often learn better representations with fewer neurons.
4. Interactive Network Builder
Design your own neural network architecture. Click neurons to see activation flow.
Network Statistics:
5. Depth vs Width
| Aspect | Deeper Networks | Wider Networks |
|---|---|---|
| Learning | Hierarchical features | More capacity per layer |
| Parameters | Fewer (more efficient) | More parameters |
| Training | Harder (vanishing gradients) | Easier to train |
| Use Case | Complex tasks (vision, NLP) | Simpler structured data |
6. Common Architectures
Feedforward Neural Network (FNN)
Standard architecture: Input β Hiddenβ β Hiddenβ β ... β Output. No cycles.
Convolutional Neural Network (CNN)
For images. Uses convolution layers to extract spatial features.
Recurrent Neural Network (RNN)
For sequences. Has loops - hidden state is fed back as input at next timestep.
Transformer
Modern architecture for NLP. Uses self-attention instead of recurrence.
7. How Many Layers? How Many Neurons?
- Start simple: Try 1-2 hidden layers first
- Hidden neurons: Between input and output size, often 2/3 of input + output
- Add layers if validation accuracy improves and you have enough data
- More data = can use larger networks without overfitting
- Use regularization (dropout, L2) if overfitting occurs