Neural Network Architecture

Building Blocks of Deep Learning

1. The Artificial Neuron

A single neuron computes a weighted sum of inputs, adds a bias, and applies an activation function.

Computation: $$ y = f\left(\sum_{i=1}^n w_i x_i + b\right) = f(\mathbf{w}^T\mathbf{x} + b) $$ where:
  • \( x_i \): Input features
  • \( w_i \): Weights (learned parameters)
  • \( b \): Bias (learned parameter)
  • \( f \): Activation function (e.g., ReLU, sigmoid)

2. Layer Types

Input Layer

Receives raw data. Size = number of features. No computation, just passes data forward.

Hidden Layers

Extract features and learn representations. More layers = deeper network = more complex patterns.

Output Layer

Produces final prediction. Size and activation depend on task (regression vs classification).

3. Universal Approximation Theorem

🎯 Key Insight: A neural network with just one hidden layer and enough neurons can approximate any continuous function to arbitrary accuracy!

However, deeper networks often learn better representations with fewer neurons.
For any continuous function \( g: \mathbb{R}^n \to \mathbb{R}^m \) and \( \epsilon > 0 \), there exists a neural network \( f \) such that: $$ |f(x) - g(x)| < \epsilon \quad \forall x $$

4. Interactive Network Builder

Design your own neural network architecture. Click neurons to see activation flow.

Network Statistics:

5. Depth vs Width

Aspect Deeper Networks Wider Networks
Learning Hierarchical features More capacity per layer
Parameters Fewer (more efficient) More parameters
Training Harder (vanishing gradients) Easier to train
Use Case Complex tasks (vision, NLP) Simpler structured data

6. Common Architectures

Feedforward Neural Network (FNN)

Standard architecture: Input β†’ Hidden₁ β†’ Hiddenβ‚‚ β†’ ... β†’ Output. No cycles.

Convolutional Neural Network (CNN)

For images. Uses convolution layers to extract spatial features.

Typical: Conv β†’ ReLU β†’ Pool β†’ Conv β†’ ReLU β†’ Pool β†’ Flatten β†’ Dense β†’ Output

Recurrent Neural Network (RNN)

For sequences. Has loops - hidden state is fed back as input at next timestep.

Transformer

Modern architecture for NLP. Uses self-attention instead of recurrence.

7. How Many Layers? How Many Neurons?

Rules of Thumb:
  • Start simple: Try 1-2 hidden layers first
  • Hidden neurons: Between input and output size, often 2/3 of input + output
  • Add layers if validation accuracy improves and you have enough data
  • More data = can use larger networks without overfitting
  • Use regularization (dropout, L2) if overfitting occurs