Neural Network Architecture

Building Blocks of Deep Learning

1. The Artificial Neuron

A single neuron computes a weighted sum of inputs, adds a bias, and applies an activation function.

Computation: $$ y = f\left(\sum_{i=1}^n w_i x_i + b\right) = f(\mathbf{w}^T\mathbf{x} + b) $$ where: \( x_i \): Input features \( w_i \): Weights (learned parameters) \( b \): Bias (learned parameter) \( f \): Activation function (e.g., ReLU, sigmoid)

2. Layer Types

Input Layer

Receives raw data. Size = number of features. No computation, just passes data forward.

Hidden Layers

Extract features and learn representations. More layers = deeper network = more complex patterns.

Output Layer

Produces final prediction. Size and activation depend on task (regression vs classification).

3. Universal Approximation Theorem

🎯 Key Insight: A neural network with just one hidden layer and enough neurons can approximate any continuous function to arbitrary accuracy!

However, deeper networks often learn better representations with fewer neurons.

For any continuous function \( g: \mathbb{R}^n \to \mathbb{R}^m \) and \( \epsilon > 0 \), there exists a neural network \( f \) such that: $$ |f(x) - g(x)| < \epsilon \quad \forall x $$

4. Interactive Network Builder

Design your own neural network architecture. Click neurons to see activation flow.

Hidden Layer 1 Neurons: 4

Hidden Layer 2 Neurons: 3

Output Neurons: 2

Network Statistics:

5. Depth vs Width

Aspect	Deeper Networks	Wider Networks
Learning	Hierarchical features	More capacity per layer
Parameters	Fewer (more efficient)	More parameters
Training	Harder (vanishing gradients)	Easier to train
Use Case	Complex tasks (vision, NLP)	Simpler structured data

6. Common Architectures

Feedforward Neural Network (FNN)

Standard architecture: Input → Hidden₁ → Hidden₂ → ... → Output. No cycles.

Convolutional Neural Network (CNN)

For images. Uses convolution layers to extract spatial features.

Typical: Conv \to ReLU \to Pool \to Conv \to ReLU \to Pool \to Flatten \to Dense \to Output

Recurrent Neural Network (RNN)

For sequences. Has loops - hidden state is fed back as input at next timestep.

Transformer

Modern architecture for NLP. Uses self-attention instead of recurrence.

7. How Many Layers? How Many Neurons?

Rules of Thumb:

Start simple: Try 1-2 hidden layers first
Hidden neurons: Between input and output size, often 2/3 of input + output
Add layers if validation accuracy improves and you have enough data
More data = can use larger networks without overfitting
Use regularization (dropout, L2) if overfitting occurs