Regularization Techniques

Preventing Overfitting

Overfitting: The Problem

Models that fit training data too well often perform poorly on new data. Regularization techniques prevent this by adding constraints or noise during training.

The Overfitting Problem

Underfitting: High bias, high training loss | Overfitting: Low training loss, high test loss | Just right: Low training + test loss

L1 Regularization (Lasso)

Adds penalty proportional to absolute value of weights to loss function:

$$L_{L1} = L_{original} + \lambda \sum_{i} |w_i|$$

$\lambda$ controls regularization strength. Higher = more regularization.

L2 Regularization (Ridge)

Adds penalty proportional to square of weights:

$$L_{L2} = L_{original} + \lambda \sum_{i} w_i^2$$

Elastic Net

Combines L1 and L2:

$$L = L_{original} + \lambda_1 \sum_i |w_i| + \lambda_2 \sum_i w_i^2$$

Gets benefits of both: feature selection from L1, smoothness from L2.

Dropout

Randomly disable neurons during training. Forces network to learn redundant representations.

How It Works

During training: Randomly set some neuron outputs to 0 with probability $p$ (e.g., 50%)

During inference: Use all neurons (but scale by $(1-p)$ to correct expected values)

Early Stopping

Stop training when validation loss stops improving.

Standard practice: Always monitor validation loss and use early stopping.

Data Augmentation

Create more training examples through random transformations: crop, flip, rotate, zoom, add noise.

Batch Normalization as Regularization

Batch normalization has regularization effect by:

When to Use What

💡 Key insight: Regularization trades training performance for better test performance. There's no free lunch - reducing overfitting typically means accepting higher training loss.