Dimensionality Reduction

The Curse of Dimensionality

High-dimensional data causes problems: computational cost, overfitting, and hard to visualize. Dimensionality reduction finds low-dimensional representations that preserve important structure.

Why Reduce?

🚀 Speed: Fewer features = faster training
🔍 Visualization: See 10,000D data in 2D
🧠 Remove noise: Keep signal, discard junk
📉 Prevent overfitting: Simpler models generalize better

Principal Component Analysis (PCA)

Goal: Find directions of maximum variance in data.

PCA finds vectors $v_1, v_2, ..., v_k$ that maximize:

$$\text{Var}(Xv_i)$$

Subject to: $v_i \perp v_j$ for $i \neq j$

These are eigenvectors of the covariance matrix, ordered by eigenvalue (variance).

Linear: Works best for linear patterns
Interpretable: Components are linear combinations of original features
Fast: Just eigendecomposition

t-SNE (t-Distributed Stochastic Neighbor Embedding)

Goal: Preserve local structure - nearby points stay nearby.

Unlike PCA, t-SNE is nonlinear and great for visualization.

✅ Excellent for exploratory visualization
✅ Reveals natural clusters
❌ Slower than PCA
❌ Can't reduce to >3D practically
❌ Not great for prediction (non-parametric)

UMAP (Uniform Manifold Approximation and Projection)

Goal: Preserve both local and global structure.

Modern alternative to t-SNE - faster and often better.

✅ Faster than t-SNE
✅ Better global structure
✅ Works for many dimensions
✅ Has theoretical grounding

Comparison

Method	Type	Speed	Best For
PCA	Linear	⚡ Very Fast	Prediction, linear data
t-SNE	Nonlinear	🐢 Slow	Exploratory visualization
UMAP	Nonlinear	⚡ Fast	Visualization + performance

Learn More

→ Linear Algebra • Probability