Linear Algebra for Machine Learning

Vectors, Matrices, and Transformations

1. Vectors: Direction and Magnitude

A vector represents both a direction and a magnitude. In machine learning, vectors represent features, weights, and gradients.

\vec{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} $$ Magnitude (L2 Norm): $$ ||\vec{v}|| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}

2. Dot Product: Measuring Similarity

The dot product tells us how much two vectors point in the same direction.

\vec{a} \cdot \vec{b} = a_1b_1 + a_2b_2 + \cdots + a_nb_n = ||\vec{a}|| \, ||\vec{b}|| \cos\theta

Key Properties:

If \( \theta = 0° \): vectors point the same way, \( \cos\theta = 1 \), max dot product
If \( \theta = 90° \): vectors are orthogonal (perpendicular), \( \cos\theta = 0 \), dot product = 0
If \( \theta = 180° \): vectors point opposite ways, \( \cos\theta = -1 \), min dot product

💡 In ML: Dot products are everywhere! Attention mechanisms, cosine similarity, neural network layers (matrix multiplication is repeated dot products).

3. Matrix Multiplication: Linear Transformations

Multiplying by a matrix transforms a vector. It can rotate, scale, shear, or project it.

$$ \vec{y} = A\vec{x} $$ where \( A \) is an \( m \times n \) matrix and \( \vec{x} \) is an \( n \times 1 \) vector.

Example: Rotation matrix (rotate by angle \( \theta \))

R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}

4. Eigenvalues and Eigenvectors

For a matrix \( A \), if there exists a non-zero vector \( \vec{v} \) such that:

A\vec{v} = \lambda \vec{v}

Then \( \vec{v} \) is an eigenvector and \( \lambda \) is an eigenvalue. This means \( A \) only scales \( \vec{v} \) (doesn't rotate it).

🔍 Why important? PCA (dimensionality reduction) finds eigenvectors of the covariance matrix. They point in directions of maximum variance.

5. Interactive: Matrix Transformations

Adjust the matrix elements to see how different matrices transform the 2D space.

Matrix A[0,0]: 1.0

Matrix A[0,1]: 0.0

Matrix A[1,0]: 0.0

Matrix A[1,1]: 1.0

Current Transformation Matrix:

Determinant: 1.00

6. Special Matrices

Identity Matrix (I)

Leaves vectors unchanged: \( I\vec{v} = \vec{v} \). Diagonal of 1s, rest 0s.

Diagonal Matrix

Only scales along axes. Off-diagonal = 0. Fast to compute and invert.

Orthogonal Matrix

Preserves lengths and angles. Columns are orthonormal. \( Q^TQ = I \).

Symmetric Matrix

\( A = A^T \). Real eigenvalues, orthogonal eigenvectors. Common in ML (covariance).

7. Matrix Decompositions

Singular Value Decomposition (SVD)

A = U\Sigma V^T

Any matrix can be decomposed into rotation (\( U \)), scaling (\( \Sigma \)), rotation (\( V^T \)). Used in: PCA, recommender systems, image compression.

Eigendecomposition

A = Q\Lambda Q^{-1}

For symmetric matrices, \( Q \) contains eigenvectors (columns), \( \Lambda \) is diagonal (eigenvalues). Used in: PCA, spectral clustering, understanding system dynamics.