Pooling Layers

Downsampling in CNNs

Pooling: Downsampling in CNNs

Pooling layers reduce spatial dimensions by aggregating information. They appear after convolutional layers to make networks more efficient and robust.

Why Pool?

  • 📉 Reduce parameters: Fewer computations
  • 🎯 Invariance: Small shifts don't change features
  • 🔍 Zoom out: Capture larger patterns
  • Faster: Fewer values to process

Max Pooling

Takes the maximum value in a local window. Most common pooling operation.

Example: 2×2 Max Pooling with stride 2

Input (4×4):
1 2 | 5 6
3 4 | 7 8
----+----
9 10| 13 14
11 12| 15 16

Output (2×2):
4 8
12 16

Each 2×2 window → keep max value. "Wins" at positions with strongest features.

Average Pooling

Takes the average of values in a window. Smoother but less commonly used.

Parameters

Effect on Dimensions

For a feature map of size $H \times W$, with pool size $P$ and stride $S$:

$$H_{out} = \left\lfloor \frac{H - P}{S} \right\rfloor + 1$$ $$W_{out} = \left\lfloor \frac{W - P}{S} \right\rfloor + 1$$

Example: 28×28 input, 2×2 pooling, stride 2 → 14×14 output

Typical Architecture

In modern CNNs, pooling alternates with convolution:

Pattern: Each pooling halves spatial dimensions, allowing deeper layers to see larger context.

Modern Trends

Learn More

→ Convolution OperationNeural Architecture