Topic 3: Attention Mechanisms & Transformers¶

🎯 Research Topic Guide

Complete resource guide for Attention & Transformers research

📚 What to Learn¶

Core Concepts¶

Attention mechanism: How attention works
Self-attention: Attention within sequences
Multi-head attention: Multiple attention heads
Transformer architecture: Encoder-decoder structure
Vision Transformers: Transformers for images
Efficient Transformers: Optimized variants

Key Skills¶

Understanding attention mechanisms
Implementing transformers
Vision transformer architectures
Efficient transformer methods
Pre-training and fine-tuning transformers

Learning Path

Start with attention basics, then Transformer, then Vision Transformers, finally efficient variants.

📖 Survey Papers (Start Here!)¶

📋 Essential Survey Papers

"Attention Mechanisms in Computer Vision: A Survey" (2022)
- Authors: Guo, Han, Cheng, et al.
- Link: https://arxiv.org/abs/2111.07624
- Why: Comprehensive vision attention survey
- Difficulty: Intermediate
"Efficient Transformers: A Survey" (2020)
- Authors: Tay, Dehghani, Bahri, Metzler
- Link: https://arxiv.org/abs/2009.06732
- Why: Survey of efficient transformer variants
- Difficulty: Intermediate
"A Survey of Transformers" (2022)
- Authors: Lin, Wang, Liu, et al.
- Link: https://arxiv.org/abs/2106.04554
- Why: Comprehensive transformer survey
- Difficulty: Intermediate

Start with Surveys

Surveys help understand the landscape before diving into specific architectures.

🏛️ Classic Papers (Must Read)¶

⭐ Foundational Papers

"Attention Is All You Need" (2017) - Transformer
- Authors: Vaswani, Shazeer, Parmar, et al.
- Link: https://arxiv.org/abs/1706.03762
- Code: https://github.com/tensorflow/tensor2tensor
- Impact: Started transformer revolution
- Difficulty: Medium (but essential)
"Squeeze-and-Excitation Networks" (2017)
- Authors: Hu, Shen, Sun
- Link: https://arxiv.org/abs/1709.01507
- Code: https://github.com/hujie-frank/SENet
- Impact: Popular attention in CNNs
- Difficulty: Easy-Medium
"CBAM: Convolutional Block Attention Module" (2018)
- Authors: Woo, Park, Lee, Kweon
- Link: https://arxiv.org/abs/1807.06521
- Code: https://github.com/Jongchan/attention-module
- Impact: Easy to add to any CNN
- Difficulty: Easy

🚀 Modern Papers (Recent & Important)¶

🔥 Recent Important Papers

"An Image is Worth 16x16 Words: Transformers for Image Recognition" (2020) - ViT
- Authors: Dosovitskiy, Beyer, Kolesnikov, et al.
- Link: https://arxiv.org/abs/2010.11929
- Code: https://github.com/google-research/vision_transformer
- Venue: ICLR 2021
- Difficulty: Medium
"Swin Transformer: Hierarchical Vision Transformer" (2021)
- Authors: Liu, Lin, Cao, et al.
- Link: https://arxiv.org/abs/2103.14030
- Code: https://github.com/microsoft/Swin-Transformer
- Venue: ICCV 2021
- Difficulty: Medium
"EfficientNetV2: Smaller Models and Faster Training" (2021)
- Authors: Tan, Le
- Link: https://arxiv.org/abs/2104.00298
- Code: https://github.com/google/automl/tree/master/efficientnetv2
- Venue: ICML 2021
- Difficulty: Medium
"MobileViT: Light-weight Vision Transformer" (2021)
- Authors: Mehta, Rastegari
- Link: https://arxiv.org/abs/2110.02178
- Code: https://github.com/apple/ml-cvnets
- Venue: ICLR 2022
- Difficulty: Medium

📝 Tutorial Papers (Beginner-Friendly)¶

🎓 Tutorial & Educational Papers

"The Illustrated Transformer" - Jay Alammar
- Link: http://jalammar.github.io/illustrated-transformer/
- Why: Best visual explanation of transformers
- Difficulty: Beginner-friendly
"The Annotated Transformer" - Harvard NLP
- Link: http://nlp.seas.harvard.edu/annotated-transformer/
- Why: Code + explanation
- Difficulty: Intermediate
"Vision Transformer Explained" - Papers With Code
- Link: https://paperswithcode.com/method/vision-transformer
- Why: Clear explanation with code
- Difficulty: Beginner-friendly
"PyTorch Transformer Tutorial"
- Link: https://pytorch.org/tutorials/beginner/transformer_tutorial.html
- Why: Official PyTorch tutorial
- Difficulty: Intermediate

💻 Code Implementation Papers¶

🔧 Papers with Excellent Code

"Transformer" (Original)
- Code: https://github.com/tensorflow/tensor2tensor
- Framework: TensorFlow
- Quality: Official
"Vision Transformer (ViT)"
- Code: https://github.com/google-research/vision_transformer
- Framework: JAX/Flax
- Quality: Official, well-documented
"Swin Transformer"
- Code: https://github.com/microsoft/Swin-Transformer
- Framework: PyTorch
- Quality: Official
"Hugging Face Transformers"
- Code: https://github.com/huggingface/transformers
- Framework: PyTorch/TensorFlow
- Quality: Industry standard, many models
"CBAM" (Easy to implement)
- Code: https://github.com/Jongchan/attention-module
- Framework: PyTorch
- Quality: Simple, well-documented
"SE-Net"
- Code: https://github.com/hujie-frank/SENet
- Framework: PyTorch
- Quality: Official

Start with CBAM or SE-Net

These are easiest to understand and implement. Then move to full transformers.

📊 Where to Track Papers¶

Paper Discovery Platforms¶

🔍 Paper Discovery

Papers With Code - Transformers
- URL: https://paperswithcode.com/method/transformer
- Features: Papers with code, leaderboards
- Best for: Finding implementations
Papers With Code - Vision Transformer
- URL: https://paperswithcode.com/method/vision-transformer
- Features: ViT papers and code
- Best for: Vision transformer research
arXiv - Machine Learning
- URL: https://arxiv.org/list/cs.LG/recent
- Search: "transformer" OR "attention"
- Best for: Latest papers
Google Scholar
- Search: "transformer" OR "attention mechanism"
- Best for: Comprehensive search
Connected Papers
- Start with: "Attention Is All You Need"
- Best for: Exploring transformer area

Conference Proceedings¶

📅 Top Venues

NeurIPS (December)
URL: https://papers.nips.cc/
Search: Transformer, attention
Best for: Latest transformer research
ICLR (May)
URL: https://openreview.net/group?id=ICLR.cc
Search: Transformer, vision transformer
Best for: Vision transformers
CVPR (June)
URL: https://openaccess.thecvf.com/CVPR
Search: Vision transformer, attention
Best for: Vision applications
ICML (July)
URL: https://proceedings.mlr.press/
Search: Transformer, efficient transformer
Best for: Efficient methods

📥 How to Get Papers¶

Free Access Methods¶

🆓 Free Access

arXiv - All papers free
Most transformer papers on arXiv
Direct PDF download
OpenReview - ICLR papers
All ICLR papers free
Includes reviews
CVF Open Access - CVPR/ICCV
All papers free
Direct PDF links
Google Scholar - PDF links
Check "All versions" for free PDFs
Author Websites
- Many authors post PDFs
- Check personal pages

Getting Papers

Most transformer papers are on arXiv (free). Very accessible.

📚 Learning Resources¶

Courses & Tutorials¶

🎓 Courses

CS224n - Stanford NLP
- URL: https://web.stanford.edu/class/cs224n/
- Focus: Transformers for NLP
- Level: Intermediate
CS231n - Stanford Vision
- URL: https://cs231n.stanford.edu/
- Focus: Vision transformers
- Level: Intermediate
Hugging Face Course
- URL: https://huggingface.co/course
- Focus: Transformers practical
- Level: Beginner to Intermediate

Blogs & Articles¶

📰 Blogs

Jay Alammar's Blog
- URL: http://jalammar.github.io/
- Focus: Transformers, BERT, GPT explained
- Why: Best visual explanations
Lil'Log by Lilian Weng
- URL: https://lilianweng.github.io/
- Focus: Attention, transformers summaries
The Gradient
- URL: https://thegradient.pub/
- Focus: Transformer research articles

🎯 Reading Strategy¶

Week 1: Attention Basics¶

Read "Illustrated Transformer" (tutorial)
Read SE-Net paper (easy)
Read CBAM paper (easy)
Implement CBAM or SE-Net
Read "Attention Is All You Need" (foundational)

Week 2: Vision Transformers¶

Read ViT paper
Read Swin Transformer paper
Try Hugging Face ViT
Implement simple ViT

Week 3: Efficient Transformers¶

Read EfficientNetV2
Read MobileViT
Read efficient transformer survey
Compare efficiency methods

Week 4: Recent Work¶

Follow arXiv for latest
Read 2-3 recent papers
Implement one method
Write summary

Reading Plan

Start with tutorials and easy papers (SE-Net, CBAM), then move to full transformers.

🔔 Stay Updated¶

RSS Feeds & Alerts¶

📡 Alerts

arXiv RSS Feed
- URL: https://arxiv.org/list/cs.LG/recent
- Search: "transformer" OR "attention"
- Check: Daily
Google Scholar Alerts
- Setup: Alert for "vision transformer"
- Frequency: Weekly
Papers With Code Newsletter
- URL: https://paperswithcode.com/newsletter
- Frequency: Weekly

📱 Social Tracking

Twitter/X
- Follow: @paperswithcode, @huggingface
- Hashtag: #Transformers, #VisionTransformer
Reddit
- r/MachineLearning
- Search: Transformer, attention

📋 To-Do Checklist¶

Beginner Level¶

Read "Illustrated Transformer" tutorial
Read SE-Net paper (easy)
Read CBAM paper (easy)
Implement CBAM or SE-Net
Read "Attention Is All You Need"

Intermediate Level¶

Advanced Level¶

🔗 Quick Links¶

Papers With Code: https://paperswithcode.com/method/transformer
Hugging Face: https://huggingface.co/
Illustrated Transformer: http://jalammar.github.io/illustrated-transformer/
ViT Code: https://github.com/google-research/vision_transformer
Swin Transformer: https://github.com/microsoft/Swin-Transformer

Next Steps: Start with "Illustrated Transformer" tutorial, then read SE-Net/CBAM (easy), then full Transformer paper.