Topic 3: Attention Mechanisms & Transformers¶
🎯 Research Topic Guide
Complete resource guide for Attention & Transformers research
📚 What to Learn¶
Core Concepts¶
- Attention mechanism: How attention works
- Self-attention: Attention within sequences
- Multi-head attention: Multiple attention heads
- Transformer architecture: Encoder-decoder structure
- Vision Transformers: Transformers for images
- Efficient Transformers: Optimized variants
Key Skills¶
- Understanding attention mechanisms
- Implementing transformers
- Vision transformer architectures
- Efficient transformer methods
- Pre-training and fine-tuning transformers
Learning Path
Start with attention basics, then Transformer, then Vision Transformers, finally efficient variants.
📖 Survey Papers (Start Here!)¶
📋 Essential Survey Papers
-
"Attention Mechanisms in Computer Vision: A Survey" (2022)
- Authors: Guo, Han, Cheng, et al.
- Link: https://arxiv.org/abs/2111.07624
- Why: Comprehensive vision attention survey
- Difficulty: Intermediate
-
"Efficient Transformers: A Survey" (2020)
- Authors: Tay, Dehghani, Bahri, Metzler
- Link: https://arxiv.org/abs/2009.06732
- Why: Survey of efficient transformer variants
- Difficulty: Intermediate
-
"A Survey of Transformers" (2022)
- Authors: Lin, Wang, Liu, et al.
- Link: https://arxiv.org/abs/2106.04554
- Why: Comprehensive transformer survey
- Difficulty: Intermediate
Start with Surveys
Surveys help understand the landscape before diving into specific architectures.
🏛️ Classic Papers (Must Read)¶
⭐ Foundational Papers
-
"Attention Is All You Need" (2017) - Transformer
- Authors: Vaswani, Shazeer, Parmar, et al.
- Link: https://arxiv.org/abs/1706.03762
- Code: https://github.com/tensorflow/tensor2tensor
- Impact: Started transformer revolution
- Difficulty: Medium (but essential)
-
"Squeeze-and-Excitation Networks" (2017)
- Authors: Hu, Shen, Sun
- Link: https://arxiv.org/abs/1709.01507
- Code: https://github.com/hujie-frank/SENet
- Impact: Popular attention in CNNs
- Difficulty: Easy-Medium
-
"CBAM: Convolutional Block Attention Module" (2018)
- Authors: Woo, Park, Lee, Kweon
- Link: https://arxiv.org/abs/1807.06521
- Code: https://github.com/Jongchan/attention-module
- Impact: Easy to add to any CNN
- Difficulty: Easy
🚀 Modern Papers (Recent & Important)¶
🔥 Recent Important Papers
-
"An Image is Worth 16x16 Words: Transformers for Image Recognition" (2020) - ViT
- Authors: Dosovitskiy, Beyer, Kolesnikov, et al.
- Link: https://arxiv.org/abs/2010.11929
- Code: https://github.com/google-research/vision_transformer
- Venue: ICLR 2021
- Difficulty: Medium
-
"Swin Transformer: Hierarchical Vision Transformer" (2021)
- Authors: Liu, Lin, Cao, et al.
- Link: https://arxiv.org/abs/2103.14030
- Code: https://github.com/microsoft/Swin-Transformer
- Venue: ICCV 2021
- Difficulty: Medium
-
"EfficientNetV2: Smaller Models and Faster Training" (2021)
- Authors: Tan, Le
- Link: https://arxiv.org/abs/2104.00298
- Code: https://github.com/google/automl/tree/master/efficientnetv2
- Venue: ICML 2021
- Difficulty: Medium
-
"MobileViT: Light-weight Vision Transformer" (2021)
- Authors: Mehta, Rastegari
- Link: https://arxiv.org/abs/2110.02178
- Code: https://github.com/apple/ml-cvnets
- Venue: ICLR 2022
- Difficulty: Medium
📝 Tutorial Papers (Beginner-Friendly)¶
🎓 Tutorial & Educational Papers
-
"The Illustrated Transformer" - Jay Alammar
- Link: http://jalammar.github.io/illustrated-transformer/
- Why: Best visual explanation of transformers
- Difficulty: Beginner-friendly
-
"The Annotated Transformer" - Harvard NLP
- Link: http://nlp.seas.harvard.edu/annotated-transformer/
- Why: Code + explanation
- Difficulty: Intermediate
-
"Vision Transformer Explained" - Papers With Code
- Link: https://paperswithcode.com/method/vision-transformer
- Why: Clear explanation with code
- Difficulty: Beginner-friendly
-
"PyTorch Transformer Tutorial"
- Link: https://pytorch.org/tutorials/beginner/transformer_tutorial.html
- Why: Official PyTorch tutorial
- Difficulty: Intermediate
💻 Code Implementation Papers¶
🔧 Papers with Excellent Code
-
"Transformer" (Original)
- Code: https://github.com/tensorflow/tensor2tensor
- Framework: TensorFlow
- Quality: Official
-
"Vision Transformer (ViT)"
- Code: https://github.com/google-research/vision_transformer
- Framework: JAX/Flax
- Quality: Official, well-documented
-
"Swin Transformer"
- Code: https://github.com/microsoft/Swin-Transformer
- Framework: PyTorch
- Quality: Official
-
"Hugging Face Transformers"
- Code: https://github.com/huggingface/transformers
- Framework: PyTorch/TensorFlow
- Quality: Industry standard, many models
-
"CBAM" (Easy to implement)
- Code: https://github.com/Jongchan/attention-module
- Framework: PyTorch
- Quality: Simple, well-documented
-
"SE-Net"
- Code: https://github.com/hujie-frank/SENet
- Framework: PyTorch
- Quality: Official
Start with CBAM or SE-Net
These are easiest to understand and implement. Then move to full transformers.
📊 Where to Track Papers¶
Paper Discovery Platforms¶
🔍 Paper Discovery
-
Papers With Code - Transformers
- URL: https://paperswithcode.com/method/transformer
- Features: Papers with code, leaderboards
- Best for: Finding implementations
-
Papers With Code - Vision Transformer
- URL: https://paperswithcode.com/method/vision-transformer
- Features: ViT papers and code
- Best for: Vision transformer research
-
arXiv - Machine Learning
- URL: https://arxiv.org/list/cs.LG/recent
- Search: "transformer" OR "attention"
- Best for: Latest papers
-
Google Scholar
- Search: "transformer" OR "attention mechanism"
- Best for: Comprehensive search
-
Connected Papers
- Start with: "Attention Is All You Need"
- Best for: Exploring transformer area
Conference Proceedings¶
📅 Top Venues
- NeurIPS (December)
- URL: https://papers.nips.cc/
- Search: Transformer, attention
-
Best for: Latest transformer research
-
ICLR (May)
- URL: https://openreview.net/group?id=ICLR.cc
- Search: Transformer, vision transformer
-
Best for: Vision transformers
-
CVPR (June)
- URL: https://openaccess.thecvf.com/CVPR
- Search: Vision transformer, attention
-
Best for: Vision applications
-
ICML (July)
- URL: https://proceedings.mlr.press/
- Search: Transformer, efficient transformer
- Best for: Efficient methods
📥 How to Get Papers¶
Free Access Methods¶
🆓 Free Access
- arXiv - All papers free
- Most transformer papers on arXiv
-
Direct PDF download
-
OpenReview - ICLR papers
- All ICLR papers free
-
Includes reviews
-
CVF Open Access - CVPR/ICCV
- All papers free
-
Direct PDF links
-
Google Scholar - PDF links
-
Check "All versions" for free PDFs
-
Author Websites
- Many authors post PDFs
- Check personal pages
Getting Papers
Most transformer papers are on arXiv (free). Very accessible.
📚 Learning Resources¶
Courses & Tutorials¶
🎓 Courses
-
CS224n - Stanford NLP
- URL: https://web.stanford.edu/class/cs224n/
- Focus: Transformers for NLP
- Level: Intermediate
-
CS231n - Stanford Vision
- URL: https://cs231n.stanford.edu/
- Focus: Vision transformers
- Level: Intermediate
-
Hugging Face Course
- URL: https://huggingface.co/course
- Focus: Transformers practical
- Level: Beginner to Intermediate
Blogs & Articles¶
📰 Blogs
-
Jay Alammar's Blog
- URL: http://jalammar.github.io/
- Focus: Transformers, BERT, GPT explained
- Why: Best visual explanations
-
Lil'Log by Lilian Weng
- URL: https://lilianweng.github.io/
- Focus: Attention, transformers summaries
-
The Gradient
- URL: https://thegradient.pub/
- Focus: Transformer research articles
🎯 Reading Strategy¶
Week 1: Attention Basics¶
- Read "Illustrated Transformer" (tutorial)
- Read SE-Net paper (easy)
- Read CBAM paper (easy)
- Implement CBAM or SE-Net
- Read "Attention Is All You Need" (foundational)
Week 2: Vision Transformers¶
- Read ViT paper
- Read Swin Transformer paper
- Try Hugging Face ViT
- Implement simple ViT
Week 3: Efficient Transformers¶
- Read EfficientNetV2
- Read MobileViT
- Read efficient transformer survey
- Compare efficiency methods
Week 4: Recent Work¶
- Follow arXiv for latest
- Read 2-3 recent papers
- Implement one method
- Write summary
Reading Plan
Start with tutorials and easy papers (SE-Net, CBAM), then move to full transformers.
🔔 Stay Updated¶
RSS Feeds & Alerts¶
📡 Alerts
-
arXiv RSS Feed
- URL: https://arxiv.org/list/cs.LG/recent
- Search: "transformer" OR "attention"
- Check: Daily
-
Google Scholar Alerts
- Setup: Alert for "vision transformer"
- Frequency: Weekly
-
Papers With Code Newsletter
- URL: https://paperswithcode.com/newsletter
- Frequency: Weekly
Social Media¶
📱 Social Tracking
-
Twitter/X
- Follow: @paperswithcode, @huggingface
- Hashtag: #Transformers, #VisionTransformer
-
Reddit
- r/MachineLearning
- Search: Transformer, attention
📋 To-Do Checklist¶
Beginner Level¶
- Read "Illustrated Transformer" tutorial
- Read SE-Net paper (easy)
- Read CBAM paper (easy)
- Implement CBAM or SE-Net
- Read "Attention Is All You Need"
Intermediate Level¶
- Read Vision Transformer (ViT) paper
- Read Swin Transformer paper
- Use Hugging Face transformers
- Implement simple ViT
- Read 5 recent papers
Advanced Level¶
- Read efficient transformer papers
- Implement efficient variants
- Read foundation model papers
- Contribute to open-source
- Write transformer review
🔗 Quick Links¶
- Papers With Code: https://paperswithcode.com/method/transformer
- Hugging Face: https://huggingface.co/
- Illustrated Transformer: http://jalammar.github.io/illustrated-transformer/
- ViT Code: https://github.com/google-research/vision_transformer
- Swin Transformer: https://github.com/microsoft/Swin-Transformer
Next Steps: Start with "Illustrated Transformer" tutorial, then read SE-Net/CBAM (easy), then full Transformer paper.