Skip to content

Topic 3: Attention Mechanisms & Transformers

🎯 Research Topic Guide

Complete resource guide for Attention & Transformers research

📚 What to Learn

Core Concepts

  • Attention mechanism: How attention works
  • Self-attention: Attention within sequences
  • Multi-head attention: Multiple attention heads
  • Transformer architecture: Encoder-decoder structure
  • Vision Transformers: Transformers for images
  • Efficient Transformers: Optimized variants

Key Skills

  • Understanding attention mechanisms
  • Implementing transformers
  • Vision transformer architectures
  • Efficient transformer methods
  • Pre-training and fine-tuning transformers

Learning Path

Start with attention basics, then Transformer, then Vision Transformers, finally efficient variants.

📖 Survey Papers (Start Here!)

📋 Essential Survey Papers
  1. "Attention Mechanisms in Computer Vision: A Survey" (2022)

    • Authors: Guo, Han, Cheng, et al.
    • Link: https://arxiv.org/abs/2111.07624
    • Why: Comprehensive vision attention survey
    • Difficulty: Intermediate
  2. "Efficient Transformers: A Survey" (2020)

    • Authors: Tay, Dehghani, Bahri, Metzler
    • Link: https://arxiv.org/abs/2009.06732
    • Why: Survey of efficient transformer variants
    • Difficulty: Intermediate
  3. "A Survey of Transformers" (2022)

    • Authors: Lin, Wang, Liu, et al.
    • Link: https://arxiv.org/abs/2106.04554
    • Why: Comprehensive transformer survey
    • Difficulty: Intermediate

Start with Surveys

Surveys help understand the landscape before diving into specific architectures.

🏛️ Classic Papers (Must Read)

⭐ Foundational Papers
  1. "Attention Is All You Need" (2017) - Transformer

    • Authors: Vaswani, Shazeer, Parmar, et al.
    • Link: https://arxiv.org/abs/1706.03762
    • Code: https://github.com/tensorflow/tensor2tensor
    • Impact: Started transformer revolution
    • Difficulty: Medium (but essential)
  2. "Squeeze-and-Excitation Networks" (2017)

    • Authors: Hu, Shen, Sun
    • Link: https://arxiv.org/abs/1709.01507
    • Code: https://github.com/hujie-frank/SENet
    • Impact: Popular attention in CNNs
    • Difficulty: Easy-Medium
  3. "CBAM: Convolutional Block Attention Module" (2018)

    • Authors: Woo, Park, Lee, Kweon
    • Link: https://arxiv.org/abs/1807.06521
    • Code: https://github.com/Jongchan/attention-module
    • Impact: Easy to add to any CNN
    • Difficulty: Easy

🚀 Modern Papers (Recent & Important)

🔥 Recent Important Papers
  1. "An Image is Worth 16x16 Words: Transformers for Image Recognition" (2020) - ViT

    • Authors: Dosovitskiy, Beyer, Kolesnikov, et al.
    • Link: https://arxiv.org/abs/2010.11929
    • Code: https://github.com/google-research/vision_transformer
    • Venue: ICLR 2021
    • Difficulty: Medium
  2. "Swin Transformer: Hierarchical Vision Transformer" (2021)

    • Authors: Liu, Lin, Cao, et al.
    • Link: https://arxiv.org/abs/2103.14030
    • Code: https://github.com/microsoft/Swin-Transformer
    • Venue: ICCV 2021
    • Difficulty: Medium
  3. "EfficientNetV2: Smaller Models and Faster Training" (2021)

    • Authors: Tan, Le
    • Link: https://arxiv.org/abs/2104.00298
    • Code: https://github.com/google/automl/tree/master/efficientnetv2
    • Venue: ICML 2021
    • Difficulty: Medium
  4. "MobileViT: Light-weight Vision Transformer" (2021)

    • Authors: Mehta, Rastegari
    • Link: https://arxiv.org/abs/2110.02178
    • Code: https://github.com/apple/ml-cvnets
    • Venue: ICLR 2022
    • Difficulty: Medium

📝 Tutorial Papers (Beginner-Friendly)

🎓 Tutorial & Educational Papers
  1. "The Illustrated Transformer" - Jay Alammar

    • Link: http://jalammar.github.io/illustrated-transformer/
    • Why: Best visual explanation of transformers
    • Difficulty: Beginner-friendly
  2. "The Annotated Transformer" - Harvard NLP

    • Link: http://nlp.seas.harvard.edu/annotated-transformer/
    • Why: Code + explanation
    • Difficulty: Intermediate
  3. "Vision Transformer Explained" - Papers With Code

    • Link: https://paperswithcode.com/method/vision-transformer
    • Why: Clear explanation with code
    • Difficulty: Beginner-friendly
  4. "PyTorch Transformer Tutorial"

    • Link: https://pytorch.org/tutorials/beginner/transformer_tutorial.html
    • Why: Official PyTorch tutorial
    • Difficulty: Intermediate

💻 Code Implementation Papers

🔧 Papers with Excellent Code
  1. "Transformer" (Original)

    • Code: https://github.com/tensorflow/tensor2tensor
    • Framework: TensorFlow
    • Quality: Official
  2. "Vision Transformer (ViT)"

    • Code: https://github.com/google-research/vision_transformer
    • Framework: JAX/Flax
    • Quality: Official, well-documented
  3. "Swin Transformer"

    • Code: https://github.com/microsoft/Swin-Transformer
    • Framework: PyTorch
    • Quality: Official
  4. "Hugging Face Transformers"

    • Code: https://github.com/huggingface/transformers
    • Framework: PyTorch/TensorFlow
    • Quality: Industry standard, many models
  5. "CBAM" (Easy to implement)

    • Code: https://github.com/Jongchan/attention-module
    • Framework: PyTorch
    • Quality: Simple, well-documented
  6. "SE-Net"

    • Code: https://github.com/hujie-frank/SENet
    • Framework: PyTorch
    • Quality: Official

Start with CBAM or SE-Net

These are easiest to understand and implement. Then move to full transformers.

📊 Where to Track Papers

Paper Discovery Platforms

🔍 Paper Discovery
  1. Papers With Code - Transformers

    • URL: https://paperswithcode.com/method/transformer
    • Features: Papers with code, leaderboards
    • Best for: Finding implementations
  2. Papers With Code - Vision Transformer

    • URL: https://paperswithcode.com/method/vision-transformer
    • Features: ViT papers and code
    • Best for: Vision transformer research
  3. arXiv - Machine Learning

    • URL: https://arxiv.org/list/cs.LG/recent
    • Search: "transformer" OR "attention"
    • Best for: Latest papers
  4. Google Scholar

    • Search: "transformer" OR "attention mechanism"
    • Best for: Comprehensive search
  5. Connected Papers

    • Start with: "Attention Is All You Need"
    • Best for: Exploring transformer area

Conference Proceedings

📅 Top Venues
  1. NeurIPS (December)
  2. URL: https://papers.nips.cc/
  3. Search: Transformer, attention
  4. Best for: Latest transformer research

  5. ICLR (May)

  6. URL: https://openreview.net/group?id=ICLR.cc
  7. Search: Transformer, vision transformer
  8. Best for: Vision transformers

  9. CVPR (June)

  10. URL: https://openaccess.thecvf.com/CVPR
  11. Search: Vision transformer, attention
  12. Best for: Vision applications

  13. ICML (July)

  14. URL: https://proceedings.mlr.press/
  15. Search: Transformer, efficient transformer
  16. Best for: Efficient methods

📥 How to Get Papers

Free Access Methods

🆓 Free Access
  1. arXiv - All papers free
  2. Most transformer papers on arXiv
  3. Direct PDF download

  4. OpenReview - ICLR papers

  5. All ICLR papers free
  6. Includes reviews

  7. CVF Open Access - CVPR/ICCV

  8. All papers free
  9. Direct PDF links

  10. Google Scholar - PDF links

  11. Check "All versions" for free PDFs

  12. Author Websites

    • Many authors post PDFs
    • Check personal pages

Getting Papers

Most transformer papers are on arXiv (free). Very accessible.

📚 Learning Resources

Courses & Tutorials

🎓 Courses
  1. CS224n - Stanford NLP

    • URL: https://web.stanford.edu/class/cs224n/
    • Focus: Transformers for NLP
    • Level: Intermediate
  2. CS231n - Stanford Vision

    • URL: https://cs231n.stanford.edu/
    • Focus: Vision transformers
    • Level: Intermediate
  3. Hugging Face Course

    • URL: https://huggingface.co/course
    • Focus: Transformers practical
    • Level: Beginner to Intermediate

Blogs & Articles

📰 Blogs
  1. Jay Alammar's Blog

    • URL: http://jalammar.github.io/
    • Focus: Transformers, BERT, GPT explained
    • Why: Best visual explanations
  2. Lil'Log by Lilian Weng

    • URL: https://lilianweng.github.io/
    • Focus: Attention, transformers summaries
  3. The Gradient

    • URL: https://thegradient.pub/
    • Focus: Transformer research articles

🎯 Reading Strategy

Week 1: Attention Basics

  1. Read "Illustrated Transformer" (tutorial)
  2. Read SE-Net paper (easy)
  3. Read CBAM paper (easy)
  4. Implement CBAM or SE-Net
  5. Read "Attention Is All You Need" (foundational)

Week 2: Vision Transformers

  1. Read ViT paper
  2. Read Swin Transformer paper
  3. Try Hugging Face ViT
  4. Implement simple ViT

Week 3: Efficient Transformers

  1. Read EfficientNetV2
  2. Read MobileViT
  3. Read efficient transformer survey
  4. Compare efficiency methods

Week 4: Recent Work

  1. Follow arXiv for latest
  2. Read 2-3 recent papers
  3. Implement one method
  4. Write summary

Reading Plan

Start with tutorials and easy papers (SE-Net, CBAM), then move to full transformers.

🔔 Stay Updated

RSS Feeds & Alerts

📡 Alerts
  1. arXiv RSS Feed

    • URL: https://arxiv.org/list/cs.LG/recent
    • Search: "transformer" OR "attention"
    • Check: Daily
  2. Google Scholar Alerts

    • Setup: Alert for "vision transformer"
    • Frequency: Weekly
  3. Papers With Code Newsletter

    • URL: https://paperswithcode.com/newsletter
    • Frequency: Weekly

Social Media

📱 Social Tracking
  1. Twitter/X

    • Follow: @paperswithcode, @huggingface
    • Hashtag: #Transformers, #VisionTransformer
  2. Reddit

    • r/MachineLearning
    • Search: Transformer, attention

📋 To-Do Checklist

Beginner Level

  • Read "Illustrated Transformer" tutorial
  • Read SE-Net paper (easy)
  • Read CBAM paper (easy)
  • Implement CBAM or SE-Net
  • Read "Attention Is All You Need"

Intermediate Level

  • Read Vision Transformer (ViT) paper
  • Read Swin Transformer paper
  • Use Hugging Face transformers
  • Implement simple ViT
  • Read 5 recent papers

Advanced Level

  • Read efficient transformer papers
  • Implement efficient variants
  • Read foundation model papers
  • Contribute to open-source
  • Write transformer review
  • Papers With Code: https://paperswithcode.com/method/transformer
  • Hugging Face: https://huggingface.co/
  • Illustrated Transformer: http://jalammar.github.io/illustrated-transformer/
  • ViT Code: https://github.com/google-research/vision_transformer
  • Swin Transformer: https://github.com/microsoft/Swin-Transformer

Next Steps: Start with "Illustrated Transformer" tutorial, then read SE-Net/CBAM (easy), then full Transformer paper.