Topic 2: Data Augmentation for Deep Learning¶
π― Research Topic Guide
Complete resource guide for Data Augmentation research
π What to Learn¶
Core Concepts¶
- Geometric transformations: Rotation, translation, scaling
- Color space augmentations: Brightness, contrast, saturation
- Mix-based methods: Mixup, CutMix, AugMix
- Auto-augmentation: Learning augmentation policies
- Adversarial augmentation: Robustness through augmentation
Key Skills¶
- Implementing augmentation pipelines
- Understanding augmentation effects
- Auto-augmentation methods
- Mix-based augmentation techniques
- Evaluation of augmentation strategies
Learning Path
Start with simple geometric augmentations, then learn mix-based methods, finally explore auto-augmentation.
π Survey Papers (Start Here!)¶
π Essential Survey Papers
-
"A Survey on Image Data Augmentation for Deep Learning" (2019)
- Authors: Shorten, Khoshgoftaar
- Link: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0
- Why: Comprehensive survey of augmentation methods
- Difficulty: Beginner-friendly
-
"Data Augmentation: A Comprehensive Survey of Modern Approaches" (2022)
- Authors: Cubuk, Zoph, Shlens, Le
- Link: https://arxiv.org/abs/2209.02897
- Why: Modern survey covering recent methods
- Difficulty: Intermediate
-
"A Survey of Data Augmentation Approaches for NLP" (2021)
- Authors: Feng, Yang, Cer, et al.
- Link: https://arxiv.org/abs/2105.03075
- Why: NLP-focused augmentation survey
- Difficulty: Intermediate
Start with Surveys
Surveys give you complete overview before diving into specific methods.
ποΈ Classic Papers (Must Read)¶
β Foundational Papers
-
"Mixup: Beyond Empirical Risk Minimization" (2017)
- Authors: Zhang, Cisse, Dauphin, Lopez-Paz
- Link: https://arxiv.org/abs/1710.09412
- Code: https://github.com/facebookresearch/mixup-cifar10
- Impact: Started mix-based augmentation trend
- Difficulty: Easy (simple concept, easy to implement)
-
"AutoAugment: Learning Augmentation Strategies from Data" (2019)
- Authors: Cubuk, Zoph, Mane, et al.
- Link: https://arxiv.org/abs/1805.09501
- Code: https://github.com/tensorflow/models/tree/master/research/autoaugment
- Impact: First successful auto-augmentation
- Difficulty: Medium
-
"CutMix: Regularization Strategy to Train Strong Classifiers" (2019)
- Authors: Yun, Han, Oh, et al.
- Link: https://arxiv.org/abs/1905.04899
- Code: https://github.com/clovaai/CutMix-PyTorch
- Impact: Popular mix-based method
- Difficulty: Easy-Medium
π Modern Papers (Recent & Important)¶
π₯ Recent Important Papers
-
"RandAugment: Practical automated data augmentation" (2020)
- Authors: Cubuk, Zoph, Shlens, Le
- Link: https://arxiv.org/abs/1909.13719
- Code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
- Venue: CVPR 2020
- Difficulty: Easy (simpler than AutoAugment)
-
"AugMix: A Simple Data Processing Method to Improve Robustness" (2020)
- Authors: Hendrycks, Mu, Cubuk, et al.
- Link: https://arxiv.org/abs/1912.02781
- Code: https://github.com/google-research/augmix
- Venue: ICLR 2020
- Difficulty: Medium
-
"TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation" (2021)
- Authors: MΓΌller, Hutter
- Link: https://arxiv.org/abs/2103.10158
- Code: https://github.com/automl/trivialaugment
- Venue: ICCV 2021
- Difficulty: Easy (parameter-free)
-
"Simple Copy-Paste is a Strong Data Augmentation" (2021)
- Authors: Ghiasi, Cui, Srinivas, et al.
- Link: https://arxiv.org/abs/2012.07177
- Code: https://github.com/facebookresearch/detection/tree/main/projects/SimpleCopyPaste
- Venue: CVPR 2021
- Difficulty: Medium
π Tutorial Papers (Beginner-Friendly)¶
π Tutorial & Educational Papers
-
"Understanding Data Augmentation" - PyTorch Tutorial
- Link: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
- Why: Practical implementation guide
- Difficulty: Beginner
-
"Data Augmentation in PyTorch" - Official Docs
- Link: https://pytorch.org/vision/stable/transforms.html
- Why: Complete transform reference
- Difficulty: Beginner
-
"Albumentations Library"
- Link: https://albumentations.ai/
- Why: Fast augmentation library with examples
- Difficulty: Beginner
π» Code Implementation Papers¶
π§ Papers with Excellent Code
-
"Mixup"
- Code: https://github.com/facebookresearch/mixup-cifar10
- Framework: PyTorch
- Quality: Official, simple (~20 lines)
-
"CutMix"
- Code: https://github.com/clovaai/CutMix-PyTorch
- Framework: PyTorch
- Quality: Official, well-documented
-
"RandAugment"
- Code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
- Framework: TensorFlow
- Quality: Official
-
"AugMix"
- Code: https://github.com/google-research/augmix
- Framework: PyTorch
- Quality: Official
-
"TrivialAugment"
- Code: https://github.com/automl/trivialaugment
- Framework: PyTorch
- Quality: Official, very simple
-
"Albumentations" (Library)
- Code: https://github.com/albumentations-team/albumentations
- Framework: PyTorch/TensorFlow
- Quality: Industry standard, 70+ transforms
Start with Mixup
Mixup is easiest to implement (10-20 lines). Start there, then move to CutMix.
π Where to Track Papers¶
Paper Discovery Platforms¶
π Paper Discovery
-
Papers With Code - Data Augmentation
- URL: https://paperswithcode.com/task/data-augmentation
- Features: Papers with code, implementations
- Best for: Finding code
-
arXiv - Computer Vision
- URL: https://arxiv.org/list/cs.CV/recent
- Search: "data augmentation" OR "augmentation"
- Best for: Latest papers
-
Google Scholar
- Search: "data augmentation" deep learning
- Best for: Comprehensive search
-
Semantic Scholar
- Search: Data augmentation
- Best for: Related papers
-
Connected Papers
- Start with: Mixup or AutoAugment paper
- Best for: Exploring augmentation area
Conference Proceedings¶
π Top Venues
- CVPR (June)
- URL: https://openaccess.thecvf.com/CVPR
- Search: Data augmentation
-
Best for: Vision augmentation
-
ICCV (October, biennial)
- URL: https://openaccess.thecvf.com/ICCV
- Search: Augmentation
-
Best for: Vision methods
-
ICLR (May)
- URL: https://openreview.net/group?id=ICLR.cc
- Search: Data augmentation
- Best for: Learning-based augmentation
π₯ How to Get Papers¶
Free Access Methods¶
π Free Access
- arXiv - All papers free
- Most augmentation papers on arXiv
-
Direct PDF download
-
OpenReview - ICLR papers
- All ICLR papers free
-
Includes reviews
-
CVF Open Access - CVPR/ICCV
- All papers free
-
Direct PDF links
-
Google Scholar - PDF links
-
Check "All versions" for free PDFs
-
Author Websites
- Many authors post PDFs
- Check personal pages
Getting Papers
Most augmentation papers are on arXiv or CVF (free). Very accessible.
π Learning Resources¶
Courses & Tutorials¶
π Courses
-
Fast.ai - Practical Deep Learning
- URL: https://course.fast.ai/
- Focus: Data augmentation practical
- Level: Beginner-friendly
-
CS231n - Stanford
- URL: https://cs231n.stanford.edu/
- Focus: Data augmentation in vision
- Level: Intermediate
Libraries & Tools¶
π οΈ Libraries
-
Albumentations
- URL: https://albumentations.ai/
- Why: Fast, 70+ transforms, well-documented
- Best for: Production use
-
torchvision.transforms
- URL: https://pytorch.org/vision/stable/transforms.html
- Why: PyTorch official, simple
- Best for: Basic augmentations
-
imgaug
- URL: https://github.com/aleju/imgaug
- Why: Many transforms, flexible
- Best for: Research
Blogs & Articles¶
π° Blogs
-
"Understanding Data Augmentation" - Towards Data Science
- Search: Data augmentation deep learning
- Why: Practical explanations
-
Albumentations Blog
- URL: https://albumentations.ai/blog
- Why: Tutorials and examples
π― Reading Strategy¶
Week 1: Foundations¶
- Read survey paper (#1)
- Read Mixup paper (easiest)
- Implement Mixup (10-20 lines)
- Read CutMix paper
- Implement CutMix
Week 2: Auto-Augmentation¶
- Read AutoAugment paper
- Read RandAugment paper (simpler)
- Try RandAugment implementation
- Read TrivialAugment (easiest auto method)
Week 3: Advanced Methods¶
- Read AugMix paper
- Read Copy-Paste paper
- Implement 2-3 methods
- Compare results
Week 4: Recent Work¶
- Follow arXiv for latest
- Read 2-3 recent papers
- Implement one new method
- Write comparison
Reading Plan
Start with Mixup (easiest), then CutMix, then auto-augmentation methods.
π Stay Updated¶
RSS Feeds & Alerts¶
π‘ Alerts
-
arXiv RSS Feed
- URL: https://arxiv.org/list/cs.CV/recent
- Search: "data augmentation"
- Check: Daily
-
Google Scholar Alerts
- Setup: Alert for "data augmentation"
- Frequency: Weekly
Social Media¶
π± Social Tracking
-
Twitter/X
- Follow: @paperswithcode
- Hashtag: #DataAugmentation
-
Reddit
- r/MachineLearning
- Search: Data augmentation
π To-Do Checklist¶
Beginner Level¶
- Read survey paper on data augmentation
- Read Mixup paper (easiest)
- Implement Mixup (10-20 lines of code)
- Read CutMix paper
- Implement CutMix
- Try Albumentations library
Intermediate Level¶
- Read AutoAugment paper
- Read RandAugment paper
- Implement RandAugment
- Read TrivialAugment (parameter-free)
- Compare different methods
- Read 5 recent papers
Advanced Level¶
- Read AugMix paper
- Read Copy-Paste paper
- Implement advanced methods
- Experiment with combinations
- Write augmentation pipeline
- Contribute to open-source
π Quick Links¶
- Papers With Code: https://paperswithcode.com/task/data-augmentation
- Albumentations: https://albumentations.ai/
- PyTorch Transforms: https://pytorch.org/vision/stable/transforms.html
- Mixup Code: https://github.com/facebookresearch/mixup-cifar10
- CutMix Code: https://github.com/clovaai/CutMix-PyTorch
Next Steps: Start with Mixup (easiest to implement), then CutMix, then explore auto-augmentation methods.